Konstantin Avrachenkov, Paulo Gonçalves, S. de Nigris, Esteban Bautista, Patrice Abry, Dynamic Networks : Temporal and Structural Capture Approach (DANTE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de l'Informatique du Parallélisme (LIP), Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École normale supérieure - Lyon (ENS Lyon)-Institut Rhône-Alpin des systèmes complexes (IXXI), École normale supérieure - Lyon (ENS Lyon)-Université Lumière - Lyon 2 (UL2)-Université Jean Moulin - Lyon 3 (UJML), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Université Lumière - Lyon 2 (UL2)-Université Jean Moulin - Lyon 3 (UJML), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Laboratoire de Physique de l'ENS Lyon (Phys-ENS), École normale supérieure - Lyon (ENS Lyon)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Network Engineering and Operations (NEO ), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), ANR-10-LABX-0070,MILYON,Community of mathematics and fundamental computer science in Lyon(2010), École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Lyon (ENS Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Institut Rhône-Alpin des systèmes complexes (IXXI), École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL), École normale supérieure de Lyon (ENS de Lyon)-Université Lumière - Lyon 2 (UL2)-Université Jean Moulin - Lyon 3 (UJML), and Université de Lyon-Université de Lyon-Centre National de la Recherche Scientifique (CNRS)
International audience; Graph Semi-supervised learning (gSSL) aims to classify data exploiting two initial inputs: firstly, the data are structured in a network whose edges convey information on the proximity, in a wide sense, of two data points (e.g. correlation or spatial proximity) and, second, there is a partial information on some nodes, which have previously been labelled. Thus, the classification problem is usually a balance between two terms: one diffusing the information from the labelled points to the unlabelled ones through the network and another one that constrains the solution to be similar, on the labelled nodes, to the given labels. In practice, popular SSL methods as Standard Laplacian (SL), Normalized Laplacian (NL) or Page Rank (PR), exploit those operators defined on graphs to spread the labels and, from a random walk perspective, the classification of a given point is given the maximum of the expected number of visits from one class. Anomalous diffusion can alter the way a graph is "explored" and, therefore, it can alter classification performance. In a nushell, Lévy flights/walks are a way to create superdiffusive regimes: the customary rule for their ignition is to allow the walkers to perform non-local jumps, whose length is distributed according to a fat-tailed pdf with diverging second moment. Mathematically speaking, there have been several attempts to convert the Lévy flight phenomenon on networks and, in the context of gSSL, we settled to the use of fractional operators. Such operators are defined on the spectrum of the Standard Laplacian matrix by elevating it to some power γ: namely if the Laplacian reads L i,j = D i,j − A i,j , where D i,j is the diagonal matrix whose diagonal are the vertexes' degrees and A i,j is the adjacency matrix, then L γ = QΛ γ Q T , where Λ = diag(λ 0 ,. .. , λ N −1) are the eigenvalues and Q is the matrix whose columns are the eigenvectors. Such operators were investigated in [1, 2] and in such works it was shown that, indeed, they connect to a Lévy Flight type of process on graphs. Thus, in our SSL context, we cast those operators in the SSL problem in each different incarnation (SL, PR and NL) [3, 4] and we investigated the beneficial effect of such a procedure for classification, as displayed in Fig. 1. The L γ operator actually possesses two quite contrasting behaviours in the 0 < γ ≤ 1 and γ > 1 regimes: in the former, it is still a valid stochastic matrix (i.e. the off-diagonal terms are negative) and, therefore, it implies a random walk process. This random walk, in the Lévy Flight style, indeed performs long-range transitions and this enhanced and faster exploration allows for a better classification in cases in which the graph is badly constructed: for instance, in the figure, the existence of edges across the two moons spoils classification with the standard method, but we override this issue through the use of fractional methods. On the other hand, the γ > 1 regime, the operator L γ presents positive off-diagonal terms, so the associated graph, defined from the weights W γ i,j , has negative edges. Consequently, it is not possible anymore to draw a random walk process from it but one can still resort, for instance, to an anisotropic diffusion interpretation [5]. In this case, the effect of the operator is evident in traditionally problematic cases for standard methods, like very skewed and unbalanced ones, in which a class possesses more labels or it is denser edge-wise (Fig.1b). a b Fig. 1: (a) Classification of the Two Moons dataset with Page Rank and Fractional Page Rank. (b) Accuracy in classification for a dataset composed by connected modules with unbalanced labels. [1] Alejandro P. Riascos and José L. Mateos. Long-range navigation on complex networks using Lévy random walks. Phys. Rev.E, 86(5):056110, 2012. [2] Alejandro P. Riascos and José L. Mateos. Fractional dynamics on networks: Emergence of anomalous diffusion and Lévy flights. Phys. Rev. E, 90(3):032809, 2014. flights for graph based semi-supervised classification. In 26th colloquium GRETSI, 2017. [5] Andrew V. Knyazev. Signed laplacian for spectral clustering revisited. arXiv preprint arXiv:1701.01394, 2017.