English

Google matrix

A Google matrix is a particular stochastic matrix that is used by Google's PageRank algorithm. The matrix represents a graph with edges representing links between pages. The PageRank of each page can then be generated iteratively from the Google matrix using the power method. However, in order for the power method to converge, the matrix must be stochastic, irreducible and aperiodic. A Google matrix is a particular stochastic matrix that is used by Google's PageRank algorithm. The matrix represents a graph with edges representing links between pages. The PageRank of each page can then be generated iteratively from the Google matrix using the power method. However, in order for the power method to converge, the matrix must be stochastic, irreducible and aperiodic. In order to generate the Google matrix G, we must first generate an adjacency matrix A which represents the relations between pages or nodes. Assuming there are N pages, we can fill out A by doing the following: Then the final Google matrix G can be expressed via S as: By the construction the sum of all non-negative elements inside each matrix column is equal to unity. The numerical coefficient α {displaystyle alpha } is known as a damping factor. Usually S is a sparse matrix and for modern directed networks it has only about ten nonzero elements in a line or column, thus only about 10N multiplications are needed to multiply a vector by matrix G. An example of the matrix S {displaystyle S} construction via Eq.(1) within a simple network is given in the article CheiRank. For the actual matrix, Google uses a damping factor α {displaystyle alpha } around 0.85. The term ( 1 − α ) {displaystyle (1-alpha )} gives a surfer probability to jump randomly on any page. The matrix G {displaystyle G} belongs to the class of Perron-Frobenius operators of Markov chains. The examples of Google matrix structure are shown in Fig.1 for Wikipedia articles hyperlink network in 2009 at small scale and in Fig.2 for University of Cambridge network in 2006 at large scale. For 0 < α < 1 {displaystyle 0<alpha <1} there is only one maximal eigenvalue λ = 1 {displaystyle lambda =1} with the corresponding right eigenvector which has non-negative elements P i {displaystyle P_{i}} which can be viewed as stationary probability distribution. These probabilities ordered by their decreasing values give the PageRank vector P i {displaystyle P_{i}} with the PageRank K i {displaystyle K_{i}} used by Google search to rank webpages. Usually one has for the World Wide Web that P ∝ 1 / K β {displaystyle Ppropto 1/K^{eta }} with β ≈ 0.9 {displaystyle eta approx 0.9} . The number of nodes with a given PageRank value scales as N P ∝ 1 / P ν {displaystyle N_{P}propto 1/P^{ u }} with the exponent ν = 1 + 1 / β ≈ 2.1 {displaystyle u =1+1/eta approx 2.1} . The left eigenvector at λ = 1 {displaystyle lambda =1} has constant matrix elements. With 0 < α {displaystyle 0<alpha } all eigenvalues move as λ i → α λ i {displaystyle lambda _{i} ightarrow alpha lambda _{i}} except the maximal eigenvalue λ = 1 {displaystyle lambda =1} , which remains unchanged. The PageRank vector varies with α {displaystyle alpha } but other eigenvectors with λ i < 1 {displaystyle lambda _{i}<1} remain unchanged due to their orthogonality to the constant left vector at λ = 1 {displaystyle lambda =1} . The gap between λ = 1 {displaystyle lambda =1} and other eigenvalue being 1 − α ≈ 0.15 {displaystyle 1-alpha approx 0.15} gives a rapid convergence of a random initial vector to the PageRank approximately after 50 multiplications on G {displaystyle G} matrix.

[ "PageRank", "pagerank algorithm", "CheiRank" ]
Parent Topic
Child Topic
    No Parent Topic