SEO Researcher News 
Link Analysis Algorithms
July 17th, 2006Relevance and Authority
When a user queries a search engine with a keyword, he expects more than just relevant results. For example if someone searches for ‘Bali vacations’ he would be disappointed to get a page of a personal blog with a story about John Doe’s awesome Bali vacation last summer. Obviously, what the user was looking for was a travel agent like Expedia. Thus it is critical that users get not just relevant but also authoritative results. And the more pages appear daily in the Internet the bigger is a shift from the relevance to the authoritativeness in search algorithms.
Nowadays the relevance of a page is defined differently. It is not just about the keyword saturation, or copy structure. Currently the context where the page exists defines its relevance. The context is a set of pages linking to or linked by the given page. If these pages are about Bali vacations, then it is naturally to expect a page linked by them to be about Bali vacations as well. The page content would be used to adjust the algorithm’s results in cases when links point to an irrelevant page, for example a widely used free web statistics system.
Link Analysis Ranking Algorithms
So how come the page content analysis is no longer enough to get relevant search results? There is a problem of abundance: the number of pages considered to be relevant basing on the page content analysis is too big for a human to digest. And this is where searching for authoritative pages helps to narrow down the results. But authority is even a vaguer notion than relevance. Authority has to express the importance and the weight of a web document. The nature of the Web as an interlinked hypertext environment suggests that links can be used to measure the degree of “public recognition†of web pages.
This idea has been existing since the creation of the Internet, and Jon Kleinberg was one of the first to create a workable approach, which he described in his seminal work “Authoritative Sources in a Hyperlinked Environment†(1998). He suggested that web pages can be either “hubs†or “authoritiesâ€. Authority is a page that has many incoming links or high in-degree. Authorities returned as relevant to some query should demonstrate an overlap in pages pointing to them. Those pages containing links to the relevant resources are called hubs. Hubs determine the relevance of authorities on a given topic and allow discarding other non-relevant pages with high in-degree.

Link analysis ranking algorithms use hyperlink graphs similar to the shown above. The nodes of the hyperlink graph are web pages, and links are the directed edges. The graph is simple: if there is more than one link between two nodes, only one is considered and neither are the links from a page to itself. A different weight can be assigned to edges (links) according to the web page analysis or other factors, which search engines consider important, e.g. link or domain age.
In my further posts I will describe the most widely used link analysis algorithms.
References
- Kleinberg, J. May 1997, ‘Authoritative sources in a hyperlinked environment’. Technical Report RJ 10076, IBM,. Available at http://citeseer.ist.psu.edu/article/kleinberg98authoritative.html
- Borodin, A, Roberts, G.O., Rosenthal, J.S. and Tsaparas, P. ‘Finding authorities and hubs from link structures on the World Wide Web’. In Proceedings of the 10 th International World Wide Web Conference, Hong Kong, May 2001. Available at http://citeseer.ist.psu.edu/borodin01finding.html
A Few Words on the Internet Search and SEO
July 9th, 2006Internet Search
For the last decade the Internet has emerged itself into an integral part of our everyday life. What previously was mostly a network of research institutions has become an unprecedented medium for commercial and social use. With the number of web pages now exceeding 12 billion it is extremely difficult to find information without using search engines.
Trust Problem in the Link-Based Popularity Ranking
July 3rd, 2006The idea behind the popularity ranking algorithms is that, by linking to a page, you imply that that page deserves attention. Search engines use links to determine the authority of pages in topics described by the link anchor text. The problem is that every link is considered as a positive endorsement with no regard to the real intention of the linking person. There is no effective way for a search engine to distinguish between positive and negative endorsements in links yet. Interested? Read on!
Popularity Ranking Faults
July 1st, 2006The influence of search engines and Google in particular on the popularity of web pages is hardly to exaggerate. When a user queries a search engine for a keyword or a key phrase and, having looked through first two or three SERPs, still can’t get desired results, he eventually gives up, and starts looking for something else. When a web page is not in the SERPs - it doesn’t exist.
In the past things were simpler. Few dozens of pages in the search results could be easily sorted according to the keyword density and prominence in the body text and/or meta tags. Now there is a problem of abundance – there are hundreds of thousands, millions of pages to be ranked by relevance and quality. And the notions of relevancy and quality have become very complex.
To measure the relevance and quality of a page search engines employ various algorithms, of which the most widely used are the popularity-based ones. The idea is, the more links point to a page, the more popular this page is, and the higher rank it gets in SERPs. In reality things are not that simple. Ranking algorithms have become quite sophisticated in the evaluation of the page popularity. They are able to detect artificially inflated link popularity, and assign different weight to certain parameters in order to eliminate any possible manipulation of the rankings. Or they can use different ways to determine popularity, like for example Google ranks higher older websites, implying that websites with longer history are well established.
Rich Get Richer
The problem is: can new websites be discriminated in SERPs because of the popularity based ranking systems? If a new website has no incoming links, it can’t be found in search results. If it not in SERPs, no one will find it and link to it. A vicious circle. Conversely – more popular sites become even more popular – ‘rich get richer’.
How do ranking algorithms of search engines impact the website popularity? Is there any pattern of the evolution of popularity? Interested? Read on!
Website is a Marketing Being: Project Perception
June 29th, 2006Vitaly Kolesnik has written a very interesting article on the perception of the key elements of a web project and their relationships.
I took the liberty of summing up the main ideas of his article as well as the critique of his approach.
Website Project
Compared to the traditional media, websites’ potential for evolving is bigger. A fresh made website is more like a DNA for a future creature rather than a ready-to-use product.
Very often there is a strong desire to create a website, but the perception of the desired result is rather vague. There is a need to get a picture of the website’s course of evolvement. There is a need to understand the relationship between parts of the project. There is a need to balance these parts out and align their development to the general course. Interested? Read on!




