SEO Articles
Link Analysis Algorithms: HITS
July 18th, 2006HITS Algorithm
This algorithm was first described by Jon Kleinberg in his work “Authoritative Sources in a Hyperlinked Environment†(1998). The idea behind the HITS (Hyperlink Induced Topic Distillation) algorithm is that the authorities and hubs mutually reinforce each other. Authority weight of a page is calculated as a sum of hub weights pointing to it, and weight of a hub – as a sum of weights of authorities pointed to by it. In other words a hub is as good as the authorities linked by it, and vice versa.
The notation of the algorithm is as follows. Let S be a set of pages for which hub and authority weights are being calculated, n – the number of pages in the set. Then H is a subset of S containing pages acting as hubs, and A is a subset of S containing authorities. Since each page can be an authority and a hub, A and H overlap. For every page i in its hub role F(i) is the number of outgoing links. For every page i in its authority role B(i) is the number of incoming links. The n-dimensional vector of authority weights is denoted as a, and vector of hub weight – as h. Then hub and authority weights are calculated by the following formula:

The process is iterative. First all the weights receive value of 1. Then hubs and authority weights are calculated and the vectors are normalized. This stage is repeated until vectors a and h converge.
The algorithm however has a number of flaws. For example the nature of mutual reinforcement creates the following situation. Consider a hub that points to many authorities (hub B on the picture below), and a number of hubs pointing to a single authority (authority A on the picture). If the number of authorities pointed to by B is larger then the number of hubs pointing to A, then the HITS algorithm will allocate all the weight to the authorities in the right part of the picture and the authority A will get a weight near zero.

The reason is that hub B will initially get a very high score and propagate it to the authorities it links to. In the same time hubs on the left side will get very low score, and consequently A will get low weight too, although obviously it deserves more.
Cited Resources
- Kleinberg, J. May 1997, ‘Authoritative sources in a hyperlinked environment’. Technical Report RJ 10076, IBM,. Available at http://citeseer.ist.psu.edu/article/kleinberg98authoritative.html
Did you like it? Was it useful? Bookmark or share this post:
4 Responses to “Link Analysis Algorithms: HITS”



Digg This!
Technorati
Del.icio.us
Furl
Blinklist
Ma.gnolia
Yahoo! My Web


July 19th, 2006 at 5:20 am
[...] In general AT(k) algorithm uses the same formula as HITS. The difference is that when calculating the weight of a hub we consider top k authorities only, i.e. Fk(i) is a subset of outgoing links F(i). If the number of outgoing links |F(i)| is less or equal k than the AT(k) algorithm works exactly the same as HITS. Posted by oleg.ishenko Filed in Search Engines Technology [...]
August 20th, 2006 at 6:49 am
[...] Ð’ общем алгоритм AT(k) иÑпользует фактичеÑки такую же формулу как и алгоритм HITS. Разница ÑоÑтоит лишь в том, что при раÑчете веÑа хаба учитываютÑÑ Ñ‚Ð¾Ð»ÑŒÐºÐ¾ k authority-Ñтраниц Ñ Ð½Ð°Ð¸Ð±Ð¾Ð»ÑŒÑˆÐ¸Ð¼Ð¸ значениÑми авторитетноÑти.only, Ñ‚.е.. Fk(i) еÑть подмножеÑтво иÑходÑщих ÑÑылокF(i). ЕÑли количеÑтво Ñлементов (кардинальноÑть) |F(i)| равна или меньше k то алгоритм AT(k) полноÑтью повторÑет процедуру алгоритма HITS. Posted by oleg.ishenko Filed in Технологии ПоиÑковых Машин [...]
February 26th, 2007 at 9:09 am
I read your many article & understand properly., i would like to know could i used hyperlinks to ranking a image using matric citation with content based retrivel system
April 22nd, 2009 at 11:59 pm
please help me.Do you have HITS algorithm? you can share with me.thanks
email:triantivirus@gmail.com