0
3.6kviews
Hyperlink Induced Topic Search Algorithm (HITS) Algorithm
2
290views

HITS Algorithm:

• Hyperlink Induced Topic Search Algorithm (HITS) identifies good authorities and hubs for a query topic by assigning two scores to a page:

i) Authority Score:

A page is a good authoritative page with respect to a given query if it is referenced by many pages related to that query.

ii) Hub Score:

A page is a good hub page with respect to a given query if it points to many good authoritative pages with respect to that query.

The steps are as follows:

1) Submit query q to a search engine. let S be the set of top 'n' pages returned by the search engine.

2) Expand s into large sit : T (base set) Add pages that are pointed to by any page in S. Add pages that point to any page in S.

3) Find the sub graph sg of the web graph that is induced by T.

4) Compute authority score and hub score of each web page in T using sub graph SG (V, E)

Given a page P, Let

a(p) - Authority score

b(p) - Hub score

(p, q) - directed edge from p to q

5) Apply following operations:

i) Operation I

Update each a(P) as the sum of all hub scores of web pages point to P.

ii) Operation O

Update each h(p) as the sum of all authority scores of web pages pointed to by P.

6) To apply the operations to all pages of the web graph at once, we can use matrix representation of operations I and O

• Let A be the adjacency matrix of SG : A (p, q) is 1 if P has a link to q, else the entry is O.

• Let hi be vector of hub scores after i iterations.

• Let ai be vector of authority score after i operations.

• Operation I : ai = ATbi-1

• Operation O : hi = Aai

7) As the values may increase beyond bounds, normalization is done after step 6 such that maximum value is 1.

8) Repeat until scores coverage.

9) Sort pages in descending order of authority scores.

10) Display the top authority pages.