A number of you seem to have some questions about the authorities and hubs  computation in Project part 2. In short this is what you have to do:
 Step 1: Find the top-k results from TF/IDF. Call that your "root  set".
 At this stage you have 10 documents.
 Step 2: Find all the documents that the root set points to and is pointed  by. Call that your "base set".
 At this stage you will have, say, 80 documents.
 Step 3: Create the adjacency matrix for these 80 documents.
 The size of the adjacency matrix would be 80 x 80. You will need to  make more calls to LinkAnalysis to populate this matrix.
 Step 4: Create an initial authorities vector and an initial hubs vector.  Then use the techniques from the slides to iteratively compute the next  authorities and hubs values. Remember to normalize after every iteration. Repeat  this until it converges.
 You can test convergence by checking whether the sum of the squares of  the differences between the current values and the previous values is less than  some threshold you choose.
 Step 5: Print out the top-N authorities and top-N hubs.
 I hope this clears some of the confusion.
 Thanks and  Regards,
Sushovan De
Sushovan De
 
No comments:
Post a Comment