Saturday, October 22, 2011

[CSE 494] Clarification about project - A/H

A number of you seem to have some questions about the authorities and hubs computation in Project part 2. In short this is what you have to do:
 
Step 1: Find the top-k results from TF/IDF. Call that your "root set".
At this stage you have 10 documents.
 
Step 2: Find all the documents that the root set points to and is pointed by. Call that your "base set".
At this stage you will have, say, 80 documents.
 
Step 3: Create the adjacency matrix for these 80 documents.
The size of the adjacency matrix would be 80 x 80. You will need to make more calls to LinkAnalysis to populate this matrix.
 
Step 4: Create an initial authorities vector and an initial hubs vector. Then use the techniques from the slides to iteratively compute the next authorities and hubs values. Remember to normalize after every iteration. Repeat this until it converges.
You can test convergence by checking whether the sum of the squares of the differences between the current values and the previous values is less than some threshold you choose.
 
Step 5: Print out the top-N authorities and top-N hubs.
 
I hope this clears some of the confusion.
 
Thanks and Regards,
Sushovan De

No comments:

Post a Comment