Tuesday, October 4, 2011

(highly recommended) reading for next class: a chapter on map-reduce architectures..

In preparation for the next class, I recommend that you read the following chapter--
upto and including section 2.3.2


It has a nice description of big file systems and map reduce architecture. 

Rao

A self-contained textbook chapter on just page-rank

is here http://infolab.stanford.edu/~ullman/mmds/ch5.pdf

It touches on pretty much all the topics I discussed in the class--w.r.t. page rank (but doesn't have anything on A&H).

rao

Homework 1 solutions posted; graded homeworks will be returned today

in the class

Rao


Office hours shift again for today: 2:30--3:30

Sorry for these rather frequent changes--my son's school has early closings on a rather large number of days this semester..

Rao

Monday, October 3, 2011

You may skip part 2 of the pagerank/a&h question on the homework..

Part 2 of the A&H/Pagerank question requires understanding an idiosyncrasy of  A&H called tyranny of majority--I was supposed to have covered it
on Friday but didn't get around to it. So, you may skip that part (it won't be graded). 

rao

on the self-link issue.. (qn 5 in homework)

The question 5 in homework has two nodes that have outgoing links but those links point back to the pages themselves.

So the question is whether these pages are to be considered sink pages or not. 

One reasonable interpretation would be to consider them sink pages--since there is no way of leaving those pages and go anywhere else.

However, the specific "repair" we have for sink pages only works if we consider a page to be a sink page ONLY IF IT HAS NO OUTGOING LINKS
(not even links pointing back to itself). Otherwise, the repair will leave you with a probability mass of more than 1.0 on the outgoing links.

So, for this problem, you have to consider these self-looping pages as non-sink pages (despite the obvious fact that you can't leave them ;-).

========

On a related note, although pagerank approach has been developed with Z and K matrices, if you are planning to use a uniform reset matrix, then in a way the sink page repair is superfluous (basically after the K matrix is added,  you will have an irreducible matrix already). 

Rao

Project part 2 released. Due 10/27.

Part 2 of the project is released officially. It will be due Oct 27th.

regards
Rao