Thursday, December 15, 2011

The extra credit points

Here are the extra credit points 

For project extra credit, the points you got are normalized by the total for the regular project and multiplied by the weight
(so if you get 10 extra credit points on a project that is graded for 50 pts, and counts for 10% of your grade, then you get (10/50)*10 = 2 cumulative points)

The UGs got extra credit for doing the midterm at home (it was counted as 3.5 cumulative points)

The UGs also got extra credit if they did the last question on the final ( which counts for [x/100]*15  cumulative points)

The social networks extra credit homework counted for 3.5 cumulative points.

All said there were a maximum of  14 cumulative points that one could have amassed. 

This will likely be the last mail about the class cumulatives. You should be able to find your letter grades from the registrar as and when they get posted.

Wishing you all a wonderful (if all too brief) holiday season. 
Rao



(hopefully) final cumulatives

Folks

 Here are the final cumulatives for the class.  They contain your final exam marks; project 3 and demo marks and a participation grade. Let us know if you find any errors anywhere
[Note that the final was graded out of 115 for grads and 100 for UG--with the answers to the paper reviews kept as extra credit portion]

I computed the regular cumulative with 5% for participation, 25% for homeworks, 30% for exams and 40% for project. I will try some other weighted averages and take the maximum among those.

I have not computed the extra credit part---given that the grading is relative, adding extra credit to the total at the outset makes extra credit "mandatory". My idea is to set the thresholds first based on regular marks and
then consider bumping your grade based on your extra credit points. 

I normally ask the students at the top of the ladder in the 494 and 598 sections to suggest grade cutoffs for their respective sections. I offer the same chance to the current two top candidates. 
(of course, this is only advisory--but does give me useful input in that the students know the comparative level of difficulty of the class).

regards
Rao

Monday, December 12, 2011

Submitting the final

Folks

 You can submit the final in hard copy by just pushing it under my door (BY 560). If you come after 8AM, the department front desk should be open and you can also submit it there telling them that it is for me.

Please don't speed on the roads on my account; +/- a few minutes is fine. 

Rao

Sunday, December 11, 2011

Clarification re: I.2

More than one person seems to have been confused by the word "learning"  in I.2 below. 
What I am asking is whether the information extracted by most IE (information extraction) methods are meant to be stored in RDF or OWL format.


 Semantic web has RDF and OWL standards for specifying structure. Information 
extraction aims to extract structure rather than wait for it being manually specified. Are 
current day IE approaches aimed at learning what is normally specified in RDF or in 
OWL? Why?

Re: Clarification on Qn VIII

yes both are per click.

Rao


On Sun, Dec 11, 2011 at 1:32 PM, Kalin Jonas <kjonas@asu.edu> wrote:
I notice you replaced A's pay rate with per-click, did you also mean to replace B's rate with per-click?


On Sat, Dec 10, 2011 at 10:21 PM, Subbarao Kambhampati <rao@asu.edu> wrote:
In the first part of the question VIII, please change "per impression" to "per click" (basically, the advertiser won't pay
unless the user actually goes to the advertiser's page by clicking on the ad)

VIII. [3+3] A search engine has bids on advertisements from two advertisers, A and B. 
Assume that A and B want their ads to be shown w.r.t the same query Q.  Suppose further 
that the search engine cannot show more than one advertisement. A is willing to pay 20c 
per XXimpressionXX click   . B is willing to pay 10c per impression.  How should the search engine 
decide as to whose ad it should show? Pay particular attention to whether the  search 
engine has all the information to make a decision, and if it needs more information, how 
it will get it.


Saturday, December 10, 2011

Clarification on Qn VIII

In the first part of the question VIII, please change "per impression" to "per click" (basically, the advertiser won't pay
unless the user actually goes to the advertiser's page by clicking on the ad)

VIII. [3+3] A search engine has bids on advertisements from two advertisers, A and B. 
Assume that A and B want their ads to be shown w.r.t the same query Q.  Suppose further 
that the search engine cannot show more than one advertisement. A is willing to pay 20c 
per XXimpressionXX click   . B is willing to pay 10c per impression.  How should the search engine 
decide as to whose ad it should show? Pay particular attention to whether the  search 
engine has all the information to make a decision, and if it needs more information, how 
it will get it.

Friday, December 9, 2011

one of the search engine projects from the class

Hi all:

 Sushovan, the TA, came and asked me to check out the project by one of the students in the class. 


The project is by Sathishkumar Poornachandran

The student seems to have done a pretty slick job--with online query completion thrown in. If only he had the page snippets too ;-)

Rao

 

Thursday, December 8, 2011

Final released...

Folks

 The CSE494/598 Take home final is released. You can access it from the URL http://rakaposhi.eas.asu.edu/cse494/final-f11.html 

Please carefully read and follow the instructions about the honor-code that applies to this take home. You are essentially allowed only to 
use the class notes, lectures, and ask me for clarifications. NO discussions with other humans or web-trawling for answers is allowed.
Your signature on the first page serves as your oath that you followed these rules. 

Please also check back this URL for any errata/modifications to the final (I will also send emails if there are any errors found).


Rao
(Released 8:35AM on Thursday 12/8; Due back in hardcopy Monday 12/12 9AM)


Wednesday, December 7, 2011

Heads up on the last question of the exam..

I have the exam pretty much set--I will release it tomorrow morning.

For 598 students who are night owls, here is the last question on the exam (which is only required for 598 students; and is extra-credit for 494 ones).
You can get a head start if you want.

=======

 

 [IX] [15pt] [Required for 598 students. Optional-extra-credit for 494 ones]

For the purposes of this question, assume that you are planning to pick a topic for your next research paper (presumably because you want to make progress towards your dissertation/thesis). Here is the link to the proceedings of WWW 2011

http://www.www2011india.com/proceeding/forms/pcontents.htm

 

Suppose you are trying to work on extending one of these papers. Select a paper. Read at least the abstract and introduction. Now, use only the space given to answer:

 

·       In your words write down what the paper is trying to do and how it is related to what we discussed in the course. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

·      What follow-on work would you be interested in doing on the paper and why?

 

 

 

 

 

 

 

 

 

[Corrected] Re: Solutions for the social networks extra credit homework posted

There was an error in the solutions that were posted; please download the link again to get the correct version

rao


On Wed, Dec 7, 2011 at 11:57 AM, Subbarao Kambhampati <rao@asu.edu> wrote:
I didn't have the solution for the last question. basically, you should see a straight line (or close to one) if you plot
# citations w.r.t. rank. 

The slope of that closest fit line is the r

Rao


final exam will be take home

Folks

 This is just to confirm that the final will be take-home. The exam itself will be almost in the in-class style, you will have to answer questions in the
space provided, and submit it.  You just get to do it at home. 

I am planning to release it by tomorrow (Thu), and it will be due Monday morning. 

Rao

fulton course evaluations

Folks

 I understand that today is the last day for the fulton course evaluations. You should have received mails on it.  I would encourage you all to take time to do them--as those are the most useful feedback I get about how the course went.

 (Just so you know, instructors will get those only sometime in January--*and* they will be anonymous--so you don't need to worry that it might affect your grade ;-)

cheers
Ra

Solutions for the social networks extra credit homework posted

I didn't have the solution for the last question. basically, you should see a straight line (or close to one) if you plot
# citations w.r.t. rank. 

The slope of that closest fit line is the r

Rao

Sunday, December 4, 2011

Class participation questionnaire--please bring it in hard-copy to Tuesday's class

Please print the attached questionnaire, fill it and bring it to Tuesday's class. 

If you don't turn it in on Tuesday, I will have no statistics to judge your class participation by. 

Rao

Poll on Final Exam format

Please vote on your preference for the final exam format (in-class vs. take-home) at


Vote early, but not often (I ask for your name as part of the vote so I know you don't vote more than once :)

Rao

Friday, December 2, 2011

[CSE494] Signup link for demo slots

Here is the link to sign up for demo slots. Please just put in a fake phone number when signing up.

 

http://cse494.schedulething.com/

 

To reiterate, each demo slot will be 10-12 minutes long, and will involve you running some queries using all of the parts of the project you implemented and telling me about any extra credit parts you implemented. All the demos will be held in BYENG 558 (the office right next to Dr Rao’s.)

 

Thanks and Regards,

Sushovan De

 

[CSE 494] Demo time slots registration will open 5pm today

Just a quick heads up – at 5:30pm today, I will send out a link that will allow you to register for time slots for the demo. The demo slots are Monday to Friday all next week (12/5 – 12/9), between 1 and 4pm. Each demo will be 10 to 12 minutes long, and I will ask you to run a few sample queries on every part of the project and tell me about any extra credit you have implemented.

 

Thanks and Regards,

Sushovan De

 

DUE: 12/6: Mandatory Interactive Review Blog qn (You should post your answer as a comment to this thread )


"List five nontrivial ideas you came to appreciate during the course of this semester"
(You cannot list generic statements like "I thought pagerank was cool".  Any cross-topic connections
you saw are particularly welcome. )

Note that you must post your response as a comment to this particular message on the class blog
All comments must be done by the beginning of Tuesday's class (12/6). 

If you have forgotten what we have done, see the class home page for a description. 


Rao
---

Tuesday, November 29, 2011

masochist special: cse471 seats for grad students...

Sorry for spamming the whole class, but a couple of students asked for overrides to get into the 598 section of cse471.

I told them to send me their name and student id, and I will let them know if more seats become available (a few might be
by the time the class starts). 

This is just to let others who may be interested but didn't think there is any use in asking, to know about this possibility.

I am always willing to try to make exception for masochists who want to suffer through multiple courses from me. 

cheers
Rao

Tuesday, November 15, 2011

On the question of whether relational schemas can capture books with multiple authors

In the class, I said that XMLschema allows us to easily capture books with multiple authors in contrast to RDBMS. 

Two of you pointed out that RDBMS can handle multiple authors by storing authors in a table along  with book IDs (say isbn numbers)

<isbn1, author11>
<isbn1, author12>
<isbn2,author21> 

etc, with the isbn number working as a foreign key into table.

While this way of doing it allows books with arbitrary number of authors,  it does not allow us to constrain the number of authors (say to be always
between 1 and 3). 

To do that, you will have to either use extra-RBDMS ideas such as triggers, or put upto three columns in the book table, resigned to the fact that the when there are
no second or third authors there will be null values in the column.  This is the price we pay for writing a single table for data that essentially is a union of three schemas--books with 
one author, books with two authors and books with three authors. It is here that XMLSchema differs by allowing specification of disjunctive schemas.

Rao




Colorless green ideas and universal grammar

Here is a mail I had sent once to my AI class (http://rakaposhi.eas.asu.edu/cse471
about the Colorless green ideas example. FYI.

Rao

=============

Colorless green ideas sleeping furiously (Chomsky, Universal Grammars etc. (Long.))


  • Tocse471-f06@parichaalak.eas.asu.edu
  • Subject: Colorless green ideas sleeping furiously (Chomsky, Universal Grammars etc. (Long.))
  • From: "Subbarao Kambhampati" <rao@asu.edu>
  • Date: Thu, 16 Nov 2006 23:14:06 -0700
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:mime-version:content-type:x-google-sender-auth; b=DZcxkBhR/QwDSmP8JAKwyKQUzBNXY6RB/pBRq0QUXJY70J25FuBrGZDA6BivzEXgEGFmrFZb7nst+Q6ZMt0eyon7XcKSTFHAcSpPU9R9+pcKBWremnY+gW4IwIS5wrH74a3msuNVztExR4UMPpcogLfAd+iocBrXJ5hsEu7ZE6s=
  • Sendersubbarao2z2@gmail.com


As I mentioned in the class today, for the longest time, and by that I
mean, until well into late 50's, the conventional scientific wisdom
was that infants come into this world with a "Tabula Rasa" (blank slate) mind, and pick up everything by learning (supervised or
reinforced) and observation. The reigning doctrine was
"behaviorism"--you can condition/reinforce any behavior into any
organism. To behaviorists, children were but cuter (pavlovian) rats, and language acquisition was no different than a acquisition of maze
following skills. B.F. Skinner was the leading exponent of behaviorism
and was, in early fifties, writing book after book expounding on how
behaviorism can explain all sorts of human behavior.
[Skinner was such an ardent behaviorist that there was even an
apocryphal urban legend that said he raised his own daughter in a
"skinner box" to test his behaviorism hypotheses--see
http://www.snopes.com/science/skinner.htm ]

When Skinner came around to applying behaviorism explanations to
language acquisition and wrote the book on "Verbal Behavior", it was expected to further shore up the behaviorism doctrine, and become a
classic. What became a classic instead is a critical scholarly 1959
"review" of the book by a then little-known linguist named Noam
Chomsky (
http://cogprints.ecs.soton.ac.uk/archive/00001148/00/chomsky.htm ).

Chomsky essentially killed the book as well as much of the euphoria of behaviorism by an arugment that has since then come to be known as
the "poverty of stimulus" argument. He said that behaviorism and
stimulus/response reinforcement does not quite explain how it is that children seem to be able to generate sentences that they have never
heard of. In other words, there are not enough examples (hence a
poverty of "stimuli") for the children to learn entire
language--grammar and sentences together (even for children--such as mine--with overly talkative parents ;-) [You note that the argument
that something cannot be learned is being done in terms of the
the inordinate number of examples needed to learn it. As we saw in
the class, difficulty of learning tasks is measured in terms of "sample complexity".]

As an alternative explanation, Chomsky cited his own work on
"generative grammars"--a set of grammar rules that can generate
"grammatically correct" sentences from a language. He said that it must be the case that children come into the world with grammar rules
already in their head. Since the grammars of different world languages
are different, he posited that the children come into this world with
a "universal" grammar. Using the language being spoken around them,
they then set the "parameters" (or knobs, if you will) on their
universal grammar such that it becomes customized to the specific language environment they are in. Once they have the customized
grammar, they then are in the business of learning word sense (or
semantics). "Colorless green ideas sleep furiously" is one of
Chomsky's famous examples, which he uses to show that even small kids
with limited vocabularies can tell automatically whether a sentence is
grammatically correct. [Even here, in learning semantics, children
come into the world with pretty strong biases--including the so-called
"whole object hypothesis". If I point towards a big moving bus and say
"Bus", the kid hypothesizes that the whole big thing is called
bus--not just the wheels, or the hubcaps, or some subset of the
parts. Pretty strong bias, come to think of it--what if I said
"Driver" pointing towards the driver side of the bus?]

Chomsky of course went on to become the demi-god of cognitive science
in general, and mathematical linguistics in particular (and you obviously heard of him in your CSE 355, when you probably learned
about the Chomskian hierarchy of languages--which is in terms of their
grammar complexity). A lot of research has been done since Chomsky's
original work, to shore up the support for the universal grammar
hypothesis. It is so much of an accepted fact (dogma) now that it
(universal grammar) in turn is seen as yet another evidence that all
humans evolved from a common set of ancestors--as against evolving
separately and independently (the "Lucy" theory,
http://www.pbs.org/wgbh/evolution/library/07/1/l_071_01.html ; by the
way Don Johanson, who found Lucy skeleton in Ethiopia, is right here
at ASU--check out http://www.asu.edu/clas/iho/dcj.html ). The basic
argument is that the rank similarity of the human languages cannot be
explained without it. (Of course, there are much stronger arguments for the common ancestor theory--including the fact that we are all
same species--any man from anywhere in the world can mate with any
woman from anywhere in the world and produce healthy offspring).

So that is some of the background on the universal grammar. By the way, note that none of the above says that conditioning will not be
effective in changing ones behavior--you probably saw the recent press
accounts of the infamous Wendell Johnson orphan stuttering experiments
(http://www.jsonline.com/news/nat/jun01/stutter11061001.asp). All
Chomsky's argument says is that conditioning and reinforcement are
only part of the story and cannot alone explain language acquisition; evolution did a whole other part too.

Now for a couple of references. Probably the best-written lay-person
book on human language acquisition is Steven Pinker's "Language
Instinct"( http://www.amazon.com/exec/obidos/tg/detail/-/0060976519/104-1220170-3641559).
A very nice and eminently watchable 3-part PBS series on language
acquisition is "Human Language"
( http://equinoxfilms.home.mindspring.com/HLseries.html).

That is all for now. Now for more important things like Seinfeld rerun
already in progress.

Rao

Epilogue/Postscript: These days of course, a search on Chomsky on the
web is more likely to turn up references to his political views than
his linguistic ones. He is sort of a one-man vocal and loyal
opposition for many of the US government policies. For example, he
wrote one of the first dissenting "non-party line" opinions of the 9/11. Whether I agree with him or not, I am grateful that there is
someone unafraid of speaking his mind--especially in these days of
hyper-patriotism, where FBI thinks it normal to monitor you because
you are against war, and your freedoms are being re-defined as the ones Ashcroft hasn't yet gotten around to excising into the "Patriot
Act".

----------------
"I simply want to tell you that there are some men in this world who
were born to do our unpleasant jobs for us. Your father's one of
them."
[..] "I always thought Maycomb folks were the best folks in the world,
least that's what they seemed like."
"We're the safest folks in the world" said Miss Maudie. "We're so
rarely called on to be Christians, but when we are, we've got men like
Atticus to go for us."

--Miss Maudie and Jem talking about Atticus Finch, as
Scout and Dill look on..


============

Wednesday, November 9, 2011

[CSE 494] Regarding Extra Credit in Project Phase 3

I would like to clarify how the points will be awarded for extra credit in Phase 3. You can pick up a maximum of 10 points as extra credit, and it will be graded on a curve. You may attempt any number of extra credit problems from the problem statement, but please remember that points are awarded for a thorough analysis. Doing a little bit of three different extra credit questions will gain you fewer points than doing one problem thoroughly.

 

Thanks and Regards,

Sushovan De

 

Re: questions about today's class

This is a good question.

This is the difference between probability (which assumes that the model parameters are known, and infers/computes the probability of an event) and statistics (which first learns the parameters first).

In your example below, if you assume that the model parameters are given (e.g. saying that the coin is a fair one),  you can say that the next toss is going to come heads with half probability. If however you cannot assume the model--i.e., you have to learn/update it first, then 
the "likelihood" that the 2million heads out of 3million tosses is caused by a biased coin is much higher than that it is caused by a fair coin. So, you learn/estimate the biased probabilities first (the model parameters), and subsequently infer, using the new model, that the next toss will come heads with higher probability. 

Rao


On Wed, Nov 9, 2011 at 12:59 AM, Shaunak Shah <skshah7@asu.edu> wrote:
Hello Dr. Rao,

I have a question not from today's class but on the topic "need for smoothing" in classification. 
Why do we consider that if a coin comes up as Heads for 2 million times out of the 3 million tosses, then it should be a biased coin ?
Should we not think otherwise that the coin is not actually biased and is just having a bad day because we know for sure that the probability of Heads/Tails if 0.5.

--
Shaunak Shah


On Tue, Nov 8, 2011 at 7:10 PM, Subbarao Kambhampati <rao@asu.edu> wrote:
folks

 I couldn't call on several raised hands today. If you remember your questions, feel free to ask me by email and
I will respond. 
(and if I think the question is of general interest I will post my answer to the blog too)

rao



Tuesday, November 8, 2011

Complications and the Brazil reference..

In case you wondered about the Brazil reference..

rao

Re: questions about today's class


True value of the item for whom? Reserve price is the minimum price at which the seller is willing to part with the item. 

The whole point of selling is  that the item being sold is hopefully valued higher by the buyer than by the seller.

The tender-based auction you describe below is a first price auction. For first price auction, bidding by true values (i.e., submitting a tender that truly reflect what the max price you are willing to pay for it) is not a dominant strategy and is thus not encouraged. [If you have trouble coming to terms with it, remember that   doing the bidding in an ascending or descending price open auction gives the fairest price to the item in the long run. Second price auction emulates it using sealed bids]

Rao

ps: You have to first realize that there is no such thing as "true value" of an item independent of the buyers. The value of an item depends what what buyers are willing to pay for it (in other words, the worth of an item is what the market bears for it). an appraised value is basically only an approximation with some assumptions about buyer population. 

On Tue, Nov 8, 2011 at 7:46 PM, shubhendra singh <Shubhendra.Singh@asu.edu> wrote:
If there is a reserve price, isnt it indicating a true value of item? Like in India the tenders are filled in same way (closed sealed, ignoring corruption for a while). The governments sets the reserve price, people bid and the one who is willing to pay highest price is given the project, with the quotation price he gave. 
 
Shubhendra Singh


From: Subbarao Kambhampati <rao@asu.edu>
To: shubhendra singh <Shubhendra.Singh@asu.edu>
Sent: Tuesday, 8 November 2011 7:28 PM
Subject: Re: questions about today's class

I assume you are talking about "reserve price"--i.e., a price below which the you don't want to sell the item.

The way to handle reserve prices is to assume that the seller (in the case of search ads, the search enginer). is also bidding--with the minimum (reserve) prices. 

If the seller wins, then basically that means there are no bids from others that the seller is willing to take (and so basically the item is not sold).

Rao


On Tue, Nov 8, 2011 at 7:23 PM, shubhendra singh <Shubhendra.Singh@asu.edu> wrote:
In vickrey second auction, is minimum pricing of an item done? I mean for lets say query/keywords, we know the number of hits, we (thinking of we as search engine), we already know how important this query is, Do we decide the minimum quotation for it? 
 
Shubhendra Singh


From: Subbarao Kambhampati <rao@asu.edu>
To: Rao Kambhampati <rao@asu.edu>
Sent: Tuesday, 8 November 2011 7:10 PM
Subject: questions about today's class

folks

 I couldn't call on several raised hands today. If you remember your questions, feel free to ask me by email and
I will respond. 
(and if I think the question is of general interest I will post my answer to the blog too)

rao







Re: questions about today's class

I assume you are talking about "reserve price"--i.e., a price below which the you don't want to sell the item.

The way to handle reserve prices is to assume that the seller (in the case of search ads, the search enginer). is also bidding--with the minimum (reserve) prices. 

If the seller wins, then basically that means there are no bids from others that the seller is willing to take (and so basically the item is not sold).

Rao


On Tue, Nov 8, 2011 at 7:23 PM, shubhendra singh <Shubhendra.Singh@asu.edu> wrote:
In vickrey second auction, is minimum pricing of an item done? I mean for lets say query/keywords, we know the number of hits, we (thinking of we as search engine), we already know how important this query is, Do we decide the minimum quotation for it? 
 
Shubhendra Singh


From: Subbarao Kambhampati <rao@asu.edu>
To: Rao Kambhampati <rao@asu.edu>
Sent: Tuesday, 8 November 2011 7:10 PM
Subject: questions about today's class

folks

 I couldn't call on several raised hands today. If you remember your questions, feel free to ask me by email and
I will respond. 
(and if I think the question is of general interest I will post my answer to the blog too)

rao




questions about today's class

folks

 I couldn't call on several raised hands today. If you remember your questions, feel free to ask me by email and
I will respond. 
(and if I think the question is of general interest I will post my answer to the blog too)

rao

[CSE 494] Homework 4, question 1 typo

HW4 question 1 says:

Show the cluster dissimilarity measure (which is defined as the sum of the similarities of docs from their respective cluster centers).

It should read “the sum of the dissimilarities.

 

Thanks to the two of you who pointed this out.

 

Thanks and Regards,

Sushovan De

 

Monday, November 7, 2011

Added an *optional* homework on social networks as promised

It is in the homeworks list--note that this is *OPTIONAL* and will be counted as extra credit if you choose to do it (you will have to view the lectures I had sent earlier to 
make sense of the questions).

The homework will be due by the last day of the class. 

Rao

ps: Note that there will be a *required* homework that is also going to be due on that day. So if you want to do this one, it would be wiser to allocate time early on.


Midterm solutions...

Sorry for the delay in getting these to you.. I gave the undergraduates a chance to do the midterm again at home (as an extra homework) and I was waiting for them to 
be done before sending these.

Rao

Sunday, November 6, 2011

Current cumulatives

Folks:

 I am attaching the current cumulatives. I scaled the homework1 to 5points homework 2 to 6pt each, the google paper to 2pt--so a total of 11pt, the midterm to 20pt, the pop quiz to 2pt,
the projects to 10pt--thus a total of 44pt. 

The extra credit points are kept separate from the main cumulative.

If you see any entry errors, please let us know.

Rao

Friday, November 4, 2011

Google's Timelier Search Results

Google has tweaked its search algorithm to present results which are as recent as possible for a particular query. You can read more about this here http://www.informationweek.com/news/development/web/231902412

Preethi

Social networks lectures..

Folks:

 Because I decided to add lecture topics on Map-reduce and search advertising, I will have to skip one of the topics I normally cover. I reluctantly decided that it would be "Social Networks"--mostly because it is a self-contained 2-class topic for which video lectures from a previous offering are available (see below).

In addition to being quite engaging, this topic also comes bundled with a discussion on power laws and their importance. 

I would encourage you to watch the video lectures. I will also give optional homework questions on the topic. If you choose to do them, you can get extra credit.

Rao





===========
Social networks and their applications on the Web
  • L14 Audio of [March 5, 2010] (Video of the lecture video part 1 (the first 1hr 5min 4gb) and video part 2 (the remaining 10+ min) Efficient computation of pagerank (and how it is important to not represent the M* matrix explicitly given that it is not sparse); doing block-based pagerank iteration to avoid thrashing, the use of asynchronous pagerank iteration to improve convergence. 
    [starts at 25min] Social networks start. Connections between link-analysis and the general field of social network analysis. Some iconic examples of social network analysis (Typhoid Mary, Patient Zero, Web graph (and its small-world nature), Offcial florida ballot viral spread, Saddam capture, Aardvark acquisition). Applications of social networks. Graph-based representation and analysis of social entworks. Measures of influence and centrality. Smallworld phenomena--and their examples in kevin bacon game and erdos number.
  • L15 Audio of [March 9, 2010] (Video of the lecture video part 1 (the first 1hr 5min 4gb) and video part 2 (the remaining 10+ min) Milgram experiment; six-degrees of separation; (uniform) random networks and their properties; realizing that the small world probability increases sharply to 1 right near k=1; where k is the average (expected) degree of the random network. If human networks are (uniform) random, then they will have small-world phenomena (since k, i.e., average number of friends per person, is almost always greater than 1). Trying to confirm whether large-scale social networks are in fact uniform random by comparing their degree distribution to the Poisson degree distribution expected for random networks. Realizing that most real world network degree distributions instead correspond to negative sloping straightlines in log-log space (which means they are of the form P=1/k^r, which is called powerlaws. Discussion of the properties of power-law disributions (which have long tails that fall off only polynomially rather than exponentially). Implications of long tails on everything from probability of existence of such highly-linked sites as google to the ability of making money selling west wing DVDs and iranian classical music CDs on the web. Discussion of generative models which can result in power law distributions over network degrees.
  • L16 Audio of [March 11, 2010] (Video of the lecture video part 1 (the first 1hr; battery died after that :( ) Attacks vs. disruptions on powerlaw vs. exponential networks; navigation on social networks; applications of social networks; discussion of trust and reputation; trust rank (a page-rank variant); discussion of social search (and aardvark); discussion of othe powerlaws in cse494--zipf's law; heap's law (and even benford's law).

Tuesday, November 1, 2011

Bi-annual Grade Anxiety Amelioration Program

Folks

 I have been meaning to send this mail earlier than the course withdrawal deadline, but got side-tracked into various deadlines. Better late than never.. 

I wanted to reiterate something I had said at the beginning of the semester about letter grades in this course. 

As I said, I don't go with 95-->A+; 90-->A  sort of scale. So, you shouldn't worry too much based just on your
cumulatives. 

I also recognize that people who can't keep up with the demands of the course leave early on (or don't register to begin with--thanks to the miracle of internet-word-of-mouth)--thus I normally don't have the
same kind of bottom distribution as might be present in other classes. 

My general advice to anyone who has come up to this point in the course and has done all assignments, and is enjoying the 
course, is to de-stress (note the "de" not "di") and continue (unless you need absolute guarantee of an A+). Your personal mileage of course may vary.

Rao


ps: If it helps, here are the final cumulatives and the actual reported grades for a previous offering of this course. The first 
three are 471 and the rest are 598. This is for illustrative purposes.
 There are *no* implicit guarantees that if you get those points, you will get those grades. If you have other questions, feel free to 
send me email.





 
88.40A+
79.57A
62.83B

 
91.97A+
90.61A+
87.55A+
85.76A
85.11A
84.34A
84.00A
83.88A
83.85A
81.27A-
79.92B+
78.76B+
78.63B+
74.76B
69.64B
69.35B

You might want to read this easy to read paper on Netflix prize before next class...

http://rakaposhi.eas.asu.edu/cse494/lsi-for-collab-filtering.pdf

it is in the readings and tells you how collaborative filtering and LSI style techniques are combined. We will talk only briefly about it next class

rao

office hours cancelled (again!)

Folks

 Due to a conflict I can't hold office hours today. If you were planning to meet with me, you can talk to me anytime tomorrow morning.

sorry
rao

Thursday, October 27, 2011

Homework on clustering/classification/recommendation systems assigned

It will be due on nov 10th (2 weeks from today).

Rao

No office hours today

Sorry for the late notice but I am at lunch with the visitor and won't be available for office hours

Rao

midterm marks

Folks

 Here are the midterm marks listed according to posting id. I will return the graded exams in the class today.

Rao

Graded out of 65 points
Average for 494 students:  44
Standard dev:   11
Max: 57.5 Min: 27

Average for 598 students   46
Standard deviation 10.22
Max 63; Min 14


Undergrad
Midterm (65)
4257-016
51.5 0076-297
48.5 2980-612
7246-789
49 4620-483
40.5 7825-853
27 2875-514
31 7467-586
57.5 6692-410
54 1598-032
8943-657
7581-369
7137-693
GRAD
47.5 4171-660
33 0833-776
32.5 9676-570
46.5 3681-882
51.5 2635-502
50 2340-625
30.5 0196-819
61.5 8103-075
35 3613-980
47.5 9374-305
60 5365-064
53.5 5355-367
41 0233-710
46 5552-935
22.5 6370-935
37 8290-753
63 9661-342
45 2987-507
50 3633-513
56 6311-340
42.5 8207-143
52.5 1942-482
53 1864-476
40 2855-634
51.5 5497-814
52 9400-307
54 7074-143
44 7106-112
56 3236-450
58.5 8487-449
45 2671-439
14 6013-730
52.5 0062-742
34.5 6622-345
46 5146-793
59 9522-612
51 5068-787
35.5 4967-738
51.5 1847-585
45.5 6146-759
45 5929-040
45 4074-088
43.5 3661-462

Wednesday, October 26, 2011

[Reminder: Talk relevant to the last 1/3rd of CSE 494] Invited Seminar: From Data to Decisions (Ullas Nambiar, IBM IRL; 10/27 11AM; BY 420)



---------- Forwarded message ----------
From: Subbarao Kambhampati <rao@asu.edu>
Date: Tue, Oct 25, 2011 at 6:36 AM
Subject: Invited Seminar: From Data to Decisions (Ullas Nambiar, IBM IRL; 10/27 11AM; BY 420)
To: Rao Kambhampati <rao@asu.edu>


From Data to Decisions 

  Folks:
     This talk is relevant to the topics we will cover in the last 1/3rd of CSE494 course.  --Rao


                                                                       Dr. Ullas Nambiar
                                                                   IBM India Research Labs

 11AM. Thu, October 27, 2011.  BY 420  



Abstract: 

  Creating a single view of entities of interest  is increasingly becoming a requirement  for effective insight generation about   product/service quality and customer expectations. The explosive growth of data  both within an enterprise and outside (Web & Social Media) has made the task of creating a unified view challenging and interesting.  In this talk,  I will  show how Big Data impacts Single View Creation,  present use cases that show that the problem is not just that of business but even present in areas of common interest like better traffic management, public safety etc.  I will then give a deep dive on a solution that augments enterprise data by extracting relevant information from Web sources. 


Bio: 
Dr. Ullas Nambiar is a researcher in the Information Management &  Analytics group at IBM Research India.  He received a PhD in Computer Science from 2005  from ASU.  His research focusses on data integration, analytics and retrieval  over distributed heterogeneous sources. He is a Senior Member of ACM.  More details are available at  
www.research.ibm.com/people/u/ubnambiar


Tuesday, October 25, 2011

Invited Seminar: From Data to Decisions (Ullas Nambiar, IBM IRL; 10/27 11AM; BY 420)

From Data to Decisions 

                                                     Dr. Ullas Nambiar, IBM India Research Labs

 11AM. Thu, October 27, 2011.  BY 420  



Abstract: 

  Creating a single view of entities of interest  is increasingly becoming a requirement  for effective insight generation about   product/service quality and customer expectations. The explosive growth of data  both within an enterprise and outside (Web & Social Media) has made the task of creating a unified view challenging and interesting.  In this talk,  I will  show how Big Data impacts Single View Creation,  present use cases that show that the problem is not just that of business but even present in areas of common interest like better traffic management, public safety etc.  I will then give a deep dive on a solution that augments enterprise data by extracting relevant information from Web sources. 


Bio: 
Dr. Ullas Nambiar is a researcher in the Information Management &  Analytics group at IBM Research India.  He received a PhD in Computer Science from 2005  from ASU.  His research focusses on data integration, analytics and retrieval  over distributed heterogeneous sources. He is a Senior Member of ACM.  More details are available at  
www.research.ibm.com/people/u/ubnambiar

Saturday, October 22, 2011

[CSE 494] Clarification about project - A/H

A number of you seem to have some questions about the authorities and hubs computation in Project part 2. In short this is what you have to do:
 
Step 1: Find the top-k results from TF/IDF. Call that your "root set".
At this stage you have 10 documents.
 
Step 2: Find all the documents that the root set points to and is pointed by. Call that your "base set".
At this stage you will have, say, 80 documents.
 
Step 3: Create the adjacency matrix for these 80 documents.
The size of the adjacency matrix would be 80 x 80. You will need to make more calls to LinkAnalysis to populate this matrix.
 
Step 4: Create an initial authorities vector and an initial hubs vector. Then use the techniques from the slides to iteratively compute the next authorities and hubs values. Remember to normalize after every iteration. Repeat this until it converges.
You can test convergence by checking whether the sum of the squares of the differences between the current values and the previous values is less than some threshold you choose.
 
Step 5: Print out the top-N authorities and top-N hubs.
 
I hope this clears some of the confusion.
 
Thanks and Regards,
Sushovan De

Monday, October 17, 2011

*Please* do not spend time on the hidden slides..

I just had a couple of students who were confused because they tried to make sense of hidden slides in the ppt lecture notes.

As I mentioned multiple times, you should focus only on the non-hidden slides. Like junk DNA, the hidden slides may not always have
direct connection to what we discussed this time in the class (and in some cases may even be erroneous). 

If you are viewing slides on the computer, use the slideshow mode--which automatically skips over hidden slides. If you are printing, remember to
click-off the "hidden slides" radio button. 

regards
Rao

ps: on a different topic--you might want to note that the page-rank vector doesn't have to be normalized if you start with [1/n ... 1/n]' as the initial vector.
Multiplying a probability distribution with M*--a stochastic matrix--will give rise to another probability distribution (i..e, the entries will all be positive and will
add up to 1)

Graded homework 2 is available at Sushovan's desk

Homework 2 is now graded. In case you want to look at your graded work before the exam, you can pick it up from Sushovan's desk (he sits 
in the lab across from my office).

Otherwise, I will bring them to class either tomorrow or Thu.

Rao

Homework 2 - Solution for Question 3

The solution for HW2 – Q3 was missing from the solutions. Here it is:

Answer 1

td =

0.2500 0.5300 0.7500
0.7300 0.5000 0.2300

 To find the first term of the svd, we find the eigen vectors of td * td'
td * td' =

0.9059 0.6200
0.6200 0.8358

 <Eigen vector calculation here>

 Eigen vectors of td * td'
-0.7268

-0.6869

And

0.6869
-0.7268

The eigen values are 1.4918 and 0.2499

[Note: these might be negative also, that is a valid solution too]

Therefore, the first term of the svd is

-0.7268 0.6869
-0.6869 -0.7268

The second term of the svd is the square root of the eigen values found above

1.2214 0 0
0 0.4999 0

To find the third term of the svd, we find the eigen vectors of td' * td

Td' * td =
0.5954 0.4975 0.3554

0.4975 0.5309 0.5125
0.3554 0.5125 0.6154

<eigen computation>

Eigen vectors are
First eigen vector:
0.5593

0.5965
0.5756

Second eigen vector:
-0.7179

0.0013
0.6962

Third eigen vector:
-0.4146

0.8026
-0.4290

Therefore the third and last term in the svd is
-0.5593 0.7179 -0.4146

-0.5965 -0.0013 0.8026
-0.5756 -0.6962 -0.4290

 <But we did not find the right signs for the matrix here, so we need to find this matrix in a different way, (see recent mail sent by Dr Rao)>

 Answer 2

After removing the lesser of the two eigen values, your s matrix becomes:

1.2214 0 0
0      0 0

 

Then to recreate the matrix, you multiply u * s * v'

0.4965 0.5296 0.5110
0.4692 0.5005 0.4829

Does it look close to the original? In my opinion, NO, it does not. We did eliminate a significant part of the variance.

(Also accept a YES answer if it claims that if you scale it properly, you would end up sort of where the original documents were.)

Answer 3

The query vector is q = [1, 0]

In the factor space, it would be q * tf

qf = -0.7268 -0.6869

The documents in the factor space are:

D1: -0.6831 0.3589
D2: -0.7286 -0.0006
D3: -0.7031 -0.3481

The similarities are:
sim(q,D1) = 0.3239

sim(q,D2) = 0.7274
sim(q,D3) = 0.9561

Answer 4

Before the transformation:

 

After the transformation

 

Clearly, after the transformation, the documents are easily separable using only one axis (the y-axis). So we can get rid of the x-axis and still differentiate between them

Answer 5

The new values added is a linear combination of the previous keywords, (0.5 * k1 + 0.5 * k2).

It will have absolutely no impact upon the calculations at all

It tells us that svd manages to find the real dimensionality of the data

 

Thanks and Regards,
Sushovan De

additional office hours for mid-term: 4-5:30pm today (Monday)

I will be available for questions between 4 to 5:30pm today.

Also, tomorrow, the office hours will be from 2:30 to 3:30.

rao

Thursday, October 13, 2011

Precision-Recall question in Homework 1

In the Precision-Recall question of homework 1, the expected answer is this:

 

 

In the slides, the precision recall curve was plotted at specific recall values (0, 0.1, 0.2, etc.). Some of you misinterpreted that to mean “You should plot the precision-recall curve at only those points at which a relevant document is retrieved.” That is not true in general.

 

Thanks and Regards,

Sushovan De

 

page synopses ==> page snippets..

Apparently some of you are confused about what the "page synopsis" means--I meant the short description that accompanies each result. 
They are also called snippets.

In the following, the synopses/snippets are the text in black


  1. Writing a Synopsis

    The synopsis. I've placed some great synopsis how-to links and books below. Don't miss the sample synopsis page link. Some Books to Help. Using the links ...
  2. Tips About How To Write a Synopsis from Fiction Writer's Connection -

    Once you already have an agent and you are discussing future projects, you can present your ideas in this one-page synopsis format for your agent to look at ...
  3. How to Write a One-Page Synopsis or Treatment | eHow.com

    Feb 16, 2009 – How to Write a One-Page Synopsis or Treatment. If you are trying to sell a screenplay, you will often be asked to submit a "one-sheet" or one ...