This is class blog for CSE494/598 at ASU. The class homepage is http://rakaposhi.eas.asu.edu/cse494
Thursday, October 27, 2011
Homework on clustering/classification/recommendation systems assigned
No office hours today
Rao
midterm marks
Wednesday, October 26, 2011
[Reminder: Talk relevant to the last 1/3rd of CSE 494] Invited Seminar: From Data to Decisions (Ullas Nambiar, IBM IRL; 10/27 11AM; BY 420)
From: Subbarao Kambhampati <rao@asu.edu>
Date: Tue, Oct 25, 2011 at 6:36 AM
Subject: Invited Seminar: From Data to Decisions (Ullas Nambiar, IBM IRL; 10/27 11AM; BY 420)
To: Rao Kambhampati <rao@asu.edu>
Abstract:
Creating a single view of entities of interest is increasingly becoming a requirement for effective insight generation about product/service quality and customer expectations. The explosive growth of data both within an enterprise and outside (Web & Social Media) has made the task of creating a unified view challenging and interesting. In this talk, I will show how Big Data impacts Single View Creation, present use cases that show that the problem is not just that of business but even present in areas of common interest like better traffic management, public safety etc. I will then give a deep dive on a solution that augments enterprise data by extracting relevant information from Web sources.
Bio:
Dr. Ullas Nambiar is a researcher in the Information Management & Analytics group at IBM Research India. He received a PhD in Computer Science from 2005 from ASU. His research focusses on data integration, analytics and retrieval over distributed heterogeneous sources. He is a Senior Member of ACM. More details are available at www.research.ibm.com/people/u/ubnambiar
Tuesday, October 25, 2011
Invited Seminar: From Data to Decisions (Ullas Nambiar, IBM IRL; 10/27 11AM; BY 420)
Abstract:
Creating a single view of entities of interest is increasingly becoming a requirement for effective insight generation about product/service quality and customer expectations. The explosive growth of data both within an enterprise and outside (Web & Social Media) has made the task of creating a unified view challenging and interesting. In this talk, I will show how Big Data impacts Single View Creation, present use cases that show that the problem is not just that of business but even present in areas of common interest like better traffic management, public safety etc. I will then give a deep dive on a solution that augments enterprise data by extracting relevant information from Web sources.
Bio:
Dr. Ullas Nambiar is a researcher in the Information Management & Analytics group at IBM Research India. He received a PhD in Computer Science from 2005 from ASU. His research focusses on data integration, analytics and retrieval over distributed heterogeneous sources. He is a Senior Member of ACM. More details are available at www.research.ibm.com/people/u/ubnambiar
Saturday, October 22, 2011
[CSE 494] Clarification about project - A/H
Sushovan De
Monday, October 17, 2011
*Please* do not spend time on the hidden slides..
Graded homework 2 is available at Sushovan's desk
Homework 2 - Solution for Question 3
The solution for HW2 – Q3 was missing from the solutions. Here it is:
Answer 1
td =
0.2500 0.5300 0.7500
0.7300 0.5000 0.2300
To find the first term of the svd, we find the eigen vectors of td * td'
td * td' =
0.9059 0.6200
0.6200 0.8358
<Eigen vector calculation here>
Eigen vectors of td * td'
-0.7268
-0.6869
And
0.6869
-0.7268
The eigen values are 1.4918 and 0.2499
[Note: these might be negative also, that is a valid solution too]
Therefore, the first term of the svd is
-0.7268 0.6869
-0.6869 -0.7268
The second term of the svd is the square root of the eigen values found above
1.2214 0 0
0 0.4999 0
To find the third term of the svd, we find the eigen vectors of td' * td
Td' * td =
0.5954 0.4975 0.3554
0.4975 0.5309 0.5125
0.3554 0.5125 0.6154
<eigen computation>
Eigen vectors are
First eigen vector:
0.5593
0.5965
0.5756
Second eigen vector:
-0.7179
0.0013
0.6962
Third eigen vector:
-0.4146
0.8026
-0.4290
Therefore the third and last term in the svd is
-0.5593 0.7179 -0.4146
-0.5965 -0.0013 0.8026
-0.5756 -0.6962 -0.4290
<But we did not find the right signs for the matrix here, so we need to find this matrix in a different way, (see recent mail sent by Dr Rao)>
Answer 2
After removing the lesser of the two eigen values, your s matrix becomes:
1.2214 0 0
0 0 0
Then to recreate the matrix, you multiply u * s * v'
0.4965 0.5296 0.5110
0.4692 0.5005 0.4829
Does it look close to the original? In my opinion, NO, it does not. We did eliminate a significant part of the variance.
(Also accept a YES answer if it claims that if you scale it properly, you would end up sort of where the original documents were.)
Answer 3
The query vector is q = [1, 0]
In the factor space, it would be q * tf
qf = -0.7268 -0.6869
The documents in the factor space are:
D1: -0.6831 0.3589
D2: -0.7286 -0.0006
D3: -0.7031 -0.3481
The similarities are:
sim(q,D1) = 0.3239
sim(q,D2) = 0.7274
sim(q,D3) = 0.9561
Answer 4
Before the transformation:
After the transformation
Clearly, after the transformation, the documents are easily separable using only one axis (the y-axis). So we can get rid of the x-axis and still differentiate between them
Answer 5
The new values added is a linear combination of the previous keywords, (0.5 * k1 + 0.5 * k2).
It will have absolutely no impact upon the calculations at all
It tells us that svd manages to find the real dimensionality of the data
Thanks and Regards,
Sushovan De
additional office hours for mid-term: 4-5:30pm today (Monday)
Thursday, October 13, 2011
Precision-Recall question in Homework 1
In the Precision-Recall question of homework 1, the expected answer is this:
In the slides, the precision recall curve was plotted at specific recall values (0, 0.1, 0.2, etc.). Some of you misinterpreted that to mean “You should plot the precision-recall curve at only those points at which a relevant document is retrieved.” That is not true in general.
Thanks and Regards,
Sushovan De
page synopses ==> page snippets..
-
Writing a Synopsis
charlottedillon.com/synopsis.htmlThe synopsis. I've placed some great synopsis how-to links and books below. Don't miss the sample synopsis page link. Some Books to Help. Using the links ...
-
Tips About How To Write a Synopsis from Fiction Writer's Connection -
www.fictionwriters.com/tips-synopsis.htmlOnce you already have an agent and you are discussing future projects, you can present your ideas in this one-page synopsis format for your agent to look at ...
-
How to Write a One-Page Synopsis or Treatment | eHow.com
Feb 16, 2009 – How to Write a One-Page Synopsis or Treatment. If you are trying to sell a screenplay, you will often be asked to submit a "one-sheet" or one ...
Wednesday, October 12, 2011
On getting the eigen vector signs to work out while computing SVD by hand
Tuesday, October 11, 2011
Midterm coverage
Saturday, October 8, 2011
practice exam with solutions
Thursday, October 6, 2011
Paper Reading assignment: Due 10/13 (Updated the homework page too)
- Homework 3 (or think of it as Homework 2 part b). Due 10/13 in class. (We might discuss your answers)
- ("Reading Comprehension") Read the paper "Anatomy of a large-scale hyper-textual search engine" which constains a description of Google search engine circa 1998 (i.e., before it became a company). Answer the following questions:
- What are Fancy hits?
- Why are there two types of barrels--the short and the long?
- How is indexing parallelized?
- How does Google show that it doesn't quite care about recall?
- How does Google avoid crawling the same URL multiple times?
- What are some of the memory saving things they do?
- Do they use TF/IDF directly or indirectly?
- Do they normalize the vectors? (why not?)
- Can they support proximity queries?
- How are "page synopses" made?
- List some of the parameters that need to be initialized for their search engine. Are the default values they pick reasonable?
Solutions for homework 2 are posted
Tuesday, October 4, 2011
(highly recommended) reading for next class: a chapter on map-reduce architectures..
A self-contained textbook chapter on just page-rank
Office hours shift again for today: 2:30--3:30
Monday, October 3, 2011
You may skip part 2 of the pagerank/a&h question on the homework..
on the self-link issue.. (qn 5 in homework)
So, for this problem, you have to consider these self-looping pages as non-sink pages (despite the obvious fact that you can't leave them ;-).