Monday, October 17, 2011

Homework 2 - Solution for Question 3

The solution for HW2 – Q3 was missing from the solutions. Here it is:

Answer 1

td =

0.2500 0.5300 0.7500
0.7300 0.5000 0.2300

 To find the first term of the svd, we find the eigen vectors of td * td'
td * td' =

0.9059 0.6200
0.6200 0.8358

 <Eigen vector calculation here>

 Eigen vectors of td * td'
-0.7268

-0.6869

And

0.6869
-0.7268

The eigen values are 1.4918 and 0.2499

[Note: these might be negative also, that is a valid solution too]

Therefore, the first term of the svd is

-0.7268 0.6869
-0.6869 -0.7268

The second term of the svd is the square root of the eigen values found above

1.2214 0 0
0 0.4999 0

To find the third term of the svd, we find the eigen vectors of td' * td

Td' * td =
0.5954 0.4975 0.3554

0.4975 0.5309 0.5125
0.3554 0.5125 0.6154

<eigen computation>

Eigen vectors are
First eigen vector:
0.5593

0.5965
0.5756

Second eigen vector:
-0.7179

0.0013
0.6962

Third eigen vector:
-0.4146

0.8026
-0.4290

Therefore the third and last term in the svd is
-0.5593 0.7179 -0.4146

-0.5965 -0.0013 0.8026
-0.5756 -0.6962 -0.4290

 <But we did not find the right signs for the matrix here, so we need to find this matrix in a different way, (see recent mail sent by Dr Rao)>

 Answer 2

After removing the lesser of the two eigen values, your s matrix becomes:

1.2214 0 0
0      0 0

 

Then to recreate the matrix, you multiply u * s * v'

0.4965 0.5296 0.5110
0.4692 0.5005 0.4829

Does it look close to the original? In my opinion, NO, it does not. We did eliminate a significant part of the variance.

(Also accept a YES answer if it claims that if you scale it properly, you would end up sort of where the original documents were.)

Answer 3

The query vector is q = [1, 0]

In the factor space, it would be q * tf

qf = -0.7268 -0.6869

The documents in the factor space are:

D1: -0.6831 0.3589
D2: -0.7286 -0.0006
D3: -0.7031 -0.3481

The similarities are:
sim(q,D1) = 0.3239

sim(q,D2) = 0.7274
sim(q,D3) = 0.9561

Answer 4

Before the transformation:

 

After the transformation

 

Clearly, after the transformation, the documents are easily separable using only one axis (the y-axis). So we can get rid of the x-axis and still differentiate between them

Answer 5

The new values added is a linear combination of the previous keywords, (0.5 * k1 + 0.5 * k2).

It will have absolutely no impact upon the calculations at all

It tells us that svd manages to find the real dimensionality of the data

 

Thanks and Regards,
Sushovan De

No comments:

Post a Comment