The solution for HW2 – Q3 was missing from the solutions. Here it is:
Answer 1
td =
0.2500 0.5300 0.7500
0.7300 0.5000 0.2300
To find the first term of the svd, we find the eigen vectors of td * td'
td * td' =
0.9059 0.6200
0.6200 0.8358
<Eigen vector calculation here>
Eigen vectors of td * td'
-0.7268
-0.6869
And
0.6869
-0.7268
The eigen values are 1.4918 and 0.2499
[Note: these might be negative also, that is a valid solution too]
Therefore, the first term of the svd is
-0.7268 0.6869
-0.6869 -0.7268
The second term of the svd is the square root of the eigen values found above
1.2214 0 0
0 0.4999 0
To find the third term of the svd, we find the eigen vectors of td' * td
Td' * td =
0.5954 0.4975 0.3554
0.4975 0.5309 0.5125
0.3554 0.5125 0.6154
<eigen computation>
Eigen vectors are
First eigen vector:
0.5593
0.5965
0.5756
Second eigen vector:
-0.7179
0.0013
0.6962
Third eigen vector:
-0.4146
0.8026
-0.4290
Therefore the third and last term in the svd is
-0.5593 0.7179 -0.4146
-0.5965 -0.0013 0.8026
-0.5756 -0.6962 -0.4290
<But we did not find the right signs for the matrix here, so we need to find this matrix in a different way, (see recent mail sent by Dr Rao)>
Answer 2
After removing the lesser of the two eigen values, your s matrix becomes:
1.2214 0 0
0 0 0
Then to recreate the matrix, you multiply u * s * v'
0.4965 0.5296 0.5110
0.4692 0.5005 0.4829
Does it look close to the original? In my opinion, NO, it does not. We did eliminate a significant part of the variance.
(Also accept a YES answer if it claims that if you scale it properly, you would end up sort of where the original documents were.)
Answer 3
The query vector is q = [1, 0]
In the factor space, it would be q * tf
qf = -0.7268 -0.6869
The documents in the factor space are:
D1: -0.6831 0.3589
D2: -0.7286 -0.0006
D3: -0.7031 -0.3481
The similarities are:
sim(q,D1) = 0.3239
sim(q,D2) = 0.7274
sim(q,D3) = 0.9561
Answer 4
Before the transformation:
After the transformation
Clearly, after the transformation, the documents are easily separable using only one axis (the y-axis). So we can get rid of the x-axis and still differentiate between them
Answer 5
The new values added is a linear combination of the previous keywords, (0.5 * k1 + 0.5 * k2).
It will have absolutely no impact upon the calculations at all
It tells us that svd manages to find the real dimensionality of the data
Thanks and Regards,
Sushovan De
No comments:
Post a Comment