The solution for HW2 – Q3 was missing from the solutions. Here it is:

# Answer 1

td =

0.2500 0.5300 0.7500

0.7300 0.5000 0.2300

To find the first term of the svd, we find the eigen vectors of td * td'

td * td' =

0.9059 0.6200

0.6200 0.8358

<Eigen vector calculation here>

Eigen vectors of td * td'

-0.7268

-0.6869

And

0.6869

-0.7268

The eigen values are 1.4918 and 0.2499

[Note: these might be negative also, that is a valid solution too]

Therefore, the first term of the svd is

-0.7268 0.6869

-0.6869 -0.7268

The second term of the svd is the square root of the eigen values found above

1.2214 0 0

0 0.4999 0

To find the third term of the svd, we find the eigen vectors of td' * td

Td' * td =

0.5954 0.4975 0.3554

0.4975 0.5309 0.5125

0.3554 0.5125 0.6154

<eigen computation>

Eigen vectors are

First eigen vector:

0.5593

0.5965

0.5756

Second eigen vector:

-0.7179

0.0013

0.6962

Third eigen vector:

-0.4146

0.8026

-0.4290

Therefore the third and last term in the svd is

-0.5593 0.7179 -0.4146

-0.5965 -0.0013 0.8026

-0.5756 -0.6962 -0.4290

<But we did not find the right signs for the matrix here, so we need to find this matrix in a different way, (see recent mail sent by Dr Rao)>

# Answer 2

After removing the lesser of the two eigen values, your s matrix becomes:

1.2214 0 0

0 0 0

Then to recreate the matrix, you multiply u * s * v'

0.4965 0.5296 0.5110

0.4692 0.5005 0.4829

Does it look close to the original? In my opinion, NO, it does not. We did eliminate a significant part of the variance.

(Also accept a YES answer if it claims that if you scale it properly, you would end up sort of where the original documents were.)

# Answer 3

The query vector is q = [1, 0]

In the factor space, it would be q * tf

qf = -0.7268 -0.6869

The documents in the factor space are:

D1: -0.6831 0.3589

D2: -0.7286 -0.0006

D3: -0.7031 -0.3481

The similarities are:

sim(q,D1) = 0.3239

sim(q,D2) = 0.7274

sim(q,D3) = 0.9561

# Answer 4

Before the transformation:

After the transformation

Clearly, after the transformation, the documents are easily separable using only one axis (the y-axis). So we can get rid of the x-axis and still differentiate between them

## Answer 5

The new values added is a linear combination of the previous keywords, (0.5 * k1 + 0.5 * k2).

It will have absolutely no impact upon the calculations at all

It tells us that svd manages to find the real dimensionality of the data

Thanks and Regards,

Sushovan De

## No comments:

## Post a Comment