function be considered a similarity function or whether there are some minimum disiderata.
You should keep in mind that similarity is really a measure of "distance"--the more distant two
objects are from each other, the less similar they are.
This brings us to desirable properties of distance (and thus similarity) metrics.
The typical required properties of a distance metric are given below (from wikipedia entry).
You want to check to see if each of the metrics we have seen until now--jaccard, euclidean, cosine-theta--satisfy all these properties.
In the literature, people do consider "distance" functions that do not satisfy all these properties. Typically, the first one to be sacrificed is
triangle inequality property. Sometimes even the symmetry property is given up. An example of a distance metric that is asymmetric is
the distance metric between two probability distributions--called KL-divergence (see http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence )
A metric on a set X is a function (called the distance function or simply distance)
d : X × X → R
(where R is the set of real numbers). For all x, y, z in X, this function is required to satisfy the following conditions:
- d(x, y) ≥ 0 (non-negativity)
- d(x, y) = 0 if and only if x = y (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
- d(x, y) = d(y, x) (symmetry)
- d(x, z) ≤ d(x, y) + d(y, z) (subadditivity / triangle inequality).
Rao
The Jaccard, Euclidean and Cosine theta similarity metrics are checked if they satisfy the required four conditions as above. Following are the observations:
ReplyDeleteJaccard Euclidean CosineTheta
d(x, y) = 0 Yes Yes Yes
d(x, y) = 0 if and only if x = y No Yes No
d(x, y) = d(y, x) (symmetry) Yes Yes Yes
d(x, z) = d(x, y) + d(y, z) No Yes No
Please let me know if there is a mistake.
-Rashmi Dubey