0
2.7kviews
Write a short note on vector based matching.
0
2views

Vector-Based Matching

As stated above, the Boolean retrieval systems cannot include frequency term data. Apart from including frequency term data, there are other deficiencies in Boolean retrieval system which have led to the development of alternative models of retrieval. Vector model-based SMART system is one among them.

With the vector model of retrieval, the relevance or similarity measurement is based on the idea of distance or angular measure. If it is based on distance, it means that the documents close together in the vector space are likely to be highly similar, while if it is based on an angular measure, it means that the documents 'in the same direction' are closely related. The SMART system uses primarily an angular measure.

1. Metrics

Metrics are dissimilarity measures. The fact that all metrics have the property that the measure (distance) of a document with respect to itself is 0, makes them unsuitable to be used as similarity measures in which the high values represent the documents that are similar. Therefore, metrics are referred as dissimilarity measures. To convert or transform a metric into a similarity measure, transformation technique can be applied. But, if one is ready to accept 0 as measure of maximum similarity, then there is no need to apply the transformation.

Converting a metric into a measure linearly makes it dependent on the value of a constant chosen, For example, if a metric $\mu$ is transformed into a similarity measure $\sigma$ linearly then it is defined.

$\sigma = k - \mu$

where k is a constant and has a fixed value.

If the two documents are identical, the similarity value is k and $\mu=0$. $\sigma$ becomes negative for documents at a distance greater than k from query irrespective of the value of k. Thus, it can interpreted as follows: The document which has positive similarity is more relevant to the query, while a document which has negative similarity is not relevant to the query. Thus, the measure is dependent on the value of constant k which is not desirable.

Another method to transform a metric into a measure is by using an inversion transform that maps the distance into a fixed positive range of numbers, say (0,1), that is, value greater than 0 and less than or equal to 1. A simple transformation to can be

$\sigma=b^{-\mu}$

where b has a fixed value such that $b\gt1$, for example $b=2$ or $b=e$, the base of the natural logarithms. It shows a sharp peak at $\mu=0$, gradually sloping away towards 0 as $\mu$ becomes larger.

More complex transforms with different properties include

$\sigma=b^{-\mu^2}$

These transforms provide a measure that is flat at $\mu=0$ and gradually decreases towards 0 as $\mu$ increases

Figure 1: Similarity calculated from (a) distance $\sigma=b^{-\mu}$ and (b) distance squared $\sigma=b^{-\mu^2}$