Explain Learning Vector Quantization Algorithm?

13views

written 7.4 years ago by

• modified 7.4 years ago

One of the most frequently used unsupervised clustering algorithms is the learning vector quantizer (LVQ) developed by Kohonen. While several versions of LVQ exist.

Ripley defined clustering algorithms as those algorithms where the purpose is to divide a set on $n$ observations into $m$ groups such that members of the same group are more alike than members of different groups. The aim of a clustering algorithm is therefore to construct clusters of similar input vectors (patterns), where similarity is usually measured in terms of Euclidean distance. LVQ-I performs such clustering. The training process of LVQ-I to construct clusters is based on competition. Referring to Figure, each output unit $ok$ represents a single cluster. The competition is among the cluster output units. During training, the cluster unit whose weight vector is the “closest” to the current input pattern is declared as the winner. The corresponding weight vector and that of neighboring units are then adjusted to better resemble the input pattern. The “closeness” of an input pattern to a weight vector is usually measured using the Euclidean distance. The weight update is given as $$\Delta u_{ki}(t)={\bigg{^{\eta(t)[z_{i,p}-u_{ki}(t-1)]}_0}$$ if $$\kappa \in \kappa_{k,p}(t)$$ Otherwise (4.19)

where $η(t)$ is a decaying learning rate, and $κk,p(t)$ is the set of neighbors of the winning cluster unit $ok$ for pattern $p$. It is, of course, not strictly necessary that **LVQ-I** makes use of a neighborhood function, thereby updating only the weights of the winning output unit. \lt/centre\gt![enter image description here][1]\lt/centre\gt An illustration of clustering, as done by LVQ-I, is given in Figure 4.2. The input space, defined by two input units $z_1$ and $z_2$, is represented in Figure 4.2(a), while Figure 4.2(b) illustrates the LVQ-I network architecture required to form the clusters. Note that although only three classes exist, four output units are necessary – one for each cluster. Less output units will lead to errors since patterns of different classes will be grouped in the same cluster, while too many clusters may cause overfitting. For the problem illustrated in Figure 4.2(a), an additional cluster unit may cause a separate cluster to learn the single $×$ in cluster 4. The Kohonen LVQ-I algorithm is summarized in Algorithm 4.2. For the LVQ-I, weights are either initialized to random values, sampled from a uniform distribution, or by **4.4 Learning Vector Quantizer-I** **Algorithm** 4.2 Learning Vector Quantizer-I Training Algorithm Initialize the network weights, the learning rate, and the neighborhood radius; **while** $\text{stopping condition(s) not true do for each pattern p do}$ $\hspace{2cm}{\text{Compute the Euclidean distance}, d_{k,p}, \text{between input vector} z_p \text{and each weight vector} (u_k = (u_{k1},u_{k2},•••,u_{KI}) as}$ $$d_{k,p}(z_p,u_k)=\sqrt{\sum^I_{i=1}(z_{i,p}-u_{ki})^2} (4.20)$$

$\hspace{2cm}{\text{Find the output unit ok for which the distance} d_{k,p} \text{is the smallest;} \\ \text{Update all the weights for the neighborhood} κ_{k,p} \text{using equation}(4.19);}$

$\hspace{1.5cm}{end \\ \text{Update the learning rate;} \\ \text{Reduce the neighborhood radius at specified learning iterations;}}$

end

taking the first input patterns as the initial weight vectors. For the example in Figure 4.2(b), the latter will result in the weights $u_{11} = z_{1,1},u_{12} = z_{2,1},u_{21} = z_{1,2},u_{22} = z_{2,2},$ etc.

Stopping conditions may be

a maximum number of epochs is reached,
stop when weight adjustments are sufficiently small,
a small enough quantization error has been reached, where the quantization error is defined as

$$Q_T=\frac{\sum^{P_T}_{p=1}||z_p-u_k||^2_2}{P_T} (4.21)$$

One problem that may occur in LVQ networks is that one cluster unit may dominate as the winning cluster unit. The danger of such a scenario is that most patterns will be in one cluster. To prevent one output unit from dominating, a “conscience” factor is incorporated in a function to determine the winning output unit. The conscience factor penalizes an output for winning too many times. The activation value of output units is calculated using

$k,p=1 for \min \forall k \{dk,p(zp,uk)-bk(t)\}(4.22) \\ \hspace{1cm}{0 \ \ \text{otherwise}}$

where $\hspace{8cm}$ (4.23)

and $g_k(t)=g_k(t-1)+\beta(0_{kp}-g_k(t-1)) \hspace{7cm} (4.24)$

In the above, $d_{k,p}$ is the Euclidean distance as defined in equation (4.20), I is the total number of input units, and $g_k(0) = 0$. Thus, $b_k(0) = I^1$, which initially gives each output unit an equal chance to be the winner; $b_k(t)$ is the conscience factor defined for each output unit. The more an output unit wins, the larger the value of $g_k(t)$ becomes, and $b_k(t)$ becomes larger negative. Consequently, a factor $|b_k(t)|$ is added to the distance $d_{k,p}$. Usually, for normalized inputs, $β = 0.0001$ and $γ = 10$.

ADD COMMENT EDIT