Explain dynamic time-warping with regard to speech recognition

11views

written 5.9 years ago by

Dynamic Time Wrapping (DTW):

(i) We have to find the best possible warping of the time axis for one or both sequence for optimal comparison. The criterion to be used will be the minimization of global error. The problem is formulated as sequential optimization strategy in which the current estimate of the global error function is updated for each step.

(ii) Consider a problem of template matching for speech recognition. The warping function is required to minimize the overall cost function.

Plot of linear time warping

(iii) The major optimization to the DTW algorithm arise from observation on the nature of good paths through the gride.

(iv) The optimization of the DTW algorithm is based on different condition which can be summarized:

a) Monotonic condition - The function must be monotonic. The path is not allowed to turn back on itself, both i and j index either stay the same or increase; They will never decrease.

b) Continuity condition - The function must not skip. The path is advance to one stapes at a time.

c) Boundary condition - The function must the end point of the two templates. The path start at the bottom left and ends at the top right.

d) Adjustment window condition - A good path is unlikely to wander very far from the diagonal the distance that the path is allowed to wander length r.

e) Slope constraint - The path should not be too steep or too shallow.

The parallelogram defined by slope limit is time warping

(v) The search space is restricted to a parallelogram:

a) Start with the first point in each template. Compute the distance, which is cost function.

b) Now move on to next point in shorter template and find its cost function with the next point in the longer template. Further next point in the longer template. Find the minimum cost, starting from point (1,1). The next point in the warping function is fixed, the minimum cost point.

c) Go on traversing in the forward direction for each point in the shorter template till both the end point meet.

(vi) The global distance D(i,j) is computed as: D(i,j) = min[D(i+1,j+1), D(i+1,j), D(i,j+1)] + d(i,j)

The DTW algorithm is able to achieve following goals:

(i) The temporal integration of local distance between the acoustic frame can be effectively established.

(ii) The time variation for speech sound normalization.

(iii) In the case of continuous speech, the approach effectively segments the speech and there is no need for any explicit segmentation.

There are number of limitation of DTW based approach sequence recognition:

(i) Compression of templates requires end point detection and this can be error-prone with realization acoustic condition.

(ii) A strong mathematically structure required for global distance computation and minimization.

(iii) Since continuous speech is not just a concatenation of word, the mechanism to represent the context is required.

Process of DTW for matching two templates

ADD COMMENT EDIT