0
7.7kviews
Explain with related equation (a) short-Time Energy (b) short-Time zero crossing rates

Subject: Speech Processing

Topic: Speech Analysis in Time Domain

Difficulty: Low

1 Answer
2
276views

(i) The short-time processing technique (in both time and frequency) produce parameter signals of the form

$ U(n) = \sum_{m=-\infty}^{\infty} T[S(m)] W(n-m) $ .....(1)

where, S(n): Speech signals

U(n): Non-zero value

W(n): Window function

U(n) corresponds to short-time energy or amplitude if T in eq.(1) is squaring or absolute magnitude operation,

$ E_n(n) = \sum_{m-w}^{w} [S(m)W(n-m)]^2 \\ = \sum_{m-w}^{w} S^2(m)W^2(n-m) \\ = \sum_{m-w}^{w} S^2(m).h(n-m) \\ = S^2(m).h(m) \\ M(n) = \sum_{m-w}^{w} |S(m)|W(n-m) \\ = |S(m)|W(m) $

Short-time Energy Measurement

(ii) Squaring of the signal to calculate energy would emphasize high amplitudes.

(iii) Magnitude (amplitude) measurements does not emphasize amplitudes and are simple to calculate.

(iv) Such a measurements help the speech to be segmented into smaller phonetic units.

(v) Voiced and unvoiced speech can be told part due to the large variation in amplitude. The amplitude of unvoiced segments is not high as the amplitude of voice segments.

Short-time energy, short-time average magnitude


Short-Time Average Zero Crossing Rate (ZCR)

(i) The zero-Crossing Rate (ZCR) provides a good spectral information in a cost effective way.

(ii) In speech signals S(n), zero-crossings occurs when S(n) = 0, i.e, when the waveform crosses the time reference axis or changes sign.

(iii) ZCR (in zero crossings) is an authentic spectral measure for narrow-band signals (eg. sinusoids), a sinusoid has two zero crossings/periods, i.e. $F_0 = \frac{ZCR}{2}$

(iv) Whereas, for discrete-time signals with ZCR in zero-crossings/sample, $ F_0 = \frac{(ZCR)(FS)}{2} $, for F$_s$ sample.

(v) The ZCR can be defined as U(n) in eq., with T[S(n)] = 0.5|sgn[S(m)] - sgn[S(m-1)]|, where the algebraic sign of S(n) is given in eq. and W(n) is a rectangular window scaled by $\frac{1}{N}$ as given in eq., would yeild zero-crossings/sample, or by $\frac{FS}{N}$ to yield zero-crossings.

(vi) An appropriate way of defining zero crossings is $ Z(n) = \sum_{m-w}{w} |sgn[S(m)] - sgn[S(m-1)]| w(n-m) $

sgn[S(m)] = $ \begin{cases} 1 & S(m) \geq 0 \\ -1 & S(m) \lt 0 \end{cases} $

$ w(m) = \frac{1}{2N} \,\,\,\,\, 0 \leq m \lt N-1 $

(vii) The ZCR varies slowly with the corresponding vocal tract movements hence U(n) can be subjected to heavy decimation.

Block diagram of short-time average zero crossing detection system

Please log in to add an answer.