RESEARCH
Acoustic & Speech
Investigates various Speech signal processing schemes for acoustic modeling so that more robust speech recognition can be achieved. Our aim is to perform the state-of-art research providing effective means for achieving:
Investigates various Speech signal processing schemes for acoustic modeling so that more robust speech recognition can be achieved. Our aim is to perform the state-of-art research providing effective means for achieving:
Endpoint Detection(EPD) for isolated word detection Contents
1. Introduction An important problem in speech processing is to detect the presence of speech in a background of noise. This problem is often referred to as the endpoint location problem [1]. The accurate detection of a word's start and end points means that subsequent processing of the data can be kept to a minimum.
2. The Reason Why 1. A major cause of errors in isolated-word automatic speech recognition systems is the inaccurazte detection of the beginning and ending boundaries of test and reference patterns[2]. It is essential for automatic speech recognition algorithms that speech segments be reliably separated from nonspeech. 2. The reason for requiring an effective endpointing algorithm is that the computation for processing the speech is minimum when the endpoints are accurately located[3].
3. Algorithm 3.1 Requirements of the algorithm
3.2 Example of a simple endpoint detection algorithmWe will introduce the commonly used and algorithm proposed in [1]. In addition, this algorithm uses two measures of the signal - the energy and the zero crossing rate. Three thresholds are computed:
For more information on how these are computed, see Rabiner and Sambur [1].The method proceeds as follows. Search from the beginning until the energy crosses ITU. Then backoff towards the signal beginning until the first point at which the energy falls below ITL is reached. This is the provisional beginning point - N1. N2 (the end point) is selected in a similar way. For the beginning point, now examine the previous 250ms of the signal's zero-crossing rate. If this measure exceeds the IZCT threshold 3 or more times, N1 is moved to the first point at which the IZCT threshold is exceeded. N1 is defined as the beginning point. Again, perform a similar method for the end point N2. For a more detailed explanation of the algorithm refer to Rabiner and Sambur [1]. Note: For the algorithm to perform correctly, the first 100ms of the speech signal must contain no speech. References [1] Rabiner, L.R. and Sambur, M.R., "An Algorithm for Determining the Endpoints of Isolated Utterances". The Bell System Technical Journal, Vol. 54, No. 2, February 1975, pp. 297-315. |