Acoustic & Speech

Investigates various Speech signal processing schemes for acoustic modeling so that more robust speech recognition can be achieved. Our aim is to perform the state-of-art research providing effective means for achieving:

 Endpoint Detection(EPD) for isolated word detection

작성자 관리자 날짜 2021-04-08 20:59:00 조회수 1

Endpoint Detection(EPD) for isolated word detection


1. Introduction

2. The Reason Why

3. Algorithm


1. Introduction        

An important problem in speech processing is to detect the presence of speech in a background of noise. This problem is often referred to as the endpoint location problem [1]. The accurate detection of a word's start and end points means that subsequent processing of the data can be kept to a minimum.


2. The Reason Why       

1. A major cause of errors in isolated-word automatic speech recognition systems is the inaccurazte detection of the beginning and ending boundaries of test and reference patterns[2]. It is essential for automatic speech recognition algorithms that speech segments be reliably separated from nonspeech.

2. The reason for requiring an effective endpointing algorithm is that the computation for processing the speech is minimum when the endpoints are accurately located[3].


3. Algorithm       

  3.1 Requirements of the algorithm

  •  Reliability & Robustness : The automatic unsupervised endpointer must be reliable, i.e., robust enough to avoid misclassification in difficult working conditions such as varying signal to noise ratio, variable loudness etc[3].
  • Accuracy : Missing weak segments such as followings is not admissible
    1. Begin or end of the word with low-energy phonemes (weak fricatives).
    2. Unvoiced plosive.
    3. A nasal.
    4. A short breath, smack, or lib noise. 
  • Adaptivity : The algorithm must be adaptive to be able to cope with changing environments, especially the variable background noise.
  • Simplicity : For example, algorithms that handle difficult conditions such as telephony speech are usually very complex. Simplicity is another desired feature of special significance when the algorithm is intended as a part of the recognizer.
  • Real_time processing : Real-time processing is also desired and is only possible if the algorithm is not complex.
  • No a priori knowledge of noise : No a priori knowledge of noise is required in an ideal algorithm which must be able to cope with variable signal to noise ratio.

  3.2 Example of a simple endpoint detection algorithm

We will introduce the commonly used and algorithm proposed in [1]. In addition, this algorithm uses two measures of the signal - the energy and the zero crossing rate.

Three thresholds are computed:

  • ITU - Upper energy threshold.
  • ITL - Lower energy threshold.
  • IZCT - Zero crossings rate threshold.


For more information on how these are computed, see Rabiner and Sambur [1].The method proceeds as follows. Search from the beginning until the energy crosses ITU. Then backoff towards the signal beginning until the first point at which the energy falls below ITL is reached. This is the provisional beginning point - N1. N2 (the end point) is selected in a similar way. For the beginning point, now examine the previous 250ms of the signal's zero-crossing rate. If this measure exceeds the IZCT threshold 3 or more times, N1 is moved to the first point at which the IZCT threshold is exceeded. N1 is defined as the beginning point. Again, perform a similar method for the end point N2.

For a more detailed explanation of the algorithm refer to Rabiner and Sambur [1].

Note: For the algorithm to perform correctly, the first 100ms of the speech signal must contain no speech.


[1] Rabiner, L.R. and Sambur, M.R., "An Algorithm for Determining the Endpoints of Isolated Utterances". The Bell System Technical Journal, Vol. 54, No. 2, February 1975, pp. 297-315.
[2] Jean-Claude Junqua, Brian Mak and Ben Reaves, "A Robust Algorithm for Word Boundary Detection in the Presence of Noise". IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, Vol. 2, No. 3, July 1997, pp. 406-412.
[3] M.H. Savoji, "Endpointing of Speech Signals". Speech Communication,Vol. 8, No. 1, March 1989, pp.46-60


댓글 (0)

등록된 댓글이 없습니다.
작성 권한이 없습니다.