Acoustic & Speech

Investigates various Speech signal processing schemes for acoustic modeling so that more robust speech recognition can be achieved. Our aim is to perform the state-of-art research providing effective means for achieving:

Environmental Robustness

작성자 관리자 날짜 2021-04-08 21:17:06 조회수 184

Environmental Robustness


1. Acoustic environment

2. Single Channel Environment compensation

3. Multi Channel Environment compensation

4. Environmental model adaptation

5. Application Demos


(1) Additive noise

1) Stationary noise

2) Non-stationary noise

3) Lombard effect

4) Human auditory system's robustness

(2) Reverberation

- Other acoustic path by reflections of walls and other objects.

- Summation of time delayed and magnitude decayed terms

- Modeling via estimation of impulse response

(3) Model of environment

- Convolutional noise : channel distortion.

- Additive noise : background noise.


- Cocktail party effect.
- Selective focus on the interested sound source

- Try to increase vocal effort in the presence of background noise.
- Higher amplitude and pitch.

- Statistical properties change over time.
- Ex) door slams, radio, TV, companion voice, lip smack, breath, ...

- White noise : flat power spectrum, wide-band
- Colored noise : narrow-band such as pink noise
- Ex) computer fan, air conditioning, ...


(1) Noise suppression

1) Spectral subtraction[2]

2) Wiener filtering

3) Band Pass filter

          - Specific frequency noise reduction using FIR or IIR filter

4) Perceptual noise reduction [3]

          - Noise reduction method using human auditory characteristic (Masking effect).

- Assumed the speech to be corrupted by additive noise.
- Restored by subtraction the estimated noise component from the corrupted speech spectrum.


 Sample of Noise Reduction



(2) Compensation on feature domain

1) Cepstral Mean Subtraction(CMS)

2) Model based methods


- Remove convolutional distortions.
- Normalized by subtraction the mean of cepstrum.
- Real-time processing.


          1) Multichannel noise suppression

                     A) Fixed beamformer

                             - Delay & sum beamformer

                             - Superdirective beamformers

           B) Adaptive beamformer

                   - Generallized sidelobe canceller

                           a) fixed beamformer(FBF): forms a beam in the look direction so that the target signal is passed and all other signals are attenuated.

                           b) blocking matrix(BM):forms a null in the look direction so that the target signal is suppressed and all other noise signals are passed.

                           c) multiple input canceller(MC): generates replicas of components correlated with the noise interferences.




         2) Blind Source Separation

                            - An approach taken to estimate original source signals using only the information of mixed signals observed in each input channel.





(1) Retraining on corrupted speech

1) Using the noise waveform from new environment

2) Multistyle training


(2) Adaptation

1) MAP(Maximum A Posteriori) adaptation[4][5]

2) MLLR(Maximum Likelihood Linear Regression) adaptation[6]

(3) Parallel model combination[7]

- Obtain the distribution of noisy speech given distribution of clean speech and noise as mixture Gaussians.

- Combination in linear-spectral or log-spectral domain.

- Clean speech models and noise model are combined according to mismatch function.

- Estimate of the corrupted speech model is transformed back into cepstral domain.


- Use a set of linear regression transformation functions to map both mean and covariances in order to maximize the likelihood of the adaptation data.
- Reasonable performance with smaller amount data comparing to MAP.

- Modify the model parameters using limited new training data guided by the prior knowledge.
- Assume the prior density function for the mixture Gaussian HMM.
- Apply MLE Baum-Welch re-estimation.

5. Application Demos    

       Noise Suppressed ASR System Demo



       Noise Suppressied ASR Demo in Car Environments (Embedded Sys.)


      Multi-channel noise reduction based ASR System Demo




[1] X. Huang, A. Acero and H. Hon, Spoken Language Processing, Prentice Hall PTR, 2001.
[2] S. F. Boll, "Suppression of Acoustic Noise in Speech Using Spectral Subtraction," IEEE Trans. on ASSP, Vol.ASSP-27, No.2, pp.113-120, April 1979.
[3]R.M.Udrea and S.Ciochina, D.N.Vizireanu, ¡°Reduction of Background Noise from Affected Speech using a Spectral Subtraction Algorithm Based on Masking Properties of the Human Ear¡±, Telecommunications in Modern Satellite, Cable and Broadcasting Services, 2005. 7th International Conference on., pp. 135-138, 2005
[4] C.-H. Lee, C.-H. Lin and B.-H. Juang, "A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models," IEEE Trans. on Signal Processing, Vol.39, No.4, pp.806-814, April 1991.
[5] J.-L. Gauvain and C.-H. Lee, "Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains," IEEE Trans. on Speech and Audio Processing, Vol.2, No.2, April 1994.
[6] M. J. F Gales and P. C. Woodland, "Mean and variance adaptation within the MLLR framework," Computer Speech and Language, Vol.10, pp.249-264, 1996.
[7] M. J. F Gales and S. J. Young, "Robust Continuous Speech Recognition Using Parallel Model Combination," IEEE Trans. on Speech and Audio Processing, Vol.4, No.5, pp.352-359, Sep. 1996. 


댓글 (0)

등록된 댓글이 없습니다.
작성 권한이 없습니다.