Acoustic & Speech

Investigates various Speech signal processing schemes for acoustic modeling so that more robust speech recognition can be achieved. Our aim is to perform the state-of-art research providing effective means for achieving:

Speaker Identification

작성자 관리자 날짜 2021-06-03 16:12:19 조회수 6


1. Introduction

2. Robust Speaker Identification

3. Implementation

1. Introduction        

  • (1) Long distance between speech source and microphone

    When the microphone locates far from speech source in room environment, SNR of speech get lower and spectrum is distorted by reverberation(wall, obstacle). We research the speaker identification algorithm robust to long distant and reverberation.

    (2) Reverberation by reflections

    Room impulse response distorts the speech spectrum by convolution with speech signal. The other paths except the direct path makes reverberation. reverberation degrades the speech intelligibility and makes distortions like coloration, smearing.

2.  Robust Speaker Identification        

  • (1) Cepstral Feature Normalization

  • The cepstral features become a weighted combination of the actual filterbank log-energies. The effect of channel distortion on cepstral domain by reverberation makes the constant bias to cepstral coefficients over the current speech.(The linear channel effect)[1]

    Observed cepstral distribution is warped as below process. Therefore arbitrary distorted feature distributions are normalized to the specific normalization and can reduce the effect of reverberation.


  • (2) i-vector speaker model



To consider the variability of speaker model of personal, environmental changeability , i-vector base on JFA constructs the speaker model that includes speaker variability, channel variability[2]


Formulation of the speaker model considers the seperate components as speaker variability, channel variability. With UBM as common speaker model, the speaker model about specific speaker variability is represented like below equation. 


To merge UBM trained with arbitrary speech and intrinsic variability for speaker, we need the common variability T, factor w to calculate variability for specific speaker. As a result, we can get the comprehensive speaker model M.

3. Implementation      



                1) Register the target speaker i-vector (Record → Stop → i-vector extraction)
                2) Get the test speaker i-vector
                3) Compare the test i-vector to registered target i-vectors
                4) Most similar speaker name
                5) Scores of each speaker
                6) Score threshold
                7) Speech signal wave
                8) Registered target speaker list




[1] J. Pelecanos and S. Sridharan, “Feature warping for robust speaker verification,” in Interspeech, 2001, pp. 213–218.

[2] N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-End Factor Analysis for Speaker Verification,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 19, no. 4, pp. 788–798, May 2011.



댓글 (0)

등록된 댓글이 없습니다.
작성 권한이 없습니다.