RESEARCH

Acoustic & Speech

Investigates various Speech signal processing schemes for acoustic modeling so that more robust speech recognition can be achieved. Our aim is to perform the state-of-art research providing effective means for achieving:

 

Automatic Speech Recognition

작성자 관리자 날짜 2021-04-08 21:34:06 조회수 289

 

Automatic Speech Recognition

 

     Contents

1. Introduction

2. The Acoustic modeling of speech recognition unit

3. Statistical Language Modeling

4. Word Network

5. Lexical Decoding

6. Application Demos

 

 

 

1. Introduction       

Automatic speech recognition system is composed of feature extraction, acoustic modeling, language modeling and searching. We estimate parameters of acoustic models using training data and estimate language model using text corpora. Then, we decode speech signal into recognized word sequence using acoustic models, language models and word network.

 

 

 

2. Acoustic Modeling of Speech Recognition Unit        

Acoustic model describes how speech signal is expressed. Recently, the most frequently used acoustic model is HMM (Hidden Markov Model). Each HMM models temporal and spectral variation of a speech-recognition unit. We estimate parameters of acoustic models using training data.

  1. The choice of speech recognition units
    • whole-words : Context Independent, Context Dependent.
    • subword segments : phone, syllable, semisyllable, triphone, diphone etc.
  2. The training of speech recognition unit model
    • Baum-Welch algorithm
    • Discriminative training

 

3. Statistical Language Modeling        

The probabilistic relationship among a sequence of words can be directly derived and modeled from the corpora with the statistical language models. We mainly use bigram or trigram language model as n-grams language model.

 

4. Word Network        

We use two kinds of networks i.e. linear lexicon and lexical tree. Linear lexicon is composed of words in parallel and used for small vocabulary recognition. Lexical tree holds previously listed pronunciations in common and is used for large vocabulary recognition.

 

 

5. Lexical Decoding        

Lexical decoding of continuous speech is to find the word sequence of the highest score out of all possible word sequences given observations sequence, acoustic model and language model using word network. In evaluation (recognition), Viterbi decoding and forward-backward algorithm are used.

 

 

6. Application Demos        

  6.1 Voice Navigation

     

  6.2 Keyword Recognition

     

6.3 LVCSR Demo

     

 

6.4 LVCSR Demo (English)

     

댓글 (0)

등록된 댓글이 없습니다.
작성 권한이 없습니다.