Image & Vision

Investigates Image data fusion techniques that combine image and track data from multiple sensors to achieve improved accuracies and more specific inferences than could be achieved by the use of a single sensor alone. Our aim is to explore the state-of-the-art image processing algorithms for achieving effective data fusion as in:


Multi-Modal Voice Activity Detection Demo

작성자 관리자 날짜 2021-06-08 23:38:21 조회수 389

Multi-Modal Voice Activity Detection Demo

1.     Introduction

A.     Voice Activity Detection (VAD) is an important step in a voice command recognition system such as speech interface based smart TV and voice command car navigation system. For such a system, using only acoustic information, however, may result performance degradation in noisy environments. Some recent studies have focused on improving VAD performance by using additional information such as video signal However many methods typically suffer difficulties in dealing with illumination change or global motion of image frames. To mitigate these problems, a Local Variance Histogram (LVH) as a statistical measure of lip motion change is proposed here.

2.     Main Algorithm and Principle

A.     An overview of the proposed algorithm is shown in Fig. 1. First, a whole face is detected, and the eyes are located from the detected face. Positions of the eyes are used to localize the lip region. If it is successful in detecting the lip region, the associated LVH is calculated.

Fig 1. Overview of our VVAD algorithm

B.       The measure of Local Variance is as follows:

Fig 2. Local Variance Histogram


C.      By observing that an open mouth consists of separated lips and teeth regions exhibiting high intensity values and the gap in-between exhibiting low intensity values, it is hypothesized that LVH of an open mouth would result larger values of histogram in high variance bins. An example of a closed and opened mouth underscoring the observation is shown in Fig. 3. Consequently, the number of pixels with high value of LVH is used as a visual cue for detecting speech as follows:

Fig 3. Example of Local Variance Histogram

D.     Consequently, the number of pixels with high value of LVH is used as a visual cue for detecting speech as follows:

Fig 4. Plot of local variance histogram and feature extraction result

 3.     Demo

댓글 (0)

등록된 댓글이 없습니다.
작성 권한이 없습니다.