home postech_logo postech 학과 contact us pohabg university of sciencce and yechnology
Advanced Signal Processing Lab.

Speech Signal Processing

HOME :: Research > Speech Signal Processing

Introduction

Speech recognition (SR) chip
* Support speech interface to machine.
* Embedded into mobile computing device.
* Real-time processing possible.

Noise immunity
* Interference with speech signal, music etc.
* Informal speech amounts to linguistic noise.

bar

Overview

Overview

Overview

Ensemble interval histogram (EIH) model
* Simulates biological auditory system.
* Filter bank followed by zero-crossing rates counter.

bar

Blind Sources Separation

* We want to estimate s1 and s2 using a BSS algorithm(ICA etc) on observation
data x1 and x2

* Assumptions
- Sources(s1 ,s2) are unknown but independent
- # of microphones ≥ # of sources

speechsignal

* Given measure F of independence, find an estimate of source S

speechsignal

bar

Acoustic Decoder

Acoustic model
* Speech signal samples
* Probabilistic finite-state machine (FSM)

Word recognition
* Find a word model or pattern best fitted to an observation.
* Architecture of linear systolic array developed.
* Connected word input assumed.
* One-pass type approach.

Measure of fitness
* Arithmetic difference
* Likelihood

bar

The Systolic Algorithm for Pattern Recognition

Dynamic time-warping (DTW) algorithm
* Warping function: temporal correspondence between test and reference patterns.
* Use dynamic programming (DP) technique to find optimal warping function.
* Hardware architecture required for real-time processing.
* Parallel arrangement of processing elements (PE).

The Systolic Algorithm for Pattern Recognition

bar

Processor States

Active node

Processor States

Inactive node

Inactive node

Inactive node

bar

Linear systolic array for HMM

Sample 3-D map (distance indicated by brightness in right image).

Linear systolic array for HMM

bar

Speech Recognition Board

Speech Recognition Board

mic → fpga(dtw) → plx 9054rdk-lite → pcl bus

bar

Spoken Tagging

* Error-corrective linguistic decoder
* Noise in input sentences
- Errors occurred in speech-to-text.
- Insertion, deletion and substitution of phonemes.
- Incorrect word boundary.

* Morphological analysis incorporated with error-correction.

* Morphotactics up to words explored.

bar

Speech Variables

Tag set T
* Refined part-of-speech category set.
* KORTEM tag set with 45 elements.
* Tag sequence

Morpheme set M
* About 20,000 morphs included.
* Morpheme sequence

Phoneme set R
* 40 Korean alphabets included.
* Phoneme sequence

State sequence of
Word generation model: equals to tag sequence.
Morpheme generation model:
Phoneme generation model:

Input sequence of
Morpheme generation model:
Phoneme generation model:

Observed speech sample sequence

bar

Dependency of Random Variables

Dependency of Random Variables

bar

Simultaneous SR and tagging

Simultaneous SR and tagging

bar

Probabilistic Estimation Model

Probabilistic model

Probabilistic model

MAP estimates

MAP estimates

bar

conclusion

* Noise immunity pursued at various stage of recognition process.

* Voice tracking approach to noise cancellation problem.

* Linear systolic array structure of DTW and HMM enabled real-time word recognition.

* Spoken tagging system considered to handle with linguistic noise.