Speech Recognition

Claudio Becchetti, Lucio Prina Ricotti

Mentioned 1

Automatic Speech Recognition (ASR) is the enabling technology for hands-free dictation and voice-triggered computer menus. It is becoming increasingly prevalent in environments such as private telephone exchanges and real-time information services. Speech Recognition introduces the principles of ASR systems, including the theory and implementation issues behind multi-speaker continuous speech recognition. Focusing on the algorithms employed in commercial and laboratory systems, the treatment enables the reader to devise practical solutions for ASR system problems. It addresses in detail C++ programming techniques used to develop ASR applications, thus offering skills that will prove useful in any large C++ based software project. Possible extensions of the well-established ASR technology are highlighted, based on "Hidden Markov Models" applied to fields such as modelling and prediction of econometric series. Features include: * Accompanying website containing all C++ source code of a complete laboratory multi-speaker continuous-speech ASR system (e.g. Initialisation, Training, Recognition, Evaluation, etc.) www.wiley.com/go/becchetti_speech * Detailed theoretical, mathematical and technical explanations of ASR * A practical account of the functioning of ASR A crucial source of information for researchers, developers and project managers involved with ASR systems, Speech Recognition is also structured for use by students of digital signal processing, speech recognition and C++ programming techniques.

More on Amazon.com

Mentioned in questions and answers.

I'm developing a project that identifies Phonemes to be able to identify whether someone is saying either "Yes" or "No".

So far in the project, I have used Zero-crossings to identify what the person is saying, this works really well and seems simple enough to understand. The project, however, needs a few enhancements and has to be developed using a Hidden Markov Model.

My question is this:

I want to develop a Hidden Markov Model, without erasing the work that I have already completed. I.e. I strip the data that do not warrant consideration by counting the number of zero-crossings as well as the summation of the blocks.

I do not understand what data I would need to train the HMM in order to be able to identify these Phonemes. E.g.

With Zero-crossings I have identifies that:

Yes - Zero-crossings start low and then the value increases

No - Zero-crossings start low and then do not increase with value.

Could I train my HMM algorithm so that it interprets these values?

Or could anyone suggest a method of which I can train the HMM to be able to identify the word that is inputted in the sample?

Hope someone can help :)!

Automated phoneme segmentation is a tough problem, so I'll provide some of my favored resources that touch on the topic in various levels of detail.

This paper: http://www.seas.upenn.edu/~jan/Files/Iscas99Speech.pdf

This paper: http://www.ll.mit.edu/publications/journal/pdf/vol08_no2/8.2.1.languageidentification.pdf

This resource is very good: http://research.microsoft.com/pubs/118769/Book-Chap-HuangDeng2010.pdf

This book gives some good examples for phoneme identification: http://www.amazon.com/Speech-Recognition-Theory-C-Implementation/dp/0471977306/

This book is pretty good, too: http://www.amazon.com/Statistical-Methods-Recognition-Language-Communication/dp/0262100665/

The books are expensive, but they are worth it (in my opinion)