Frequency Modulation features for speech front-end processing

In the past, features based mainly on amplitude information such as Mel frequency cepstral coefficients (MFCCs) has been used for speech characterization, while the phase is usually ignored. Recently, phase has been shown to enhance speech synthesis intelligibility in several human perception experiments. Consequently, phase-based features have received increasing research attention for speech processing applications. One of the main motivations for FM is the psychophysical evidence for the existence of separate pathways to track changes in amplitude and frequency. Following Teager’s experiment, it was argued that frequency modulation is introduced when the air flows through vocal tract, as it not only oscillates between the vocal tract walls, but also the effective mass and cross sectional area is rapidly changing.

Hence, we have investigated the AM-FM model of speech processing, and proposed new FM extraction methods designed for speech front-end processing. Our experiments show that FM features contain information about the speaker, and we have used them for both classical and forensic speaker recognition, and also for language identification. We have shown that FM features, augmented with MFCCs, improve the overall performance of automatic forensic speaker recognition systems, using NIST 2001 SRE database and of the speaker recognition system. Using the NIST 2008 speaker recognition database, substantial improvements have been produced relative to an MFCC-only system. These evaluations strongly support the hypothesis that the FM and MFCC features are highly complementary in nature.

Authors: Tharmarajah Thiruvaran, Mohaddeseh Nosratighods, Eliathamby Ambikairajah, Julien Epps

Event: SF08: Search and Information Extraction from Audio Data Workshop

← View all submissions for this event.