Extraction of perceptual formant information from Cochlear Model

In this paper, a method to represent perceptual formant information is proposed. Using a two-dimensional hydro-mechanical Cochlear Model (CM), speech signal can be converted into human cochlea response, which is much more accurate than Psycoacoustic Masking Model (PMM) in perceptual domain. A "Peak Track" technique has been proposed to collect information from CM response. Using "Peak Track" technique, formant regions are located in CM response and salient formant information has been extracted, which include 2-stage information (pitch-dependant or not). The salient formant information is used for prediction of temporally localized distortion of speech quality, e.g. SB/SF/SI/SD in Diagnostic Acceptability Measure (DAM). Correlation coefficients between subjective scores and predictions are around 0.9.

Authors: Wenliang Lu and D. Sen

Event: SF08: Search and Information Extraction from Audio Data Workshop

← View all submissions for this event.