Much of the research conducted at QUT’s Speech and Audio Research Laboratory is geared toward meta-data extraction from audio. We will provide an overview of the research projects currently underway at QUT and some insights into future directions.
Speaker recognition: Speaker recognition has been a core research topic of the Speech Lab for approximately a decade. We have participated regularly and very successfully at the Speaker Recognition Evaluations conducted by NIST for the best part of this decade. Current research directions emphasise verification with short utterances and tailoring state-of-the-art speaker recognition techniques to resource-poor scenarios.
Speech indexing and search: Our research focus in this area exploits phonetic recognition for indexing and search to provide indexes searchable for an unlimited set of search terms. Many systems rely on word-level indexes derived from LVCSR systems with limited vocabularies. In contrast a phonetic system is inherently open-vocabulary, allowing for queries with proper nouns and rare terms.
Speech detection: Speech detection is a fundamental speech technology. Our research focus in this area is speech detection in low-SNR environments, particularly as a conditioning step for further speech processing.
Speaker diarisation: Generally consisting of speaker change-point detection (speaker segmentation) and speaker clustering steps, speaker diarisation is the process of annotating audio with speaker labels. Applications for this information include speaker indexing of audio, as well as gathering speech segments from the same speaker to enable speaker adaptation for improved speech recognition performance.
Authors: Robbie Vogt, Brendan Baker, Sridha Sridharan
Event: SF08: Search and Information Extraction from Audio Data Workshop