Real not fake - we need a real AVT corpus of Australian English

The most obvious requirements for a new corpus are that we need a broad range of texts and speech recorded with the quality of audio-visual equipment that can be expected in projected applications, preferably with multiple sets of varying quality and optimality of placement, including phone and codec qualities. The applications will also involve difficult environments extending beyond reverberant rooms to noisy stadia. The problems are not just the filtering or accommodation of noise and reverberation, but the speaker's compensation and accommodation to the noise - these cannot be simulated by just adding canned background, although they can be provided over headphones to allow more flexibility in training. A range of Australian office, bush and crowd background noise, as well as models for a variety of reverberant environments, is highly desirable.

Another issue is in recognizing and synthesizing all the subtleties of speech. Read lists of words, and read stories, reports or novels, are both different from spontaneous conversation, from professionally delivered lectures, and from commercial radio, television, film and other media. It is important that a range of naturally occuring samples are recorded and marked up. Similarly, in analyzing emotions and expressions, we need to be able to analyse genuine naturally occuring data, not the results of Hollywood or NIDA acting classes.

Another question is the extent to which we record foreign, immigrant and indigenous English - or aren't they entitled to pizza or taxis? Again we should be recording basic data and using it to develop models and perhaps even synthesize a broader range of accents that can be used in training and robustification of speech recognition system.

In terms of tagging and grammatical mark up, tagging with common parts of speech and bracketing according to commonly agreed principle is needed, however it need not implement any particular grammatical model, but should allow optional annotation addition and comparison by subsequent taggers and processes, and all such annotations should be regarded as hypothetical and thus suspect. It would also be useful to have tagging for anaphora resolution, and to allow tagging with senses and tonal, emotional, gestural or expressional features. Facilities should be available to allow easy extraction of text with or without any of the standard or optional layers of mark up.

Authors: David M W Powers

Event: SF08: Designing the Australian National Corpus Workshop

← View all submissions for this event.