The creation of a corpus which includes fresh spoken English data provides exciting prospects: opening up new areas of analysis and deepening our understandings of Australian English. To make the most of this opportunity the data must be of high quality and of maximal use to researchers with differing interests.
Toward this aim we draw on design principles in other spoken corpora and our collective experience to specify considerations for the construction of the spoken component of the AuNC. For the data to be useful for linguists as well as other researchers from a range of theoretical backgrounds, we need to consider:
a) the quality of recordings (e.g., high enough quality to allow instrumental analysis of phonetic and prosodic detail);
b) the range of recordings (e.g., carefully selected recordings of speakers in different contexts and under different conditions, allowing investigation of the effects of such changes);
c) supporting documentation (e.g., details about interlocutors and recordings, allowing new insights into regional and social variation);
d) transcription and coding (e.g., to a high enough level to readily allow discourse analytic research and comparison with other national spoken corpora); and
e) the searchablity of the documentation, transcription, and coding as well as the recordings themselves.
As an argument for best practice processes, detailed proposals in each of these areas are presented along with a sample recording demonstrating the value of the information which can be obtained through putting such measures in place.
Authors: Jean Mulder, Cara Penry Williams, and Debbie Loakes
Event: SF08: Designing the Australian National Corpus Workshop