Towards the Design of the Australian National Corpus (ANC)

Although there are currently several corpora on Australian English, they have not been widely used due to their small size and scope in comparison with well-known corpora developed in the United Kingdom and the United States such as the British National Corpus (BNC), the American National Corpus (ANC) and the Corpus of Contemporary American English (COCA). In order to compile a corpus that will be widely used, it is necessary to make it comparable to those large corpora. This will make it particularly useful to linguists and people from other fields to compare across corpora. This paper thus reviews the designs of current widely used corpora in the world and proposes a design for the Australian National Corpus. It will outline what needs to be taken into consideration to compile an Australian corpus such as timeline, various genres or categories to be included in the corpus and selection criteria for texts to be included in each genre or subgenre. A careful design of the corpus before actual data are collected or accepted to be included in the corpus will help avoid the waste of resources (i.e. data that has been gathered but cannot be used or too much of the same kind of data). It will also provide a guideline for potential contributors to the corpus to collect the right kind of data or to refine their corpora before donating them to the Australian National Corpus.

Authors: Phuong Dzung Pho

Event: SF08: Designing the Australian National Corpus Workshop

← View all submissions for this event.