Language documentation archives and corpora share the aim of storing bodies of language data. The nature of the data and the uses to which it may be put vary, but common technological challenges arise in each case. In this paper we suggest that the experience gained in dealing with the particular challenges of documentary material, especially handling large amounts of audio and video data, is relevant to the project of designing a national corpus in the 21st century. Expertise in these areas already exists in Australia in organisations such as the Australian Institute for Aboriginal and Torres Strait Islander Studies (AIATSIS) and the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC). Utilising such expertise would not only assist in the design and construction of a national corpus, but would also ensure that unnecessary duplication is avoided, and that existing resources might form a part of a national network of language resources which includes a national corpus. Design of a corpus should also take into account the nature of the holdings of the existing archives, with the aim that new collections should complement those that already exist (e.g. Australian Indigenous languages archived at AIATSIS). We suggest that there are at least two areas which may not be adequately documented currently and which should therefore be priorities in a national project. These are:
a. The non-Indigenous languages used by the Indigenous population: Aboriginal English, Kriols, Torres Strait Creole.
and:
b. Community languages, considered as distinct varieties (for example, Australian Italian as a distinct variety of Italian).
Authors: Simon Musgrave, Sarah Cutfield
Event: SF08: Designing the Australian National Corpus Workshop