Small-scale and large-scale corpus analysis - two complementary perspectives

The majority of current corpus linguistic research deals with the automatic, computerized analysis of large data collections. This allows a focus on lexicogrammar and its patterning (e.g. collocation, frames etc) and ensures both empirical validity and a high degree of representativeness as far as results are concerned. It has many useful applications in lexicography and language learning, forensic linguistics, language variation studies etc. However, such research is not as well suited to the analysis of semantic and pragmatic features beyond those that are formally identifiable. Thus, large-scale corpus analyses can usefully be complemented by small-scale analyses of data that are small enough for manual or semi-automated, context-sensitive analysis but large enough to show at least some patterns and allow some generalisibility. Within both a distinction can be made between corpus-based/corpus-driven or text-based/text-driven research (Bednarek 2006). This paper outlines and exemplifies both approaches giving examples from research and shows how both kinds of analyses have their advantages and disadvantages and can be seen as providing complementary corpus linguistic perspectives on linguistic data.

Authors: Monika Bednarek

Event: SF08: Designing the Australian National Corpus Workshop

← View all submissions for this event.