Workshop "Developing standards for phonological corpora: corpus creation, encoding and data format"

University of Augsburg, July 29 - 31, 2009 (funded by the VW Foundation)


Many phonological corpora are currently being compiled and many researchers in corpus phonology would welcome the sharing and reuse of existing corpora. In practice, however, the reuse (and extension) of existing corpora is very much impeded by the fact that they all have different data formats regarding both metadata and annotation and that the different annotation tools have limited interoperability. Moreover, there are as yet no commonly accepted ISO guidelines for the encoding of phonological corpora, and the existing TEI guidelines for the encoding of spoken language are not yet widely known.

Thus, for both existing and planned phonological corpora, internationally recognized standards for data formats, annotations and the integration of metadata are of the highest importance. The objective of the proposed workshop was to develop those standards by specifying principles and methods for creating, coding and managing phonological corpora. The workshop brought together four groups of international experts that rarely meet otherwise: researchers who are compiling phonological corpora, the developers of annotation tools, experts on XML data formats and members of the relevant ISO committee as well as the TEI.