Corpus-oriented lexicographic database for Beserman Udmurt

Arkhangelskiy, Timofey and Serdobolskaya, Natalia and Usacheva, Maria (2017) Corpus-oriented lexicographic database for Beserman Udmurt. Acta Linguistica Academica, 64 (3). pp. 397-415. ISSN 2559-8201


Download (300kB) | Preview


Beserman Udmurt documentation project is a long-term undertaking aimed primarily at collecting lexicographic and corpus data in the field. During our work on the project, we developed a pipeline for collecting, annotating and publishing our data. In this paper, we describe this pipeline and present the online web interface we developed for providing public access to Beserman materials. We use TLex lexicographic software for working on the dictionary and Fieldworks FLEX for annotating the corpus. After the data have been annotated, they are exported to XML and stored in the online web interface, where these two types of data become interconnected and searchable. We propose solutions to challenges that arise in projects of such kind and reflect on various constraints imposed on lexicographic databases being developed in long-term projects aimed at description of underresourced languages. We suggest that the proposed pipeline and the web interface we developed could be employed by similar projects dealing with other minority languages. The web interface based on the database and a corpus of oral Beserman texts is available online at

Item Type: Article
Subjects: P Language and Literature / nyelvészet és irodalom > P0 Philology. Linguistics / filológia, nyelvészet
Depositing User: László Sallai-Tóth
Date Deposited: 06 Oct 2017 06:44
Last Modified: 30 Sep 2019 23:15

Actions (login required)

Edit Item Edit Item