REAL

ELTE Poetry Corpus : A Machine Annotated Database of Canonical Hungarian Poetry

Horváth, Péter and Kundráth, Péter and Indig, Balázs and Fellegi, Zsófia and Szlávich, Eszter and Bajzát, Tímea Borbála and Sárközi-Lindner, Zsófia and Vida, Bence and Karabulut, Aslihan and Timári, Mária and Palkó, Gábor (2022) ELTE Poetry Corpus : A Machine Annotated Database of Canonical Hungarian Poetry. In: Proceedings of the 13th Language Resources and Evaluation Conference. European Language Resources Association (ELRA), Paris, pp. 3471-3478. ISBN 9791095546726

[img]
Preview
Text
596.pdf
Available under License Creative Commons Attribution Non-commercial.

Download (405kB) | Preview

Abstract

ELTE Poetry Corpus is a database that stores canonical Hungarian poetry with automatically generated annotations of the poems’ structural units, grammatical features and sound devices, i.e. rhyme patterns, rhyme pairs, rhythm, alliterations and the main phonological features of words. The corpus has an open access online query tool with several search functions. The paper presents the main stages of the annotation process and the tools used for each stage. The TEI XML format of the different versions of the corpus, each of which contains an increasing number of annotation layers, is presented as well. We have also specified our own XML format for the corpus, slightly different from TEI, in order to make it easier and faster to execute queries on the corpus. We discuss the results of a manual evaluation of the quality of automatic annotation of rhythm, as well as the results of an automatic evaluation of different rule sets used for the automatic annotation of rhyme patterns. Finally, the paper gives an overview of the main functions of the online query tool developed for the corpus.

Item Type: Book Section
Uncontrolled Keywords: poetry corpus, Hungarian, automatic annotation, sound devices
Subjects: P Language and Literature / nyelvészet és irodalom > P0 Philology. Linguistics / filológia, nyelvészet
P Language and Literature / nyelvészet és irodalom > PH Finno-Ugrian, Basque languages and literatures / finnugor és baszk nyelvek és irodalom > PH04 Hungarian language and literature / magyar nyelv és irodalom
Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 27 Feb 2023 08:43
Last Modified: 27 Feb 2023 08:43
URI: http://real.mtak.hu/id/eprint/160443

Actions (login required)

Edit Item Edit Item