REAL

SzegedKoref: A Hungarian Coreference Corpus

Vincze, Veronika and Hegedűs, Klára and Sliz-Nagy, Alex and Farkas, Richárd (2018) SzegedKoref: A Hungarian Coreference Corpus. In: 11th edition of the Language Resources and Evaluation Conference.

[img]
Preview
Text
325.pdf

Download (240kB) | Preview

Abstract

In this paper we introduce SzegedKoref, a Hungarian corpus in which coreference relations are manually annotated. For annotation, we selected some texts of Szeged Treebank, the biggest treebank of Hungarian with manual annotation at several linguistic layers. The corpus contains approximately 55,000 tokens and 4000 sentences. Due to its size, the corpus can be exploited in training and testing machine learning based coreference resolution systems, which we would like to implement in the near future. We present the annotated texts, we describe the annotated categories of anaphoric relations, we report on the annotation process and we offer several examples of each annotated category. Two linguistic phenomena – phonologically empty pronouns and pronouns referring to subordinate clauses – are important characteristics of Hungarian coreference relations. In our paper, we also discuss both of them.

Item Type: Conference or Workshop Item (Paper)
Subjects: T Technology / alkalmazott, műszaki tudományok > T2 Technology (General) / műszaki tudományok általában
Depositing User: Dr Richárd Farkas
Date Deposited: 30 Sep 2018 18:12
Last Modified: 30 Sep 2018 18:12
URI: http://real.mtak.hu/id/eprint/86150

Actions (login required)

Edit Item Edit Item