REAL

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs

Martelli, Federico and Bejgu, Andrei Stefan and Campagnano, Cesare and Čibe, Jaka and Costa, Rute and Gantar, Apolonija and Kallas, Jelena and Koeva, Svetla and Koppel, Kristina and Krek, Simon and Langemets, Margit and Lipp, Veronika and Nimb, Sanni and Olsen, Sussi and Pedersen, Bolette Sandford and Quochi, Valeria and Salgado, Ana and Simon, László and Carole, Tiberius and Ureña-Ruiz, Rafael-J. and Navigli, Roberto (2023) XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs. In: CEUR Workshop Proceedings. CEUR Workshop Proceedings . CEUR-WS, p. 35.

[img]
Preview
Text
_Clic_it_2023__XL_WA___NUOVO_TEMPLATE.pdf
Available under License Creative Commons Attribution.

Download (256kB) | Preview

Abstract

Word alignment plays a crucial role in several NLP tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs.

Item Type: Book Section
Subjects: P Language and Literature / nyelvészet és irodalom > P0 Philology. Linguistics / filológia, nyelvészet
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 07 Dec 2023 10:21
Last Modified: 12 Dec 2023 11:42
URI: http://real.mtak.hu/id/eprint/181998

Actions (login required)

Edit Item Edit Item