Martelli, Federico and Bejgu, Andrei Stefan and Campagnano, Cesare and Čibe, Jaka and Costa, Rute and Gantar, Apolonija and Kallas, Jelena and Koeva, Svetla and Koppel, Kristina and Krek, Simon and Langemets, Margit and Lipp, Veronika and Nimb, Sanni and Olsen, Sussi and Pedersen, Bolette Sandford and Quochi, Valeria and Salgado, Ana and Simon, László and Carole, Tiberius and Ureña-Ruiz, Rafael-J. and Navigli, Roberto (2023) XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs. In: CEUR Workshop Proceedings. CEUR Workshop Proceedings . CEUR-WS, p. 35.
|
Text
_Clic_it_2023__XL_WA___NUOVO_TEMPLATE.pdf Available under License Creative Commons Attribution. Download (256kB) | Preview |
Abstract
Word alignment plays a crucial role in several NLP tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs.
Item Type: | Book Section |
---|---|
Subjects: | P Language and Literature / nyelvészet és irodalom > P0 Philology. Linguistics / filológia, nyelvészet |
SWORD Depositor: | MTMT SWORD |
Depositing User: | MTMT SWORD |
Date Deposited: | 07 Dec 2023 10:21 |
Last Modified: | 12 Dec 2023 11:42 |
URI: | http://real.mtak.hu/id/eprint/181998 |
Actions (login required)
![]() |
Edit Item |