Dodé, Réka and Yang, Zijian Győző (2024) Further Keyword Generation Experiment in Hungarian with Fine-tuning PULI LlumiX 32K Model. In: 2024 IEEE 3rd Conference on Information Technology and Data Science (CITDS) Proceedings. University of Debrecen, Debrecen, pp. 20-24. ISBN 9798350387889
|
Text
31-35.pdf - Published Version Download (707kB) | Preview |
Abstract
Our research continues an investigation using neural models to generate and extract keywords from lengthy texts, using data from the REAL repository and author-provided keywords. Previously, we tested three models: fastText for keyword extraction as a multi-label classification baseline, a fine-tuned Hungarian language model PULI GPT-3SX for keyword generation, and a further trained Llama-2-7B-32K model. In this study, we fine-tuned a new model, the PULI LlumiX 32K model with the same data, combining Hungarian language knowledge with Llama-2-7B-32K’s 32,000-token input capacity. We assessed the generation of new, relevant keywords by the models compared to author-provided keywords and those not present in the text. The PULI LlumiX 32K model outperformed both the PULI GPT-3SX language model and Llama-2-7B-32K model. For keywords not present in the text, PULI LlumiX 32K and Llama-2-7B-32K generated approximately 20%, similar to author keywords. PULI GPT-3SX had a higher ratio of about 30%. Some new keywords were relevant, while others were inaccurate due to erroneous phrases.
Item Type: | Book Section |
---|---|
Uncontrolled Keywords: | PULI LlumiX 32K, generated keywords, finetuning, author-provided keywords, Llama-2-7B-32K, PULI GPT3SX, Hungarian language model |
Subjects: | P Language and Literature / nyelvészet és irodalom > P0 Philology. Linguistics / filológia, nyelvészet P Language and Literature / nyelvészet és irodalom > PH Finno-Ugrian, Basque languages and literatures / finnugor és baszk nyelvek és irodalom > PH04 Hungarian language and literature / magyar nyelv és irodalom |
SWORD Depositor: | MTMT SWORD |
Depositing User: | MTMT SWORD |
Date Deposited: | 08 Oct 2024 12:26 |
Last Modified: | 08 Oct 2024 12:26 |
URI: | https://real.mtak.hu/id/eprint/207081 |
Actions (login required)
Edit Item |