Yang, Zijian Győző and Laki, László János and Váradi, Tamás and Prószéky, Gábor (2023) Mono- and Multilingual GPT-3 Models for Hungarian. In: Text, Speech, and Dialogue : 26th International Conference, TSD 2023, Pilsen, Czech Republic, September 4–6, 2023, Proceedings. Lecture Notes in Computer Science, 14102 . Springer Nature Switzerland, Cham, pp. 94-104. ISBN 9783031404979; 9783031404986
|
Text
TSD_2023_GPT.pdf Download (253kB) | Preview |
Abstract
In recent years, the growth in size of Transformer-based language models has accelerated significantly. Global technology companies are training larger and larger models that require enormous resources and training data. With these experiments, they aim to demonstrate that sufficiently large models with abundant training data can solve any natural language processing task even without fine-tuning. It may not be feasible to compete directly in this race, but there is an opportunity to conduct experiments in the direction of larger models in their shadow. Our aim is to train large language models for Hungarian. According to the knowledge transfer researches, a language model can adapt valuable knowledge from other languages. Furthermore, in order for the model to be able to solve translation tasks, it also needs multilingual knowledge. In our research, we trained a Hungarian monolingual and a Hungarian-English-Chinese trilingual 6.7 billion parameter GPT language model with more than 1TB text data. In our experiments, we also fine-tuned our model with the prompts provided by the Stanford Alpaca dataset. Thus, employing this methodology, an instruct GPT was built, which, as far as we know, is the first multilingual large language model in this region that can follow instructions.
Item Type: | Book Section |
---|---|
Uncontrolled Keywords: | GPT-3, multilingual large language model, instruct GPT |
Subjects: | P Language and Literature / nyelvészet és irodalom > P0 Philology. Linguistics / filológia, nyelvészet Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
SWORD Depositor: | MTMT SWORD |
Depositing User: | MTMT SWORD |
Date Deposited: | 19 Sep 2023 10:23 |
Last Modified: | 19 Sep 2023 10:23 |
URI: | http://real.mtak.hu/id/eprint/173960 |
Actions (login required)
Edit Item |