REAL

Mono- and Multilingual GPT-3 Models for Hungarian

Yang, Zijian Győző and Laki, László János and Váradi, Tamás and Prószéky, Gábor (2023) Mono- and Multilingual GPT-3 Models for Hungarian. In: Text, Speech, and Dialogue : 26th International Conference, TSD 2023, Pilsen, Czech Republic, September 4–6, 2023, Proceedings. Lecture Notes in Computer Science, 14102 . Springer Nature Switzerland, Cham, pp. 94-104. ISBN 9783031404979; 9783031404986

[img]
Preview
Text
TSD_2023_GPT.pdf

Download (253kB) | Preview

Abstract

In recent years, the growth in size of Transformer-based language models has accelerated significantly. Global technology companies are training larger and larger models that require enormous resources and training data. With these experiments, they aim to demonstrate that sufficiently large models with abundant training data can solve any natural language processing task even without fine-tuning. It may not be feasible to compete directly in this race, but there is an opportunity to conduct experiments in the direction of larger models in their shadow. Our aim is to train large language models for Hungarian. According to the knowledge transfer researches, a language model can adapt valuable knowledge from other languages. Furthermore, in order for the model to be able to solve translation tasks, it also needs multilingual knowledge. In our research, we trained a Hungarian monolingual and a Hungarian-English-Chinese trilingual 6.7 billion parameter GPT language model with more than 1TB text data. In our experiments, we also fine-tuned our model with the prompts provided by the Stanford Alpaca dataset. Thus, employing this methodology, an instruct GPT was built, which, as far as we know, is the first multilingual large language model in this region that can follow instructions.

Item Type: Book Section
Uncontrolled Keywords: GPT-3, multilingual large language model, instruct GPT
Subjects: P Language and Literature / nyelvészet és irodalom > P0 Philology. Linguistics / filológia, nyelvészet
Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 19 Sep 2023 10:23
Last Modified: 19 Sep 2023 10:23
URI: http://real.mtak.hu/id/eprint/173960

Actions (login required)

Edit Item Edit Item