Hatvani, Péter and Yang, Zijian Győző (2025) Automated detection of toxic comments in Hungarian. ANNALES MATHEMATICAE ET INFORMATICAE, 61. pp. 108-117. ISSN 1787-6117
|
Text
108_117_hatvani.pdf - Published Version Download (903kB) | Preview |
Abstract
Moderating toxic online comments in Hungarian remains a challenging NLP task. We introduce the first openly available Hungarian corpus for toxic comment classification, though limited in size (n = 655), sourced from social media and political news forums. We fine-tuned three BERTbased classifiers (huBERT, multilingual BERT, and huBERT-SetFit) and applied data augmentation techniques to expand the training dataset. The best-performing model, huBERT-SetFit, achieved an F1 score of 93.7%. Our results demonstrate the effectiveness of transformer-based models for toxicity detection in low-resource, linguistically complex settings.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | toxicity, online hate, nlp, classification, logistic regression |
| Subjects: | Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
| Depositing User: | Tibor Gál |
| Date Deposited: | 11 Nov 2025 09:56 |
| Last Modified: | 11 Nov 2025 09:56 |
| URI: | https://real.mtak.hu/id/eprint/228838 |
Actions (login required)
![]() |
Edit Item |




