Automated detection of toxic comments in Hungarian

Hatvani, Péter and Yang, Zijian Győző (2025) Automated detection of toxic comments in Hungarian. ANNALES MATHEMATICAE ET INFORMATICAE, 61. pp. 108-117. ISSN 1787-6117

Preview

Text
108_117_hatvani.pdf - Published Version
Download (903kB) | Preview

Official URL: https://doi.org/10.33039/ami.2025.10.007

Abstract

Moderating toxic online comments in Hungarian remains a challenging NLP task. We introduce the first openly available Hungarian corpus for toxic comment classification, though limited in size (n = 655), sourced from social media and political news forums. We fine-tuned three BERTbased classifiers (huBERT, multilingual BERT, and huBERT-SetFit) and applied data augmentation techniques to expand the training dataset. The best-performing model, huBERT-SetFit, achieved an F1 score of 93.7%. Our results demonstrate the effectiveness of transformer-based models for toxicity detection in low-resource, linguistically complex settings.

Item Type:	Article
Uncontrolled Keywords:	toxicity, online hate, nlp, classification, logistic regression
Subjects:	Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User:	Tibor Gál
Date Deposited:	11 Nov 2025 09:56
Last Modified:	11 Nov 2025 09:56
URI:	https://real.mtak.hu/id/eprint/228838

Actions (login required)

Edit Item