REAL

Automated detection of toxic comments in Hungarian

Hatvani, Péter and Yang, Zijian Győző (2025) Automated detection of toxic comments in Hungarian. ANNALES MATHEMATICAE ET INFORMATICAE, 61. pp. 108-117. ISSN 1787-6117

[img]
Preview
Text
108_117_hatvani.pdf - Published Version

Download (903kB) | Preview

Abstract

Moderating toxic online comments in Hungarian remains a challenging NLP task. We introduce the first openly available Hungarian corpus for toxic comment classification, though limited in size (n = 655), sourced from social media and political news forums. We fine-tuned three BERTbased classifiers (huBERT, multilingual BERT, and huBERT-SetFit) and applied data augmentation techniques to expand the training dataset. The best-performing model, huBERT-SetFit, achieved an F1 score of 93.7%. Our results demonstrate the effectiveness of transformer-based models for toxicity detection in low-resource, linguistically complex settings.

Item Type: Article
Uncontrolled Keywords: toxicity, online hate, nlp, classification, logistic regression
Subjects: Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
Depositing User: Tibor Gál
Date Deposited: 11 Nov 2025 09:56
Last Modified: 11 Nov 2025 09:56
URI: https://real.mtak.hu/id/eprint/228838

Actions (login required)

Edit Item Edit Item