Kmetty, Zoltán and Kollányi, Bence and Boros, Krisztián (2024) Boosting Classification Reliability of NLP Transformer Models in the Long Run—Challenges of Time in Opinion Prediction Regarding COVID-19 Vaccine. SN COMPUTER SCIENCE, 6 (1). ISSN 2662-995X
![]() |
Text
s42979-024-03553-2.pdf - Published Version Restricted to Registered users only until 20 December 2025. Download (903kB) | Request a copy |
Abstract
Transformer-based machine learning models have become an essential tool for many natural language processing (NLP) tasks since the introduction of the method. A common objective of these projects is to classify text data. Classification models are often extended to a different topic and/or time period. In these situations, deciding how long a classification is suitable for and when it is worth re-training our model is difficult. This paper compares different approaches to fine-tune a BERT model for a long-running classification task. We use data from different periods to fine-tune our original BERT model, and we also measure how a second round of annotation could boost the classification quality. Our corpus contains over 8 million comments on COVID-19 vaccination in Hungary posted between September 2020 and December 2021. Our results show that the best solution is using all available unlabeled comments to fine-tune a model. It is not advisable to focus only on comments containing words that our model has not encountered before; a more efficient solution is randomly sample comments from the new period. Fine-tuning does not prevent the model from losing performance but merely slows it down. In a rapidly changing linguistic environment, it is not possible to maintain model performance without regularly annotating new texts.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Comment classifcation , BERT , Fine-tunning , Temporal analysis , Concept drift , Vaccination |
Subjects: | Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány T Technology / alkalmazott, műszaki tudományok > T2 Technology (General) / műszaki tudományok általában |
SWORD Depositor: | MTMT SWORD |
Depositing User: | MTMT SWORD |
Date Deposited: | 20 Dec 2024 09:49 |
Last Modified: | 20 Dec 2024 09:49 |
URI: | https://real.mtak.hu/id/eprint/212317 |
Actions (login required)
![]() |
Edit Item |