Koltai, Júlia and Rakovics, Zsófia and Kmetty, Zoltán and Számel, Kata and Ungvári, Borbála and Váradi, Bendegúz and Huszár, Ákos (2025) Classifying social position with social media behavioral data. EPJ DATA SCIENCE, 14 (1). No.-60. ISSN 2193-1127
|
Text
s13688-025-00578-2.pdf - Published Version Available under License Creative Commons Attribution. Download (2MB) | Preview |
Abstract
The main question of our study is how far social position can be predicted solely based on digital behavior. The phenomenon that offline inequalities are reflected in the digital space has been heavily researched since the digital revolution. Nevertheless, there are few data, which both measure social inequalities and digital behavior: scientists either have information on the social status of people, or on their observed digital behavior, but not on both. When analyzing digital behavioral data, however large scale it is, information on the social position of the users is hardly available. In the current paper, we analyze a special dataset collected with a data donation technique, which contains information on both the social position and the observed digital behavior of participants, and which is representative for the internet user population of Hungary. In the analysis, using diverse models, we explored how well basic indicators measuring digital behavior on Facebook can classify users’ social class measured by the 5-category version of the European Socio-economic Classification (ESeC). The results show that based on basic quantitative indicators of digital behavior and usage the models cannot classify users’ social position with a high degree neither in the classification of social class, nor in the case of socio-economic status. Nevertheless, the inclusion of socio-demographic characteristics as features increased the predictive power of the models, that could differentiate between the lowest and highest social position with a high degree. The models based on purely observed digital behavior could identify those in the lowest social position with the highest performance. Among those features, that played an important role in this classification, usage time, frequency network size and language characteristics (especially the diversity of the used language and punctuation) should be highlighted, while diverse Facebook activities and detected interest categories also played a role. These results are in line with the results of previous studies derived from smaller-scale, non-representative, or self-reported survey-based data on the same topic.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Observed digital behavior; Social position; Social inequalities; Social media; XGBoost; Classification |
| Subjects: | H Social Sciences / társadalomtudományok > H Social Sciences (General) / társadalomtudomány általában |
| SWORD Depositor: | MTMT SWORD |
| Depositing User: | MTMT SWORD |
| Date Deposited: | 18 Aug 2025 11:40 |
| Last Modified: | 18 Aug 2025 11:40 |
| URI: | https://real.mtak.hu/id/eprint/222423 |
Actions (login required)
![]() |
Edit Item |




