REAL

Classifying social position with social media behavioral data

Koltai, Júlia and Rakovics, Zsófia and Kmetty, Zoltán and Számel, Kata and Ungvári, Borbála and Váradi, Bendegúz and Huszár, Ákos (2025) Classifying social position with social media behavioral data. EPJ DATA SCIENCE, 14 (1). No.-60. ISSN 2193-1127

[img]
Preview
Text
s13688-025-00578-2.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB) | Preview

Abstract

The main question of our study is how far social position can be predicted solely based on digital behavior. The phenomenon that offline inequalities are reflected in the digital space has been heavily researched since the digital revolution. Nevertheless, there are few data, which both measure social inequalities and digital behavior: scientists either have information on the social status of people, or on their observed digital behavior, but not on both. When analyzing digital behavioral data, however large scale it is, information on the social position of the users is hardly available. In the current paper, we analyze a special dataset collected with a data donation technique, which contains information on both the social position and the observed digital behavior of participants, and which is representative for the internet user population of Hungary. In the analysis, using diverse models, we explored how well basic indicators measuring digital behavior on Facebook can classify users’ social class measured by the 5-category version of the European Socio-economic Classification (ESeC). The results show that based on basic quantitative indicators of digital behavior and usage the models cannot classify users’ social position with a high degree neither in the classification of social class, nor in the case of socio-economic status. Nevertheless, the inclusion of socio-demographic characteristics as features increased the predictive power of the models, that could differentiate between the lowest and highest social position with a high degree. The models based on purely observed digital behavior could identify those in the lowest social position with the highest performance. Among those features, that played an important role in this classification, usage time, frequency network size and language characteristics (especially the diversity of the used language and punctuation) should be highlighted, while diverse Facebook activities and detected interest categories also played a role. These results are in line with the results of previous studies derived from smaller-scale, non-representative, or self-reported survey-based data on the same topic.

Item Type: Article
Uncontrolled Keywords: Observed digital behavior; Social position; Social inequalities; Social media; XGBoost; Classification
Subjects: H Social Sciences / társadalomtudományok > H Social Sciences (General) / társadalomtudomány általában
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 18 Aug 2025 11:40
Last Modified: 18 Aug 2025 11:40
URI: https://real.mtak.hu/id/eprint/222423

Actions (login required)

Edit Item Edit Item