REAL

Identifying missing data handling methods with text mining

Boros, Krisztián and Kmetty, Zoltán (2024) Identifying missing data handling methods with text mining. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, Online. ISSN 2364-415X

[img]
Preview
Text
s41060-024-00582-1.pdf - Published Version

Download (1MB) | Preview

Abstract

Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles published between 1999 and 2016. JSTOR provided the data in text format. We utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods, such as Multiple Imputation or Full Information Maximum Likelihood estimation, is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.

Item Type: Article
Subjects: Z Bibliography. Library Science. Information Resources / könyvtártudomány > Z665 Library Science. Information Science / könyvtártudomány, információtudomány
Z Bibliography. Library Science. Information Resources / könyvtártudomány > ZA Information resources / információforrások > ZA4450 Databases / adatbázisok
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 18 Jun 2024 06:19
Last Modified: 18 Jun 2024 06:19
URI: https://real.mtak.hu/id/eprint/197731

Actions (login required)

Edit Item Edit Item