REAL

Efficiency of Random Sampling Based Data Size Reduction on Computing Time and Validity of Clustering in Data Mining

Cebeci, Zeynel and Yildiz, Figen (2016) Efficiency of Random Sampling Based Data Size Reduction on Computing Time and Validity of Clustering in Data Mining. AGRÁRINFORMATIKA / JOURNAL OF AGRICULTURAL INFORMATICS, 7 (1). pp. 53-64. ISSN 2061-862X

[img]
Preview
Text
266_1161_1_PB_u.pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB) | Preview

Abstract

In data mining, cluster analysis is one of the widely used analytics to discover existing groups in datasets. However, the traditional clustering algorithmsbecome insufficient for the analysis ofbig data which havebeen formed with the enormous increase in the amount of collected data in recent years. Therefore, the scalability has been one of the most intensively studied research topics for clustering big data. The parallel clustering algorithms and the Map-Reduce framework based techniques on multiple machines are getting popular in scalability for big data analysis. However, applying the sampling techniques onbigdatasets could be still alternative or complementary taskin order to run the traditional algorithms on single machines. The results obtained in this studyshowed that the data size reduction by the simple random sampling could be successfully usedin cluster analysisfor large datasets. The clustering validitiesby running K-means algorithmon the sample datasetswerefound as highas those of the complete datasets. Additionally the required execution time for cluster analysis onthe sample datasets wassignificantly shorter thanthose obtained for thecomplete datasets.

Item Type: Article
Additional Information:
Uncontrolled Keywords: data reduction, randomsampling,clusteranalysis, external validity indices,big data, k-meansclustering
Subjects: H Social Sciences / társadalomtudományok > HD Industries. Land use. Labor / ipar, földhasználat, munkaügy > HD30.2 Knowledge management. Information technology management / Tudásmenedzsment
S Agriculture / mezőgazdaság > S1 Agriculture (General) / mezőgazdaság általában
Z Bibliography. Library Science. Information Resources / könyvtártudomány > ZA Information resources / információforrások
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 26 Aug 2016 06:25
Last Modified: 05 Jun 2024 13:09
URI: https://real.mtak.hu/id/eprint/39153

Actions (login required)

Edit Item Edit Item