REAL

Comparative cluster labelling involving external text sources

Kruzslicz, Ferenc and Kovács, Balázs and Hornyák, Miklós (2017) Comparative cluster labelling involving external text sources. Hungarian Statistical Review, 95 (K21). pp. 101-127. ISSN 0039-0690

[img]
Preview
Text
2017_K21_101.pdf

Download (1MB) | Preview

Abstract

Giving clear, straightforward names to individual result groups of clustering data is most important in making research usable. This is especially so when clustering is the real outcome of the analysis and not just a tool for data preparation. In this case, the underlying concept of the cluster itself makes the result meaningful and useful. However, a cluster comes alive only in the investigator’s mind since it can be defined or described in words. Our method introduced in this paper aims to facilitate and partly automate this verbal characterisation process. The external text database is joined to the objects of the clustering that adds new, previously unused features to the data set. Clusters are described by labels produced by text mining analytics. The validity of clustering can be characterised by the shape of the final word cloud.

Item Type: Article
Subjects: H Social Sciences / társadalomtudományok > HA Statistics / statisztika
SWORD Depositor: MTMT SWORD
Depositing User: Erika Bilicsi
Date Deposited: 20 Dec 2017 08:12
Last Modified: 10 Mar 2022 12:25
URI: http://real.mtak.hu/id/eprint/53463

Actions (login required)

Edit Item Edit Item