REAL

Synthetic Test Data Generation for Hierarchical Graph Clustering Methods

Szilágyi, L. and Kovács, Levente and Szilágyi, S.M. (2014) Synthetic Test Data Generation for Hierarchical Graph Clustering Methods. LECTURE NOTES IN COMPUTER SCIENCE, 8835. pp. 303-310. ISSN 0302-9743

[img]
Preview
Text
88350303_SzLaci.pdf

Download (1MB) | Preview

Abstract

Recent achievements in graph-based clustering algorithms revealed the need for large-scale test data sets. This paper introduces a procedure that can provide synthetic but realistic test data to the hi- erarchical Markov clustering algorithm. Being created according to the structure and properties of the SCOP95 protein sequence data set, the synthetic data act as a collection of proteins organized in a four-level hierarchy and a similarity matrix containing pairwise similarity values of the proteins. An ultimate high-speed TRIBE-MCL algorithm was em- ployed to validate the synthetic data. Generated data sets have a healthy amount of variability due to the randomness in the processing, and are suitable for testing graph-based clustering algorithms on large-scale data.

Item Type: Article
Subjects: Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 22 Dec 2014 17:09
Last Modified: 22 Dec 2014 17:09
URI: http://real.mtak.hu/id/eprint/19659

Actions (login required)

Edit Item Edit Item