Szilágyi, L. and Kovács, Levente and Szilágyi, S.M. (2014) Synthetic Test Data Generation for Hierarchical Graph Clustering Methods. LECTURE NOTES IN COMPUTER SCIENCE, 8835. pp. 303-310. ISSN 0302-9743
|
Text
88350303_SzLaci.pdf Download (1MB) | Preview |
Abstract
Recent achievements in graph-based clustering algorithms revealed the need for large-scale test data sets. This paper introduces a procedure that can provide synthetic but realistic test data to the hi- erarchical Markov clustering algorithm. Being created according to the structure and properties of the SCOP95 protein sequence data set, the synthetic data act as a collection of proteins organized in a four-level hierarchy and a similarity matrix containing pairwise similarity values of the proteins. An ultimate high-speed TRIBE-MCL algorithm was em- ployed to validate the synthetic data. Generated data sets have a healthy amount of variability due to the randomness in the processing, and are suitable for testing graph-based clustering algorithms on large-scale data.
Item Type: | Article |
---|---|
Subjects: | Q Science / természettudomány > QA Mathematics / matematika > QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány |
SWORD Depositor: | MTMT SWORD |
Depositing User: | MTMT SWORD |
Date Deposited: | 22 Dec 2014 17:09 |
Last Modified: | 22 Dec 2014 17:09 |
URI: | http://real.mtak.hu/id/eprint/19659 |
Actions (login required)
Edit Item |