REAL

Transcription factor binding site detection using convolutional neural networks with a functional group-based data representation

Pap, Gergely and Györgypál, Zoltán and Ádám, Krisztián and Tóth, László and Hegedűs, Zoltán (2021) Transcription factor binding site detection using convolutional neural networks with a functional group-based data representation. JOURNAL OF PHYSICS-CONFERENCE SERIES, 1824 (1). ISSN 1742-6588

[img]
Preview
Text
PapJPhysicsConfSer.pdf
Available under License Creative Commons Attribution.

Download (686kB) | Preview

Abstract

Transcription factors (TFs) play an essential role in molecular biology by regulating gene expression. The binding sites of TFs can vary by a large amount and the numerous possible binding locations make their detection a challenging issue. Recently, several machine learning approaches using nucleotide sequence data were applied to classify DNA sequences regarding Transcription Factor Binding Sites (TFBS). We propose a novel training strategy without the traditional 1D nucleotide-based DNA sequence representation by instead using a 2D topological matrix of sub-nucleotide chemical functional groups substantially defining the protein binding ability of DNA fragments. We train convolutional neural networks using this novel Functional Group DNA Representation (FGDR) to solve a TFBS classification task. We compare our results with the efficiency of previous nucleotide-based training approaches and show that learning from an FGDR data sequence has several benefits regarding TFBS classification. Moreover, we reason that learning deep neural networks from the FGDR representation produces competitive results while only introducing a pre-processing conversion step. Finally, we show that employing an ensemble of models from the nucleotide and FGDR representations for network training results in higher classification performance than any of the single input approaches. © Published under licence by IOP Publishing Ltd.

Item Type: Article
Additional Information: Institute of Informatics, University of Szeged, Arpád Square 2, Szeged, H-6720, Hungary Institute of Biophysics, Biological Research Centre, Temesvári Blvd. 62, Szeged, H-6726, Hungary Department of Biochemistry and Medical Chemistry, University of Pécs, Pécs, Hungary Export Date: 28 January 2022
Subjects: Q Science / természettudomány > QA Mathematics / matematika > QA76 Computer software / programozás
Q Science / természettudomány > QH Natural history / természetrajz > QH301 Biology / biológia > QH3011 Biochemistry / biokémia
Q Science / természettudomány > QH Natural history / természetrajz > QH301 Biology / biológia > QH3020 Biophysics / biofizika
SWORD Depositor: MTMT SWORD
Depositing User: MTMT SWORD
Date Deposited: 07 Feb 2022 10:38
Last Modified: 07 Feb 2022 10:38
URI: http://real.mtak.hu/id/eprint/137529

Actions (login required)

Edit Item Edit Item