Back to Search
Start Over
Direct AUC optimization of regulatory motifs
- Source :
- Bioinformatics
- Publication Year :
- 2017
- Publisher :
- Oxford University Press (OUP), 2017.
-
Abstract
- Motivation The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. Results We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. Availability and Implementation CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8. Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- 0301 basic medicine
Statistics and Probability
Computer science
0206 medical engineering
Nucleotide Motif
02 engineering and technology
computer.software_genre
Biochemistry
03 medical and health sciences
Discriminative model
Area under curve
Genetic variation
Humans
Nucleotide Motifs
Promoter Regions, Genetic
Molecular Biology
Reggen
Binding Sites
business.industry
Deep learning
Computational Biology
Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
Computer Science Applications
DNA binding site
Computational Mathematics
030104 developmental biology
ROC Curve
Computational Theory and Mathematics
Area Under Curve
Artificial intelligence
Data mining
K562 Cells
business
computer
Algorithms
020602 bioinformatics
Protein Binding
Transcription Factors
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 33
- Database :
- OpenAIRE
- Journal :
- Bioinformatics
- Accession number :
- edsair.doi.dedup.....9b43d6acf44481a3814f3b2dedf69c70
- Full Text :
- https://doi.org/10.1093/bioinformatics/btx255