Back to Search Start Over

Direct AUC optimization of regulatory motifs

Authors :
Hongbo Zhang
Lin Zhu
De-Shuang Huang
Source :
Bioinformatics
Publication Year :
2017
Publisher :
Oxford University Press (OUP), 2017.

Abstract

Motivation The discovery of transcription factor binding site (TFBS) motifs is essential for untangling the complex mechanism of genetic variation under different developmental and environmental conditions. Among the huge amount of computational approaches for de novo identification of TFBS motifs, discriminative motif learning (DML) methods have been proven to be promising for harnessing the discovery power of accumulated huge amount of high-throughput binding data. However, they have to sacrifice accuracy for speed and could fail to fully utilize the information of the input sequences. Results We propose a novel algorithm called CDAUC for optimizing DML-learned motifs based on the area under the receiver-operating characteristic curve (AUC) criterion, which has been widely used in the literature to evaluate the significance of extracted motifs. We show that when the considered AUC loss function is optimized in a coordinate-wise manner, the cost function of each resultant sub-problem is a piece-wise constant function, whose optimal value can be found exactly and efficiently. Further, a key step of each iteration of CDAUC can be efficiently solved as a computational geometry problem. Experimental results on real world high-throughput datasets illustrate that CDAUC outperforms competing methods for refining DML motifs, while being one order of magnitude faster. Meanwhile, preliminary results also show that CDAUC may also be useful for improving the interpretability of convolutional kernels generated by the emerging deep learning approaches for predicting TF sequences specificities. Availability and Implementation CDAUC is available at: https://drive.google.com/drive/folders/0BxOW5MtIZbJjNFpCeHlBVWJHeW8. Supplementary information Supplementary data are available at Bioinformatics online.

Details

ISSN :
13674811 and 13674803
Volume :
33
Database :
OpenAIRE
Journal :
Bioinformatics
Accession number :
edsair.doi.dedup.....9b43d6acf44481a3814f3b2dedf69c70
Full Text :
https://doi.org/10.1093/bioinformatics/btx255