1. Enricherator: A Bayesian Method for Inferring Regularized Genome-wide Enrichments from Sequencing Count Data.
- Author
-
Schroeder JW and Freddolino PL
- Subjects
- Humans, Genomics methods, Algorithms, Computational Biology methods, Software, Sequence Analysis, DNA methods, Chromatin Immunoprecipitation Sequencing methods, Bayes Theorem, High-Throughput Nucleotide Sequencing methods
- Abstract
A pervasive question in biological research studying gene regulation, chromatin structure, or genomics is where, and to what extent, does a signal of interest arise genome-wide? This question is addressed using a variety of methods relying on high-throughput sequencing data as their final output, including ChIP-seq for protein-DNA interactions,
1 GapR-seq for measuring supercoiling,2 and HBD-seq or DRIP-seq for R-loop positioning.3,4 Current computational methods to calculate genome-wide enrichment of the signal of interest usually do not properly handle the count-based nature of sequencing data, they often do not make use of the local correlation structure of sequencing data, and they do not apply any regularization of enrichment estimates. This can result in unrealistic estimates of the true underlying biological enrichment of interest, unrealistically low estimates of confidence in point estimates of enrichment (or no estimates of confidence at all), unrealistic gyrations in enrichment estimates at very close (<10 bp) genomic loci due to noise inherent in sequencing data, and in a multiple-hypothesis testing problem during interpretation of genome-wide enrichment estimates. We developed a tool called Enricherator to infer genome-wide enrichments from sequencing count data. Enricherator uses the variational Bayes algorithm to fit a generalized linear model to sequencing count data and to sample from the approximate posterior distribution of enrichment estimates (https://github.com/jwschroeder3/enricherator). Enrichments inferred by Enricherator more precisely identify known binding sites in cases where low coverage between binding sites leads to false-positive peak calls in these noisy regions of the genome; these benefits extend to published datasets., Competing Interests: Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: P.L. Freddolino is an Editorial Board Member/Editor-in-Chief/Associate Editor/Guest Editor for Scientific Reports and EcoSal Plus; neither organization was involved in the editorial review or the decision to publish this article. P.L. Freddolino is on the Scientific Advisory Board and is a Consultant for CircNova, Inc; CircNova provided no financial support for this work, and was not involved in any way in the performance of the research, manuscript preparation, editorial review, or decision to publish this article. J.W. Schroeder declares that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2024 Elsevier Ltd. All rights reserved.)- Published
- 2024
- Full Text
- View/download PDF