Back to Search
Start Over
MIPHENO: data normalization for high throughput metabolite analysis
- Source :
- BMC Bioinformatics, BMC Bioinformatics, Vol 13, Iss 1, p 10 (2012)
- Publication Year :
- 2012
- Publisher :
- Springer Science and Business Media LLC, 2012.
-
Abstract
- Background High throughput methodologies such as microarrays, mass spectrometry and plate-based small molecule screens are increasingly used to facilitate discoveries from gene function to drug candidate identification. These large-scale experiments are typically carried out over the course of months and years, often without the controls needed to compare directly across the dataset. Few methods are available to facilitate comparisons of high throughput metabolic data generated in batches where explicit in-group controls for normalization are lacking. Results Here we describe MIPHENO (Mutant Identification by Probabilistic High throughput-Enabled Normalization), an approach for post-hoc normalization of quantitative first-pass screening data in the absence of explicit in-group controls. This approach includes a quality control step and facilitates cross-experiment comparisons that decrease the false non-discovery rates, while maintaining the high accuracy needed to limit false positives in first-pass screening. Results from simulation show an improvement in both accuracy and false non-discovery rate over a range of population parameters (p < 2.2 × 10-16) and a modest but significant (p < 2.2 × 10-16) improvement in area under the receiver operator characteristic curve of 0.955 for MIPHENO vs 0.923 for a group-based statistic (z-score). Analysis of the high throughput phenotypic data from the Arabidopsis Chloroplast 2010 Project (http://www.plastid.msu.edu/) showed ~ 4-fold increase in the ability to detect previously described or expected phenotypes over the group based statistic. Conclusions Results demonstrate MIPHENO offers substantial benefit in improving the ability to detect putative mutant phenotypes from post-hoc analysis of large data sets. Additionally, it facilitates data interpretation and permits cross-dataset comparison where group-based controls are missing. MIPHENO is applicable to a wide range of high throughput screenings and the code is freely available as Additional file 1 as well as through an R package in CRAN.
- Subjects :
- Quality Control
0106 biological sciences
Normalization (statistics)
Chloroplasts
Population
Arabidopsis
Computational biology
Biology
lcsh:Computer applications to medicine. Medical informatics
Bioinformatics
01 natural sciences
Biochemistry
Database normalization
03 medical and health sciences
Structural Biology
False positive paradox
education
lcsh:QH301-705.5
Molecular Biology
Chromatography, High Pressure Liquid
Statistic
Plant Proteins
030304 developmental biology
0303 health sciences
education.field_of_study
Receiver operating characteristic
Methodology Article
Applied Mathematics
Probabilistic logic
Microarray Analysis
Computer Science Applications
Phenotype
lcsh:Biology (General)
Area Under Curve
Mutation
Metabolome
lcsh:R858-859.7
DNA microarray
010606 plant biology & botany
Subjects
Details
- ISSN :
- 14712105
- Volume :
- 13
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....336a530cc8a438ef01ece3fd6a8a2f5e
- Full Text :
- https://doi.org/10.1186/1471-2105-13-10