Back to Search
Start Over
CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints
- Source :
- BMC Bioinformatics
- Publication Year :
- 2016
- Publisher :
- Springer Science and Business Media LLC, 2016.
-
Abstract
- Background As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes. Methods We introduce CC-PROMISE as an integrated data analysis method that combines components of canonical correlation (CC) and projection onto the most interesting evidence (PROMISE). For each gene, CC-PROMISE first uses CC to compute scores that represent the association of two forms of molecular data with each other. Next, these scores are substituted into PROMISE to evaluate the statistical evidence that the molecular data show a biologically meaningful relationship with the endpoints. Results CC-PROMISE shows outstanding performance in simulation studies and an example application involving pediatric leukemia. In simulation studies, CC-PROMISE controls the type I error (misleading significance) rate very near the nominal level across 100 distinct null settings in which no molecular-endpoint association exists. Also, CC-PROMISE has better statistical power than three other methods that control type I error in 396 of 400 (99 %) alternative settings for which a molecular-endpoint association is present; the power advantage of CC-PROMISE exceeds 30 % in 127 of the 400 (32 %) alternative settings. These advantages of CC-PROMISE are also observed in an example application. Conclusion CC-PROMISE very effectively identifies genes for which some form of molecular data shows a biologically meaningful association with multiple related endpoints. Availability The R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/software. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1217-0) contains supplementary material, which is available to authorized users.
- Subjects :
- 0301 basic medicine
Association (object-oriented programming)
Genomics
Microarray
Biology
computer.software_genre
01 natural sciences
Biochemistry
Statistical power
010104 statistics & probability
03 medical and health sciences
Canonical correlation
Structural Biology
Humans
Sequencing
0101 mathematics
Molecular Biology
Oligonucleotide Array Sequence Analysis
Leukemia
Research
Applied Mathematics
Sequence Analysis, DNA
DNA Methylation
Computer Science Applications
Nominal level
Integrated data analysis
Projection (relational algebra)
030104 developmental biology
Null (SQL)
Data mining
Transcriptome
Projection onto the most interesting statistical evidence
computer
Software
Type I and type II errors
Subjects
Details
- ISSN :
- 14712105
- Volume :
- 17
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....eb529194edb2e19dafd70ab0bd224ce1