1. Procrustes is a machine-learning approach that removes cross-platform batch effects from clinical RNA sequencing data.
- Author
-
Kotlov N, Shaposhnikov K, Tazearslan C, Chasse M, Baisangurov A, Podsvirova S, Fernandez D, Abdou M, Kaneunyenye L, Morgan K, Cheremushkin I, Zemskiy P, Chelushkin M, Sorokina M, Belova E, Khorkova S, Lozinsky Y, Nuzhdina K, Vasileva E, Kravchenko D, Suryamohan K, Nomie K, Curran J, Fowler N, and Bagaev A
- Subjects
- Humans, Tissue Fixation methods, Sequence Analysis, RNA methods, Machine Learning, Gene Expression Profiling methods, RNA genetics
- Abstract
With the increased use of gene expression profiling for personalized oncology, optimized RNA sequencing (RNA-seq) protocols and algorithms are necessary to provide comparable expression measurements between exome capture (EC)-based and poly-A RNA-seq. Here, we developed and optimized an EC-based protocol for processing formalin-fixed, paraffin-embedded samples and a machine-learning algorithm, Procrustes, to overcome batch effects across RNA-seq data obtained using different sample preparation protocols like EC-based or poly-A RNA-seq protocols. Applying Procrustes to samples processed using EC and poly-A RNA-seq protocols showed the expression of 61% of genes (N = 20,062) to correlate across both protocols (concordance correlation coefficient > 0.8, versus 26% before transformation by Procrustes), including 84% of cancer-specific and cancer microenvironment-related genes (versus 36% before applying Procrustes; N = 1,438). Benchmarking analyses also showed Procrustes to outperform other batch correction methods. Finally, we showed that Procrustes can project RNA-seq data for a single sample to a larger cohort of RNA-seq data. Future application of Procrustes will enable direct gene expression analysis for single tumor samples to support gene expression-based treatment decisions., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF