1. XSI—a genotype compression tool for compressive genomics in large biobanks.
- Author
-
Wertenbroek, Rick, Rubinacci, Simone, Xenarios, Ioannis, Thoma, Yann, and Delaneau, Olivier
- Subjects
- *
GENOMICS , *GENOTYPES , *BIOBANKS , *ALLELES , *NUCLEOTIDE sequencing - Abstract
Motivation Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. Results We show that xSqueezeIt (XSI) allows for a file size reduction of 4 - 20 × compared with compressed BCF and demonstrate its potential for 'compressive genomics' on the UK Biobank whole-genome sequencing genotypes with 8 × faster loading times, 5 × faster run of homozygozity computation, 30 × faster dot products computation and 280 × faster allele counts. Availability and implementation The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF