Greene, Daniel, Genomics England Research Consortium, Pirri, Daniela, Frudd, Karen, Sackey, Ege, Al-Owain, Mohammed, Giese, Arnaud PJ, Ramzan, Khushnooda, Riaz, Sehar, Yamanaka, Itaru, Boeckx, Nele, Thys, Chantal, Gelb, Bruce D, Brennan, Paul, Hartill, Verity, Harvengt, Julie, Kosho, Tomoki, Mansour, Sahar, Masuno, Mitsuo, Ohata, Takako, Stewart, Helen, Taibah, Khalid, Turner, Claire LS, Imtiaz, Faiqa, Riazuddin, Saima, Morisaki, Takayuki, Ostergaard, Pia, Loeys, Bart L, Morisaki, Hiroko, Ahmed, Zubair M, Birdsey, Graeme M, Freson, Kathleen, Mumford, Andrew, Turro, Ernest, Giese, Arnaud PJ [0000-0001-7228-9542], Yamanaka, Itaru [0000-0003-0293-8070], Gelb, Bruce D [0000-0001-8527-5027], Kosho, Tomoki [0000-0002-8344-7507], Riazuddin, Saima [0000-0002-8645-4761], Ostergaard, Pia [0000-0002-2190-1356], Loeys, Bart L [0000-0003-3703-9518], Ahmed, Zubair M [0000-0003-2914-4502], Birdsey, Graeme M [0000-0002-0981-8672], Turro, Ernest [0000-0002-1820-6563], Apollo - University of Cambridge Repository, Giese, Arnaud P J [0000-0001-7228-9542], Freson, Kathleen [0000-0002-4381-2442], Giese, Arnaud P. J. [0000-0001-7228-9542], Gelb, Bruce D. [0000-0001-8527-5027], Loeys, Bart L. [0000-0003-3703-9518], Ahmed, Zubair M. [0000-0003-2914-4502], Birdsey, Graeme M. [0000-0002-0981-8672], and Genomics England Research Consortium
Acknowledgements: This research was made possible through access to the data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Institute for Health Research and National Health Service (NHS) England. The Wellcome Trust, Cancer Research UK and the Medical Research Council have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected by the National Health Service as part of their care and support. GS was performed by Illumina at Illumina Laboratory Services and was overseen by Genomics England. We thank all NHS clinicians who have contributed clinical phenotype data to the 100,000 Genomes Rare Diseases Programme and all staff at Genomics England who have contributed to the sequencing, maintenance of the research environment and assembly of the standard bioinformatic files that were required for our analyses. We thank the participants of the rare diseases program who made this research possible. We are grateful to V. Keeley for providing access to paternal DNA (ERG), F. Elmslie for inviting a patient to the clinic (ERG) and T. Jaworek for technical assistance (GPR156). D.G. was supported by the Cambridge British Heart Foundation (BHF) Centre of Research Excellence (RE/18/1/34212) and the Wellcome Collaborative (219506/Z/19/Z). V.H. was supported by an Medical Research Council (MRC)/National Institute for Health and Care Research Clinical Academic Research Partnership (MR/V037617/1). G.M.B. and K. Frudd were funded by BHF (PG/17/33/32990). G.M.B. and D.P. were funded by BHF (PG/20/16/35047). E.S. was supported by the Swiss Federal National Fund for Scientific Research (CRSII5_177191/1). S.M. and P.O. were supported by the MRC (MR/P011543/1) and BHF (RG/17/7/33217). K. Freson was supported by Katholieke Universiteit (KU) Leuven Special Research Fund (BOF) (C14/19/096) and Research Foundation – Flanders (G072921N). Work at the University of Maryland, Baltimore was supported by the National Institute on Deafness and Other Communication Disorders/National Institutes of Health (R01DC016295 to Z.M.A.). M.A.-O., F.I. and K.R. were supported by the King Salman Center for Disability Research (85722). E.T. was supported by the Mindich Child Health and Development Institute, the Charles Bronfman Institute for Personalized Medicine and the Lowy Foundation USA., Funder: Cambridge BHF Centre of Research Excellence [RE/18/1/34212] and Wellcome Collaborative Award 219506/Z/19/Z, Funder: BHF Project grant PG/17/33/32990, Funder: Swiss National Science Foundation grant CRSII5_177191, Funder: King Salman Center for Disability Research # 85722, Funder: MRC/NIHR Clinical Academic Research Partnership MR/V037617/1, Funder: Medical Research Council grant MR/P011543/1 and British Heart Foundation grant RG/17/7/33217, Funder: NIDCD/NIH R01DC016295, Funder: BHF Project grants PG/20/16/35047 & PG/17/33/32990, Funder: KULeuven BOF grant C14/19/096, FWO grant G072921N, The genetic etiologies of more than half of rare diseases remain unknown. Standardized genome sequencing and phenotyping of large patient cohorts provide an opportunity for discovering the unknown etiologies, but this depends on efficient and powerful analytical methods. We built a compact database, the ‘Rareservoir’, containing the rare variant genotypes and phenotypes of 77,539 participants sequenced by the 100,000 Genomes Project. We then used the Bayesian genetic association method BeviMed to infer associations between genes and each of 269 rare disease classes assigned by clinicians to the participants. We identified 241 known and 19 previously unidentified associations. We validated associations with ERG, PMEPA1 and GPR156 by searching for pedigrees in other cohorts and using bioinformatic and experimental approaches. We provide evidence that (1) loss-of-function variants in the Erythroblast Transformation Specific (ETS)-family transcription factor encoding gene ERG lead to primary lymphoedema, (2) truncating variants in the last exon of transforming growth factor-β regulator PMEPA1 result in Loeys–Dietz syndrome and (3) loss-of-function variants in GPR156 give rise to recessive congenital hearing impairment. The Rareservoir provides a lightweight, flexible and portable system for synthesizing the genetic and phenotypic data required to study rare disease cohorts with tens of thousands of participants.