57 results on '"Narzisi, G."'
Search Results
2. A strategy for building and using a human reference pangenome [version 2; peer review: 2 approved]
- Author
-
Llamas, B., Narzisi, G., Schneider, V., Audano, P., Biederstedt, E., Blauvelt, L., Bradbury, P., Chang, X., Chin, C., Fungtammasan, A., Clarke, W., Cleary, A., Ebler, J., Eizenga, J., Sibbesen, J., Markello, C., Garrison, E., Garg, S., Hickey, G., Lazo, G., Lin, M., Mahmoud, M., Marschall, T., Minkin, I., Monlong, J., Musunuri, R., Sagayaradj, S., Novak, A., Rautiainen, M., Regier, A., Sedlazeck, F., Siren, J., Souilmi, Y., Wagner, J., Wrightsman, T., Yokoyama, T., Zeng, Q., Zook, J., Paten, B., and Busby, B.
- Subjects
Science ,Medicine - Abstract
In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.
- Published
- 2021
3. Determination of protein structure and dynamics combining immune algorithms and pattern search methods
- Author
-
Anile, A. M., Cutello, V., Narzisi, G., Nicosia, G., and Spinella, S.
- Published
- 2007
- Full Text
- View/download PDF
4. ExpansionHunter Denovo: a computational method for locating known and novel repeat expansions in short-read sequencing data
- Author
-
Dolzhenko, E, Bennett, MF, Richmond, PA, Trost, B, Chen, S, van Vugt, JJFA, Nguyen, C, Narzisi, G, Gainullin, VG, Gross, AM, Lajoie, BR, Taft, RJ, Wasserman, WW, Scherer, SW, Veldink, JH, Bentley, DR, Yuen, RKC, Bahlo, M, Eberle, MA, Dolzhenko, E, Bennett, MF, Richmond, PA, Trost, B, Chen, S, van Vugt, JJFA, Nguyen, C, Narzisi, G, Gainullin, VG, Gross, AM, Lajoie, BR, Taft, RJ, Wasserman, WW, Scherer, SW, Veldink, JH, Bentley, DR, Yuen, RKC, Bahlo, M, and Eberle, MA
- Abstract
Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.
- Published
- 2020
5. Adherence issues related to sublingual immunotherapy as perceived by allergists
- Author
-
Scurati, S., Frati, F., Passalacqua, G., Puccinelli, P., Hilaire, C., Incorvaia, C., D Avino, G., Comi, R., Lo Schiavo, M., Pezzuto, F., Montera, C., Pio, A., Teresa Ielpo, M., Cellini, F., Vicentini, L., Pecorari, R., Aresu, T., Capra, L., Benedictis, E., Bombi, C., Zauli, D., Vanzi, A., Alberto Paltrinieri, C., Bondioli, A., Paletta, I., Ventura, D., Mei, F., Paolini, F., Colangelo, C., Cavallucci, E., Cucinelli, F., Tinari, R., Ermini, G., Beltrami, V., Novembre, E., Begliomini, C., Marchese, E., Solito, E., Ammannati, V., Molino, G., Galli, E., Baldassini, M., Di Michele, L., Calvani, M., Gidaro, M., Venuti, A., Li Bianchi, E., Benassi, F., Pocobelli, D., Zangari, P., Rocco, M. G., Lo Vecchio, A., Pingitore, G., Grimaldi, O., Schiavino, D., Perrone, N., Antonietta Frieri, M., Di Rienzo, V., Tripodi, S., Scarpa, A., Tomsic, M., Bonaguro, R., Enrico Senna, G., Sirena, A., Turatello, F., Crescioli, S., Favero, E., Billeri, L., Chieco Bianchi, F., Gemignani, C., Zanforlin, M., Angiola Crivellaro, M., Hendrick, B., Maltauro, A., Masieri, S., Elisabetta Conte, M., Fama, M., Pozzan, M., Bonadonna, P., Casanova, S., Vallerani, E., Schiappoli, M., Borghesan, F., Giro, G., Casotto, S., Berardino, L., Zanoni, G., Ariano, R., Aquilina, R., Pellegrino, R., Marsico, P., Del Giudice, A., Narzisi, G., Tomaselli, V., Fornaca, G., Favro, M., Loperfido, B., Gallo, C., Buffoni, S., Gani, F., Raviolo, P., Faggionato, S., Truffelli, T., Vivalda, L., Albano, M., Enzo Rossi, R., Lattuada, G., Bona, F., Quaglio, L., Chiesa, A., Trapani, M., Seminara, R., Cucchi, B., Oderda, S., Borio, G., Galeasso, G., Garbaccio, P., Marco, A., Marengo, F., Cadario, G., Manzoni, S., Vinay, C., Curcio, A., Silvestri, A., Peduto, A., Riario-Sforza, G. G., Maria Forgnone, A., Barocelli, P., Tartaglia, N., Feyles, G., Giacone, A., Ricca, V., Guida, G., Nebiolo, F., Bommarito, L., Heffler, E., Vietti, F., Galimberti, M., Savi, E., Pappacoda, A., Bottero, P., Porcu, S., Felice, G., Berra, D., Francesca Spina, M., Pravettoni, V., Calamari, A. M., Varin, E., Iemoli, E., Lietti, D., Ghiglioni, D., Alessandro Fiocchi, Tosi, A., Poppa, M., Caviglia, A., Restuccia, M., Russello, M., Alciato, P., Manzotti, G., Ranghino, E., Luraschi, G., Rapetti, A., Rivolta, F., Allegri, F., Terracciano, L., Agostinis, F., Paolo Piras, P., Ronchi, G., Gaspardini, G., Caria, V., Tolu, F., Fantasia, D., Carta, P., Moraschini, A., Quilleri, R., Santelli, A., Prandini, P., Del Giudice, G., Apollonio, A., Bonazza, L., Teresa Franzini, M., Branchi, S., Zanca, M., Rinaldi, S., Catelli, L., Zanoletti, T., Cosentino, C., Della Torre, F., Cremonte, L., Musazzi, D., Suli, C., Rivolta, L., Ottolenghi, A., Marino, G., Sterza, G., Sambugaro, R., Orlandini, A., Minale, P., Voltolini, S., Bignardi, D., Omodeo, P., Tiri, A., Milani, S., Ronchi, B., Licardi, G., Bruni, P., Scibilia, J., Schroeder, J., Crosti, F., Maltagliati, A., Alesina, M. R., Mosca, M., Leone, G., Napolitano, G., Di Gruttola, G., Scala, G., Mascio, S., Valente, A., Marchetiello, I., Catello, R., Gazulli, A., Del Prete, A., Varricchio, A. M., Carbone, A., Forestieri, A., Stillitano, M., Leonetti, L., Tirroni, E., Castellano, F., Abbagnara, F., Romano, F., Levanti, C., Cilia, M., Longo, R., Ferrari, A., Merenda, R., Di Ponti, A., Guercio, E., Surace, L., Ammendola, G., Tansella, F., Peccarisi, L., Stragapede, L., Minenna, M., Granato, M., Fuiano, N., Pannofino, A., Ciuffreda, S., Giannotta, A., Morero, G., D Oronzio, L., Taddeo, G., Nettis, E., Cinquepalmi, G., Lamanna, C., Mastrandrea, F., Minelli, M., Salamino, F., Muratore, L., Latorre, F., Quarta, C., Ventura, M., D Ippolito, G., Giannoccaro, F., Dambra, P., Pinto, L., Triggiani, M., Munno, G., Manfredi, G., Lonero, G., Damiano, V., Errico, G., Di Leo, E., Manzari, F., Spagna, V., Arsieni, A., Matarrese, A., Mazzarella, G., Scarcia, G., Scarano, R., Ferrannini, A., Pastore, A., Maionchi, P., Filannino, L., Tria, M., Giuliano, G., Damiani, E., Scichilone, N., Marchese, M., Lucania, A., Marino, M., Strazzeri, L., Tumminello, S., Vitale, G. I., Gulotta, S., Gragotto, G., Zambito, M., Greco, D., Valenti, G., Licitra, G., Cannata, E., Filpi, R., Contraffatto, M., Sichili, S., Randazzo, S., Scarantino, G., Lo Porto, B., Pavone, F., Di Bartolo, C., Paternò, A., Rapisarda, F., Laudani, E., Leonardi, S., Padua, V., Cabibbo, G., Marino Guzzardi, G., Deluca, F., Agozzino, C., Pettinato, R., Ghini, M., Scurati S., Frati F., Passalacqua G., Puccinelli P., Hilaire C., Incorvaia C., D'Avino G., Comi R., Lo Schiavo M., Pezzuto F., Montera C., Pio A., Teresa Ielpo M., Cellini F., Vicentini L., Pecorari R., Aresu T., Capra L., De Benedictis E., Bombi C., Zauli D., Vanzi A., Alberto Paltrinieri C., Bondioli A., Paletta I., Ventura D., Mei F., Paolini F., Colangelo C., Cavallucci E., Cucinelli F., Tinari R., Ermini G., Beltrami V., Novembre E., Begliomini C., Marchese E., Solito E., Ammannati V., Molino G., Galli E., Baldassini M., Di Michele L., Calvani M., Gidaro M., Venuti A., Li Bianchi E., Benassi F., Pocobelli D., Zangari P., De Rocco M.G., Lo Vecchio A., Pingitore G., Grimaldi O., Schiavino D., Perrone N., Antonietta Frieri M., Di Rienzo V., Tripodi S., Scarpa A., Tomsic M., Bonaguro R., Enrico Senna G., Sirena A., Turatello F., Crescioli S., Favero E., Billeri L., Chieco Bianchi F., Gemignani C., Zanforlin M., Angiola Crivellaro M., Hendrick B., Maltauro A., Masieri S., Elisabetta Conte M., Fama M., Pozzan M., Bonadonna P., Casanova S., Vallerani E., Schiappoli M., Borghesan F., Giro G., Casotto S., Berardino L., Zanoni G., Ariano R., Aquilina R., Pellegrino R., Marsico P., Del Giudice A., Narzisi G., Tomaselli V., Fornaca G., Favro M., Loperfido B., Gallo C., Buffoni S., Gani F., Raviolo P., Faggionato S., Truffelli T., Vivalda L., Albano M., Enzo Rossi R., Lattuada G., Bona F., Quaglio L., Chiesa A., Trapani M., Seminara R., Cucchi B., Oderda S., Borio G., Galeasso G., Garbaccio P., De Marco A., Marengo F., Cadario G., Manzoni S., Vinay C., Curcio A., Silvestri A., Peduto A., Riario-Sforza G.G., Maria Forgnone A., Barocelli P., Tartaglia N., Feyles G., Giacone A., Ricca V., Guida G., Nebiolo F., Bommarito L., Heffler E., Vietti F., Galimberti M., Savi E., Pappacoda A., Bottero P., Porcu S., Felice G., Berra D., Francesca Spina M., Pravettoni V., Calamari A.M., Varin E., Iemoli E., Lietti D., Ghiglioni D., Fiocchi A., Tosi A., Poppa M., Caviglia A., Restuccia M., Russello M., Alciato P., Manzotti G., Ranghino E., Luraschi G., Rapetti A., Rivolta F., Allegri F., Terracciano L., Agostinis F., Paolo Piras P., Ronchi G., Gaspardini G., Caria V., Tolu F., Fantasia D., Carta P., Moraschini A., Quilleri R., Santelli A., Prandini P., Del Giudice G., Apollonio A., Bonazza L., Teresa Franzini M., Branchi S., Zanca M., Rinaldi S., Catelli L., Zanoletti T., Cosentino C., Della Torre F., Cremonte L., Musazzi D., Suli C., Rivolta L., Ottolenghi A., Marino G., Sterza G., Sambugaro R., Orlandini A., Minale P., Voltolini S., Bignardi D., Omodeo P., Tiri A., Milani S., Ronchi B., Licardi G., Bruni P., Scibilia J., Schroeder J., Crosti F., Maltagliati A., Alesina M.R., Mosca M., Leone G., Napolitano G., Di Gruttola G., Scala G., Mascio S., Valente A., Marchetiello I., Catello R., Gazulli A., Del Prete A., Varricchio A.M., Carbone A., Forestieri A., Stillitano M., Leonetti L., Tirroni E., Castellano F., Abbagnara F., Romano F., Levanti C., Cilia M., Longo R., Ferrari A., Merenda R., Di Ponti A., Guercio E., Surace L., Ammendola G., Tansella F., Peccarisi L., Stragapede L., Minenna M., Granato M., Fuiano N., Pannofino A., Ciuffreda S., Giannotta A., Morero G., D'Oronzio L., Taddeo G., Nettis E., Cinquepalmi G., Lamanna C., Mastrandrea F., Minelli M., Salamino F., Muratore L., Latorre F., Quarta C., Ventura M., D'Ippolito G., Giannoccaro F., Dambra P., Pinto L., Triggiani M., Munno G., Manfredi G., Lonero G., Damiano V., Errico G., Di Leo E., Manzari F., Spagna V., Arsieni A., Matarrese A., Mazzarella G., Scarcia G., Scarano R., Ferrannini A., Pastore A., Maionchi P., Filannino L., Tria M., Giuliano G., Damiani E., Scichilone N., Marchese M., Lucania A., Marino M., Strazzeri L., Tumminello S., Vitale G.I., Gulotta S., Gragotto G., Zambito M., Greco D., Valenti G., Licitra G., Cannata E., Filpi R., Contraffatto M., Sichili S., Randazzo S., Scarantino G., Lo Porto B., Pavone F., Di Bartolo C., Paterno A., Rapisarda F., Laudani E., Leonardi S., Padua V., Cabibbo G., Marino Guzzardi G., Deluca F., Agozzino C., Pettinato R., Ghini M., Scurati S, Frati F, Passalacqua G, Puccinelli P, Hilaire C, Incorvaia I, D'Avino G, Comi R, Lo Schiavio M, Pezzuto F, Montera C, Pio A, Ielpo MT, Cellini F, Vicentini L, Pecorari R, Aresu T, Capra L, De Benedictis E, Bombi C, Zauli D, and et al
- Subjects
medicine.medical_specialty ,Pathology ,genetic structures ,efficacy ,Alternative medicine ,Medicine (miscellaneous) ,Adherence, Cost, Efficacy, Side effects, Sublingual immunotherapy ,Settore MED/10 - Malattie Dell'Apparato Respiratorio ,sublingual immunotherapy ,ALLERGEN ,cost ,medicine ,Subcutaneous immunotherapy ,Sublingual immunotherapy ,adherence ,Clinical efficacy ,Intensive care medicine ,Pharmacology, Toxicology and Pharmaceutics (miscellaneous) ,sublingual immunoterapy ,Original Research ,Asthma ,AEROALLERGENS ,side effects ,business.industry ,Health Policy ,medicine.disease ,Slit ,eye diseases ,Clinical trial ,Patient Preference and Adherence ,immunotherapy ,sense organs ,Allergists ,ADHERENCE TO TREATMENT ,business ,Social Sciences (miscellaneous) - Abstract
Silvia Scurati1, Franco Frati1, Gianni Passalacqua2, Paola Puccinelli1, Cecile Hilaire1, Cristoforo Incorvaia3, Italian Study Group on SLIT Compliance 1Scientific and Medical Department, Stallergenes, Milan, Italy; 2Allergy and Respiratory Diseases, Department of Internal Medicine, Genoa; 3Allergy/Pulmonary Rehabilitation, ICP Hospital, Milan, ItalyObjectives: Sublingual immunotherapy (SLIT) is a viable alternative to subcutaneous immunotherapy to treat allergic rhinitis and asthma, and is widely used in clinical practice in many European countries. The clinical efficacy of SLIT has been established in a number of clinical trials and meta-analyses. However, because SLIT is self-administered by patients without medical supervision, the degree of patient adherence with treatment is still a concern. The objective of this study was to evaluate the perception by allergists of issues related to SLIT adherence.Methods: We performed a questionnaire-based survey of 296 Italian allergists, based on the adherence issues known from previous studies. The perception of importance of each item was assessed by a VAS scale ranging from 0 to 10.Results: Patient perception of clinical efficacy was considered the most important factor (ranked 1 by 54% of allergists), followed by the possibility of reimbursement (ranked 1 by 34%), and by the absence of side effects (ranked 1 by 21%). Patient education, regular follow-up, and ease of use of SLIT were ranked first by less than 20% of allergists.Conclusion: These findings indicate that clinical efficacy, cost, and side effects are perceived as the major issues influencing patient adherence to SLIT, and that further improvement of adherence is likely to be achieved by improving the patient information provided by prescribers.Keywords: adherence, sublingual immunotherapy, efficacy, cost, side effects
- Published
- 2010
6. How to escape traps using clonal selection algorithms
- Author
-
Cutello, Vincenzo, Narzisi, G, Nicosia, Giuseppe, Pavone, MARIO FRANCESCO, and Sorace, Giuseppe
- Subjects
Evolutionary Algorithms ,Immune Algorithms ,Clonal Selection Algorithms - Published
- 2004
7. Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.)
- Author
-
Ming, R, VanBuren, R, Liu, Y, Yang, M, Han, Y, Li, LT, Zhang, Q, Kim, MJ, Schatz, MC, Campbell, M, Li, J, Bowers, JE, Tang, H, Lyons, E, Ferguson, AA, Narzisi, G, Nelson, DR, Blaby-Haas, CE, Gschwend, AR, Jiao, Y, Der, JP, Zeng, F, Han, J, Min, XJ, Hudson, KA, Singh, R, Grennan, AK, Karpowicz, SJ, Watling, JR, Ito, K, Robinson, SA, Hudson, ME, Yu, Q, Mockler, TC, Carroll, A, Zheng, Y, Sunkar, R, Jia, R, Chen, N, Arro, J, Wai, CM, Wafula, E, Spence, A, Xu, L, Zhang, J, Peery, R, Haus, MJ, Xiong, W, Walsh, JA, Wu, J, Wang, ML, Zhu, YJ, Paull, RE, Britt, AB, Du, C, Downie, SR, Schuler, MA, Michael, TP, Long, SP, Ort, DR, William Schopf, J, Gang, DR, Jiang, N, Yandell, M, dePamphilis, CW, Merchant, SS, Paterson, AH, Buchanan, BB, Li, S, Shen-Miller, J, Ming, R, VanBuren, R, Liu, Y, Yang, M, Han, Y, Li, LT, Zhang, Q, Kim, MJ, Schatz, MC, Campbell, M, Li, J, Bowers, JE, Tang, H, Lyons, E, Ferguson, AA, Narzisi, G, Nelson, DR, Blaby-Haas, CE, Gschwend, AR, Jiao, Y, Der, JP, Zeng, F, Han, J, Min, XJ, Hudson, KA, Singh, R, Grennan, AK, Karpowicz, SJ, Watling, JR, Ito, K, Robinson, SA, Hudson, ME, Yu, Q, Mockler, TC, Carroll, A, Zheng, Y, Sunkar, R, Jia, R, Chen, N, Arro, J, Wai, CM, Wafula, E, Spence, A, Xu, L, Zhang, J, Peery, R, Haus, MJ, Xiong, W, Walsh, JA, Wu, J, Wang, ML, Zhu, YJ, Paull, RE, Britt, AB, Du, C, Downie, SR, Schuler, MA, Michael, TP, Long, SP, Ort, DR, William Schopf, J, Gang, DR, Jiang, N, Yandell, M, dePamphilis, CW, Merchant, SS, Paterson, AH, Buchanan, BB, Li, S, and Shen-Miller, J
- Abstract
© 2013 Ming et al. Background: Sacred lotus is a basal eudicot with agricultural, medicinal, cultural and religious importance. It was domesticated in Asia about 7,000 years ago, and cultivated for its rhizomes and seeds as a food crop. It is particularly noted for its 1,300-year seed longevity and exceptional water repellency, known as the lotus effect. The latter property is due to the nanoscopic closely packed protuberances of its self-cleaning leaf surface, which have been adapted for the manufacture of a self-cleaning industrial paint, Lotusan. Results: The genome of the China Antique variety of the sacred lotus was sequenced with Illumina and 454 technologies, at respective depths of 101× and 5.2×. The final assembly has a contig N50 of 38.8 kbp and a scaffold N50 of 3.4 Mbp, and covers 86.5% of the estimated 929 Mbp total genome size. The genome notably lacks the paleo-triplication observed in other eudicots, but reveals a lineage-specific duplication. The genome has evidence of slow evolution, with a 30% slower nucleotide mutation rate than observed in grape. Comparisons of the available sequenced genomes suggest a minimum gene set for vascular plants of 4,223 genes. Strikingly, the sacred lotus has 16 COG2132 multi-copper oxidase family proteins with root-specific expression; these are involved in root meristem phosphate starvation, reflecting adaptation to limited nutrient availability in an aquatic environment. Conclusions: The slow nucleotide substitution rate makes the sacred lotus a better resource than the current standard, grape, for reconstructing the pan-eudicot genome, and should therefore accelerate comparative analysis between eudicots and monocots.
- Published
- 2013
8. Reevaluating Assembly Evaluations with Feature Response Curves : GAGE and Assemblathons
- Author
-
Vezzi, Francesco, Narzisi, G., Mishra, B., Vezzi, Francesco, Narzisi, G., and Mishra, B.
- Abstract
In just the last decade, a multitude of bio-technologies and software pipelines have emerged to revolutionize genomics. To further their central goal, they aim to accelerate and improve the quality of de novo whole-genome assembly starting from short DNA sequences/reads. However, the performance of each of these tools is contingent on the length and quality of the sequencing data, the structure and complexity of the genome sequence, and the resolution and quality of long-range information. Furthermore, in the absence of any metric that captures the most fundamental "features" of a high-quality assembly, there is no obvious recipe for users to select the most desirable assembler/assembly. This situation has prompted the scientific community to rely on crowd-sourcing through international competitions, such as Assemblathons or GAGE, with the intention of identifying the best assembler(s) and their features. Somewhat circuitously, the only available approach to gauge de novo assemblies and assemblers relies solely on the availability of a high-quality fully assembled reference genome sequence. Still worse, reference-guided evaluations are often both difficult to analyze, leading to conclusions that are difficult to interpret. In this paper, we circumvent many of these issues by relying upon a tool, dubbed FRCbam, which is capable of evaluating de novo assemblies from the read-layouts even when no reference exists. We extend the FRCurve approach to cases where lay-out information may have been obscured, as is true in many deBruijn-graph-based algorithms. As a by-product, FRCurve now expands its applicability to a much wider class of assemblers - thus, identifying higher-quality members of this group, their inter-relations as well as sensitivity to carefully selected features, with or without the support of a reference sequence or layout for the reads. The paper concludes by reevaluating several recently conducted assembly competitions and the datasets that have resulted, QC 20130128
- Published
- 2012
- Full Text
- View/download PDF
9. Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies
- Author
-
Schatz, M. C., primary, Phillippy, A. M., additional, Sommer, D. D., additional, Delcher, A. L., additional, Puiu, D., additional, Narzisi, G., additional, Salzberg, S. L., additional, and Pop, M., additional
- Published
- 2011
- Full Text
- View/download PDF
10. Determination of protein structure and dynamics combining immune algorithms and pattern search methods
- Author
-
Anile, A. M., primary, Cutello, V., additional, Narzisi, G., additional, Nicosia, G., additional, and Spinella, S., additional
- Published
- 2006
- Full Text
- View/download PDF
11. Robust Bio-active Peptide Prediction Using Multi-objective Optimization.
- Author
-
Narzisi, G., Nicosia, G., and Stracquadanio, G.
- Published
- 2010
- Full Text
- View/download PDF
12. Modeling and simulation of e-mail social networks: A new stochastic agent-based approach.
- Author
-
Menges, F., Mishra, B., and Narzisi, G.
- Published
- 2008
- Full Text
- View/download PDF
13. Lipschitzian Pattern Search and Immunological Algorithm with Quasi-Newton Method for the Protein Folding Problem: An Innovative Multistage Approach.
- Author
-
Apolloni, Bruno, Marinaro, Maria, Nicosia, Giuseppe, Tagliaferri, Roberto, Anile, A.M., Cutello, V., Narzisi, G., Nicosia, G., and Spinella, S.
- Abstract
In this work we show an innovative approach to the protein folding problem based on an hybrid Immune Algorithm (IA) and a quasi-Newton method starting from a population of promising protein conformations created by the global optimizer DIRECT. The new method has been tested on Met-Enkephelin peptide, which is a paradigmatic example of multiple-minima problem, 1POLY, 1ROP and the three helix protein 1BDC. The experimental results show as the multistage approach is a competitive and effective search method in the conformational search space of real proteins, in terms of quality solution and computational cost comparing the results of the current state-of-art algorithms. Keywords: Deterministic Search, DIRECT, Immune Algorithms, Clonal Selection Algorithms, Quasi-Newton method, Hybrid methods, Protein Folding, Protein Structure Prediction, Bioinformatics. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
14. A novel approach to multihazard modeling and simulation.
- Author
-
Smith SW, Portelli I, Narzisi G, Nelson LS, Menges F, Rekow ED, Mincer JS, Mishra B, and Goldfrank LR
- Published
- 2009
- Full Text
- View/download PDF
15. Real coded clonal selection algorithm for unconstrained global optimization using a hybrid inversely proportional hypermutation operator
- Author
-
Vincenzo Cutello, Nicosia, G., Pavone, M., and Narzisi, G.
16. Small variant benchmark from a complete assembly of X and Y chromosomes.
- Author
-
Wagner J, Olson ND, McDaniel J, Harris L, Pinto BJ, Jáspez D, Muñoz-Barrera A, Rubio-Rodríguez LA, Lorenzo-Salazar JM, Flores C, Sahraeian SME, Narzisi G, Byrska-Bishop M, Evani US, Xiao C, Lake JA, Fontana P, Greenberg C, Freed D, Mootor MFE, Boutros PC, Murray L, Shafin K, Carroll A, Sedlazeck FJ, Wilson M, and Zook JM
- Subjects
- Humans, Male, DNA Copy Number Variations, Genome, Human, Genetic Variation, Genomics methods, Chromosomes, Human, Y genetics, Chromosomes, Human, X genetics, Benchmarking
- Abstract
The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets. We show how complete assemblies can expand benchmarks to difficult regions, but highlight remaining challenges benchmarking variants in long homopolymers and tandem repeats, complex gene conversions, copy number variable gene arrays, and human satellites., Competing Interests: Competing interests: JAL is an employee of PacBio. SMES is an employee of Roche Sequencing Solutions. DF is an employee of Sentieon, Inc., and holds stock options as part of the standard compensation package. PCB sits on the Scientific Advisory Boards of Intersect Diagnostics Inc., Sage Bionetworks and BioSymetrics Inc. LM is an employee and shareholder of Illumina Inc. KS and AC are employees of Google LLC and own Alphabet stock as part of the standard compensation package. FJS has support from ONT, Illumina, Pacbio and Genentech. The remaining authors declare no competing interests., (© 2025. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.)
- Published
- 2025
- Full Text
- View/download PDF
17. Development and extensive sequencing of a broadly-consented Genome in a Bottle matched tumor-normal pair.
- Author
-
McDaniel JH, Patel V, Olson ND, He HJ, He Z, Cole KD, Schmitt A, Sikkink K, Sedlazeck FJ, Doddapaneni H, Jhangiani SN, Muzny DM, Gingras MC, Mehta H, Paulin LF, Hastie AR, Yu HC, Weigman V, Rojas A, Kennedy K, Remington J, Gonzalez I, Sudkamp M, Wiseman K, Lajoie BR, Levy S, Jain M, Akeson S, Narzisi G, Steinsnyder Z, Reeves C, Shelton J, Kingan SB, Lambert C, Bayabyan P, Wenger AM, McLaughlin IJ, Adamson A, Kingsley C, Wescott M, Kim Y, Paten B, Park J, Violich I, Miga KH, Gardner J, McNulty B, Rosen G, McCoy R, Brundu F, Sayyari E, Scheffler K, Truong S, Catreux S, Hannah LC, Lipson D, Benjamin H, Iremadze N, Soifer I, Eacker S, Wood M, Cross E, Husar G, Gross S, Vernich M, Kolmogorov M, Ahmad T, Keskus A, Bryant A, Thibaud-Nissen F, Trow J, Proszynski J, Hirschberg JW, Ryon K, Mason CE, Wagner J, Xiao C, Liss AS, and Zook JM
- Abstract
The Genome in a Bottle Consortium (GIAB), hosted by the National Institute of Standards and Technology (NIST), is developing new matched tumor-normal samples, the first to be explicitly consented for public dissemination of genomic data and cell lines. Here, we describe a comprehensive genomic dataset from the first individual, HG008, including DNA from an adherent, epithelial-like pancreatic ductal adenocarcinoma (PDAC) tumor cell line and matched normal cells from duodenal and pancreatic tissues. Data for the tumor-normal matched samples comes from thirteen distinct state-of-the-art whole genome measurement technologies, including high depth short and long-read bulk whole genome sequencing (WGS), single cell WGS, and Hi-C, and karyotyping. These data will be used by the GIAB Consortium to develop matched tumor-normal benchmarks for somatic variant detection. We expect these data to facilitate innovation for whole genome measurement technologies, de novo assembly of tumor and normal genomes, and bioinformatic tools to identify small and structural somatic mutations. This first-of-its-kind broadly consented open-access resource will facilitate further understanding of sequencing methods used for cancer biology., Competing Interests: Competing interests A.S. and K.S. are employees of Arima Genomics. L.F.P. from BCM, was sponsored by Genentech Inc until September 2023. F.J.S from BCM, received research support from Illumina, ONT and Pacbio. A.R.H and H-C.Y. are employees of Bionano Genomics and own stock shares and options of Bionano Genomics, Inc. V.W., K.K., J.R., and I.G. are employees of BioSkryb Genomics. M.S., K.B., B.R.L. and S.L. are employees of Element Biosciences. S.B.K., C.L., P.B., A.M.W., I.J.M., A.A., C.K., M.W., and Y.K. are employees and shareholders of PacBio, Inc. D.L., H.B., N.I., and I.S. are employees and shareholders of Ultima Genomics. S.E. and M.W. are employees of Phase Genomics. E.C., G.H., S.G., and M.V. are employees of KromaTiD, Inc, E.C. is also a shareholder. F.B., E.S., K.S., S.T. and S.C. are employees of Illumina, Inc. All other authors have no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
18. Efficient indexing and querying of annotations in a pangenome graph.
- Author
-
Novak AM, Chung D, Hickey G, Djebali S, Yokoyama TT, Garrison E, Narzisi G, Paten B, and Monlong J
- Abstract
The current reference genome is the backbone of diverse and rich annotations. Simple text formats, like VCF or BED, have been widely adopted and helped the critical exchange of genomic information. There is a dire need for tools and formats enabling pangenomic annotation to facilitate such enrichment of pangenomic references. The Graph Alignment Format (GAF) is a text format, tab-delimited like BED/VCF files, which was proposed to represent alignments. GAF could also be used to store paths representing annotations in a pangenome graph, but there are no tools to index and query them efficiently. Here, we present extensions to vg and HTSlib that provide efficient sorting, indexing, and querying for GAF files. With this approach, annotations overlapping a subgraph can be extracted quickly. Paths are sorted based on the IDs of traversed nodes, compressed with BGZIP, and indexed with HTSlib/tabix via our extensions for the GAF format. Compared to the binary GAM format, GAF files are easier to edit or inspect because they are plain text, and we show that they are twice as fast to sort and half as large on disk. In addition, we updated vg annotate, which takes BED or GFF3 annotation files relative to linear sequences and projects them into the pangenome. It can now produce GAF files representing these annotations' paths through the pangenome. We showcase these new tools on several applications. We projected annotations for all Human Pangenome Reference Consortium Year 1 haplotypes, including genes, segmental duplications, tandem repeats and repeats annotations, into the Minigraph-Cactus pangenome (GRCh38-based v1.1). We also projected known variants from the GWAS Catalog and expression QTLs from the GTEx project into the pangenome. Finally, we reanalyzed ATAC-seq data from ENCODE to demonstrate what a coverage track could look like in a pangenome graph. These rich annotations can be quickly queried with vg and visualized using existing tools like the Sequence Tube Map or Bandage.
- Published
- 2024
- Full Text
- View/download PDF
19. DeepSomatic: Accurate somatic small variant discovery for multiple sequencing technologies.
- Author
-
Park J, Cook DE, Chang PC, Kolesnikov A, Brambrink L, Mier JC, Gardner J, McNulty B, Sacco S, Keskus A, Bryant A, Ahmad T, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Yoo B, Pushel I, Lansdon LA, Bi C, Walter A, Gibson M, Pastinen T, Farooqi MS, Robine N, Miga KH, Carroll A, Kolmogorov M, Paten B, and Shafin K
- Abstract
Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies now offer potential advantages in terms of repeat mapping and variant phasing. We present DeepSomatic, a deep learning method for detecting somatic SNVs and insertions and deletions (indels) from both short-read and long-read data, with modes for whole-genome and exome sequencing, and able to run on tumor-normal, tumor-only, and with FFPE-prepared samples. To help address the dearth of publicly available training and benchmarking data for somatic variant detection, we generated and make openly available a dataset of five matched tumor-normal cell line pairs sequenced with Illumina, PacBio HiFi, and Oxford Nanopore Technologies, along with benchmark variant sets. Across samples and technologies (short-read and long-read), DeepSomatic consistently outperforms existing callers, particularly for indels., Competing Interests: Competing interests K.S., D.E.C., P.C., A. Kolesnikov, L.B., J.C.M., and A.C. are employees of Google LLC and own Alphabet stock as part of the standard compensation package. M.S.F. is a part of the speakers bureau for Bayer and PacBio.
- Published
- 2024
- Full Text
- View/download PDF
20. Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads.
- Author
-
Keskus A, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, and Kolmogorov M
- Abstract
Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; on this benchmark Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels., Competing Interests: Competing interests. S.A. is an employee and stockholder of Oxford Nanopore Technologies. A.K., P.C., K.S., D.C., A.C. are employees of Google LLC and own Alphabet stock as part of the standard compensation package. E.G. served on advisory boards for Jazz Pharmaceuticals and Syndax Pharmaceuticals. M.S.F. is part of the speakers bureau for Bayer and PacBio. The remaining authors declare no competing interests.
- Published
- 2024
- Full Text
- View/download PDF
21. Osteocalcin of maternal and embryonic origins synergize to establish homeostasis in offspring.
- Author
-
Correa Pinto Junior D, Canal Delgado I, Yang H, Clemenceau A, Corvelo A, Narzisi G, Musunuri R, Meyer Berger J, Hendricks LE, Tokumura K, Luo N, Li H, Oury F, Ducy P, Yadav VK, Li X, and Karsenty G
- Subjects
- Animals, Female, Humans, Mice, Pregnancy, Homeostasis, Insulin metabolism, Insulin Secretion, Mammals metabolism, Osteocalcin genetics, Osteocalcin metabolism, Blood Glucose analysis, Blood Glucose metabolism, Prenatal Exposure Delayed Effects metabolism
- Abstract
Many physiological osteocalcin-regulated functions are affected in adult offspring of mothers experiencing unhealthy pregnancy. Furthermore, osteocalcin signaling during gestation influences cognition and adrenal steroidogenesis in adult mice. Together these observations suggest that osteocalcin may broadly function during pregnancy to determine organismal homeostasis in adult mammals. To test this hypothesis, we analyzed in unchallenged wildtype and Osteocalcin-deficient, newborn and adult mice of various genotypes and origin maintained on different genetic backgrounds, the functions of osteocalcin in the pancreas, liver and testes and their molecular underpinnings. This analysis revealed that providing mothers are Osteocalcin-deficient, Osteocalcin haploinsufficiency in embryos hampers insulin secretion, liver gluconeogenesis, glucose homeostasis, testes steroidogenesis in adult offspring; inhibits cell proliferation in developing pancreatic islets and testes; and disrupts distinct programs of gene expression in these organs and in the brain. This study indicates that osteocalcin exerts dominant functions in most organs it influences. Furthermore, through their synergistic regulation of multiple physiological functions, osteocalcin of maternal and embryonic origins contributes to the establishment and maintenance of organismal homeostasis in newborn and adult offspring., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
22. Osteocalcin of maternal and embryonic origins synergize to establish homeostasis in offspring.
- Author
-
Pinto DC, Delgado IC, Yang H, Clemenceau A, Corvelo A, Narzisi G, Musunuri R, Berger JM, Hendricks LE, Tokumura K, Luo N, Li H, Oury F, Ducy P, Yadav VK, Li X, and Karsenty G
- Abstract
Many physiological functions regulated by osteocalcin are affected in adult offspring of mothers experiencing an unhealthy pregnancy. Furthermore, osteocalcin signaling during gestation influences cognition and adrenal steroidogenesis in adult mice. Together these observations suggest that osteocalcin functions during pregnancy may be a broader determinant of organismal homeostasis in adult mammals than previously thought. To test this hypothesis, we analyzed in unchallenged wildtype and Osteocalcin -deficient, newborn, and adult mice of various genotypes and origin, and that were maintained on different genetic backgrounds, the functions of osteocalcin in the pancreas, liver and testes and their molecular underpinnings. This analysis revealed that providing mothers are themselves Osteocalcin -deficient, Osteocalcin haploinsufficiency in embryos hampers insulin secretion, liver gluconeogenesis, glucose homeostasis, testes steroidogenesis in adult offspring; inhibits cell proliferation in developing pancreatic islets and testes; and disrupts distinct programs of gene expression in these organs and in the brain. This study indicates that through their synergistic regulation of multiple physiological functions, osteocalcin ofmaternal and embryonic origins contributes to the establishment and maintenance of organismal homeostasis in newborn and adult offspring.
- Published
- 2023
- Full Text
- View/download PDF
23. Unexpected frequency of the pathogenic AR CAG repeat expansion in the general population.
- Author
-
Zanovello M, Ibáñez K, Brown AL, Sivakumar P, Bombaci A, Santos L, van Vugt JJFA, Narzisi G, Karra R, Scholz SW, Ding J, Gibbs JR, Chiò A, Dalgard C, Weisburd B, Hanna MG, Greensmith L, Phatnani H, Veldink JH, Traynor BJ, Polke J, Houlden H, Fratta P, and Tucci A
- Subjects
- Humans, Male, Muscular Atrophy, Polymerase Chain Reaction, Trinucleotide Repeat Expansion genetics, Receptors, Androgen genetics, Muscular Atrophy, Spinal genetics
- Abstract
CAG repeat expansions in exon 1 of the AR gene on the X chromosome cause spinal and bulbar muscular atrophy, a male-specific progressive neuromuscular disorder associated with a variety of extra-neurological symptoms. The disease has a reported male prevalence of approximately 1:30 000 or less, but the AR repeat expansion frequency is unknown. We established a pipeline, which combines the use of the ExpansionHunter tool and visual validation, to detect AR CAG expansion on whole-genome sequencing data, benchmarked it to fragment PCR sizing, and applied it to 74 277 unrelated individuals from four large cohorts. Our pipeline showed sensitivity of 100% [95% confidence interval (CI) 90.8-100%], specificity of 99% (95% CI 94.2-99.7%), and a positive predictive value of 97.4% (95% CI 84.4-99.6%). We found the mutation frequency to be 1:3182 (95% CI 1:2309-1:4386, n = 117 734) X chromosomes-10 times more frequent than the reported disease prevalence. Modelling using the novel mutation frequency led to estimate disease prevalence of 1:6887 males, more than four times more frequent than the reported disease prevalence. This discrepancy is possibly due to underdiagnosis of this neuromuscular condition, reduced penetrance, and/or pleomorphic clinical manifestations., (© The Author(s) 2023. Published by Oxford University Press on behalf of the Guarantors of Brain.)
- Published
- 2023
- Full Text
- View/download PDF
24. Integrative transcriptomic analysis of the amyotrophic lateral sclerosis spinal cord implicates glial activation and suggests new risk genes.
- Author
-
Humphrey J, Venkatesh S, Hasan R, Herb JT, de Paiva Lopes K, Küçükali F, Byrska-Bishop M, Evani US, Narzisi G, Fagegaltier D, Sleegers K, Phatnani H, Knowles DA, Fratta P, and Raj T
- Subjects
- Humans, Retrospective Studies, Transcriptome, Spinal Cord metabolism, Amyotrophic Lateral Sclerosis genetics, Amyotrophic Lateral Sclerosis metabolism, Neurodegenerative Diseases metabolism
- Abstract
Amyotrophic lateral sclerosis (ALS) is a progressively fatal neurodegenerative disease affecting motor neurons in the brain and spinal cord. In this study, we investigated gene expression changes in ALS via RNA sequencing in 380 postmortem samples from cervical, thoracic and lumbar spinal cord segments from 154 individuals with ALS and 49 control individuals. We observed an increase in microglia and astrocyte gene expression, accompanied by a decrease in oligodendrocyte gene expression. By creating a gene co-expression network in the ALS samples, we identified several activated microglia modules that negatively correlate with retrospective disease duration. We mapped molecular quantitative trait loci and found several potential ALS risk loci that may act through gene expression or splicing in the spinal cord and assign putative cell types for FNBP1, ACSL5, SH3RF1 and NFASC. Finally, we outline how common genetic variants associated with splicing of C9orf72 act as proxies for the well-known repeat expansion, and we use the same mechanism to suggest ATXN3 as a putative risk gene., (© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.)
- Published
- 2023
- Full Text
- View/download PDF
25. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.
- Author
-
Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K, Fairley S, Runnels A, Winterkorn L, Lowy E, Paul Flicek, Germer S, Brand H, Hall IM, Talkowski ME, Narzisi G, and Zody MC
- Subjects
- Female, High-Throughput Nucleotide Sequencing methods, Humans, INDEL Mutation, Male, Polymorphism, Single Nucleotide, Genome, Human, Whole Genome Sequencing
- Abstract
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies., Competing Interests: Declaration of interests E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. P.F. is an SAB member of Fabric Genomics, Inc., and Eagle Genomics, Ltd., (Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.)
- Published
- 2022
- Full Text
- View/download PDF
26. Curated variation benchmarks for challenging medically relevant autosomal genes.
- Author
-
Wagner J, Olson ND, Harris L, McDaniel J, Cheng H, Fungtammasan A, Hwang YC, Gupta R, Wenger AM, Rowell WJ, Khan ZM, Farek J, Zhu Y, Pisupati A, Mahmoud M, Xiao C, Yoo B, Sahraeian SME, Miller DE, Jáspez D, Lorenzo-Salazar JM, Muñoz-Barrera A, Rubio-Rodríguez LA, Flores C, Narzisi G, Evani US, Clarke WE, Lee J, Mason CE, Lincoln SE, Miga KH, Ebbert MTW, Shumate A, Li H, Chin CS, Zook JM, and Sedlazeck FJ
- Subjects
- Haplotypes genetics, Humans, Sequence Analysis, DNA, Genome, Human genetics
- Abstract
The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome., (© 2022. This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.)
- Published
- 2022
- Full Text
- View/download PDF
27. Benchmarking challenging small variants with linked and long reads.
- Author
-
Wagner J, Olson ND, Harris L, Khan Z, Farek J, Mahmoud M, Stankovic A, Kovacevic V, Yoo B, Miller N, Rosenfeld JA, Ni B, Zarate S, Kirsche M, Aganezov S, Schatz MC, Narzisi G, Byrska-Bishop M, Clarke W, Evani US, Markello C, Shafin K, Zhou X, Sidow A, Bansal V, Ebert P, Marschall T, Lansdorp P, Hanlon V, Mattsson CA, Barrio AM, Fiddes IT, Xiao C, Fungtammasan A, Chin CS, Wenger AM, Rowell WJ, Sedlazeck FJ, Carroll A, Salit M, and Zook JM
- Abstract
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2 . For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.
- Published
- 2022
- Full Text
- View/download PDF
28. Author Correction: Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study.
- Author
-
Foox J, Tighe SW, Nicolet CM, Zook JM, Byrska-Bishop M, Clarke WE, Khayat MM, Mahmoud M, Laaguiby PK, Herbert ZT, Warner D, Grills GS, Jen J, Levy S, Xiang J, Alonso A, Zhao X, Zhang W, Teng F, Zhao Y, Lu H, Schroth GP, Narzisi G, Farmerie W, Sedlazeck FJ, Baldwin DA, and Mason CE
- Published
- 2021
- Full Text
- View/download PDF
29. Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study.
- Author
-
Foox J, Tighe SW, Nicolet CM, Zook JM, Byrska-Bishop M, Clarke WE, Khayat MM, Mahmoud M, Laaguiby PK, Herbert ZT, Warner D, Grills GS, Jen J, Levy S, Xiang J, Alonso A, Zhao X, Zhang W, Teng F, Zhao Y, Lu H, Schroth GP, Narzisi G, Farmerie W, Sedlazeck FJ, Baldwin DA, and Mason CE
- Subjects
- Base Pair Mismatch, Benchmarking, DNA genetics, DNA, Bacterial genetics, Genome, Bacterial, Genome, Human, Humans, High-Throughput Nucleotide Sequencing methods, High-Throughput Nucleotide Sequencing standards, Sequence Analysis, DNA methods, Sequence Analysis, DNA standards
- Abstract
Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling., (© 2021. The Author(s), under exclusive licence to Springer Nature America, Inc.)
- Published
- 2021
- Full Text
- View/download PDF
30. Feather Gene Expression Elucidates the Developmental Basis of Plumage Iridescence in African Starlings.
- Author
-
Rubenstein DR, Corvelo A, MacManes MD, Maia R, Narzisi G, Rousaki A, Vandenabeele P, Shawkey MD, and Solomon J
- Subjects
- Animals, Gene Expression, Iridescence, Pigmentation genetics, Feathers, Starlings
- Abstract
Iridescence is widespread in the living world, occurring in organisms as diverse as bacteria, plants, and animals. Yet, compared to pigment-based forms of coloration, we know surprisingly little about the developmental and molecular bases of the structural colors that give rise to iridescence. Birds display a rich diversity of iridescent structural colors that are produced in feathers by the arrangement of melanin-containing organelles called melanosomes into nanoscale configurations, but how these often unusually shaped melanosomes form, or how they are arranged into highly organized nanostructures, remains largely unknown. Here, we use functional genomics to explore the developmental basis of iridescent plumage using superb starlings (Lamprotornis superbus), which produce both iridescent blue and non-iridescent red feathers. Through morphological and chemical analyses, we confirm that hollow, flattened melanosomes in iridescent feathers are eumelanin-based, whereas melanosomes in non-iridescent feathers are solid and amorphous, suggesting that high pheomelanin content underlies red coloration. Intriguingly, the nanoscale arrangement of melanosomes within the barbules was surprisingly similar between feather types. After creating a new genome assembly, we use transcriptomics to show that non-iridescent feather development is associated with genes related to pigmentation, metabolism, and mitochondrial function, suggesting non-iridescent feathers are more energetically expensive to produce than iridescent feathers. However, iridescent feather development is associated with genes related to structural and cellular organization, suggesting that, while nanostructures themselves may passively assemble, barbules and melanosomes may require active organization to give them their shape. Together, our analyses suggest that iridescent feathers form through a combination of passive self-assembly and active processes., (© The American Genetic Association. 2021. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2021
- Full Text
- View/download PDF
31. Somatic variant analysis of linked-reads sequencing data with Lancet.
- Author
-
Musunuri R, Arora K, Corvelo A, Shah M, Shelton J, Zody MC, and Narzisi G
- Subjects
- Algorithms, Diploidy, Sequence Analysis, DNA, High-Throughput Nucleotide Sequencing, Software
- Abstract
Summary: We present a new version of the popular somatic variant caller, Lancet, that supports the analysis of linked-reads sequencing data. By seamlessly integrating barcodes and haplotype read assignments within the colored De Bruijn graph local-assembly framework, Lancet computes a barcode-aware coverage and identifies variants that disagree with the local haplotype structure., Availability and Implementation: Lancet is implemented in C++ and available for academic and non-commercial research purposes as an open-source package at https://github.com/nygenome/lancet., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2021
- Full Text
- View/download PDF
32. Coding and noncoding variants in EBF3 are involved in HADDS and simplex autism.
- Author
-
Padhi EM, Hayeck TJ, Cheng Z, Chatterjee S, Mannion BJ, Byrska-Bishop M, Willems M, Pinson L, Redon S, Benech C, Uguen K, Audebert-Bellanger S, Le Marechal C, Férec C, Efthymiou S, Rahman F, Maqbool S, Maroofian R, Houlden H, Musunuri R, Narzisi G, Abhyankar A, Hunter RD, Akiyama J, Fries LE, Ng JK, Mehinovic E, Stong N, Allen AS, Dickel DE, Bernier RA, Gorkin DU, Pennacchio LA, Zody MC, and Turner TN
- Subjects
- Autistic Disorder epidemiology, Autistic Disorder pathology, Enhancer Elements, Genetic genetics, Exome genetics, Female, Gene Regulatory Networks genetics, Humans, Male, Muscle Hypotonia epidemiology, Muscle Hypotonia pathology, Mutation genetics, Neurodevelopmental Disorders epidemiology, Neurodevelopmental Disorders pathology, Neurons metabolism, Neurons pathology, Autistic Disorder genetics, Genetic Predisposition to Disease, Muscle Hypotonia genetics, Neurodevelopmental Disorders genetics, Transcription Factors genetics
- Abstract
Background: Previous research in autism and other neurodevelopmental disorders (NDDs) has indicated an important contribution of protein-coding (coding) de novo variants (DNVs) within specific genes. The role of de novo noncoding variation has been observable as a general increase in genetic burden but has yet to be resolved to individual functional elements. In this study, we assessed whole-genome sequencing data in 2671 families with autism (discovery cohort of 516 families, replication cohort of 2155 families). We focused on DNVs in enhancers with characterized in vivo activity in the brain and identified an excess of DNVs in an enhancer named hs737., Results: We adapted the fitDNM statistical model to work in noncoding regions and tested enhancers for excess of DNVs in families with autism. We found only one enhancer (hs737) with nominal significance in the discovery (p = 0.0172), replication (p = 2.5 × 10
-3 ), and combined dataset (p = 1.1 × 10-4 ). Each individual with a DNV in hs737 had shared phenotypes including being male, intact cognitive function, and hypotonia or motor delay. Our in vitro assessment of the DNVs showed they all reduce enhancer activity in a neuronal cell line. By epigenomic analyses, we found that hs737 is brain-specific and targets the transcription factor gene EBF3 in human fetal brain. EBF3 is genome-wide significant for coding DNVs in NDDs (missense p = 8.12 × 10-35 , loss-of-function p = 2.26 × 10-13 ) and is widely expressed in the body. Through characterization of promoters bound by EBF3 in neuronal cells, we saw enrichment for binding to NDD genes (p = 7.43 × 10-6 , OR = 1.87) involved in gene regulation. Individuals with coding DNVs have greater phenotypic severity (hypotonia, ataxia, and delayed development syndrome [HADDS]) in comparison to individuals with noncoding DNVs that have autism and hypotonia., Conclusions: In this study, we identify DNVs in the hs737 enhancer in individuals with autism. Through multiple approaches, we find hs737 targets the gene EBF3 that is genome-wide significant in NDDs. By assessment of noncoding variation and the genes they affect, we are beginning to understand their impact on gene regulatory networks in NDDs., (© 2021. The Author(s).)- Published
- 2021
- Full Text
- View/download PDF
33. The genomic basis of evolutionary differentiation among honey bees.
- Author
-
Fouks B, Brand P, Nguyen HN, Herman J, Camara F, Ence D, Hagen DE, Hoff KJ, Nachweide S, Romoth L, Walden KKO, Guigo R, Stanke M, Narzisi G, Yandell M, Robertson HM, Koeniger N, Chantawannakul P, Schatz MC, Worley KC, Robinson GE, Elsik CG, and Rueppell O
- Abstract
In contrast to the western honey bee, Apis mellifera , other honey bee species have been largely neglected despite their importance and diversity. The genetic basis of the evolutionary diversification of honey bees remains largely unknown. Here, we provide a genome-wide comparison of three honey bee species, each representing one of the three subgenera of honey bees, namely the dwarf ( Apis florea ), giant ( A. dorsata ), and cavity-nesting ( A. mellifera ) honey bees with bumblebees as an outgroup. Our analyses resolve the phylogeny of honey bees with the dwarf honey bees diverging first. We find that evolution of increased eusocial complexity in Apis proceeds via increases in the complexity of gene regulation, which is in agreement with previous studies. However, this process seems to be related to pathways other than transcriptional control. Positive selection patterns across Apis reveal a trade-off between maintaining genome stability and generating genetic diversity, with a rapidly evolving piRNA pathway leading to genomes depleted of transposable elements, and a rapidly evolving DNA repair pathway associated with high recombination rates in all Apis species. Diversification within Apis is accompanied by positive selection in several genes whose putative functions present candidate mechanisms for lineage-specific adaptations, such as migration, immunity, and nesting behavior., (© 2021 Fouks et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2021
- Full Text
- View/download PDF
34. A crowdsourced set of curated structural variants for the human genome.
- Author
-
Chapman LM, Spies N, Pai P, Lim CS, Carroll A, Narzisi G, Watson CM, Proukakis C, Clarke WE, Nariai N, Dawson E, Jones G, Blankenberg D, Brueffer C, Xiao C, Kolora SRR, Alexander N, Wolujewicz P, Ahmed AE, Smith G, Shehreen S, Wenger AM, Salit M, and Zook JM
- Subjects
- Heuristics, Humans, INDEL Mutation, Genome, Human, Genomic Structural Variation
- Abstract
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies., Competing Interests: AC is an employee of Google Inc. AC is a former employee of DNAnexus Inc. NN is an employee of Illumina Inc.
- Published
- 2020
- Full Text
- View/download PDF
35. ExpansionHunter Denovo: a computational method for locating known and novel repeat expansions in short-read sequencing data.
- Author
-
Dolzhenko E, Bennett MF, Richmond PA, Trost B, Chen S, van Vugt JJFA, Nguyen C, Narzisi G, Gainullin VG, Gross AM, Lajoie BR, Taft RJ, Wasserman WW, Scherer SW, Veldink JH, Bentley DR, Yuen RKC, Bahlo M, and Eberle MA
- Subjects
- Case-Control Studies, Fragile X Syndrome genetics, Friedreich Ataxia genetics, High-Throughput Nucleotide Sequencing, Humans, Huntington Disease genetics, Microsatellite Repeats, Myotonic Dystrophy genetics, Whole Genome Sequencing, DNA Repeat Expansion, Software
- Abstract
Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.
- Published
- 2020
- Full Text
- View/download PDF
36. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions.
- Author
-
Dolzhenko E, Deshpande V, Schlesinger F, Krusche P, Petrovski R, Chen S, Emig-Agius D, Gross A, Narzisi G, Bowman B, Scheffler K, van Vugt JJFA, French C, Sanchis-Juan A, Ibáñez K, Tucci A, Lajoie BR, Veldink JH, Raymond FL, Taft RJ, Bentley DR, and Eberle MA
- Subjects
- Genotype, Microsatellite Repeats, Software
- Abstract
Summary: We describe a novel computational method for genotyping repeats using sequence graphs. This method addresses the long-standing need to accurately genotype medically important loci containing repeats adjacent to other variants or imperfect DNA repeats such as polyalanine repeats. Here we introduce a new version of our repeat genotyping software, ExpansionHunter, that uses this method to perform targeted genotyping of a broad class of such loci., Availability and Implementation: ExpansionHunter is implemented in C++ and is available under the Apache License Version 2.0. The source code, documentation, and Linux/macOS binaries are available at https://github.com/Illumina/ExpansionHunter/., Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author(s) 2019. Published by Oxford University Press.)
- Published
- 2019
- Full Text
- View/download PDF
37. A strategy for building and using a human reference pangenome.
- Author
-
Llamas B, Narzisi G, Schneider V, Audano PA, Biederstedt E, Blauvelt L, Bradbury P, Chang X, Chin CS, Fungtammasan A, Clarke WE, Cleary A, Ebler J, Eizenga J, Sibbesen JA, Markello CJ, Garrison E, Garg S, Hickey G, Lazo GR, Lin MF, Mahmoud M, Marschall T, Minkin I, Monlong J, Musunuri RL, Sagayaradj S, Novak AM, Rautiainen M, Regier A, Sedlazeck FJ, Siren J, Souilmi Y, Wagner J, Wrightsman T, Yokoyama TT, Zeng Q, Zook JM, Paten B, and Busby B
- Abstract
In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript., Competing Interests: No competing interests were disclosed., (Copyright: © 2019 Llamas B et al.)
- Published
- 2019
- Full Text
- View/download PDF
38. YES1 amplification is a mechanism of acquired resistance to EGFR inhibitors identified by transposon mutagenesis and clinical genomics.
- Author
-
Fan PD, Narzisi G, Jayaprakash AD, Venturini E, Robine N, Smibert P, Germer S, Yu HA, Jordan EJ, Paik PK, Janjigian YY, Chaft JE, Wang L, Jungbluth AA, Middha S, Spraggon L, Qiao H, Lovly CM, Kris MG, Riely GJ, Politi K, Varmus H, and Ladanyi M
- Subjects
- Cell Line, Tumor, Humans, Proto-Oncogene Proteins c-fyn genetics, Proto-Oncogene Proteins c-fyn metabolism, Proto-Oncogene Proteins pp60(c-src) genetics, Proto-Oncogene Proteins pp60(c-src) metabolism, DNA Transposable Elements, Drug Resistance, Neoplasm, Enzyme Inhibitors pharmacology, ErbB Receptors antagonists & inhibitors, ErbB Receptors genetics, ErbB Receptors metabolism, Gene Amplification, Gene Expression Regulation, Neoplastic drug effects, Lung Neoplasms drug therapy, Lung Neoplasms genetics, Lung Neoplasms metabolism, Lung Neoplasms pathology, Proto-Oncogene Proteins c-yes biosynthesis, Proto-Oncogene Proteins c-yes genetics
- Abstract
In ∼30% of patients with EGFR -mutant lung adenocarcinomas whose disease progresses on EGFR inhibitors, the basis for acquired resistance remains unclear. We have integrated transposon mutagenesis screening in an EGFR -mutant cell line and clinical genomic sequencing in cases of acquired resistance to identify mechanisms of resistance to EGFR inhibitors. The most prominent candidate genes identified by insertions in or near the genes during the screen were MET , a gene whose amplification is known to mediate resistance to EGFR inhibitors, and the gene encoding the Src family kinase YES1. Cell clones with transposon insertions that activated expression of YES1 exhibited resistance to all three generations of EGFR inhibitors and sensitivity to pharmacologic and siRNA-mediated inhibition of YES1 Analysis of clinical genomic sequencing data from cases of acquired resistance to EGFR inhibitors revealed amplification of YES1 in five cases, four of which lacked any other known mechanisms of resistance. Preinhibitor samples, available for two of the five patients, lacked YES1 amplification. None of 136 postinhibitor samples had detectable amplification of other Src family kinases ( SRC and FYN ). YES1 amplification was also found in 2 of 17 samples from ALK fusion-positive lung cancer patients who had progressed on ALK TKIs. Taken together, our findings identify acquired amplification of YES1 as a recurrent and targetable mechanism of resistance to EGFR inhibition in EGFR -mutant lung cancers and demonstrate the utility of transposon mutagenesis in discovering clinically relevant mechanisms of drug resistance., Competing Interests: Conflict of interest statement: H.A.Y. has served on the advisory boards for AstraZeneca and Boehringer Ingelheim. Y.Y.J. has received consulting fees from Bristol–Myers Squibb and honoraria from Pfizer, Genentech, and Boehringer Ingelheim. J.E.C. has received consulting fees from AstraZeneca, Genentech, Bristol–Myers Squibb, and Merck. M.G.K. has served as a consultant for AstraZeneca. C.M.L. has served on the Advisory Board for Cepheid Oncology and has received consulting fees from Pfizer, Novartis, AstraZeneca, Genoptix, Sequenom, Ariad, Takeda, and Foundation Medicine. G.J.R. has received consulting fees from Roche, and Memorial Sloan Kettering Cancer Center (MSKCC) has received support from Pfizer and Roche to fund G.J.R.’s clinical research. K.P. has received research funding from AstraZeneca, Roche, Kolltan, and Symphogen; honoraria for consulting or advisory roles from AstraZeneca, Merck, Novartis, and Tocagen; and royalties from intellectual property licensed by MSKCC to Molecular MD. M.L. has received advisory board compensation from Boehringer Ingelheim, AstraZeneca, Bristol-Myers Squibb, Takeda, and Bayer, and research support from LOXO Oncology., (Copyright © 2018 the Author(s). Published by PNAS.)
- Published
- 2018
- Full Text
- View/download PDF
39. Genome-wide somatic variant calling using localized colored de Bruijn graphs.
- Author
-
Narzisi G, Corvelo A, Arora K, Bergmann EA, Shah M, Musunuri R, Emde AK, Robine N, Vacic V, and Zody MC
- Abstract
Reliable detection of somatic variations is of critical importance in cancer research. Here we present Lancet, an accurate and sensitive somatic variant caller, which detects SNVs and indels by jointly analyzing reads from tumor and matched normal samples using colored de Bruijn graphs. We demonstrate, through extensive experimental comparison on synthetic and real whole-genome sequencing datasets, that Lancet has better accuracy, especially for indel detection, than widely used somatic callers, such as MuTect, MuTect2, LoFreq, Strelka, and Strelka2. Lancet features a reliable variant scoring system, which is essential for variant prioritization, and detects low-frequency mutations without sacrificing the sensitivity to call longer insertions and deletions empowered by the local-assembly engine. In addition to genome-wide analysis, Lancet allows inspection of somatic variants in graph space, which augments the traditional read alignment visualization to help confirm a variant of interest. Lancet is available as an open-source program at https://github.com/nygenome/lancet., Competing Interests: The authors declare no competing interests.
- Published
- 2018
- Full Text
- View/download PDF
40. Detection of long repeat expansions from PCR-free whole-genome sequence data.
- Author
-
Dolzhenko E, van Vugt JJFA, Shaw RJ, Bekritsky MA, van Blitterswijk M, Narzisi G, Ajay SS, Rajan V, Lajoie BR, Johnson NH, Kingsbury Z, Humphray SJ, Schellevis RD, Brands WJ, Baker M, Rademakers R, Kooyman M, Tazelaar GHP, van Es MA, McLaughlin R, Sproviero W, Shatunov A, Jones A, Al Khleifat A, Pittman A, Morgan S, Hardiman O, Al-Chalabi A, Shaw C, Smith B, Neo EJ, Morrison K, Shaw PJ, Reeves C, Winterkorn L, Wexler NS, Housman DE, Ng CW, Li AL, Taft RJ, van den Berg LH, Bentley DR, Veldink JH, and Eberle MA
- Subjects
- Algorithms, C9orf72 Protein genetics, Databases, Genetic, Humans, Precision Medicine, Sensitivity and Specificity, Software, Amyotrophic Lateral Sclerosis genetics, DNA Repeat Expansion, Whole Genome Sequencing methods
- Abstract
Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions., (© 2017 Dolzhenko et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2017
- Full Text
- View/download PDF
41. Indel variant analysis of short-read sequencing data with Scalpel.
- Author
-
Fang H, Bergmann EA, Arora K, Vacic V, Zody MC, Iossifov I, O'Rawe JA, Wu Y, Jimenez Barron LT, Rosenbaum J, Ronemus M, Lee YH, Wang Z, Dikoglu E, Jobanputra V, Lyon GJ, Wigler M, Schatz MC, and Narzisi G
- Subjects
- Alleles, Genomics, Humans, Molecular Sequence Annotation, Polymorphism, Single Nucleotide, DNA Mutational Analysis methods, High-Throughput Nucleotide Sequencing methods, INDEL Mutation
- Abstract
As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in ∼5 h after read mapping.
- Published
- 2016
- Full Text
- View/download PDF
42. The challenge of small-scale repeats for indel discovery.
- Author
-
Narzisi G and Schatz MC
- Abstract
Repetitive sequences are abundant in the human genome. Different classes of repetitive DNA sequences, including simple repeats, tandem repeats, segmental duplications, interspersed repeats, and other elements, collectively span more than 50% of the genome. Because repeat sequences occur in the genome at different scales they can cause various types of sequence analysis errors, including in alignment, de novo assembly, and annotation, among others. This mini-review highlights the challenges introduced by small-scale repeat sequences, especially near-identical tandem or closely located repeats and short tandem repeats, for discovering DNA insertion and deletion (indel) mutations from next-generation sequencing data. We also discuss the de Bruijn graph sequence assembly paradigm that is emerging as the most popular and promising approach for detecting indels. The human exome is taken as an example and highlights how these repetitive elements can obscure or introduce errors while detecting these types of mutations.
- Published
- 2015
- Full Text
- View/download PDF
43. The contribution of de novo coding mutations to autism spectrum disorder.
- Author
-
Iossifov I, O'Roak BJ, Sanders SJ, Ronemus M, Krumm N, Levy D, Stessman HA, Witherspoon KT, Vives L, Patterson KE, Smith JD, Paeper B, Nickerson DA, Dea J, Dong S, Gonzalez LE, Mandell JD, Mane SM, Murtha MT, Sullivan CA, Walker MF, Waqar Z, Wei L, Willsey AJ, Yamrom B, Lee YH, Grabowska E, Dalkic E, Wang Z, Marks S, Andrews P, Leotta A, Kendall J, Hakker I, Rosenbaum J, Ma B, Rodgers L, Troge J, Narzisi G, Yoon S, Schatz MC, Ye K, McCombie WR, Shendure J, Eichler EE, State MW, and Wigler M
- Subjects
- Child, Cluster Analysis, Exome genetics, Female, Genes, Humans, Intelligence Tests, Male, Reproducibility of Results, Child Development Disorders, Pervasive genetics, Genetic Predisposition to Disease genetics, Mutation genetics, Open Reading Frames genetics
- Abstract
Whole exome sequencing has proven to be a powerful tool for understanding the genetic architecture of human disease. Here we apply it to more than 2,500 simplex families, each having a child with an autistic spectrum disorder. By comparing affected to unaffected siblings, we show that 13% of de novo missense mutations and 43% of de novo likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding de novo mutations contribute to about 30% of all simplex and 45% of female diagnoses. Almost all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower intelligence quotient (IQ), but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to contributory missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Most of the significance for the latter comes from affected females.
- Published
- 2014
- Full Text
- View/download PDF
44. Reducing INDEL calling errors in whole genome and exome sequencing data.
- Author
-
Fang H, Wu Y, Narzisi G, O'Rawe JA, Barrón LT, Rosenbaum J, Ronemus M, Iossifov I, Schatz MC, and Lyon GJ
- Abstract
Background: INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts., Methods: We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low-quality INDELs (7% vs. 51%)., Results: Simulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (53%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (84% vs. 57%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data., Conclusions: Overall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (for example, capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.
- Published
- 2014
- Full Text
- View/download PDF
45. Accurate de novo and transmitted indel detection in exome-capture data using microassembly.
- Author
-
Narzisi G, O'Rawe JA, Iossifov I, Fang H, Lee YH, Wang Z, Wu Y, Lyon GJ, Wigler M, and Schatz MC
- Subjects
- Algorithms, Computational Biology methods, DNA chemistry, Databases, Genetic, Humans, Mutation, Programming Languages, Sequence Alignment, Software, DNA Mutational Analysis methods, Exome, INDEL Mutation
- Abstract
We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpel's power to detect long (≥30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.
- Published
- 2014
- Full Text
- View/download PDF
46. Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.).
- Author
-
Ming R, VanBuren R, Liu Y, Yang M, Han Y, Li LT, Zhang Q, Kim MJ, Schatz MC, Campbell M, Li J, Bowers JE, Tang H, Lyons E, Ferguson AA, Narzisi G, Nelson DR, Blaby-Haas CE, Gschwend AR, Jiao Y, Der JP, Zeng F, Han J, Min XJ, Hudson KA, Singh R, Grennan AK, Karpowicz SJ, Watling JR, Ito K, Robinson SA, Hudson ME, Yu Q, Mockler TC, Carroll A, Zheng Y, Sunkar R, Jia R, Chen N, Arro J, Wai CM, Wafula E, Spence A, Han Y, Xu L, Zhang J, Peery R, Haus MJ, Xiong W, Walsh JA, Wu J, Wang ML, Zhu YJ, Paull RE, Britt AB, Du C, Downie SR, Schuler MA, Michael TP, Long SP, Ort DR, Schopf JW, Gang DR, Jiang N, Yandell M, dePamphilis CW, Merchant SS, Paterson AH, Buchanan BB, Li S, and Shen-Miller J
- Subjects
- Adaptation, Biological, Amino Acid Substitution, Evolution, Molecular, Molecular Sequence Data, Mutation Rate, Nelumbo classification, Nelumbo physiology, Phylogeny, Vitis genetics, Genome, Plant, Nelumbo genetics
- Abstract
Background: Sacred lotus is a basal eudicot with agricultural, medicinal, cultural and religious importance. It was domesticated in Asia about 7,000 years ago, and cultivated for its rhizomes and seeds as a food crop. It is particularly noted for its 1,300-year seed longevity and exceptional water repellency, known as the lotus effect. The latter property is due to the nanoscopic closely packed protuberances of its self-cleaning leaf surface, which have been adapted for the manufacture of a self-cleaning industrial paint, Lotusan., Results: The genome of the China Antique variety of the sacred lotus was sequenced with Illumina and 454 technologies, at respective depths of 101× and 5.2×. The final assembly has a contig N50 of 38.8 kbp and a scaffold N50 of 3.4 Mbp, and covers 86.5% of the estimated 929 Mbp total genome size. The genome notably lacks the paleo-triplication observed in other eudicots, but reveals a lineage-specific duplication. The genome has evidence of slow evolution, with a 30% slower nucleotide mutation rate than observed in grape. Comparisons of the available sequenced genomes suggest a minimum gene set for vascular plants of 4,223 genes. Strikingly, the sacred lotus has 16 COG2132 multi-copper oxidase family proteins with root-specific expression; these are involved in root meristem phosphate starvation, reflecting adaptation to limited nutrient availability in an aquatic environment., Conclusions: The slow nucleotide substitution rate makes the sacred lotus a better resource than the current standard, grape, for reconstructing the pan-eudicot genome, and should therefore accelerate comparative analysis between eudicots and monocots.
- Published
- 2013
- Full Text
- View/download PDF
47. Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies.
- Author
-
Schatz MC, Phillippy AM, Sommer DD, Delcher AL, Puiu D, Narzisi G, Salzberg SL, and Pop M
- Subjects
- Animals, Computational Biology, Computer Graphics, Data Display, High-Throughput Nucleotide Sequencing statistics & numerical data, Humans, Genomics statistics & numerical data, Sequence Analysis, DNA statistics & numerical data, Software
- Abstract
Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler. These applications have been used to assemble and analyze dozens of genomes ranging in complexity from simple microbial species through mammalian genomes. Recent efforts have been focused on enhancing support for new data characteristics brought on by second- and now third-generation sequencing. This review describes the major components of AMOS in light of these challenges, with an emphasis on methods for assessing assembly quality and the visual analytics capabilities of Hawkeye. These interactive graphical aspects are essential for navigating and understanding the complexities of a genome assembly, from the overall genome structure down to individual bases. Hawkeye and AMOS are available open source at http://amos.sourceforge.net.
- Published
- 2013
- Full Text
- View/download PDF
48. De novo gene disruptions in children on the autistic spectrum.
- Author
-
Iossifov I, Ronemus M, Levy D, Wang Z, Hakker I, Rosenbaum J, Yamrom B, Lee YH, Narzisi G, Leotta A, Kendall J, Grabowska E, Ma B, Marks S, Rodgers L, Stepansky A, Troge J, Andrews P, Bekritsky M, Pradhan K, Ghiban E, Kramer M, Parla J, Demeter R, Fulton LL, Fulton RS, Magrini VJ, Ye K, Darnell JC, Darnell RB, Mardis ER, Wilson RK, Schatz MC, McCombie WR, and Wigler M
- Subjects
- Child, Child Development Disorders, Pervasive etiology, Child, Preschool, Family Health, Female, Gene Dosage, Genetic Association Studies, Humans, Male, Models, Molecular, Parents, Phenotype, Child Development Disorders, Pervasive genetics, Fragile X Mental Retardation Protein genetics, Genetic Predisposition to Disease, Mutation genetics
- Abstract
Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveal de novo small indels and point substitutions, which come mostly from the paternal line in an age-dependent manner. We do not see significantly greater numbers of de novo missense mutations in affected versus unaffected children, but gene-disrupting mutations (nonsense, splice site, and frame shifts) are twice as frequent, 59 to 28. Based on this differential and the number of recurrent and total targets of gene disruption found in our and similar studies, we estimate between 350 and 400 autism susceptibility genes. Many of the disrupted genes in these studies are associated with the fragile X protein, FMRP, reinforcing links between autism and synaptic plasticity. We find FMRP-associated genes are under greater purifying selection than the remainder of genes and suggest they are especially dosage-sensitive targets of cognitive disorders., (Copyright © 2012 Elsevier Inc. All rights reserved.)
- Published
- 2012
- Full Text
- View/download PDF
49. Reevaluating assembly evaluations with feature response curves: GAGE and assemblathons.
- Author
-
Vezzi F, Narzisi G, and Mishra B
- Subjects
- Algorithms, Computational Biology methods, Sequence Analysis, DNA methods, Software
- Abstract
In just the last decade, a multitude of bio-technologies and software pipelines have emerged to revolutionize genomics. To further their central goal, they aim to accelerate and improve the quality of de novo whole-genome assembly starting from short DNA sequences/reads. However, the performance of each of these tools is contingent on the length and quality of the sequencing data, the structure and complexity of the genome sequence, and the resolution and quality of long-range information. Furthermore, in the absence of any metric that captures the most fundamental "features" of a high-quality assembly, there is no obvious recipe for users to select the most desirable assembler/assembly. This situation has prompted the scientific community to rely on crowd-sourcing through international competitions, such as Assemblathons or GAGE, with the intention of identifying the best assembler(s) and their features. Somewhat circuitously, the only available approach to gauge de novo assemblies and assemblers relies solely on the availability of a high-quality fully assembled reference genome sequence. Still worse, reference-guided evaluations are often both difficult to analyze, leading to conclusions that are difficult to interpret. In this paper, we circumvent many of these issues by relying upon a tool, dubbed [Formula: see text], which is capable of evaluating de novo assemblies from the read-layouts even when no reference exists. We extend the FRCurve approach to cases where lay-out information may have been obscured, as is true in many deBruijn-graph-based algorithms. As a by-product, FRCurve now expands its applicability to a much wider class of assemblers - thus, identifying higher-quality members of this group, their inter-relations as well as sensitivity to carefully selected features, with or without the support of a reference sequence or layout for the reads. The paper concludes by reevaluating several recently conducted assembly competitions and the datasets that have resulted from them.
- Published
- 2012
- Full Text
- View/download PDF
50. Feature-by-feature--evaluating de novo sequence assembly.
- Author
-
Vezzi F, Narzisi G, and Mishra B
- Subjects
- Contig Mapping, Genome, Methods, Computational Biology methods, Sequence Analysis, DNA methods
- Abstract
The whole-genome sequence assembly (WGSA) problem is among one of the most studied problems in computational biology. Despite the availability of a plethora of tools (i.e., assemblers), all claiming to have solved the WGSA problem, little has been done to systematically compare their accuracy and power. Traditional methods rely on standard metrics and read simulation: while on the one hand, metrics like N50 and number of contigs focus only on size without proportionately emphasizing the information about the correctness of the assembly, comparisons performed on simulated dataset, on the other hand, can be highly biased by the non-realistic assumptions in the underlying read generator. Recently the Feature Response Curve (FRC) method was proposed to assess the overall assembly quality and correctness: FRC transparently captures the trade-offs between contigs' quality against their sizes. Nevertheless, the relationship among the different features and their relative importance remains unknown. In particular, FRC cannot account for the correlation among the different features. We analyzed the correlation among different features in order to better describe their relationships and their importance in gauging assembly quality and correctness. In particular, using multivariate techniques like principal and independent component analysis we were able to estimate the "excess-dimensionality" of the feature space. Moreover, principal component analysis allowed us to show how poorly the acclaimed N50 metric describes the assembly quality. Applying independent component analysis we identified a subset of features that better describe the assemblers performances. We demonstrated that by focusing on a reduced set of highly informative features we can use the FRC curve to better describe and compare the performances of different assemblers. Moreover, as a by-product of our analysis, we discovered how often evaluation based on simulated data, obtained with state of the art simulators, lead to not-so-realistic results.
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.