12 results on '"Ilia Minkin"'
Search Results
2. Structure-guided isoform identification for the human transcriptome
- Author
-
Markus J Sommer, Sooyoung Cha, Ales Varabyou, Natalia Rincon, Sukhwan Park, Ilia Minkin, Mihaela Pertea, Martin Steinegger, and Steven L Salzberg
- Subjects
protein structure prediction ,genome annotation ,mouse ,transcriptomics ,proteomics ,machine learning ,Medicine ,Science ,Biology (General) ,QH301-705.5 - Abstract
Recently developed methods to predict three-dimensional protein structure with high accuracy have opened new avenues for genome and proteome research. We explore a new hypothesis in genome annotation, namely whether computationally predicted structures can help to identify which of multiple possible gene isoforms represents a functional protein product. Guided by protein structure predictions, we evaluated over 230,000 isoforms of human protein-coding genes assembled from over 10,000 RNA sequencing experiments across many human tissues. From this set of assembled transcripts, we identified hundreds of isoforms with more confidently predicted structure and potentially superior function in comparison to canonical isoforms in the latest human gene database. We illustrate our new method with examples where structure provides a guide to function in combination with expression and evolutionary evidence. Additionally, we provide the complete set of structures as a resource to better understand the function of human genes and their isoforms. These results demonstrate the promise of protein structure prediction as a genome annotation tool, allowing us to refine even the most highly curated catalog of human proteins. More generally we demonstrate a practical, structure-guided approach that can be used to enhance the annotation of any genome.
- Published
- 2022
- Full Text
- View/download PDF
3. Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ
- Author
-
Ilia Minkin and Paul Medvedev
- Subjects
Science - Abstract
Multiple whole-genome alignment is a challenging problem in bioinformatics, especially when computational resources are limited. Here the authors present SibeliaZ, an algorithm and software based on analysis of de Bruijn graphs, which provides improved computational efficiency and scalability.
- Published
- 2020
- Full Text
- View/download PDF
4. A strategy for building and using a human reference pangenome [version 2; peer review: 2 approved]
- Author
-
Bastien Llamas, Giuseppe Narzisi, Valerie Schneider, Peter A. Audano, Evan Biederstedt, Lon Blauvelt, Peter Bradbury, Xian Chang, Chen-Shan Chin, Arkarachai Fungtammasan, Wayne E. Clarke, Alan Cleary, Jana Ebler, Jordan Eizenga, Jonas A. Sibbesen, Charles J. Markello, Erik Garrison, Shilpa Garg, Glenn Hickey, Gerard R. Lazo, Michael F. Lin, Medhat Mahmoud, Tobias Marschall, Ilia Minkin, Jean Monlong, Rajeeva L. Musunuri, Sagayamary Sagayaradj, Adam M. Novak, Mikko Rautiainen, Allison Regier, Fritz J. Sedlazeck, Jouni Siren, Yassine Souilmi, Justin Wagner, Travis Wrightsman, Toshiyuki T. Yokoyama, Qiandong Zeng, Justin M. Zook, Benedict Paten, and Ben Busby
- Subjects
Medicine ,Science - Abstract
In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.
- Published
- 2021
- Full Text
- View/download PDF
5. Scalable Pairwise Whole-Genome Homology Mapping of Long Genomes with BubbZ
- Author
-
Ilia Minkin and Paul Medvedev
- Subjects
Algorithms ,Bioinformatics ,Genomics ,Science - Abstract
Summary: Pairwise whole-genome homology mapping is the problem of finding all pairs of homologous intervals between a pair of genomes. As the number of available whole genomes has been rising dramatically in the last few years, there has been a need for more scalable homology mappers. In this paper, we develop an algorithm (BubbZ) for computing whole-genome pairwise homology mappings, especially in the context of all-to-all comparison for multiple genomes. BubbZ is based on an algorithm for computing chains in compacted de Bruijn graphs. We evaluate BubbZ on simulated datasets, a dataset composed of 16 long mouse genomes, and a large dataset of 1,600 Salmonella genomes. We show up to approximately an order of magnitude speed improvement, compared with MashMap2 and Minimap2, while retaining similar accuracy.
- Published
- 2020
- Full Text
- View/download PDF
6. A strategy for building and using a human reference pangenome [version 1; peer review: 1 approved, 1 approved with reservations]
- Author
-
Bastien Llamas, Giuseppe Narzisi, Valerie Schneider, Peter A. Audano, Evan Biederstedt, Lon Blauvelt, Peter Bradbury, Xian Chang, Chen-Shan Chin, Arkarachai Fungtammasan, Wayne E. Clarke, Alan Cleary, Jana Ebler, Jordan Eizenga, Jonas A. Sibbesen, Charles J. Markello, Erik Garrison, Shilpa Garg, Glenn Hickey, Gerard R. Lazo, Michael F. Lin, Medhat Mahmoud, Tobias Marschall, Ilia Minkin, Jean Monlong, Rajeeva L. Musunuri, Sagayamary Sagayaradj, Adam M. Novak, Mikko Rautiainen, Allison Regier, Fritz J. Sedlazeck, Jouni Siren, Yassine Souilmi, Justin Wagner, Travis Wrightsman, Toshiyuki T. Yokoyama, Qiandong Zeng, Justin M. Zook, Benedict Paten, and Ben Busby
- Subjects
Software Tool Article ,Articles ,Hackathon ,Pangenome ,Graph Genome ,RNAseq ,Structural Variant - Abstract
In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.
- Published
- 2019
- Full Text
- View/download PDF
7. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure
- Author
-
Ales Varabyou, Markus J. Sommer, Beril Erdogdu, Ida Shinder, Ilia Minkin, Kuan-Hao Chao, Sukhwan Park, Jakob Heinz, Christopher Pockrandt, Alaina Shumate, Natalia Rincon, Daniela Puiu, Martin Steinegger, Steven L. Salzberg, and Mihaela Pertea
- Abstract
The original CHESS database of human genes was assembled from nearly 10,000 RNA sequencing experiments in 53 human body sites produced by the Genotype-Tissue Expression (GTEx) project, and then augmented with genes from other databases to yield a comprehensive collection of protein-coding and noncoding transcripts. The construction of the new CHESS 3 database employed improved transcript assembly algorithms, a new machine learning classifier, and protein structure predictions to identify genes and transcripts likely to be functional and to eliminate those that appeared more likely to represent noise. The new catalog contains 41,356 genes on the GRCh38 reference human genome, of which 19,839 are protein-coding, and a total of 158,377 transcripts. These include 14,863 novel protein-coding transcripts. The total number of transcripts is substantially smaller than earlier versions due to improved transcriptome assembly methods and to a stricter protocol for filtering out noisy transcripts. Notably, CHESS 3 contains all of the transcripts in the MANE database, and at least one transcript corresponding to the vast majority of protein-coding genes in the RefSeq and GENCODE databases. CHESS 3 has also been mapped onto the complete CHM13 human genome, which gives a more-complete gene count of 43,773 genes and 19,968 protein-coding genes. The CHESS database is available athttp://ccb.jhu.edu/chess.
- Published
- 2022
- Full Text
- View/download PDF
8. Author response: Structure-guided isoform identification for the human transcriptome
- Author
-
Markus J Sommer, Sooyoung Cha, Ales Varabyou, Natalia Rincon, Sukhwan Park, Ilia Minkin, Mihaela Pertea, Martin Steinegger, and Steven L Salzberg
- Published
- 2022
- Full Text
- View/download PDF
9. Highly accurate isoform identification for the human transcriptome
- Author
-
Markus J. Sommer, Sooyoung Cha, Ales Varabyou, Natalia Rincon, Sukhwan Park, Ilia Minkin, Mihaela Pertea, Martin Steinegger, and Steven L. Salzberg
- Abstract
We explore a new hypothesis in genome annotation, namely whether computationally predicted protein structures can help to identify which of multiple possible gene isoforms represents a functional protein product. Guided by structure predictions, we evaluated over 140,000 isoforms of human protein-coding genes assembled from over 10,000 RNA sequencing experiments across many human tissues. We illustrate our new method with examples where structure provides a guide to function in combination with expression and evolutionary evidence. Additionally, we provide the complete set of structures as a resource to better understand the function of human genes and their isoforms. These results demonstrate the promise of protein structure prediction as a genome annotation tool, allowing us to refine even the most highly-curated catalog of human proteins.One-Sentence SummaryWe describe the use of 3D protein structures on a genome-wide scale to evaluate human protein isoforms for biological functionality.
- Published
- 2022
- Full Text
- View/download PDF
10. Structure-guided isoform identification for the human transcriptome.
- Author
-
Sommer, Markus J., Sooyoung Cha, Ales Varabyou, Rincon, Natalia, Sukhwan Park, Ilia Minkin, Pertea, Mihaela, Steinegger, Martin, and Salzberg, Steven L.
- Published
- 2022
- Full Text
- View/download PDF
11. A strategy for building and using a human reference pangenome
- Author
-
Wayne E. Clarke, Ilia Minkin, Yassine Souilmi, Shilpa Garg, Jouni Sirén, Arkarachai Fungtammasan, Medhat Mahmoud, Giuseppe Narzisi, Peter J. Bradbury, Ben Busby, Justin Wagner, Mikko Rautiainen, Gerard R. Lazo, Rajeeva Musunuri, Fritz J. Sedlazeck, Jean Monlong, Valerie A. Schneider, Erik Garrison, Xian Chang, Justin M. Zook, Bastien Llamas, Jonas Andreas Sibbesen, Allison A. Regier, Glenn Hickey, Evan Biederstedt, Toshiyuki T. Yokoyama, Sagayamary Sagayaradj, Lon Blauvelt, Qiandong Zeng, Peter A. Audano, Charles Markello, Alan Cleary, Benedict Paten, Adam M. Novak, Chen-Shan Chin, Michael F. Lin, Tobias Marschall, Jordan M. Eizenga, Travis Wrightsman, and Jana Ebler
- Subjects
Computer science ,USable ,General Biochemistry, Genetics and Molecular Biology ,Pangenome ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Software ,Leverage (statistics) ,General Pharmacology, Toxicology and Pharmaceutics ,Hackathon ,Structural Variant ,030304 developmental biology ,0303 health sciences ,General Immunology and Microbiology ,business.industry ,Software Tool Article ,General Medicine ,Articles ,Technical specifications ,Data structure ,RNAseq ,Data science ,Visualization ,Graph Genome ,Graph (abstract data type) ,business ,030217 neurology & neurosurgery - Abstract
In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.
- Published
- 2019
12. TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes
- Author
-
Paul Medvedev, Son Pham, and Ilia Minkin
- Subjects
0301 basic medicine ,Statistics and Probability ,FOS: Computer and information sciences ,Primates ,Theoretical computer science ,Computer science ,0206 medical engineering ,Population ,02 engineering and technology ,Biochemistry ,Genome ,De Bruijn graph ,03 medical and health sciences ,symbols.namesake ,Computer Science - Data Structures and Algorithms ,Animals ,Humans ,Quantitative Biology - Genomics ,Data Structures and Algorithms (cs.DS) ,education ,Molecular Biology ,De Bruijn sequence ,Genomics (q-bio.GN) ,education.field_of_study ,Genome, Human ,Genomics ,Data structure ,Graph ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,FOS: Biological sciences ,symbols ,Human genome ,020602 bioinformatics ,Algorithms ,Software - Abstract
Motivation de Bruijn graphs have been proposed as a data structure to facilitate the analysis of related whole genome sequences, in both a population and comparative genomic settings. However, current approaches do not scale well to many genomes of large size (such as mammalian genomes). Results In this article, we present TwoPaCo, a simple and scalable low memory algorithm for the direct construction of the compacted de Bruijn graph from a set of complete genomes. We demonstrate that it can construct the graph for 100 simulated human genomes in less than a day and eight real primates in Availability and Implementation Our code and data is available for download from github.com/medvedevgroup/TwoPaCo. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2016
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.