29 results on '"Numanagić, Ibrahim"'
Search Results
2. Geny: a genotyping tool for allelic decomposition of killer cell immunoglobulin-like receptor genes.
- Author
-
Zhou, Qinghui, Ghezelji, Mazyar, Hari, Ananth, Ford, Michael K. B., Holley, Connor, Sahinalp, S. Cenk, and Numanagić, Ibrahim
- Subjects
KILLER cell receptors ,COMPUTATIONAL biology ,PAN-genome ,COMBINATORIAL optimization ,NUCLEOTIDE sequencing - Abstract
Introduction: Accurate genotyping of Killer cell Immunoglobulin-like Receptor (KIR) genes plays a pivotal role in enhancing our understanding of innate immune responses, disease correlations, and the advancement of personalized medicine. However, due to the high variability of the KIR region and high level of sequence similarity among different KIR genes, the generic genotyping workflows are unable to accurately infer copy numbers and complete genotypes of individual KIR genes from next-generation sequencing data. Thus, specialized genotyping tools are needed to genotype this complex region. Methods: Here, we introduce Geny, a new computational tool for precise genotyping of KIR genes. Geny utilizes available KIR allele databases and proposes a novel combination of expectation-maximization filtering schemes and integer linear programming-based combinatorial optimization models to resolve ambiguous reads, provide accurate copy number estimation, and estimate the correct allele of each copy of genes within the KIR region. Results & Discussion: We evaluated Geny on a large set of simulated short-read datasets covering the known validated KIR region assemblies and a set of Illumina short-read samples sequenced from 40 validated samples from the Human Pangenome Reference Consortium collection and showed that it outperforms the existing state-of-the-art KIR genotyping tools in terms of accuracy, precision, and recall. We envision Geny becoming a valuable resource for understanding immune system response and consequently advancing the field of patient-centric medicine. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
3. CYP2C8, CYP2C9, and CYP2C19 Characterization Using Next-Generation Sequencing and Haplotype Analysis: A GeT-RM Collaborative Project
- Author
-
Gaedigk, Andrea, Boone, Erin C., Scherer, Steven E., Lee, Seung-been, Numanagić, Ibrahim, Sahinalp, Cenk, Smith, Joshua D., McGee, Sean, Radhakrishnan, Aparna, Qin, Xiang, Wang, Wendy Y., Farrow, Emily G., Gonzaludo, Nina, Halpern, Aaron L., Nickerson, Deborah A., Miller, Neil A., Pratt, Victoria M., and Kalman, Lisa V.
- Published
- 2022
- Full Text
- View/download PDF
4. Fast characterization of segmental duplication structure in multiple genome assemblies
- Author
-
Išerić, Hamza, Alkan, Can, Hach, Faraz, and Numanagić, Ibrahim
- Published
- 2022
- Full Text
- View/download PDF
5. Diagnostics of viral infections using high-throughput genome sequencing data.
- Author
-
Ning, Haochen, Boyes, Ian, Numanagić, Ibrahim, Rott, Michael, Xing, Li, and Zhang, Xuekui
- Subjects
PLANT viruses ,VIRUS diseases ,NUCLEOTIDE sequencing ,PLANT genomes ,VIRAL genomes - Abstract
Plant viral infections cause significant economic losses, totalling $350 billion USD in 2021. With no treatment for virus-infected plants, accurate and efficient diagnosis is crucial to preventing and controlling these diseases. High-throughput sequencing (HTS) enables cost-efficient identification of known and unknown viruses. However, existing diagnostic pipelines face challenges. First, many methods depend on subjectively chosen parameter values, undermining their robustness across various data sources. Second, artifacts (e.g. false peaks) in the mapped sequence data can lead to incorrect diagnostic results. While some methods require manual or subjective verification to address these artifacts, others overlook them entirely, affecting the overall method performance and leading to imprecise or labour-intensive outcomes. To address these challenges, we introduce IIMI, a new automated analysis pipeline using machine learning to diagnose infections from 1583 plant viruses with HTS data. It adopts a data-driven approach for parameter selection, reducing subjectivity, and automatically filters out regions affected by artifacts, thus improving accuracy. Testing with in-house and published data shows IIMI's superiority over existing methods. Besides a prediction model, IIMI also provides resources on plant virus genomes, including annotations of regions prone to artifacts. The method is available as an R package (iimi) on CRAN and will integrate with the web application www.virtool.ca , enhancing accessibility and user convenience. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Biologically-informed killer cell immunoglobulin-like receptor gene annotation tool.
- Author
-
Ford, Michael K B, Hari, Ananth, Zhou, Qinghui, Numanagić, Ibrahim, and Sahinalp, S Cenk
- Subjects
KILLER cells ,PAN-genome ,CELL physiology ,DATABASES ,SCIENTIFIC community ,KILLER cell receptors - Abstract
Summary Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This article introduces BAKIR (Biologically informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community. Availability and implementation BAKIR is available at github.com/algo-cancer/bakir [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Improved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets
- Author
-
Berger, Emily, Yorukoglu, Deniz, Zhang, Lillian, Nyquist, Sarah K., Shalek, Alex K., Kellis, Manolis, Numanagić, Ibrahim, and Berger, Bonnie
- Published
- 2020
- Full Text
- View/download PDF
8. Computational pharmacogenotype extraction from clinical next-generation sequencing.
- Author
-
Shugg, Tyler, Ly, Reynold C., Osei, Wilberforce, Rowe, Elizabeth J., Granfield, Caitlin A., Lynnes, Ty C., Medeiros, Elizabeth B., Hodge, Jennelle C., Breman, Amy M., Schneider, Bryan P., Sahinalp, S. Cenk, Numanagić, Ibrahim, Salisbury, Benjamin A., Bray, Steven M., Ratcliff, Ryan, and Skaar, Todd C.
- Subjects
NUCLEOTIDE sequencing ,WHOLE genome sequencing ,CYTOCHROME P-450 CYP2D6 ,CYTOCHROME P-450 CYP3A ,CYTOCHROME P-450 CYP2C19 - Abstract
Background: Next-generation sequencing (NGS), including whole genome sequencing (WGS) and whole exome sequencing (WES), is increasingly being used for clinic care. While NGS data have the potential to be repurposed to support clinical pharmacogenomics (PGx), current computational approaches have not been widely validated using clinical data. In this study, we assessed the accuracy of the Aldy computational method to extract PGx genotypes from WGS and WES data for 14 and 13 major pharmacogenes, respectively. Methods: Germline DNA was isolated from whole blood samples collected for 264 patients seen at our institutional molecular solid tumor board. DNA was used for panel-based genotyping within our institutional Clinical Laboratory Improvement Amendments- (CLIA-) certified PGx laboratory. DNA was also sent to other CLIA-certified commercial laboratories for clinical WGS or WES. Aldy v3.3 and v4.4 were used to extract PGx genotypes from these NGS data, and results were compared to the panel-based genotyping reference standard that contained 45 star allele-defining variants within CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, CYP4F2, DPYD, G6PD, NUDT15, SLCO1B1, TPMT, and VKORC1. Results: Mean WGS read depth was >30x for all variant regions except for G6PD (average read depth was 29 reads), and mean WES read depth was >30x for all variant regions. For 94 patients with WGS, Aldy v3.3 diplotype calls were concordant with those from the genotyping reference standard in 99.5% of cases when excluding diplotypes with additional major star alleles not tested by targeted genotyping, ambiguous phasing, and CYP2D6 hybrid alleles. Aldy v3.3 identified 15 additional clinically actionable star alleles not covered by genotyping within CYP2B6, CYP2C19, DPYD, SLCO1B1, and NUDT15. Within the WGS cohort, Aldy v4.4 diplotype calls were concordant with those from genotyping in 99.7% of cases. When excluding patients with CYP2D6 copy number variation, all Aldy v4.4 diplotype calls except for one CYP3A4 diplotype call were concordant with genotyping for 161 patients in the WES cohort. Conclusion: Aldy v3.3 and v4.4 called diplotypes for major pharmacogenes from clinical WES and WGS data with >99% accuracy. These findings support the use of Aldy to repurpose clinical NGS data to inform clinical PGx. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Analytical Validation of a Computational Method for Pharmacogenetic Genotyping from Clinical Whole Exome Sequencing
- Author
-
Ly, Reynold C., Shugg, Tyler, Ratcliff, Ryan, Osei, Wilberforce, Lynnes, Ty C., Pratt, Victoria M., Schneider, Bryan P., Radovich, Milan, Bray, Steven M., Salisbury, Benjamin A., Parikh, Baiju, Sahinalp, S. Cenk, Numanagić, Ibrahim, and Skaar, Todd C.
- Published
- 2022
- Full Text
- View/download PDF
10. Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes
- Author
-
Numanagić, Ibrahim, Malikić, Salem, Ford, Michael, Qin, Xiang, Toji, Lorraine, Radovich, Milan, Skaar, Todd C., Pratt, Victoria M., Berger, Bonnie, Scherer, Steve, and Sahinalp, S. Cenk
- Published
- 2018
- Full Text
- View/download PDF
11. ORMAN: Optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms
- Author
-
Dao, Phuong, Numanagić, Ibrahim, Lin, Yen-Yi, Hach, Faraz, Karakoc, Emre, Donmez, Nilgun, Collins, Colin, Eichler, Evan E., and Sahinalp, Cenk S.
- Published
- 2014
- Full Text
- View/download PDF
12. SCALCE: boosting sequence compression algorithms using locally consistent encoding
- Author
-
Hach, Faraz, Numanagić, Ibrahim, Alkan, Can, and Sahinalp, S Cenk
- Published
- 2012
- Full Text
- View/download PDF
13. Statistical Binning for Barcoded Reads Improves Downstream Analyses
- Author
-
Shajii, Ariya, Numanagić, Ibrahim, Whelan, Christopher, and Berger, Bonnie
- Published
- 2018
- Full Text
- View/download PDF
14. eP373: Analytical validation of a computational method for pharmacogenetic genotyping from clinical exome sequencing
- Author
-
Ly, Reynold, Shugg, Tyler, Ratcliff, Ryan, Osei, Wilberforce, Pratt, Victoria, Schneider, Bryan, Radovich, Milan, Bray, Steven, Salisbury, Benjamin, Parikh, Baiju, Sahinalp, S. Cenk, Numanagić, Ibrahim, and Skaar, Todd
- Published
- 2022
- Full Text
- View/download PDF
15. A Python-based programming language for high-performance computational genomics.
- Author
-
Shajii, Ariya, Numanagić, Ibrahim, Leighton, Alexander T., Greenyer, Haley, Amarasinghe, Saman, and Berger, Bonnie
- Published
- 2021
- Full Text
- View/download PDF
16. Computational identification of micro-structural variations and their proteogenomic consequences in cancer.
- Author
-
Lin, Yen-Yi, Gawronski, Alexander, Hach, Faraz, Li, Sujun, Numanagić, Ibrahim, Sarrafi, Iman, Mishra, Swati, McPherson, Andrew, Collins, Colin C, and Radovich, Milan
- Subjects
NUCLEOTIDE sequencing ,TRANSCRIPTOMES ,PROTEOMICS ,GENE fusion ,PEPTIDES - Abstract
Motivation: Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples. Results: We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive intergenomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Availability and implementation: MiStrVar is available for download at https://bitbucket.org/comp bio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
17. Optimal compressed representation of high throughput sequence data via light assembly.
- Author
-
Ginart, Antonio A., Joseph Hui, Kaiyuan Zhu, Numanagić, Ibrahim, Courtade, Thomas A., Cenk Sahinalp, S., and Tse, David N.
- Subjects
DATA compression ,DATA - Abstract
The most effective genomic data compression methods either assemble reads into contigs, or replace them with their alignment positions on a reference genome. Such methods require significant computational resources, but faster alternatives that avoid using explicit or de novo-constructed references fail to match their performance. Here, we introduce a new reference-free compressed representation for genomic data based on light de novo assembly of reads, where each read is represented as a node in a (compact) trie. We show how to efficiently build such tries to compactly represent reads and demonstrate that among all methods using this representation (including all de novo assembly based methods), our method achieves the shortest possible output. We also provide an lower bound on the compression rate achievable on uniformly sampled genomic read data, which is approximated by our method well. Our method significantly improves the compression performance of alternatives without compromising speed. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
18. Discovery and genotyping of novel sequence insertions in many sequenced individuals.
- Author
-
Kavak, Pınar, Yen-Yi Lin, Numanagić, Ibrahim, Asghari, Hossein, Güngör, Tunga, Alkan, Can, and Hach, Faraz
- Subjects
GENOTYPES ,GENOMES ,ALGORITHMS ,GENETICS ,ALLELES - Abstract
Motivation: Despite recent advances in algorithms design to characterize structural variation using high-throughput short read sequencing (HTS) data, characterization of novel sequence insertions longer than the average read length remains a challenging task. This is mainly due to both computational difficulties and the complexities imposed by genomic repeats in generating reliable assemblies to accurately detect both the sequence content and the exact location of such insertions. Additionally, de novo genome assembly algorithms typically require a very high depth of coverage, which may be a limiting factor for most genome studies. Therefore, characterization of novel sequence insertions is not a routine part of most sequencing projects. There are only a handful of algorithms that are specifically developed for novel sequence insertion discovery that can bypass the need for the whole genome de novo assembly. Still, most such algorithms rely on high depth of coverage, and to our knowledge there is only one method (PopIns) that can use multi-sample data to "collectively" obtain a very high coverage dataset to accurately find insertions common in a given population. Result: Here, we present Pamir, a new algorithm to efficiently and accurately discover and genotype novel sequence insertions using either single or multiple genome sequencing datasets. Pamir is able to detect breakpoint locations of the insertions and calculate their zygosity (i.e. heterozygous versus homozygous) by analyzing multiple sequence signatures, matching one-end-anchored sequences to small-scale de novo assemblies of unmapped reads, and conducting strand-aware local assembly. We test the efficacy of Pamir on both simulated and real data, and demonstrate its potential use in accurate and routine identification of novel sequence insertions in genome projects. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
19. Cypiripi: exact genotyping of CYP2D6 using high-throughput sequencing data.
- Author
-
Numanagić, Ibrahim, Malikić, Salem, Pratt, Victoria M., Skaar, Todd C., Flockhart, David A., and Sahinalp, S. Cenk
- Subjects
- *
GENOTYPES , *HIGH throughput screening (Drug development) , *NUCLEOTIDE sequence , *GENETIC code , *GENETIC recombination , *PSEUDOGENES , *GENETIC algorithms - Abstract
Motivation: CYP2D6 is highly polymorphic gene which encodes the (CYP2D6) enzyme, involved in the metabolism of 20-25% of all clinically prescribed drugs and other xenobiotics in the human body. CYP2D6 genotyping is recommended prior to treatment decisions involving one or more of the numerous drugs sensitive to CYP2D6 allelic composition. In this context, high-throughput sequencing (HTS) technologies provide a promising time-efficient and cost-effective alternative to currently used genotyping techniques. To achieve accurate interpretation of HTS data, however, one needs to overcome several obstacles such as high sequence similarity and genetic recombinations between CYP2D6 and evolutionarily related pseudogenes CYP2D7 and CYP2D8, high copy number variation among individuals and short read lengths generated by HTS technologies. Results: In this work, we present the first algorithm to computationally infer CYP2D6 genotype at basepair resolution from HTS data. Our algorithm is able to resolve complex genotypes, including alleles that are the products of duplication, deletion and fusion events involving CYP2D6 and its evolutionarily related cousin CYP2D7. Through extensive experiments using simulated and real datasets, we show that our algorithm accurately solves this important problem with potential clinical implications. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
20. Fast characterization of segmental duplications in genome assemblies.
- Author
-
Numanagić, Ibrahim, Gökkaya, Alim S, Zhang, Lillian, Berger, Bonnie, Alkan, Can, and Hach, Faraz
- Subjects
- *
NUCLEOTIDE sequence , *GENOMICS , *SCHIZOPHRENIA , *AUTISM , *GENOMES - Abstract
Motivation Segmental duplications (SDs) or low-copy repeats, are segments of DNA > 1 Kbp with high sequence identity that are copied to other regions of the genome. SDs are among the most important sources of evolution, a common cause of genomic structural variation and several are associated with diseases of genomic origin including schizophrenia and autism. Despite their functional importance, SDs present one of the major hurdles for de novo genome assembly due to the ambiguity they cause in building and traversing both state-of-the-art overlap-layout-consensus and de Bruijn graphs. This causes SD regions to be misassembled, collapsed into a unique representation, or completely missing from assembled reference genomes for various organisms. In turn, this missing or incorrect information limits our ability to fully understand the evolution and the architecture of the genomes. Despite the essential need to accurately characterize SDs in assemblies, there has been only one tool that was developed for this purpose, called Whole-Genome Assembly Comparison (WGAC); its primary goal is SD detection. WGAC is comprised of several steps that employ different tools and custom scripts, which makes this strategy difficult and time consuming to use. Thus there is still a need for algorithms to characterize within-assembly SDs quickly, accurately, and in a user friendly manner. Results Here we introduce SEgmental Duplication Evaluation Framework (SEDEF) to rapidly detect SDs through sophisticated filtering strategies based on Jaccard similarity and local chaining. We show that SEDEF accurately detects SDs while maintaining substantial speed up over WGAC that translates into practical run times of minutes instead of weeks. Notably, our algorithm captures up to 25% ‘pairwise error’ between segments, whereas previous studies focused on only 10%, allowing us to more deeply track the evolutionary history of the genome. Availability and implementation SEDEF is available at https://github.com/vpc-ccg/sedef. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
21. Geny: a genotyping tool for allelic decomposition of killer cell immunoglobulin-like receptor genes.
- Author
-
Zhou Q, Ghezelji M, Hari A, Ford MKB, Holley C, Sahinalp SC, and Numanagić I
- Subjects
- Humans, High-Throughput Nucleotide Sequencing methods, Computational Biology methods, Software, Receptors, KIR genetics, Alleles, Genotyping Techniques methods, Genotype
- Abstract
Introduction: Accurate genotyping of Killer cell Immunoglobulin-like Receptor (KIR) genes plays a pivotal role in enhancing our understanding of innate immune responses, disease correlations, and the advancement of personalized medicine. However, due to the high variability of the KIR region and high level of sequence similarity among different KIR genes, the generic genotyping workflows are unable to accurately infer copy numbers and complete genotypes of individual KIR genes from next-generation sequencing data. Thus, specialized genotyping tools are needed to genotype this complex region., Methods: Here, we introduce Geny, a new computational tool for precise genotyping of KIR genes. Geny utilizes available KIR allele databases and proposes a novel combination of expectation-maximization filtering schemes and integer linear programming-based combinatorial optimization models to resolve ambiguous reads, provide accurate copy number estimation, and estimate the correct allele of each copy of genes within the KIR region., Results & Discussion: We evaluated Geny on a large set of simulated short-read datasets covering the known validated KIR region assemblies and a set of Illumina short-read samples sequenced from 40 validated samples from the Human Pangenome Reference Consortium collection and showed that it outperforms the existing state-of-the-art KIR genotyping tools in terms of accuracy, precision, and recall. We envision Geny becoming a valuable resource for understanding immune system response and consequently advancing the field of patient-centric medicine., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision., (Copyright © 2024 Zhou, Ghezelji, Hari, Ford, Holley, Sahinalp and Numanagić.)
- Published
- 2024
- Full Text
- View/download PDF
22. Biologically-informed Killer cell immunoglobulin-like receptor (KIR) gene annotation tool.
- Author
-
Ford MKB, Hari A, Zhou Q, Numanagić I, and Sahinalp SC
- Abstract
Natural killer (NK) cells are essential components of the innate immune system, with their activity significantly regulated by Killer cell Immunoglobulin-like Receptors (KIRs). The diversity and structural complexity of KIR genes present significant challenges for accurate genotyping, essential for understanding NK cell functions and their implications in health and disease. Traditional genotyping methods struggle with the variable nature of KIR genes, leading to inaccuracies that can impede immunogenetic research. These challenges extend to high-quality phased assemblies, which have been recently popularized by the Human Pangenome Consortium. This paper introduces BAKIR (Biologically-informed Annotator for KIR locus), a tailored computational tool designed to overcome the challenges of KIR genotyping and annotation on high-quality, phased genome assemblies. BAKIR aims to enhance the accuracy of KIR gene annotations by structuring its annotation pipeline around identifying key functional mutations, thereby improving the identification and subsequent relevance of gene and allele calls. It uses a multi-stage mapping, alignment, and variant calling process to ensure high-precision gene and allele identification, while also maintaining high recall for sequences that are significantly mutated or truncated relative to the known allele database. BAKIR has been evaluated on a subset of the HPRC assemblies, where BAKIR was able to improve many of the associated annotations and call novel variants. BAKIR is freely available on GitHub, offering ease of access and use through multiple installation methods, including pip, conda, and singularity container, and is equipped with a user-friendly command-line interface, thereby promoting its adoption in the scientific community.
- Published
- 2024
- Full Text
- View/download PDF
23. An efficient genotyper and star-allele caller for pharmacogenomics.
- Author
-
Hari A, Zhou Q, Gonzaludo N, Harting J, Scott SA, Qin X, Scherer S, Sahinalp SC, and Numanagić I
- Subjects
- Humans, Alleles, Genotype, Genomics, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA, Pharmacogenetics, Polymorphism, Single Nucleotide
- Abstract
High-throughput sequencing provides sufficient means for determining genotypes of clinically important pharmacogenes that can be used to tailor medical decisions to individual patients. However, pharmacogene genotyping, also known as star-allele calling, is a challenging problem that requires accurate copy number calling, structural variation identification, variant calling, and phasing within each pharmacogene copy present in the sample. Here we introduce Aldy 4, a fast and efficient tool for genotyping pharmacogenes that uses combinatorial optimization for accurate star-allele calling across different sequencing technologies. Aldy 4 adds support for long reads and uses a novel phasing model and improved copy number and variant calling models. We compare Aldy 4 against the current state-of-the-art star-allele callers on a large and diverse set of samples and genes sequenced by various sequencing technologies, such as whole-genome and targeted Illumina sequencing, barcoded 10x Genomics, and Pacific Biosciences (PacBio) HiFi. We show that Aldy 4 is the most accurate star-allele caller with near-perfect accuracy in all evaluated contexts, and hope that Aldy remains an invaluable tool in the clinical toolbox even with the advent of long-read sequencing technologies., (© 2023 Hari et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2023
- Full Text
- View/download PDF
24. Sequre: a high-performance framework for rapid development of secure bioinformatics pipelines.
- Author
-
Smajlović H, Shajii A, Berger B, Cho H, and Numanagić I
- Published
- 2022
25. Seq: A High-Performance Language for Bioinformatics.
- Author
-
Shajii A, Numanagić I, Baghdadi R, Berger B, and Amarasinghe S
- Abstract
The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100-a factor of over 10
6 -and the amount of data to be analyzed has increased proportionally. Yet, as Moore's Law continues to slow, computational biologists can no longer rely on computing hardware to compensate for the ever-increasing size of biological datasets. In a field where many researchers are primarily focused on biological analysis over computational optimization, the unfortunate solution to this problem is often to simply buy larger and faster machines. Here, we introduce Seq, the first language tailored specifically to bioinformatics, which marries the ease and productivity of Python with C-like performance. Seq starts with a subset of Python-and is in many cases a drop-in replacement-yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. We evaluated Seq on several standard computational genomics tasks like reverse complementation, k -mer manipulation, sequence pattern matching and large genomic index queries. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 160× improvement once domain-specific language features and optimizations are used. With parallelism, we demonstrate up to a 650× improvement. Compared to optimized C++ code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2× improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.- Published
- 2019
- Full Text
- View/download PDF
26. Publisher Correction: Optimal compressed representation of high throughput sequence data via light assembly.
- Author
-
Ginart AA, Hui J, Zhu K, Numanagić I, Courtade TA, Sahinalp SC, and Tse DN
- Abstract
The original version of this Article contained errors in the affiliations of the authors Ibrahim Numanagić and Thomas A. Courtade, which were incorrectly given as 'Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA' and 'Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA', respectively. Also, the hyperlink for the source code in the Data Availability section was incorrectly given as https://github.iu.edu/kzhu/assembltrie , which links to a page that is not publicly accessible. The source code is publicly accessible at https://github.com/kyzhu/assembltrie . Furthermore, in the PDF version of the Article, the right-hand side of Figure 3 was inadvertently cropped. These errors have now been corrected in both the PDF and HTML versions of the Article.
- Published
- 2018
- Full Text
- View/download PDF
27. Latent Variable Model for Aligning Barcoded Short-Reads Improves Downstream Analyses.
- Author
-
Shajii A, Numanagić I, and Berger B
- Published
- 2018
28. Comparison of high-throughput sequencing data compression tools.
- Author
-
Numanagić I, Bonfield JK, Hach F, Voges J, Ostermann J, Alberti C, Mattavelli M, and Sahinalp SC
- Subjects
- Animals, Cacao genetics, Drosophila melanogaster genetics, Escherichia coli genetics, Humans, Pseudomonas aeruginosa genetics, Computational Biology methods, Data Compression methods, High-Throughput Nucleotide Sequencing methods
- Abstract
High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.
- Published
- 2016
- Full Text
- View/download PDF
29. DeeZ: reference-based compression by local assembly.
- Author
-
Hach F, Numanagić I, and Sahinalp SC
- Subjects
- Base Sequence, Gene Library, High-Throughput Nucleotide Sequencing, Sequence Analysis, DNA, Data Compression methods, Electronic Data Processing methods
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.