7 results on '"Vaser, R."'
Search Results
2. Protein database search optimization based on CUDA and MPI
- Author
-
Pavlovic, D., Vaser, R., Korpar, M., and Mile Sikic
- Subjects
alignment ,Smith-Waterman ,sequence ,GPU - Abstract
Protein database search is an important method in the field of computational biology. There are a large number of sequences in an average database which makes such searches rather time and resource consuming. With the rapid growth in size of these databases in the past years, there came a need to speed up the search and consequently, any alignments performed on such databases. This paper presents an acceleration of the database search tool sw#DB which is based on a CUDA implementation of Smith-Waterman algorithm. We achieved speed up by reducing database size. The whole database was divided into seeds of a fixed length. The positions of these seeds and the corresponding sequence indexes from the database are then stored in a hash container. This allows for a constant time lookup of all the positions of a seed in every sequence of a database. Potential alignment candidate sequences for a query are filtered using this method, forwarding only those which contain at least one seed from the query to the sw#DB. This reduces the number of alignments performed. Overall, it brings a speedup of around three times compared to the basic sw#DB tool, based solely on Smith Waterman algorithm, with almost no loss of accuracy. The implementation is written in CUDA and C programming languages. For large queries, MPI implementation with multiple CUDA cards is used.
- Published
- 2013
3. Genome assembly and chemogenomic profiling of National Flower of Singapore Papilionanthe Miss Joaquim 'Agnes' reveals metabolic pathways regulating floral traits.
- Author
-
Lim AH, Low ZJ, Shingate PN, Hong JH, Chong SC, Ng CCY, Liu W, Vaser R, Šikić M, Sung WK, Nagarajan N, Tan P, and Teh BT
- Subjects
- Chromatin metabolism, Flowers genetics, Flowers metabolism, Gene Expression Regulation, Plant, Glycosyltransferases genetics, Metabolic Networks and Pathways, Singapore, Anthocyanins, Orchidaceae genetics
- Abstract
Singapore's National Flower, Papilionanthe (Ple.) Miss Joaquim 'Agnes' (PMJ) is highly prized as a horticultural flower from the Orchidaceae family. A combination of short-read sequencing, single-molecule long-read sequencing and chromatin contact mapping was used to assemble the PMJ genome, spanning 2.5 Gb and 19 pseudo-chromosomal scaffolds. Genomic resources and chemical profiling provided insights towards identifying, understanding and elucidating various classes of secondary metabolite compounds synthesized by the flower. For example, presence of the anthocyanin pigments detected by chemical profiling coincides with the expression of ANTHOCYANIN SYNTHASE (ANS), an enzyme responsible for the synthesis of the former. Similarly, the presence of vandaterosides (a unique class of glycosylated organic acids with the potential to slow skin aging) discovered using chemical profiling revealed the involvement of glycosyltransferase family enzymes candidates in vandateroside biosynthesis. Interestingly, despite the unnoticeable scent of the flower, genes involved in the biosynthesis of volatile compounds and chemical profiling revealed the combination of oxygenated hydrocarbons, including traces of linalool, beta-ionone and vanillin, forming the scent profile of PMJ. In summary, by combining genomics and biochemistry, the findings expands the known biodiversity repertoire of the Orchidaceae family and insights into the genome and secondary metabolite processes of PMJ., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
4. Time- and memory-efficient genome assembly with Raven.
- Author
-
Vaser R and Šikić M
- Abstract
Whole genome sequencing technologies are unable to invariably read DNA molecules intact, a shortcoming that assemblers try to resolve by stitching the obtained fragments back together. Here, we present methods for the improvement of de novo genome assembly from erroneous long reads incorporated into a tool called Raven. Raven maintains similar performance for various genomes and has accuracy on par with other assemblers that support third-generation sequencing data. It is one of the fastest options while having the lowest memory consumption on the majority of benchmarked datasets., (© 2021. The Author(s), under exclusive licence to Springer Nature America, Inc.)
- Published
- 2021
- Full Text
- View/download PDF
5. Fast and accurate de novo genome assembly from long uncorrected reads.
- Author
-
Vaser R, Sović I, Nagarajan N, and Šikić M
- Subjects
- Contig Mapping standards, Genomics standards, Sequence Alignment standards, Sequence Analysis, DNA standards, Algorithms, Contig Mapping methods, Genomics methods, Sequence Alignment methods, Sequence Analysis, DNA methods
- Abstract
The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster., (© 2017 Vaser et al.; Published by Cold Spring Harbor Laboratory Press.)
- Published
- 2017
- Full Text
- View/download PDF
6. SWORD-a highly efficient protein database search.
- Author
-
Vaser R, Pavlović D, and Šikić M
- Subjects
- Algorithms, Software, Databases, Protein, Search Engine, Sequence Alignment
- Abstract
Motivation: Protein database search is one of the fundamental problems in bioinformatics. For decades, it has been explored and solved using different exact and heuristic approaches. However, exponential growth of data in recent years has brought significant challenges in improving already existing algorithms. BLAST has been the most successful tool for protein database search, but is also becoming a bottleneck in many applications. Due to that, many different approaches have been developed to complement or replace it. In this article, we present SWORD, an efficient protein database search implementation that runs 8-16 times faster than BLAST in the sensitive mode and up to 68 times faster in the fast and less accurate mode. It is designed to be used in nearly all database search environments, but is especially suitable for large databases. Its sensitivity exceeds that of BLAST for majority of input datasets and provides guaranteed optimal alignments., Availability and Implementation: Sword is freely available for download from https://github.com/rvaser/sword, Contact: robert.vaser@fer.hr and mile.sikic@fer.hr, Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.)
- Published
- 2016
- Full Text
- View/download PDF
7. SIFT missense predictions for genomes.
- Author
-
Vaser R, Adusumalli S, Leng SN, Sikic M, and Ng PC
- Subjects
- Databases, Protein, Genomics standards, Humans, Molecular Sequence Annotation, Phenotype, Reference Standards, Algorithms, Genomics methods, Mutation, Missense genetics
- Abstract
The SIFT (sorting intolerant from tolerant) algorithm helps bridge the gap between mutations and phenotypic variations by predicting whether an amino acid substitution is deleterious. SIFT has been used in disease, mutation and genetic studies, and a protocol for its use has been previously published with Nature Protocols. This updated protocol describes SIFT 4G (SIFT for genomes), which is a faster version of SIFT that enables practical computations on reference genomes. Users can get predictions for single-nucleotide variants from their organism of interest using the SIFT 4G annotator with SIFT 4G's precomputed databases. The scope of genomic predictions is expanded, with predictions available for more than 200 organisms. Users can also run the SIFT 4G algorithm themselves. SIFT predictions can be retrieved for 6.7 million variants in 4 min once the database has been downloaded. If precomputed predictions are not available, the SIFT 4G algorithm can compute predictions at a rate of 2.6 s per protein sequence. SIFT 4G is available from http://sift-dna.org/sift4g.
- Published
- 2016
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.