Author: "Rizk G" / Topic: software - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Rizk G"' showing total 6 results

Start Over Author "Rizk G" Topic software

6 results on '"Rizk G"'

1. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.

Author: Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Don Kang D, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, Cuevas DA, Edwards RA, Saha S, Piro VC, Renard BY, Pop M, Klenk HP, Göker M, Kyrpides NC, Woyke T, Vorholt JA, Schulze-Lefert P, Rubin EM, Darling AE, Rattei T, and McHardy AC
Subjects: Algorithms, Benchmarking, Sequence Analysis, DNA, Metagenomics, Software
Abstract: Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Published: 2017
Full Text: View/download PDF

2. FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome.

Author: Wucher V, Legeai F, Hédan B, Rizk G, Lagoutte L, Leeb T, Jagannathan V, Cadieu E, David A, Lohi H, Cirera S, Fredholm M, Botherel N, Leegwater PAJ, Le Béguec C, Fieten H, Johnson J, Alföldi J, André C, Lindblad-Toh K, Hitte C, and Derrien T
Subjects: Animals, Benchmarking, Decision Trees, Dogs, Gene Expression Regulation, Humans, Mice, Molecular Sequence Annotation statistics & numerical data, Open Reading Frames, RNA, Long Noncoding classification, RNA, Long Noncoding metabolism, RNA, Messenger classification, RNA, Messenger genetics, RNA, Messenger metabolism, Sequence Analysis, RNA, Genome, Molecular Sequence Annotation methods, RNA, Long Noncoding genetics, Software, Transcriptome
Abstract: Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/FEELnc., (© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.)
Published: 2017
Full Text: View/download PDF

3. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph.

Author: Benoit G, Lemaitre C, Lavenier D, Drezen E, Dayris T, Uricaru R, and Rizk G
Subjects: Animals, Computational Biology methods, Computer Simulation, Metagenomics, Probability, Algorithms, Caenorhabditis elegans genetics, Caenorhabditis elegans Proteins genetics, Computer Graphics, Data Compression methods, High-Throughput Nucleotide Sequencing methods, Software
Abstract: Background: Data volumes generated by next-generation sequencing (NGS) technologies is now a major concern for both data storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as the widely used gzip method., Results: We present a novel reference-free method meant to compress data issued from high throughput sequencing technologies. Our approach, implemented in the software LEON, employs techniques derived from existing assembly principles. The method is based on a reference probabilistic de Bruijn Graph, built de novo from the set of reads and stored in a Bloom filter. Each read is encoded as a path in this graph, by memorizing an anchoring kmer and a list of bifurcations. The same probabilistic de Bruijn Graph is used to perform a lossy transformation of the quality scores, which allows to obtain higher compression rates without losing pertinent information for downstream analyses., Conclusions: LEON was run on various real sequencing datasets (whole genome, exome, RNA-seq or metagenomics). In all cases, LEON showed higher overall compression ratios than state-of-the-art compression software. On a C. elegans whole genome sequencing dataset, LEON divided the original file size by more than 20. LEON is an open source software, distributed under GNU affero GPL License, available for download at http://gatb.inria.fr/software/leon/.
Published: 2015
Full Text: View/download PDF

4. GATB: Genome Assembly & Analysis Tool Box.

Author: Drezen E, Rizk G, Chikhi R, Deltel C, Lemaitre C, Peterlongo P, and Lavenier D
Subjects: Algorithms, Computer Graphics, Genome, Human genetics, Humans, Biostatistics methods, Genomics methods, High-Throughput Nucleotide Sequencing methods, Software
Abstract: Motivation: Efficient and fast next-generation sequencing (NGS) algorithms are essential to analyze the terabytes of data generated by the NGS machines. A serious bottleneck can be the design of such algorithms, as they require sophisticated data structures and advanced hardware implementation., Results: We propose an open-source library dedicated to genome assembly and analysis to fasten the process of developing efficient software. The library is based on a recent optimized de-Bruijn graph implementation allowing complex genomes to be processed on desktop computers using fast algorithms with low memory footprints., Availability and Implementation: The GATB library is written in C++ and is available at the following Web site http://gatb.inria.fr under the A-GPL license., Contact: lavenier@irisa.fr, Supplementary Information: Supplementary data are available at Bioinformatics online., (© The Author 2014. Published by Oxford University Press.)
Published: 2014
Full Text: View/download PDF

5. DSK: k-mer counting with very low memory usage.

Author: Rizk G, Lavenier D, and Chikhi R
Subjects: Algorithms, Genome, Human, Humans, Sequence Analysis, DNA methods, Sequence Analysis, RNA methods, Software
Abstract: Summary: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count. We present a new streaming algorithm for k-mer counting, called DSK (disk streaming of k-mers), which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned, and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered. DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 h. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers., Availability: http://minia.genouest.org/dsk
Published: 2013
Full Text: View/download PDF

6. GASSST: global alignment short sequence search tool.

Author: Rizk G and Lavenier D
Subjects: Base Sequence, Sequence Analysis, DNA methods, Algorithms, Sequence Alignment methods, Software
Abstract: Motivation: The rapid development of next-generation sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to improve speed, whereas more flexible aligners are too slow for large-scale applications. Moreover, many current aligners are becoming inefficient as generated reads grow ever larger. Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-fold-achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads., Results: We propose a new efficient filtering step that discards most alignments coming from the seed phase before they are checked by the costly dynamic programming algorithm. We use a carefully designed series of filters of increasing complexity and efficiency to quickly eliminate most candidate alignments in a wide range of configurations. The main filter uses a precomputed table containing the alignment score of short four base words aligned against each other. This table is reused several times by a new algorithm designed to approximate the score of the full dynamic programming algorithm. We compare the performance of GASSST against BWA, BFAST, SSAHA2 and PASS. We found that GASSST achieves high sensitivity in a wide range of configurations and faster overall execution time than other state-of-the-art aligners., Availability: GASSST is distributed under the CeCILL software license at http://www.irisa.fr/symbiose/projects/gassst/, Contact: guillaume.rizk@irisa.fr; dominique.lavenier@irisa.fr, Supplementary Information: Supplementary data are available at Bioinformatics online.
Published: 2010
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Rizk G"'

1. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.

2. FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome.

3. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph.

4. GATB: Genome Assembly & Analysis Tool Box.

5. DSK: k-mer counting with very low memory usage.

6. GASSST: global alignment short sequence search tool.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

6 results on '"Rizk G"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources