35 results on '"Sequence Analysis standards"'
Search Results
2. Synthetic DNA spike-ins (SDSIs) enable sample tracking and detection of inter-sample contamination in SARS-CoV-2 sequencing workflows.
- Author
-
Lagerborg KA, Normandin E, Bauer MR, Adams G, Figueroa K, Loreth C, Gladden-Young A, Shaw BM, Pearlman LR, Berenzy D, Dewey HB, Kales S, Dobbins ST, Shenoy ES, Hooper D, Pierce VM, Zachary KC, Park DJ, MacInnis BL, Tewhey R, Lemieux JE, Sabeti PC, Reilly SK, and Siddle KJ
- Subjects
- COVID-19 diagnosis, DNA Primers chemical synthesis, Genome, Viral genetics, Humans, Quality Control, RNA, Viral genetics, Reproducibility of Results, Sequence Analysis methods, Whole Genome Sequencing, Workflow, DNA Primers standards, SARS-CoV-2 genetics, Sequence Analysis standards
- Abstract
The global spread and continued evolution of SARS-CoV-2 has driven an unprecedented surge in viral genomic surveillance. Amplicon-based sequencing methods provide a sensitive, low-cost and rapid approach but suffer a high potential for contamination, which can undermine laboratory processes and results. This challenge will increase with the expanding global production of sequences across a variety of laboratories for epidemiological and clinical interpretation, as well as for genomic surveillance of emerging diseases in future outbreaks. We present SDSI + AmpSeq, an approach that uses 96 synthetic DNA spike-ins (SDSIs) to track samples and detect inter-sample contamination throughout the sequencing workflow. We apply SDSIs to the ARTIC Consortium's amplicon design, demonstrate their utility and efficiency in a real-time investigation of a suspected hospital cluster of SARS-CoV-2 cases and validate them across 6,676 diagnostic samples at multiple laboratories. We establish that SDSI + AmpSeq provides increased confidence in genomic data by detecting and correcting for relatively common, yet previously unobserved modes of error, including spillover and sample swaps, without impacting genome recovery., (© 2021. The Author(s), under exclusive licence to Springer Nature Limited.)
- Published
- 2022
- Full Text
- View/download PDF
3. A rapid, cost-effective tailed amplicon method for sequencing SARS-CoV-2.
- Author
-
Gohl DM, Garbe J, Grady P, Daniel J, Watson RHB, Auch B, Nelson A, Yohe S, and Beckman KB
- Subjects
- Benchmarking, COVID-19 diagnosis, COVID-19 epidemiology, COVID-19 Nucleic Acid Testing standards, Humans, Molecular Epidemiology, Mutation, RNA, Viral genetics, SARS-CoV-2 isolation & purification, Sequence Analysis methods, Sequence Analysis standards, COVID-19 virology, COVID-19 Nucleic Acid Testing methods, Genome, Viral genetics, SARS-CoV-2 genetics
- Abstract
Background: The global COVID-19 pandemic has led to an urgent need for scalable methods for clinical diagnostics and viral tracking. Next generation sequencing technologies have enabled large-scale genomic surveillance of SARS-CoV-2 as thousands of isolates are being sequenced around the world and deposited in public data repositories. A number of methods using both short- and long-read technologies are currently being applied for SARS-CoV-2 sequencing, including amplicon approaches, metagenomic methods, and sequence capture or enrichment methods. Given the small genome size, the ability to sequence SARS-CoV-2 at scale is limited by the cost and labor associated with making sequencing libraries., Results: Here we describe a low-cost, streamlined, all amplicon-based method for sequencing SARS-CoV-2, which bypasses costly and time-consuming library preparation steps. We benchmark this tailed amplicon method against both the ARTIC amplicon protocol and sequence capture approaches and show that an optimized tailed amplicon approach achieves comparable amplicon balance, coverage metrics, and variant calls to the ARTIC v3 approach., Conclusions: The tailed amplicon method we describe represents a cost-effective and highly scalable method for SARS-CoV-2 sequencing.
- Published
- 2020
- Full Text
- View/download PDF
4. *-DCC: A platform to collect, annotate, and explore a large variety of sequencing experiments.
- Author
-
Hörtenhuber M, Mukarram AK, Stoiber MH, Brown JB, and Daub CO
- Subjects
- Molecular Sequence Annotation standards, Sequence Analysis standards, Molecular Sequence Annotation methods, Sequence Analysis methods, Software
- Abstract
Background: Over the past few years the variety of experimental designs and protocols for sequencing experiments increased greatly. To ensure the wide usability of the produced data beyond an individual project, rich and systematic annotation of the underlying experiments is crucial., Findings: We first developed an annotation structure that captures the overall experimental design as well as the relevant details of the steps from the biological sample to the library preparation, the sequencing procedure, and the sequencing and processed files. Through various design features, such as controlled vocabularies and different field requirements, we ensured a high annotation quality, comparability, and ease of annotation. The structure can be easily adapted to a large variety of species. We then implemented the annotation strategy in a user-hosted web platform with data import, query, and export functionality., Conclusions: We present here an annotation structure and user-hosted platform for sequencing experiment data, suitable for lab-internal documentation, collaborations, and large-scale annotation efforts., (© The Author(s) 2020. Published by Oxford University Press.)
- Published
- 2020
- Full Text
- View/download PDF
5. Wheat Virus Identification Within Infected Tissue Using Nanopore Sequencing Technology.
- Author
-
Fellers JP, Webb C, Fellers MC, Shoup Rupp J, and De Wolf E
- Subjects
- Bunyaviridae classification, Bunyaviridae genetics, Luteovirus classification, Luteovirus genetics, Nanopores, Potyviridae classification, Potyviridae genetics, Plant Diseases virology, Plant Viruses classification, Plant Viruses genetics, Sequence Analysis standards, Triticum virology
- Abstract
Viral diseases are a limiting factor to wheat production. Viruses are difficult to diagnose in the early stages of disease development and are often confused with nutrient deficiencies or other abiotic problems. Immunological methods are useful to identify viruses, but specific antibodies may not be available or require high virus titer for detection. In 2015 and 2017, wheat plants containing Wheat streak mosaic virus (WSMV) resistance gene, Wsm2 , were found to have symptoms characteristic of WSMV. Serologically, WSMV was detected in all four samples. Additionally, High Plains wheat mosaic virus (HPWMoV) was also detected in one of the samples. Barley yellow dwarf virus (BYDV) was not detected, and a detection kit was not readily available for Triticum mosaic virus (TriMV). Initially, cDNA cloning and Sanger sequencing were used to determine the presence of WSMV; however, the process was time-consuming and expensive. Subsequently, cDNA from infected wheat tissue was sequenced with single-strand, Oxford Nanopore sequencing technology (ONT). ONT was able to confirm the presence of WSMV. Additionally, TriMV was found in all of the samples and BYDV in three of the samples. Deep coverage sequencing of full-length, single-strand WSMV revealed variation compared with the WSMV Sidney-81 reference strain and may represent new variants which overcome Wsm2 . These results demonstrate that ONT can more accurately identify causal virus agents and has sufficient resolution to provide evidence of causal variants.
- Published
- 2019
- Full Text
- View/download PDF
6. Viral Metagenomics in the Clinical Realm: Lessons Learned from a Swiss-Wide Ring Trial.
- Author
-
Junier T, Huber M, Schmutz S, Kufner V, Zagordi O, Neuenschwander S, Ramette A, Kubacki J, Bachofen C, Qi W, Laubscher F, Cordey S, Kaiser L, Beuret C, Barbié V, Fellay J, and Lebrand A
- Subjects
- Genome, Human, Humans, Metagenomics methods, Sequence Analysis methods, Switzerland, Clinical Laboratory Services standards, Genome, Viral, Laboratory Proficiency Testing methods, Metagenome, Metagenomics standards, Sequence Analysis standards
- Abstract
Shotgun metagenomics using next generation sequencing (NGS) is a promising technique to analyze both DNA and RNA microbial material from patient samples. Mostly used in a research setting, it is now increasingly being used in the clinical realm as well, notably to support diagnosis of viral infections, thereby calling for quality control and the implementation of ring trials (RT) to benchmark pipelines and ensure comparable results. The Swiss NGS clinical virology community therefore decided to conduct a RT in 2018, in order to benchmark current metagenomic workflows used at Swiss clinical virology laboratories, and thereby contribute to the definition of common best practices. The RT consisted of two parts (increments), in order to disentangle the variability arising from the experimental compared to the bioinformatics parts of the laboratory pipeline. In addition, the RT was also designed to assess the impact of databases compared to bioinformatics algorithms on the final results, by asking participants to perform the bioinformatics analysis with a common database, in addition to using their own in-house database. Five laboratories participated in the RT (seven pipelines were tested). We observed that the algorithms had a stronger impact on the overall performance than the choice of the reference database. Our results also suggest that differences in sample preparation can lead to significant differences in the performance, and that laboratories should aim for at least 5-10 Mio reads per sample and use depth of coverage in addition to other interpretation metrics such as the percent of coverage. Performance was generally lower when increasing the number of viruses per sample. The lessons learned from this pilot study will be useful for the development of larger-scale RTs to serve as regular quality control tests for laboratories performing NGS analyses of viruses in a clinical setting., Competing Interests: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript, or in the decision to publish the results.
- Published
- 2019
- Full Text
- View/download PDF
7. "Decoding hereditary breast cancer" benefits and questions from multigene panel testing.
- Author
-
Colas C, Golmard L, de Pauw A, Caputo SM, and Stoppa-Lyonnet D
- Subjects
- Biomarkers, Tumor genetics, Early Detection of Cancer methods, Female, Genes, BRCA1, Genes, BRCA2, Genetic Testing methods, Genetic Variation, Humans, Ovarian Neoplasms genetics, Reproducibility of Results, Sequence Analysis methods, Breast Neoplasms genetics, Early Detection of Cancer standards, Genetic Predisposition to Disease, Genetic Testing standards, Sequence Analysis standards
- Abstract
Multigene panel testing for breast and ovarian cancer predisposition diagnosis is a useful tool as it makes possible to sequence a considerable number of genes in a large number of individuals. More than 200 different multigene panels in which the two major BRCA1 and BRCA2 breast cancer predisposing genes are included are proposed by public or commercial laboratories. We review the clinical validity and clinical utility of the 26 genes most oftenly included in these panels. Because clinical validity and utility are not established for all genes and due to the heterogeneity of tumour risk levels, there is a substantial difficulty in the routine use of multigene panels if management guidelines and recommendations for testing relatives are not previously defined for each gene. Besides, the classification of variant of unknown significance (VUS) is a particular limitation and challenge. Efforts to classify VUSs and also to identify factors that modify cancer risks are now needed to produce personalised risk estimates. The complexity of information, the capacity to come back to patients when VUS are re-classified as pathogenic, and the expected large increase in the number of individuals to be tested especially when the aim of multigene panel testing is not only prevention but also treatment are challenging both for physicians and patients. Quality of tests, interpretation of results, information and accompaniment of patients must be at the heart of the guidelines of multigene panel testing., (Copyright © 2019. Published by Elsevier Ltd.)
- Published
- 2019
- Full Text
- View/download PDF
8. CellMarker: a manually curated resource of cell markers in human and mouse.
- Author
-
Zhang X, Lan Y, Xu J, Quan F, Zhao E, Deng C, Luo T, Xu L, Liao G, Yan M, Ping Y, Li F, Shi A, Bai J, Zhao T, Li X, and Xiao Y
- Subjects
- Animals, Humans, Mice, Sequence Analysis standards, Single-Cell Analysis standards, Databases, Genetic, Sequence Analysis methods, Single-Cell Analysis methods, Software
- Abstract
One of the most fundamental questions in biology is what types of cells form different tissues and organs in a functionally coordinated fashion. Larger-scale single-cell sequencing and biology experiment studies are now rapidly opening up new ways to track this question by revealing substantial cell markers for distinguishing different cell types in tissues. Here, we developed the CellMarker database (http://biocc.hrbmu.edu.cn/CellMarker/ or http://bio-bigdata.hrbmu.edu.cn/CellMarker/), aiming to provide a comprehensive and accurate resource of cell markers for various cell types in tissues of human and mouse. By manually curating over 100 000 published papers, 4124 entries including the cell marker information, tissue type, cell type, cancer information and source, were recorded. At last, 13 605 cell markers of 467 cell types in 158 human tissues/sub-tissues and 9148 cell makers of 389 cell types in 81 mouse tissues/sub-tissues were collected and deposited in CellMarker. CellMarker provides a user-friendly interface for browsing, searching and downloading markers of diverse cell types of different tissues. Furthermore, a summarized marker prevalence in each cell type is graphically and intuitively presented through a vivid statistical graph. We believe that CellMarker is a comprehensive and valuable resource for cell researches in precisely identifying and characterizing cells, especially at the single-cell level.
- Published
- 2019
- Full Text
- View/download PDF
9. The metagenomic data life-cycle: standards and best practices.
- Author
-
Ten Hoopen P, Finn RD, Bongo LA, Corre E, Fosso B, Meyer F, Mitchell A, Pelletier E, Pesole G, Santamaria M, Willassen NP, and Cochrane G
- Subjects
- Data Mining methods, Data Mining standards, Databases, Genetic, Metagenome, Sequence Analysis methods, Sequence Analysis standards, Workflow, Computational Biology methods, Computational Biology standards, Metagenomics methods, Metagenomics standards
- Abstract
Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonized way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (i) material sampling, (ii) material sequencing, (iii) data analysis, and (iv) data archiving and publishing. Taking examples from marine research, we summarize essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community, but greater awareness and adoption is still needed. We emphasize the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing., (© The Author 2017. Published by Oxford University Press.)
- Published
- 2017
- Full Text
- View/download PDF
10. A proposal for standardization of transgenic reference sequences used in food forensics.
- Author
-
Moreira F, Carneiro J, and Pereira F
- Subjects
- Genes, Plant genetics, Genetic Variation, Humans, Plants, Genetically Modified genetics, Sequence Analysis standards
- Published
- 2017
- Full Text
- View/download PDF
11. Improving Synthetic Biology Communication: Recommended Practices for Visual Depiction and Digital Submission of Genetic Designs.
- Author
-
Hillson NJ, Plahar HA, Beal J, and Prithviraj R
- Subjects
- Humans, Workflow, Genetics standards, Publishing standards, Research standards, Sequence Analysis standards, Synthetic Biology standards
- Abstract
Research is communicated more effectively and reproducibly when articles depict genetic designs consistently and fully disclose the complete sequences of all reported constructs. ACS Synthetic Biology is now providing authors with updated guidance and piloting a new tool and publication workflow that facilitate compliance with these recommended practices and standards for visual representation and data exchange.
- Published
- 2016
- Full Text
- View/download PDF
12. Study Design for Sequencing Studies.
- Author
-
Honaas LA, Altman NS, and Krzywinski M
- Subjects
- Animals, Humans, Sequence Analysis, DNA methods, Sequence Analysis, DNA standards, Sequence Analysis, RNA methods, Sequence Analysis, RNA standards, High-Throughput Nucleotide Sequencing methods, High-Throughput Nucleotide Sequencing standards, Research Design, Sequence Analysis methods, Sequence Analysis standards
- Abstract
Once a biochemical method has been devised to sample RNA or DNA of interest, sequencing can be used to identify the sampled molecules with high fidelity and low bias. High-throughput sequencing has therefore become the primary data acquisition method for many genomics studies and is being used more and more to address molecular biology questions. By applying principles of statistical experimental design, sequencing experiments can be made more sensitive to the effects under study as well as more biologically sound, hence more replicable.
- Published
- 2016
- Full Text
- View/download PDF
13. Overview of Sequence Data Formats.
- Author
-
Zhang H
- Subjects
- Computational Biology methods, Computational Biology standards, Genomics methods, Sequence Alignment methods, Sequence Alignment standards, Molecular Sequence Data, Sequence Analysis methods, Sequence Analysis standards
- Abstract
Next-generation sequencing experiment can generate billions of short reads for each sample and processing of the raw reads will add more information. Various file formats have been introduced/developed in order to store and manipulate this information. This chapter presents an overview of the file formats including FASTQ, FASTA, SAM/BAM, GFF/GTF, BED, and VCF that are commonly used in analysis of next-generation sequencing data.
- Published
- 2016
- Full Text
- View/download PDF
14. Practical guidelines for B-cell receptor repertoire sequencing analysis.
- Author
-
Yaari G and Kleinstein SH
- Subjects
- Computational Biology methods, Computational Biology standards, Guidelines as Topic, High-Throughput Nucleotide Sequencing methods, High-Throughput Nucleotide Sequencing standards, Humans, Receptors, Antigen, B-Cell genetics, Sequence Analysis methods, Sequence Analysis standards
- Abstract
High-throughput sequencing of B-cell immunoglobulin repertoires is increasingly being applied to gain insights into the adaptive immune response in healthy individuals and in those with a wide range of diseases. Recent applications include the study of autoimmunity, infection, allergy, cancer and aging. As sequencing technologies continue to improve, these repertoire sequencing experiments are producing ever larger datasets, with tens- to hundreds-of-millions of sequences. These data require specialized bioinformatics pipelines to be analyzed effectively. Numerous methods and tools have been developed to handle different steps of the analysis, and integrated software suites have recently been made available. However, the field has yet to converge on a standard pipeline for data processing and analysis. Common file formats for data sharing are also lacking. Here we provide a set of practical guidelines for B-cell receptor repertoire sequencing analysis, starting from raw sequencing reads and proceeding through pre-processing, determination of population structure, and analysis of repertoire properties. These include methods for unique molecular identifiers and sequencing error correction, V(D)J assignment and detection of novel alleles, clonal assignment, lineage tree construction, somatic hypermutation modeling, selection analysis, and analysis of stereotyped or convergent responses. The guidelines presented here highlight the major steps involved in the analysis of B-cell repertoire sequencing data, along with recommendations on how to avoid common pitfalls.
- Published
- 2015
- Full Text
- View/download PDF
15. Challenges with using primer IDs to improve accuracy of next generation sequencing.
- Author
-
Brodin J, Hedskog C, Heddini A, Benard E, Neher RA, Mild M, and Albert J
- Subjects
- Base Sequence, DNA Primers, HIV-1 genetics, Molecular Sequence Data, Polymerase Chain Reaction, Sequence Analysis standards, Sequence Homology, Nucleic Acid, Sequence Analysis methods
- Abstract
Next generation sequencing technologies, like ultra-deep pyrosequencing (UDPS), allows detailed investigation of complex populations, like RNA viruses, but its utility is limited by errors introduced during sample preparation and sequencing. By tagging each individual cDNA molecule with barcodes, referred to as Primer IDs, before PCR and sequencing these errors could theoretically be removed. Here we evaluated the Primer ID methodology on 257,846 UDPS reads generated from a HIV-1 SG3Δenv plasmid clone and plasma samples from three HIV-infected patients. The Primer ID consisted of 11 randomized nucleotides, 4,194,304 combinations, in the primer for cDNA synthesis that introduced a unique sequence tag into each cDNA molecule. Consensus template sequences were constructed for reads with Primer IDs that were observed three or more times. Despite high numbers of input template molecules, the number of consensus template sequences was low. With 10,000 input molecules for the clone as few as 97 consensus template sequences were obtained due to highly skewed frequency of resampling. Furthermore, the number of sequenced templates was overestimated due to PCR errors in the Primer IDs. Finally, some consensus template sequences were erroneous due to hotspots for UDPS errors. The Primer ID methodology has the potential to provide highly accurate deep sequencing. However, it is important to be aware that there are remaining challenges with the methodology. In particular it is important to find ways to obtain a more even frequency of resampling of template molecules as well as to identify and remove artefactual consensus template sequences that have been generated by PCR errors in the Primer IDs.
- Published
- 2015
- Full Text
- View/download PDF
16. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.
- Author
-
Pightling AW, Petronella N, and Pagotto F
- Subjects
- Genome, Bacterial genetics, Polymorphism, Single Nucleotide genetics, Software, Listeria monocytogenes genetics, Sequence Analysis standards
- Abstract
The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results.
- Published
- 2014
- Full Text
- View/download PDF
17. Sequencing and validation of reference genes to analyze endogenous gene expression and quantify yellow dwarf viruses using RT-qPCR in viruliferous Rhopalosiphum padi.
- Author
-
Wu K, Liu W, Mar T, Liu Y, Wu Y, and Wang X
- Subjects
- Algorithms, Animals, Genes, Essential genetics, Reference Standards, Reproducibility of Results, Viral Load, Aphids genetics, Aphids virology, Gene Expression Profiling standards, Genes, Insect genetics, Luteovirus physiology, Reverse Transcriptase Polymerase Chain Reaction standards, Sequence Analysis standards
- Abstract
The bird cherry-oat aphid (Rhopalosiphum padi), an important pest of cereal crops, not only directly sucks sap from plants, but also transmits a number of plant viruses, collectively the yellow dwarf viruses (YDVs). For quantifying changes in gene expression in vector aphids, reverse transcription-quantitative polymerase chain reaction (RT-qPCR) is a touchstone method, but the selection and validation of housekeeping genes (HKGs) as reference genes to normalize the expression level of endogenous genes of the vector and for exogenous genes of the virus in the aphids is critical to obtaining valid results. Such an assessment has not been done, however, for R. padi and YDVs. Here, we tested three algorithms (GeNorm, NormFinder and BestKeeper) to assess the suitability of candidate reference genes (EF-1α, ACT1, GAPDH, 18S rRNA) in 6 combinations of YDV and vector aphid morph. EF-1α and ACT1 together or in combination with GAPDH or with GAPDH and 18S rRNA could confidently be used to normalize virus titre and expression levels of endogenous genes in winged or wingless R. padi infected with Barley yellow dwarf virus isolates (BYDV)-PAV and BYDV-GAV. The use of only one reference gene, whether the most stably expressed (EF-1α) or the least stably expressed (18S rRNA), was not adequate for obtaining valid relative expression data from the RT-qPCR. Because of discrepancies among values for changes in relative expression obtained using 3 regions of the same gene, different regions of an endogenous aphid gene, including each terminus and the middle, should be analyzed at the same time with RT-qPCR. Our results highlight the necessity of choosing the best reference genes to obtain valid experimental data and provide several HKGs for relative quantification of virus titre in YDV-viruliferous aphids.
- Published
- 2014
- Full Text
- View/download PDF
18. Methods-based proficiency testing in molecular genetic pathology.
- Author
-
Schrijver I, Aziz N, Jennings LJ, Richards CS, Voelkerding KV, and Weck KE
- Subjects
- Gene Library, Genetic Testing standards, Humans, Laboratory Proficiency Testing standards, Sequence Analysis methods, Sequence Analysis standards, Validation Studies as Topic, Workflow, Clinical Laboratory Services standards, Genetic Testing methods, Laboratory Proficiency Testing methods
- Abstract
This Perspectives article describes methods-based proficiency testing (MBPT), the benefits and limitations of MBPT, why the time is right for MBPT in molecular diagnostics, and how MBPT for next-generation sequencing is being developed by the College of American Pathologists., (Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.)
- Published
- 2014
- Full Text
- View/download PDF
19. The importance of patient engagement.
- Author
-
Williams JK, Daack-Hirsch S, Driessnack M, Downing NR, and Simon C
- Subjects
- Humans, Genetic Testing standards, Genetics, Medical standards, Incidental Findings, Sequence Analysis standards
- Published
- 2013
- Full Text
- View/download PDF
20. State-of the art methodologies dictate new standards for phylogenetic analysis.
- Author
-
Anisimova M, Liberles DA, Philippe H, Provan J, Pupko T, and von Haeseler A
- Subjects
- Genetics, Population, Sequence Analysis standards, Classification methods, Phylogeny
- Abstract
The intention of this editorial is to steer researchers through methodological choices in molecular evolution, drawing on the combined expertise of the authors. Our aim is not to review the most advanced methods for a specific task. Rather, we define several general guidelines to help with methodology choices at different stages of a typical phylogenetic 'pipeline'. We are not able to provide exhaustive citation of a literature that is vast and plentiful, but we point the reader to a set of classical textbooks that reflect the state-of-the-art. We do not wish to appear overly critical of outdated methodology but rather provide some practical guidance on the sort of issues which should be considered. We stress that a reported study should be well-motivated and evaluate a specific hypothesis or scientific question. However, a publishable study should not be merely a compilation of available sequences for a protein family of interest followed by some standard analyses, unless it specifically addresses a scientific hypothesis or question. The rapid pace at which sequence data accumulate quickly outdates such publications. Although clearly, discoveries stemming from data mining, reports of new tools and databases and review papers are also desirable.
- Published
- 2013
- Full Text
- View/download PDF
21. Don't just invite us to the table: authentic community engagement.
- Author
-
Terry SF
- Subjects
- Genetic Testing methods, Genome, Human, Guidelines as Topic, Humans, Mutation, Research Design, Sequence Analysis methods, Genetic Testing standards, Genetics, Medical standards, Incidental Findings, Sequence Analysis standards
- Published
- 2013
- Full Text
- View/download PDF
22. Genome interpreter vies for place in clinical market.
- Author
-
Baker M
- Subjects
- Genetic Diseases, Inborn genetics, Genetic Privacy, Humans, Sequence Analysis economics, Sequence Analysis ethics, Sequence Analysis standards, Genome, Human, Sequence Analysis instrumentation
- Published
- 2012
- Full Text
- View/download PDF
23. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.
- Author
-
Pruitt KD, Tatusova T, Brown GR, and Maglott DR
- Subjects
- Genomics standards, Humans, Reference Standards, Sequence Analysis, DNA standards, Sequence Analysis, Protein standards, Sequence Analysis, RNA standards, Databases, Genetic, Molecular Sequence Annotation, Sequence Analysis standards
- Abstract
The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16,00 organisms, 2.4 × 0(6) genomic records, 13 × 10(6) proteins and 2 × 10(6) RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).
- Published
- 2012
- Full Text
- View/download PDF
24. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data.
- Author
-
Patel RK and Jain M
- Subjects
- DNA Primers genetics, Data Compression, High-Throughput Nucleotide Sequencing, Polymerization, Quality Control, Sequence Analysis standards, Statistics as Topic, Sequence Analysis methods
- Abstract
Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis.
- Published
- 2012
- Full Text
- View/download PDF
25. Immunoglobulin sequence analysis and prognostication in CLL: guidelines from the ERIC review board for reliable interpretation of problematic cases.
- Author
-
Langerak AW, Davi F, Ghia P, Hadzidimitriou A, Murray F, Potter KN, Rosenquist R, Stamatopoulos K, and Belessi C
- Subjects
- Humans, Leukemia, Lymphocytic, Chronic, B-Cell diagnosis, Prognosis, Reference Standards, Sequence Analysis methods, Immunoglobulins genetics, Leukemia, Lymphocytic, Chronic, B-Cell immunology, Leukemia, Lymphocytic, Chronic, B-Cell pathology, Sequence Analysis standards
- Abstract
Immunoglobulin gene sequence analysis is widely utilized for prognostication in chronic lymphocytic leukemia (CLL) and the definition of standardized procedures has allowed reliable and reproducible results. Occasionally, a straightforward interpretation of the sequences is not possible because of the so-called 'problematic sequences' that do not fit the 'classic' interpretation and pose scientific questions at the cross-road between hematology and immunology. Thanks to a dedicated effort within the European Research Initiative on CLL (ERIC), we have now the possibility to present such cases, offer a scientific explanation and propose recommendations in terms of prognostication.
- Published
- 2011
- Full Text
- View/download PDF
26. Detection and removal of biases in the analysis of next-generation sequencing reads.
- Author
-
Schwartz S, Oren R, and Ast G
- Subjects
- Base Sequence, Bias, Data Interpretation, Statistical, Sequence Analysis standards
- Abstract
Since the emergence of next-generation sequencing (NGS) technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered while analyzing various Illumina datasets. These biases are due to both biological and statistical effects that in particular affect comparisons between different genomic regions. Specifically, we encountered biases pertaining to the distributions of nucleotides across sequencing cycles, to mappability, to contamination of pre-mRNA with mRNA, and to non-uniform hydrolysis of RNA. Most of these biases are not specific to one analyzed dataset, but are present across a variety of datasets and within a variety of genomic contexts. Importantly, some of these biases correlated in a highly significant manner with biological features, including transcript length, gene expression levels, conservation levels, and exon-intron architecture, misleadingly increasing the credibility of results due to them. We also demonstrate the relevance of these biases in the context of analyzing an NGS dataset mapping transcriptionally engaged RNA polymerase II (RNAPII) in the context of exon-intron architecture, and show that elimination of these biases is crucial for avoiding erroneous interpretation of the data. Collectively, our results highlight several important pitfalls, challenges and approaches in the analysis of NGS reads.
- Published
- 2011
- Full Text
- View/download PDF
27. Quality control issues and the identification of rare functional variants with next-generation sequencing data.
- Author
-
Hemmelmann C, Daw EW, and Wilson AF
- Subjects
- Algorithms, Exome genetics, Genetic Predisposition to Disease, Human Genome Project, Humans, Quality Control, Regression Analysis, Sequence Analysis standards, Genetic Variation, Molecular Epidemiology methods, Molecular Epidemiology standards
- Abstract
Next-generation sequencing of large numbers of individuals presents challenges in data preparation, quality control, and statistical analysis because of the rarity of the variants. The Genetic Analysis Workshop 17 (GAW17) data provide an opportunity to survey existing methods and compare these methods with novel ones. Specifically, the GAW17 Group 2 contributors investigate existing and newly proposed methods and study design strategies to identify rare variants, predict functional variants, and/or examine quality control. We introduce the eight Group 2 papers, summarize their approaches, and discuss their strengths and weaknesses. For these investigations, some groups used only the genotype data, whereas others also used the simulated phenotype data. Although the eight Group 2 contributions covered a wide variety of topics under the general idea of identifying rare variants, they can be grouped into three broad categories according to their common research interests: functionality of variants and quality control issues, family-based analyses, and association analyses of unrelated individuals. The aims of the first subgroup were quite different. These were population structure analyses that used rare variants to predict functionality and examine the accuracy of genotype calls. The aims of the family-based analyses were to select which families should be sequenced and to identify high-risk pedigrees; the aim of the association analyses was to identify variants or genes with regression-based methods. However, power to detect associations was low in all three association studies. Thus this work shows opportunities for incorporating rare variants into the genetic and statistical analyses of common diseases., (© 2011 Wiley Periodicals, Inc.)
- Published
- 2011
- Full Text
- View/download PDF
28. Standardizing the next generation of bioinformatics software development with BioHDF (HDF5).
- Author
-
Mason CE, Zumbo P, Sanders S, Folk M, Robinson D, Aydt R, Gollery M, Welsh M, Olson NE, and Smith TM
- Subjects
- Computational Biology, Computer Simulation, Database Management Systems, Databases, Genetic, Sequence Alignment standards, Sequence Alignment trends, Sequence Analysis standards, Sequence Analysis trends, Software standards, Software trends, Software Design, User-Computer Interface, Sequence Alignment statistics & numerical data, Sequence Analysis statistics & numerical data
- Abstract
Next Generation Sequencing technologies are limited by the lack of standard bioinformatics infrastructures that can reduce data storage, increase data processing performance, and integrate diverse information. HDF technologies address these requirements and have a long history of use in data-intensive science communities. They include general data file formats, libraries, and tools for working with the data. Compared to emerging standards, such as the SAM/BAM formats, HDF5-based systems demonstrate significantly better scalability, can support multiple indexes, store multiple data types, and are self-describing. For these reasons, HDF5 and its BioHDF extension are well suited for implementing data models to support the next generation of bioinformatics applications.
- Published
- 2010
- Full Text
- View/download PDF
29. A standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences.
- Author
-
Alcantara LC, Cassol S, Libin P, Deforche K, Pybus OG, Van Ranst M, Galvão-Castro B, Vandamme AM, and de Oliveira T
- Subjects
- Base Sequence, Genotype, HIV-1 classification, HIV-1 genetics, Hepacivirus classification, Hepacivirus genetics, Hepatitis B virus classification, Hepatitis B virus genetics, Phylogeny, Recombination, Genetic, Reference Standards, Sequence Alignment, Sequence Analysis standards, Viruses genetics, Genetic Variation, Software, Viruses classification
- Abstract
Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including: http://bioafrica.mrc.ac.za/rega-genotype/html/, http://lasp.cpqgm.fiocruz.br/virus-genotype/html/, http://jose.med.kuleuven.be/genotypetool/html/.
- Published
- 2009
- Full Text
- View/download PDF
30. NCBI Reference Sequences: current status, policy and new initiatives.
- Author
-
Pruitt KD, Tatusova T, Klimke W, and Maglott DR
- Subjects
- Animals, Exons, Genomics standards, Humans, Mice, Proteins chemistry, Pseudogenes, RNA, Untranslated chemistry, Reference Standards, Databases, Genetic, Sequence Analysis standards
- Abstract
NCBI's Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) is a curated non-redundant collection of sequences representing genomes, transcripts and proteins. RefSeq records integrate information from multiple sources and represent a current description of the sequence, the gene and sequence features. The database includes over 5300 organisms spanning prokaryotes, eukaryotes and viruses, with records for more than 5.5 x 10(6) proteins (RefSeq release 30). Feature annotation is applied by a combination of curation, collaboration, propagation from other sources and computation. We report here on the recent growth of the database, recent changes to feature annotations and record types for eukaryotic (primarily vertebrate) species and policies regarding species inclusion and genome annotation. In addition, we introduce RefSeqGene, a new initiative to support reporting variation data on a stable genomic coordinate system.
- Published
- 2009
- Full Text
- View/download PDF
31. Genetic sequences: how are they patented?
- Author
-
Dufresne G and Duval M
- Subjects
- European Union, Government Agencies, Government Regulation, Sequence Analysis classification, United States, Amino Acid Sequence, Base Sequence, Patents as Topic, Sequence Analysis methods, Sequence Analysis standards
- Published
- 2004
- Full Text
- View/download PDF
32. NCL: a C++ class library for interpreting data files in NEXUS format.
- Author
-
Lewis PO
- Subjects
- Databases, Bibliographic standards, Databases, Genetic standards, Information Storage and Retrieval methods, Information Storage and Retrieval standards, Programming Languages, Sequence Analysis methods, Sequence Analysis standards, Software
- Abstract
Unlabelled: The NEXUS Class Library (NCL) is a collection of C++ classes designed to simplify interpreting data files written in the NEXUS format used by many computer programs for phylogenetic analyses. The NEXUS format allows different programs to share the same data files, even though none of the programs can interpret all of the data stored therein. Because users are not required to reformat the data file for each program, use of the NEXUS format prevents cut-and-paste errors as well as the proliferation of copies of the original data file. The purpose of making the NCL available is to encourage the use of the NEXUS format by making it relatively easy for programmers to add the ability to interpret NEXUS files in newly developed software., Availability: The NCL is freely available under the GNU General Public License from http://hydrodictyon.eeb.uconn.edu/ncl/, Supplementary Information: Documentation for the NCL (general information and source code documentation) is available in HTML format at http://hydrodictyon.eeb.uconn.edu/ncl/
- Published
- 2003
- Full Text
- View/download PDF
33. QA/QC as a pressing need for microarray analysis: meeting report from CAMDA'02.
- Author
-
Johnson K and Lin S
- Subjects
- Quality Control, Reference Values, Reproducibility of Results, Sensitivity and Specificity, Sequence Analysis methods, United States, Databases, Nucleic Acid standards, Oligonucleotide Array Sequence Analysis methods, Oligonucleotide Array Sequence Analysis standards, Sequence Analysis standards
- Published
- 2003
34. Comparing the success of different prediction software in sequence analysis: a review.
- Author
-
Bajić VB
- Subjects
- Computational Biology, Quality Control, Sequence Analysis standards, Sequence Analysis statistics & numerical data, Software standards
- Abstract
The abundance of computer software for different types of prediction in DNA and protein sequence analyses raises the problem of adequate ranking of prediction program quality. A single measure of success of predictor software, which adequately ranks the predictors, does not exist. A typical example of such an incomplete measure is the so-called correlation coefficient. This paper provides an overview and short analysis of several different measures of prediction quality. Frequently, some of these measures give results contradictory to each other even when they relate to the same prediction scores. This may lead to confusion. In order to overcome some of the problems, a few new measures are proposed including some variants of a 'generalised distance from the ideal predictor score'; these are based on topological properties, rather than on statistics. In order to provide a sort of a balanced ranking, the averaged score measure (ASM) is introduced. The ASM provides a possibility for the selection of the predictor that probably has the best overall performance. The method presented in the paper applies to the ranking problem of any prediction software whose results can be properly represented in a true positive-false positive framework, thus providing a natural set-up for linear biological sequence analysis.
- Published
- 2000
- Full Text
- View/download PDF
35. Impact of genomics on inflammation research.
- Author
-
Rediske J and Crowl R
- Subjects
- Animals, Chemokines genetics, DNA, Complementary genetics, Humans, Research, Sequence Analysis standards, Inflammation genetics
- Published
- 1996
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.