1. System to assess genome sequencing needs for viral protein diagnostics and therapeutics.
- Author
-
Gardner SN, Kuczmarski TA, Zhou CE, Lam MW, and Slezak TR
- Subjects
- Computational Biology methods, DNA Viruses genetics, DNA Viruses isolation & purification, Humans, Monte Carlo Method, RNA Viruses genetics, RNA Viruses isolation & purification, Sequence Analysis, DNA, Viral Proteins genetics, Virus Diseases drug therapy, Virus Diseases virology, Base Sequence, DNA Viruses classification, Genome, Viral, RNA Viruses classification, Viral Proteins chemistry, Virus Diseases diagnosis
- Abstract
Computational analyses of genome sequences may elucidate protein signatures unique to a target pathogen. We constructed a Protein Signature Pipeline to guide the selection of short peptide sequences to serve as targets for detection and therapeutics. In silico identification of good target peptides that are conserved among strains and unique compared to other species generates a list of peptides. These peptides may be developed in the laboratory as targets of antibody, peptide, and ligand binding for detection assays and therapeutics or as targets for vaccine development. In this paper, we assess how the amount of sequence data affects our ability to identify conserved, unique protein signature candidates. To determine the amount of sequence data required to select good protein signature candidates, we have built a computationally intensive system called the Sequencing Analysis Pipeline (SAP). The SAP performs thousands of Monte Carlo simulations, each calling the Protein Signature Pipeline, to assess how the amount of sequence data for a target organism affects the ability to predict peptide signature candidates. Viral species differ substantially in the number of genomes required to predict protein signature targets. Patterns do not appear based on genome structure. There are more protein than DNA signatures due to greater intraspecific conservation at the protein than at the nucleotide level. We conclude that it is necessary to use the SAP as a dynamic system to assess the need for continued sequencing for each species individually and to update predictions with each additional genome that is sequenced.
- Published
- 2005
- Full Text
- View/download PDF