Author: "Peters B" / Journal: bmc bioinformatics - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Peters B"' showing total 18 results

Start Over Author "Peters B" Journal bmc bioinformatics

18 results on '"Peters B"'

1. Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation

Author: Seymour Emily, Damle Rohini, Sette Alessandro, and Peters Bjoern
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background The Immune Epitope Database (IEDB) project manually curates information from published journal articles that describe immune epitopes derived from a wide variety of organisms and associated with different diseases. In the past, abstracts of scientific articles were retrieved by broad keyword queries of PubMed, and were classified as relevant (curatable) or irrelevant (not curatable) to the scope of the database by a Naïve Bayes classifier. The curatable abstracts were subsequently manually classified into categories corresponding to different disease domains. Over the past four years, we have examined how to further improve this approach in order to enhance classification performance and to reduce the need for manual intervention. Results Utilizing 89,884 abstracts classified by a domain expert as curatable or uncuratable, we found that a SVM classifier outperformed the previously used Naïve Bayes classifier for curatability predictions with an AUC of 0.899 and 0.854, respectively. Next, using a non-hierarchical and a hierarchical application of SVM classifiers trained on 22,833 curatable abstracts manually classified into three levels of disease specific categories we demonstrated that a hierarchical application of SVM classifiers outperformed non-hierarchical SVM classifiers for categorization. Finally, to optimize the hierarchical SVM classifiers' error profile for the curation process, cost sensitivity functions were developed to avoid serious misclassifications. We tested our design on a benchmark dataset of 1,388 references and achieved an overall category prediction accuracy of 94.4%, 93.9%, and 82.1% at the three levels of categorization, respectively. Conclusions A hierarchical application of SVM algorithms with cost sensitive output weighting enabled high quality reference classification with few serious misclassifications. This enabled us to significantly reduce the manual component of abstract categorization. Our findings are relevant to other databases that are developing their own document classifier schema and the datasets we make available provide large scale real-life benchmark sets for method developers.
Published: 2011
Full Text: View/download PDF

2. Peptide binding predictions for HLA DR, DP and DQ molecules

Author: Lund Ole, Sette Alessandro, Kim Yohan, Sidney John, Wang Peng, Nielsen Morten, and Peters Bjoern
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background MHC class II binding predictions are widely used to identify epitope candidates in infectious agents, allergens, cancer and autoantigens. The vast majority of prediction algorithms for human MHC class II to date have targeted HLA molecules encoded in the DR locus. This reflects a significant gap in knowledge as HLA DP and DQ molecules are presumably equally important, and have only been studied less because they are more difficult to handle experimentally. Results In this study, we aimed to narrow this gap by providing a large scale dataset of over 17,000 HLA-peptide binding affinities for a set of 11 HLA DP and DQ alleles. We also expanded our dataset for HLA DR alleles resulting in a total of 40,000 MHC class II binding affinities covering 26 allelic variants. Utilizing this dataset, we generated prediction tools utilizing several machine learning algorithms and evaluated their performance. Conclusion We found that 1) prediction methodologies developed for HLA DR molecules perform equally well for DP or DQ molecules. 2) Prediction performances were significantly increased compared to previous reports due to the larger amounts of training data available. 3) The presence of homologous peptides between training and testing datasets should be avoided to give real-world estimates of prediction performance metrics, but the relative ranking of different predictors is largely unaffected by the presence of homologous peptides, and predictors intended for end-user applications should include all training data for maximum performance. 4) The recently developed NN-align prediction method significantly outperformed all other algorithms, including a naïve consensus based on all prediction methods. A new consensus method dropping the comparably weak ARB prediction method could outperform the NN-align method, but further research into how to best combine MHC class II binding predictions is required.
Published: 2010
Full Text: View/download PDF

3. Derivation of an amino acid similarity matrix for peptide:MHC binding and its application as a Bayesian prior

Author: Sette Alessandro, Pinilla Clemencia, Sidney John, Kim Yohan, and Peters Bjoern
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Experts in peptide:MHC binding studies are often able to estimate the impact of a single residue substitution based on a heuristic understanding of amino acid similarity in an experimental context. Our aim is to quantify this measure of similarity to improve peptide:MHC binding prediction methods. This should help compensate for holes and bias in the sequence space coverage of existing peptide binding datasets. Results Here, a novel amino acid similarity matrix (PMBEC) is directly derived from the binding affinity data of combinatorial peptide mixtures. Like BLOSUM62, this matrix captures well-known physicochemical properties of amino acid residues. However, PMBEC differs markedly from existing matrices in cases where residue substitution involves a reversal of electrostatic charge. To demonstrate its usefulness, we have developed a new peptide:MHC class I binding prediction method, using the matrix as a Bayesian prior. We show that the new method can compensate for missing information on specific residues in the training data. We also carried out a large-scale benchmark, and its results indicate that prediction performance of the new method is comparable to that of the best neural network based approaches for peptide:MHC class I binding. Conclusion A novel amino acid similarity matrix has been derived for peptide:MHC binding interactions. One prominent feature of the matrix is that it disfavors substitution of residues with opposite charges. Given that the matrix was derived from experimentally determined peptide:MHC binding affinity measurements, this feature is likely shared by all peptide:protein interactions. In addition, we have demonstrated the usefulness of the matrix as a Bayesian prior in an improved scoring-matrix based peptide:MHC class I prediction method. A software implementation of the method is available at: http://www.mhc-pathway.net/smmpmbec.
Published: 2009
Full Text: View/download PDF

4. ElliPro: a new structure-based tool for the prediction of antibody epitopes

Author: Fusseder Nicholas, Bourne Philip E, Li Wei, Bui Huynh-Hoa, Ponomarenko Julia, Sette Alessandro, and Peters Bjoern
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highly desirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity, solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton and colleagues proposed a method for identifying continuous epitopes in the protein regions protruding from the protein's globular surface. The aim of this work was to implement that method as a web-tool and evaluate its performance on discontinuous epitopes known from the structures of antibody-protein complexes. Results Here we present ElliPro, a web-tool that implements Thornton's method and, together with a residue clustering algorithm, the MODELLER program and the Jmol viewer, allows the prediction and visualization of antibody epitopes in a given protein sequence or structure. ElliPro has been tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures of antibody-protein complexes. In comparison with six other structure-based methods that can be used for epitope prediction, ElliPro performed the best and gave an AUC value of 0.732, when the most significant prediction was considered for each protein. Since the rank of the best prediction was at most in the top three for more than 70% of proteins and never exceeded five, ElliPro is considered a useful research tool for identifying antibody epitopes in protein antigens. ElliPro is available at http://tools.immuneepitope.org/tools/ElliPro. Conclusion The results from ElliPro suggest that further research on antibody epitopes considering more features that discriminate epitopes from non-epitopes may further improve predictions. As ElliPro is based on the geometrical properties of protein structure and does not require training, it might be more generally applied for predicting different types of protein-protein interactions.
Published: 2008
Full Text: View/download PDF

5. Automating document classification for the Immune Epitope Database

Author: Sette Alessandro, Zhang Qing, Morgan Alexander A, Wang Peng, and Peters Bjoern
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background The Immune Epitope Database contains information on immune epitopes curated manually from the scientific literature. Like similar projects in other knowledge domains, significant effort is spent on identifying which articles are relevant for this purpose. Results We here report our experience in automating this process using Naïve Bayes classifiers trained on 20,910 abstracts classified by domain experts. Improvements on the basic classifier performance were made by a) utilizing information stored in PubMed beyond the abstract itself b) applying standard feature selection criteria and c) extracting domain specific feature patterns that e.g. identify peptides sequences. We have implemented the classifier into the curation process determining if abstracts are clearly relevant, clearly irrelevant, or if no certain classification can be made, in which case the abstracts are manually classified. Testing this classification scheme on an independent dataset, we achieve 95% sensitivity and specificity in the 51.1% of abstracts that were automatically classified. Conclusion By implementing text classification, we have sped up the reference selection process without sacrificing sensitivity or specificity of the human expert classification. This study provides both practical recommendations for users of text classification tools, as well as a large dataset which can serve as a benchmark for tool developers.
Published: 2007
Full Text: View/download PDF

6. Curation of complex, context-dependent immunological data

Author: Sidney John, Chan Russell K, de Castro Romulo, Ponomarenko Julia, Bourne Philip E, Bui Huynh-Hoa, Mokili John, Sathiamurthy Muthu, Grey Howard, Fleri Ward, Salimi Nima, Zarebski Laura, Vaughan Kerrie, Vita Randi, Wilson Stephen S, Stewart Scott, Way Scott, Peters Bjoern, and Sette Alessandro
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background The Immune Epitope Database and Analysis Resource (IEDB) is dedicated to capturing, housing and analyzing complex immune epitope related data http://www.immuneepitope.org. Description To identify and extract relevant data from the scientific literature in an efficient and accurate manner, novel processes were developed for manual and semi-automated annotation. Conclusion Formalized curation strategies enable the processing of a large volume of context-dependent data, which are now available to the scientific community in an accessible and transparent format. The experiences described herein are applicable to other databases housing complex biological data and requiring a high level of curation expertise.
Published: 2006
Full Text: View/download PDF

7. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method

Author: Sette Alessandro and Peters Bjoern
Subjects: Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as a sequence motif, a scoring matrix or an artificial neural network. The purpose of these models is two-fold. First, they can provide a summary of experimental results, allowing for a deeper understanding of the mechanisms involved in sequence recognition. Second, such models can be used to predict the experimental outcome for yet untested sequences. In the past we reported the development of a method to generate such models called the Stabilized Matrix Method (SMM). This method has been successfully applied to predicting peptide binding to MHC molecules, peptide transport by the transporter associated with antigen presentation (TAP) and proteasomal cleavage of protein sequences. Results Herein we report the implementation of the SMM algorithm as a publicly available software package. Specific features determining the type of problems the method is most appropriate for are discussed. Advantageous features of the package are: (1) the output generated is easy to interpret, (2) input and output are both quantitative, (3) specific computational strategies to handle experimental noise are built in, (4) the algorithm is designed to effectively handle bounded experimental data, (5) experimental data from randomized peptide libraries and conventional peptides can easily be combined, and (6) it is possible to incorporate pair interactions between positions of a sequence. Conclusion Making the SMM method publicly available enables bioinformaticians and experimental biologists to easily access it, to compare its performance to other prediction methods, and to extend it to other applications.
Published: 2005
Full Text: View/download PDF

8. PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins.

Author: Marrama D, Chronister WD, Westernberg L, Vita R, Koşaloğlu-Yalçın Z, Sette A, Nielsen M, Greenbaum JA, and Peters B
Subjects: Humans, Amino Acid Sequence, Peptides chemistry, Algorithms, Epitopes, T-Lymphocyte, Proteome, Software, Neoplasms
Abstract: Background: Numerous tools exist for biological sequence comparisons and search. One case of particular interest for immunologists is finding matches for linear peptide T cell epitopes, typically between 8 and 15 residues in length, in a large set of protein sequences. Both to find exact matches or matches that account for residue substitutions. The utility of such tools is critical in applications ranging from identifying conservation across viral epitopes, identifying putative epitope targets for allergens, and finding matches for cancer-associated neoepitopes to examine the role of tolerance in tumor recognition., Results: We defined a set of benchmarks that reflect the different practical applications of short peptide sequence matching. We evaluated a suite of existing methods for speed and recall and developed a new tool, PEPMatch. The tool uses a deterministic k-mer mapping algorithm that preprocesses proteomes before searching, achieving a 50-fold increase in speed over methods such as the Basic Local Alignment Search Tool (BLAST) without compromising recall. PEPMatch's code and benchmark datasets are publicly available., Conclusions: PEPMatch offers significant speed and recall advantages for peptide sequence matching. While it is of immediate utility for immunologists, the developed benchmarking framework also provides a standard against which future tools can be evaluated for improvements. The tool is available at https://nextgen-tools.iedb.org , and the source code can be found at https://github.com/IEDB/PEPMatch ., (© 2023. The Author(s).)
Published: 2023
Full Text: View/download PDF

9. Benchmark datasets of immune receptor-epitope structural complexes.

Author: Mahajan S, Yan Z, Jespersen MC, Jensen KK, Marcatili P, Nielsen M, Sette A, and Peters B
Subjects: Epitopes immunology, Humans, Receptors, Immunologic immunology, Antigen-Antibody Complex, Databases, Protein, Epitopes metabolism, Receptors, Immunologic metabolism
Abstract: Background: The development of accurate epitope prediction tools is important in facilitating disease diagnostics, treatment and vaccine development. The advent of new approaches making use of antibody and TCR sequence information to predict receptor-specific epitopes have the potential to transform the epitope prediction field. Development and validation of these new generation of epitope prediction methods would benefit from regularly updated high-quality receptor-antigen complex datasets., Results: To address the need for high-quality datasets to benchmark performance of these new generation of receptor-specific epitope prediction tools, a webserver called SCEptRe (Structural Complexes of Epitope-Receptor) was created. SCEptRe extracts weekly updated 3D complexes of antibody-antigen, TCR-pMHC and MHC-ligand from the Immune Epitope Database and clusters them based on antigen, receptor and epitope features to generate benchmark datasets. SCEptRe also provides annotated information such as CDR sequences and VDJ genes on the receptors. Users can generate custom datasets based by selecting thresholds for structural quality and clustering parameters (e.g. resolution, R-free factor, antigen or epitope sequence identity) based on their need., Conclusions: SCEptRe provides weekly updated, user-customized comprehensive benchmark datasets of immune receptor-epitope structural complexes. These datasets can be used to develop and benchmark performance of receptor-specific epitope prediction tools in the future. SCEptRe is freely accessible at http://tools.iedb.org/sceptre .
Published: 2019
Full Text: View/download PDF

10. Reporting and connecting cell type names and gating definitions through ontologies.

Author: Overton JA, Vita R, Dunn P, Burel JG, Bukhari SAC, Cheung KH, Kleinstein SH, Diehl AD, and Peters B
Subjects: Humans, Immune System metabolism, Protein Subunits metabolism, Proteins metabolism, Biological Ontologies, Databases, Factual
Abstract: Background: Human immunology studies often rely on the isolation and quantification of cell populations from an input sample based on flow cytometry and related techniques. Such techniques classify cells into populations based on the detection of a pattern of markers. The description of the cell populations targeted in such experiments typically have two complementary components: the description of the cell type targeted (e.g. 'T cells'), and the description of the marker pattern utilized (e.g. CD14-, CD3+)., Results: We here describe our attempts to use ontologies to cross-compare cell types and marker patterns (also referred to as gating definitions). We used a large set of such gating definitions and corresponding cell types submitted by different investigators into ImmPort, a central database for immunology studies, to examine the ability to parse gating definitions using terms from the Protein Ontology (PRO) and cell type descriptions, using the Cell Ontology (CL). We then used logical axioms from CL to detect discrepancies between the two., Conclusions: We suggest adoption of our proposed format for describing gating and cell type definitions to make comparisons easier. We also suggest a number of new terms to describe gating definitions in flow cytometry that are not based on molecular markers captured in PRO, but on forward- and side-scatter of light during data acquisition, which is more appropriate to capture in the Ontology for Biomedical Investigations (OBI). Finally, our approach results in suggestions on what logical axioms and new cell types could be considered for addition to the Cell Ontology.
Published: 2019
Full Text: View/download PDF

11. GOnet: a tool for interactive Gene Ontology analysis.

Author: Pomaznoy M, Ha B, and Peters B
Subjects: Algorithms, Animals, Humans, Mice, Gene Ontology, Software
Abstract: Background: Biological interpretation of gene/protein lists resulting from -omics experiments can be a complex task. A common approach consists of reviewing Gene Ontology (GO) annotations for entries in such lists and searching for enrichment patterns. Unfortunately, there is a gap between machine-readable output of GO software and its human-interpretable form. This gap can be bridged by allowing users to simultaneously visualize and interact with term-term and gene-term relationships., Results: We created the open-source GOnet web-application (available at http://tools.dice-database.org/GOnet/ ), which takes a list of gene or protein entries from human or mouse data and performs GO term annotation analysis (mapping of provided entries to GO subsets) or GO term enrichment analysis (scanning for GO categories overrepresented in the input list). The application is capable of producing parsable data formats and importantly, interactive visualizations of the GO analysis results. The interactive results allow exploration of genes and GO terms as a graph that depicts the natural hierarchy of the terms and retains relationships between terms and genes/proteins. As a result, GOnet provides insight into the functional interconnection of the submitted entries., Conclusions: The application can be used for GO analysis of any biological data sources resulting in gene/protein lists. It can be helpful for experimentalists as well as computational biologists working on biological interpretation of -omics data resulting in such lists.
Published: 2018
Full Text: View/download PDF

12. Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions.

Author: Kim Y, Sidney J, Buus S, Sette A, Nielsen M, and Peters B
Subjects: Alleles, Animals, Benchmarking, Epitopes immunology, HLA Antigens genetics, HLA Antigens immunology, Humans, Mice, Protein Binding, Reproducibility of Results, Computational Biology methods, HLA Antigens metabolism, Oligopeptides metabolism
Abstract: Background: It is important to accurately determine the performance of peptide:MHC binding predictions, as this enables users to compare and choose between different prediction methods and provides estimates of the expected error rate. Two common approaches to determine prediction performance are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set., Results: We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative rules of how large and diverse datasets need to be to provide generalizable performance estimates., Conclusion: It has long been known that cross-validated prediction performance estimates often overestimate performance on independently generated blind set data. We here identify and quantify the specific factors contributing to this effect for MHC-I binding predictions. An increasing number of peptides for which MHC binding affinities are measured experimentally have been selected based on binding predictions and thus are less diverse than historic datasets sampling the entire sequence and affinity space, making them more difficult benchmark data sets. This has to be taken into account when comparing performance metrics between different benchmarks, and when deriving error estimates for predictions based on benchmark performance.
Published: 2014
Full Text: View/download PDF

13. Peptide binding predictions for HLA DR, DP and DQ molecules.

Author: Wang P, Sidney J, Kim Y, Sette A, Lund O, Nielsen M, and Peters B
Subjects: Alleles, Binding Sites, Epitopes genetics, Epitopes immunology, Genes, MHC Class II, HLA-DP Antigens genetics, HLA-DP Antigens immunology, HLA-DQ Antigens genetics, HLA-DQ Antigens immunology, HLA-DR Antigens genetics, HLA-DR Antigens immunology, Humans, Peptides immunology, Peptides metabolism, HLA-DP Antigens chemistry, HLA-DQ Antigens chemistry, HLA-DR Antigens chemistry, Peptides chemistry
Abstract: Background: MHC class II binding predictions are widely used to identify epitope candidates in infectious agents, allergens, cancer and autoantigens. The vast majority of prediction algorithms for human MHC class II to date have targeted HLA molecules encoded in the DR locus. This reflects a significant gap in knowledge as HLA DP and DQ molecules are presumably equally important, and have only been studied less because they are more difficult to handle experimentally., Results: In this study, we aimed to narrow this gap by providing a large scale dataset of over 17,000 HLA-peptide binding affinities for a set of 11 HLA DP and DQ alleles. We also expanded our dataset for HLA DR alleles resulting in a total of 40,000 MHC class II binding affinities covering 26 allelic variants. Utilizing this dataset, we generated prediction tools utilizing several machine learning algorithms and evaluated their performance., Conclusion: We found that 1) prediction methodologies developed for HLA DR molecules perform equally well for DP or DQ molecules. 2) Prediction performances were significantly increased compared to previous reports due to the larger amounts of training data available. 3) The presence of homologous peptides between training and testing datasets should be avoided to give real-world estimates of prediction performance metrics, but the relative ranking of different predictors is largely unaffected by the presence of homologous peptides, and predictors intended for end-user applications should include all training data for maximum performance. 4) The recently developed NN-align prediction method significantly outperformed all other algorithms, including a naïve consensus based on all prediction methods. A new consensus method dropping the comparably weak ARB prediction method could outperform the NN-align method, but further research into how to best combine MHC class II binding predictions is required.
Published: 2010
Full Text: View/download PDF

14. Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior.

Author: Kim Y, Sidney J, Pinilla C, Sette A, and Peters B
Subjects: Amino Acid Sequence, Bayes Theorem, Binding Sites, Databases, Protein, Peptides metabolism, Sequence Analysis, Protein, Amino Acids chemistry, Computational Biology methods, Histocompatibility Antigens Class I chemistry, Peptides chemistry
Abstract: Background: Experts in peptide:MHC binding studies are often able to estimate the impact of a single residue substitution based on a heuristic understanding of amino acid similarity in an experimental context. Our aim is to quantify this measure of similarity to improve peptide:MHC binding prediction methods. This should help compensate for holes and bias in the sequence space coverage of existing peptide binding datasets., Results: Here, a novel amino acid similarity matrix (PMBEC) is directly derived from the binding affinity data of combinatorial peptide mixtures. Like BLOSUM62, this matrix captures well-known physicochemical properties of amino acid residues. However, PMBEC differs markedly from existing matrices in cases where residue substitution involves a reversal of electrostatic charge. To demonstrate its usefulness, we have developed a new peptide:MHC class I binding prediction method, using the matrix as a Bayesian prior. We show that the new method can compensate for missing information on specific residues in the training data. We also carried out a large-scale benchmark, and its results indicate that prediction performance of the new method is comparable to that of the best neural network based approaches for peptide:MHC class I binding., Conclusion: A novel amino acid similarity matrix has been derived for peptide:MHC binding interactions. One prominent feature of the matrix is that it disfavors substitution of residues with opposite charges. Given that the matrix was derived from experimentally determined peptide:MHC binding affinity measurements, this feature is likely shared by all peptide:protein interactions. In addition, we have demonstrated the usefulness of the matrix as a Bayesian prior in an improved scoring-matrix based peptide:MHC class I prediction method. A software implementation of the method is available at: http://www.mhc-pathway.net/smmpmbec.
Published: 2009
Full Text: View/download PDF

15. ElliPro: a new structure-based tool for the prediction of antibody epitopes.

Author: Ponomarenko J, Bui HH, Li W, Fusseder N, Bourne PE, Sette A, and Peters B
Subjects: Algorithms, Binding Sites, Antibody genetics, Cluster Analysis, Humans, Internet, Models, Immunological, Protein Conformation, User-Computer Interface, Computational Biology methods, Epitope Mapping methods, Epitopes, B-Lymphocyte chemistry, Sequence Analysis, Protein methods, Software
Abstract: Background: Reliable prediction of antibody, or B-cell, epitopes remains challenging yet highly desirable for the design of vaccines and immunodiagnostics. A correlation between antigenicity, solvent accessibility, and flexibility in proteins was demonstrated. Subsequently, Thornton and colleagues proposed a method for identifying continuous epitopes in the protein regions protruding from the protein's globular surface. The aim of this work was to implement that method as a web-tool and evaluate its performance on discontinuous epitopes known from the structures of antibody-protein complexes., Results: Here we present ElliPro, a web-tool that implements Thornton's method and, together with a residue clustering algorithm, the MODELLER program and the Jmol viewer, allows the prediction and visualization of antibody epitopes in a given protein sequence or structure. ElliPro has been tested on a benchmark dataset of discontinuous epitopes inferred from 3D structures of antibody-protein complexes. In comparison with six other structure-based methods that can be used for epitope prediction, ElliPro performed the best and gave an AUC value of 0.732, when the most significant prediction was considered for each protein. Since the rank of the best prediction was at most in the top three for more than 70% of proteins and never exceeded five, ElliPro is considered a useful research tool for identifying antibody epitopes in protein antigens. ElliPro is available at http://tools.immuneepitope.org/tools/ElliPro., Conclusion: The results from ElliPro suggest that further research on antibody epitopes considering more features that discriminate epitopes from non-epitopes may further improve predictions. As ElliPro is based on the geometrical properties of protein structure and does not require training, it might be more generally applied for predicting different types of protein-protein interactions.
Published: 2008
Full Text: View/download PDF

16. Automating document classification for the Immune Epitope Database.

Author: Wang P, Morgan AA, Zhang Q, Sette A, and Peters B
Subjects: Abstracting and Indexing methods, Artificial Intelligence, Documentation methods, Epitopes classification, Information Storage and Retrieval methods, Databases, Factual, Epitope Mapping methods, Epitopes chemistry, Epitopes immunology, Natural Language Processing, Periodicals as Topic, PubMed
Abstract: Background: The Immune Epitope Database contains information on immune epitopes curated manually from the scientific literature. Like similar projects in other knowledge domains, significant effort is spent on identifying which articles are relevant for this purpose., Results: We here report our experience in automating this process using Naïve Bayes classifiers trained on 20,910 abstracts classified by domain experts. Improvements on the basic classifier performance were made by a) utilizing information stored in PubMed beyond the abstract itself b) applying standard feature selection criteria and c) extracting domain specific feature patterns that e.g. identify peptides sequences. We have implemented the classifier into the curation process determining if abstracts are clearly relevant, clearly irrelevant, or if no certain classification can be made, in which case the abstracts are manually classified. Testing this classification scheme on an independent dataset, we achieve 95% sensitivity and specificity in the 51.1% of abstracts that were automatically classified., Conclusion: By implementing text classification, we have sped up the reference selection process without sacrificing sensitivity or specificity of the human expert classification. This study provides both practical recommendations for users of text classification tools, as well as a large dataset which can serve as a benchmark for tool developers.
Published: 2007
Full Text: View/download PDF

17. Curation of complex, context-dependent immunological data.

Author: Vita R, Vaughan K, Zarebski L, Salimi N, Fleri W, Grey H, Sathiamurthy M, Mokili J, Bui HH, Bourne PE, Ponomarenko J, de Castro R Jr, Chan RK, Sidney J, Wilson SS, Stewart S, Way S, Peters B, and Sette A
Subjects: Animals, Artificial Intelligence, Databases, Factual, Databases, Protein, Humans, Immune System, Information Storage and Retrieval, Models, Statistical, Pattern Recognition, Automated, Allergy and Immunology, Computational Biology methods, Database Management Systems, Epitopes chemistry
Abstract: Background: The Immune Epitope Database and Analysis Resource (IEDB) is dedicated to capturing, housing and analyzing complex immune epitope related data http://www.immuneepitope.org., Description: To identify and extract relevant data from the scientific literature in an efficient and accurate manner, novel processes were developed for manual and semi-automated annotation., Conclusion: Formalized curation strategies enable the processing of a large volume of context-dependent data, which are now available to the scientific community in an accessible and transparent format. The experiences described herein are applicable to other databases housing complex biological data and requiring a high level of curation expertise.
Published: 2006
Full Text: View/download PDF

18. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.

Author: Peters B and Sette A
Subjects: Algorithms, Amino Acid Sequence, Computer Simulation, Data Interpretation, Statistical, Databases, Protein, Models, Biological, Models, Statistical, Neural Networks, Computer, Peptide Library, Peptides chemistry, Programming Languages, Protein Binding, Sensitivity and Specificity, Software, Biology methods, Computational Biology methods
Abstract: Background: Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as a sequence motif, a scoring matrix or an artificial neural network. The purpose of these models is two-fold. First, they can provide a summary of experimental results, allowing for a deeper understanding of the mechanisms involved in sequence recognition. Second, such models can be used to predict the experimental outcome for yet untested sequences. In the past we reported the development of a method to generate such models called the Stabilized Matrix Method (SMM). This method has been successfully applied to predicting peptide binding to MHC molecules, peptide transport by the transporter associated with antigen presentation (TAP) and proteasomal cleavage of protein sequences., Results: Herein we report the implementation of the SMM algorithm as a publicly available software package. Specific features determining the type of problems the method is most appropriate for are discussed. Advantageous features of the package are: (1) the output generated is easy to interpret, (2) input and output are both quantitative, (3) specific computational strategies to handle experimental noise are built in, (4) the algorithm is designed to effectively handle bounded experimental data, (5) experimental data from randomized peptide libraries and conventional peptides can easily be combined, and (6) it is possible to incorporate pair interactions between positions of a sequence., Conclusion: Making the SMM method publicly available enables bioinformaticians and experimental biologists to easily access it, to compare its performance to other prediction methods, and to extend it to other applications.
Published: 2005
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

18 results on '"Peters B"'

1. Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation

2. Peptide binding predictions for HLA DR, DP and DQ molecules

3. Derivation of an amino acid similarity matrix for peptide:MHC binding and its application as a Bayesian prior

4. ElliPro: a new structure-based tool for the prediction of antibody epitopes

5. Automating document classification for the Immune Epitope Database

6. Curation of complex, context-dependent immunological data

7. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method

8. PEPMatch: a tool to identify short peptide sequence matches in large sets of proteins.

9. Benchmark datasets of immune receptor-epitope structural complexes.

10. Reporting and connecting cell type names and gating definitions through ontologies.

11. GOnet: a tool for interactive Gene Ontology analysis.

12. Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions.

13. Peptide binding predictions for HLA DR, DP and DQ molecules.

14. Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior.

15. ElliPro: a new structure-based tool for the prediction of antibody epitopes.

16. Automating document classification for the Immune Epitope Database.

17. Curation of complex, context-dependent immunological data.

18. Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

18 results on '"Peters B"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources