16 results on '"PROTEIN ANNOTATION"'
Search Results
2. Protein Annotation by Secondary Structure Based Alignments (PASSTA)
- Author
-
Bannert, Constantin, Stoye, Jens, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Istrail, Sorin, editor, Pevzner, Pavel, editor, Waterman, Michael, editor, R. Berthold, Michael, editor, Glen, Robert C., editor, Diederichs, Kay, editor, Kohlbacher, Oliver, editor, and Fischer, Ingrid, editor
- Published
- 2005
- Full Text
- View/download PDF
3. Columba: Multidimensional Data Integration of Protein Annotations
- Author
-
Rother, Kristian, Müller, Heiko, Trissl, Silke, Koch, Ina, Steinke, Thomas, Preissner, Robert, Frömmel, Cornelius, Leser, Ulf, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Istrail, Sorin, editor, Pevzner, Pavel, editor, Waterman, Michael, editor, and Rahm, Erhard, editor
- Published
- 2004
- Full Text
- View/download PDF
4. Functional Annotation and Analysis of Dual Oxidase 1 (DUOX1): a Potential Anti-pyocyanin Immune Component
- Author
-
Rashid, Muhammad Ibrahim, Ali, Amjad, and Andleeb, Saadia
- Published
- 2019
- Full Text
- View/download PDF
5. Aligning Discovered Patterns from Protein Family Sequences
- Author
-
Dennis Zhuang, Andrew K. C. Wong, and En-Shiun Annie Lee
- Subjects
Smith–Waterman algorithm ,Protein family ,Computational biology ,Plasma protein binding ,PROSITE ,Biology ,computer.software_genre ,Hierarchical clustering ,ComputingMethodologies_PATTERNRECOGNITION ,Protein Annotation ,Protein function prediction ,Data mining ,computer ,Function (biology) - Abstract
A basic task in protein analysis is to discover a set of sequence patterns that characterizes the function of a protein family. To address this task, we introduce a synthesized pattern representation called Aligned Pattern (AP) Cluster to discover potential functional segments in protein sequences. We apply our algorithm to identify and display the binding segments for the Cytochrome C. and Ubiquitin protein families. The resulting AP Clusters correspond to protein binding segments that surround the binding residues. When compared to the results from the protein annotation databases, PROSITE and pFam, ours are more efficient in computation and comprehensive in quality. The significance of the AP Cluster is that it is able to capture subtle variations of the binding segments in protein families. It thus could help to reduce time-consuming simulations and experimentation in the protein analysis.
- Published
- 2012
- Full Text
- View/download PDF
6. Metric Labeling and Semi-metric Embedding for Protein Annotation Prediction
- Author
-
Emre Sefer and Carl Kingsford
- Subjects
Markov random field ,Computer science ,Heuristic ,business.industry ,Function (mathematics) ,Machine learning ,computer.software_genre ,Protein Annotation ,Metric (mathematics) ,Convex optimization ,Embedding ,Protein function prediction ,Artificial intelligence ,business ,Algorithm ,computer - Abstract
Computational techniques have been successful at predicting protein function from relational data (functional or physical interactions). These techniques have been used to generate hypotheses and to direct experimental validation. With few exceptions, the task is modeled as multi-label classification problems where the labels (functions) are treated independently or semi-independently. However, databases such as the Gene Ontology provide information about the similarities between functions. We explore the use of the Metric Labeling combinatorial optimization problem to make use of heuristically computed distances between functions to make more accurate predictions of protein function in networks derived from both physical interactions and a combination of other data types. To do this, we give a new technique (based on convex optimization) for converting heuristic semimetric distances into a metric with minimum least-squared distortion (LSD). The Metric Labeling approach is shown to outperform five existing techniques for inferring function from networks. These results suggest Metric Labeling is useful for protein function prediction, and that LSD minimization can help solve the problem of converting heuristic distances to a metric
- Published
- 2011
- Full Text
- View/download PDF
7. MMRF for Proteome Annotation Applied to Human Protein Disease Prediction
- Author
-
Agapito Ledezma, Beatriz García-Jiménez, and Araceli Sanchis
- Subjects
Biological data ,Protein family ,Computer science ,business.industry ,Relational data mining ,Machine learning ,computer.software_genre ,Annotation ,RDM ,Protein Annotation ,Interaction network ,Proteome ,Artificial intelligence ,business ,computer - Abstract
Biological processes where every gene and protein participates is an essential knowledge for designing disease treatments. Nowadays, these annotations are still unknown for many genes and proteins. Since making annotations from in-vivo experiments is costly, computational predictors are needed for different kinds of annotation such as metabolic pathway, interaction network, protein family, tissue, disease and so on. Biological data has an intrinsic relational structure, including genes and proteins, which can be grouped by many criteria. This hinders the possibility of finding good hypotheses when attribute-value representation is used. Hence, we propose the generic Modular Multi-Relational Framework (MMRF) to predict different kinds of gene and protein annotation using Relational Data Mining (RDM). The specific MMRF application to annotate human protein with diseases verifies that group knowledge (mainly protein-protein interaction pairs) improves the prediction, particularly doubling the area under the precision-recall curve.
- Published
- 2011
- Full Text
- View/download PDF
8. Detection of Protein Domains in Eukaryotic Genome Sequences
- Author
-
Arli Aditya Parikesit, Peter F. Stadler, and Sonja J. Prohaska
- Subjects
Genetics ,Annotation ,Protein Annotation ,Eukaryotic genome ,Protein domain ,Genome project ,Computational biology ,Biology ,Gene ,Genome ,Organism - Abstract
Large-scale studies of the origins and evolution of regulatory mechanisms require quantitative estimates of the abundance and co-occurrence of functional protein domains in the genomes of very diverse organism. Current databases, such as SUPERFAMILY, are not able to provide such quantitative data because of species-specific differences and biases in the existing transcript and protein annotations on which they are based. Here we show that the combination of de novo gene predictors and subsequent HMM-based annotation of SCOP domains in the predicted peptides leads to consistent estimates with acceptable accuracy.
- Published
- 2010
- Full Text
- View/download PDF
9. Integration and Mining of Genomic Annotations: Experiences and Perspectives in GFINDer Data Warehousing
- Author
-
Stefano Ceri, Marco Masseroli, and Alessandro Campi
- Subjects
Annotation ,Web server ,Information retrieval ,Protein Annotation ,Computer science ,Federated databases ,Data mining ,Gene Annotation ,computer.software_genre ,Data type ,computer ,Data warehouse - Abstract
Many tasks in bioinformatics require the comprehensive evaluation of different types of data, generally available in distributed and heterogeneous data sources. Several approaches, including federated databases, multi databases and mediator based systems, have been proposed to integrate data from multiple sources. Yet, data warehousing seams to be the most adequate when numerous data need to be integrated, efficiently processed, and mined comprehensively. To support biological interpretation of high-throughput gene lists, we previously developed GFINDer (Genome Functional INtegrated Discoverer, http://www.bioinformatics.polimi.it/GFINDer/), a web server that statistically analyzes and mines functional and phenotypic gene annotations sparsely available in numerous databanks to highlight annotation categories significantly enriched or depleted in the considered gene lists. GFINDer includes a data warehouse that integrates gene and protein annotations of several organisms expressed through various controlled terminologies and ontologies. Here, we describe GFINDer data warehouse and discuss the lessons learned in its construction and five-year maintenance and development.
- Published
- 2009
- Full Text
- View/download PDF
10. Automatic Classification of Enzyme Family in Protein Annotation
- Author
-
Cassia Trojahn dos Santos, Ney Lemke, and Ana L. C. Bazzan
- Subjects
Computer science ,business.industry ,Process (engineering) ,Enzyme Commission number ,Genome project ,Machine learning ,computer.software_genre ,Class (biology) ,Support vector machine ,Annotation ,Protein Annotation ,Automatic indexing ,Artificial intelligence ,business ,computer - Abstract
Most of the tasks in genome annotation can be at least partially automated. Since this annotation is time-consuming, facilitating some parts of the process --- thus freeing the specialist to carry out more valuable tasks --- has been the motivation of many tools and annotation environments. In particular, annotation of protein function can benefit from knowledge about enzymatic processes. The use of sequence homology alone is not a good approach to derive this knowledge when there are only a few homologues of the sequence to be annotated. The alternative is to use motifs. This paper uses a symbolic machine learning approach to derive rules for the classification of enzymes according to the Enzyme Commission (EC). Our results show that, for the top class, the average global classification error is 3.13%. Our technique also produces a set of rules relating structural to functional information, which is important to understand the protein tridimensional structure and determine its biological function.
- Published
- 2009
- Full Text
- View/download PDF
11. Implementing an Interactive Web-Based DAS Client
- Author
-
Xavier Messeguer and Bernat Gel
- Subjects
Multimedia ,Computer science ,business.industry ,computer.software_genre ,Visualization ,World Wide Web ,Annotation ,Interactivity ,Protein Annotation ,Web application ,User interface ,Zoom ,business ,computer ,Interactive visualization - Abstract
The Distributed Annotation System (DAS) allows clients to access many disperse genome and protein annotation sources in a coordinate manner. Here we present DASGenExp, a web based DAS client for interactive visualisation and exploration of genome based annotations inspired in the Google Maps user interface. The client is easy to use and intuitive and integrates some unique functions not found in other DAS clients: interactivity, multiple genomes at the same time, arbitrary zoom windows,... DASGenExp can be freely accessed at http://gralggen.lsi.upc.edu/recerca/DASgenexp/ .
- Published
- 2008
- Full Text
- View/download PDF
12. Protein Function Prediction Based on Patterns in Biological Networks
- Author
-
Mustafa Kirac and Gultekin Ozsoyoglu
- Subjects
Computational biology ,Biology ,computer.software_genre ,Annotation ,ComputingMethodologies_PATTERNRECOGNITION ,Protein Annotation ,Protein Interaction Networks ,False positive paradox ,Protein function prediction ,Data mining ,Critical Assessment of Function Annotation ,Precision and recall ,computer ,Biological network - Abstract
In this paper, we propose a pattern-based protein function annotation framework, employing protein interaction networks, to predict annotation functions of proteins. More specifically, we first detect patterns that appear in the neighborhood of proteins with a particular functionality, and then transfer annotations between two proteins only if they have similar annotation patterns. We show that, in comparison with other techniques, our approach predicts protein annotations more effectively. Our technique (a) produces the highest prediction accuracy of 70-80% precision and recall for different organism specific datasets, and (b) is robust to false positives in protein interaction networks.
- Published
- 2008
- Full Text
- View/download PDF
13. What’s New? What’s Certain? – Scoring Search Results in the Presence of Overlapping Data Sources
- Author
-
Silke Trißl, Philipp Hussels, and Ulf Leser
- Subjects
SQL ,Information retrieval ,Computer science ,Rank (computer programming) ,Subject (documents) ,computer.software_genre ,Degree (music) ,Strength of evidence ,Ranking ,Protein Annotation ,Data mining ,computer ,Data integration ,computer.programming_language - Abstract
Data integration projects in the life sciences often gather data on a particular subject from multiple sources. Some of these sources overlap to a certain degree. Therefore, integrated search results may be supported by one, few, or all data sources. To reflect these differences, results should be ranked according to the number of data sources that support them. How such a ranking should look like is not clear per se. Either, results supported by only few sources are ranked high because this information is potentially new, or such results are ranked low because the strength of evidence supporting them is limited. We present two scoring schemes to rank search results in the integrated protein annotation database Columba. We define a surprisingness score, preferring results supported by few sources, and a confidence score, preferring frequently encountered information. Unlike many other scoring schemes our proposal is purely data-driven and does not require users to specify preferences among sources. Both scores take the concrete overlaps of data sources into account and do not presume statistical independence. We show how our schemes have been implemented efficiently using SQL.
- Published
- 2007
- Full Text
- View/download PDF
14. Protein Annotation by Secondary Structure Based Alignments (PASSTA)
- Author
-
Jens Stoye and Constantin Bannert
- Subjects
Protein Annotation ,Computer science ,Protein Data Bank (RCSB PDB) ,A protein ,computer.file_format ,Data mining ,Protein Data Bank ,computer.software_genre ,computer ,Protein secondary structure - Abstract
Most software tools in homology recognition on proteins answer only a few specific questions, often leaving not much room for the interpretation of the results. We develop a software Passta that helps to decide whether a protein sequence is related to a protein with known structure. Our approach may indicate rearrangements and duplications, and it displays information from different sources in an integrated fashion. Our approach is to first break each sequence of the Protein Data Bank (PDB) into Secondary Structure Elements (SSEs). Given a query sequence, our goal is then to ‘explain' it by SSE sequences as good as possible. Therefore, we use the Waterman-Eggert algorithm to compute pairwise alignments of SSE sequences with the query. In a graph-based approach, we then select those alignments that reproduce the query in an optimal way. We discuss two examples to illustrate the potential (and possible pitfalls) of the method.
- Published
- 2005
- Full Text
- View/download PDF
15. Computational Life Sciences
- Author
-
Kay Diederichs, Oliver Kohlbacher, Michael R. Berthold, Robert C. Glen, and Ingrid Fischer
- Subjects
Support vector machine ,Multiple sequence alignment ,Structural biology ,Protein Annotation ,Computer science ,Systems biology ,Context (language use) ,Genomics ,Data mining ,computer.software_genre ,Proteomics ,computer - Abstract
Systems Biology.- Structural Protein Interactions Predict Kinase-Inhibitor Interactions in Upregulated Pancreas Tumour Genes Expression Data.- Biochemical Pathway Analysis via Signature Mining.- Recurrent Neuro-fuzzy Network Models for Reverse Engineering Gene Regulatory Interactions.- Data Analysis and Integration.- Some Applications of Dummy Point Scatterers for Phasing in Macromolecular X-Ray Crystallography.- BioRegistry: A Structured Metadata Repository for Bioinformatic Databases.- Robust Perron Cluster Analysis for Various Applications in Computational Life Science.- Structural Biology.- Multiple Alignment of Protein Structures in Three Dimensions.- Protein Annotation by Secondary Structure Based Alignments (PASSTA).- MAPPIS: Multiple 3D Alignment of Protein-Protein Interfaces.- Genomics.- Frequent Itemsets for Genomic Profiling.- Gene Selection Through Sensitivity Analysis of Support Vector Machines.- The Breakpoint Graph in Ciliates.- Computational Proteomics.- ProSpect: An R Package for Analyzing SELDI Measurements Identifying Protein Biomarkers.- Algorithms for the Automated Absolute Quantification of Diagnostic Markers in Complex Proteomics Samples.- Detection of Protein Assemblies in Crystals.- Molecular Informatics.- Molecular Similarity Searching Using COSMO Screening Charges (COSMO/3PP).- Increasing Diversity in In-silico Screening with Target Flexibility.- Multiple Semi-flexible 3D Superposition of Drug-Sized Molecules.- Molecular Structure Determination and Simulation.- Efficiency Considerations in Solving Smoluchowski Equations for Rough Potentials.- Fast and Accurate Structural RNA Alignment by Progressive Lagrangian Optimization.- Visual Analysis of Molecular Conformations by Means of a Dynamic Density Mixture Model.- Distributed Data Mining.- Distributed BLAST in a Grid Computing Context.- Parallel Tuning of Support Vector Machine Learning Parameters for Large and Unbalanced Data Sets.- The Architecture of a Proteomic Network in the Yeast.
- Published
- 2005
- Full Text
- View/download PDF
16. Columba: Multidimensional Data Integration of Protein Annotations
- Author
-
Thomas Steinke, Silke Trissl, Cornelius Frömmel, Heiko Müller, Ulf Leser, Ina Koch, Kristian Rother, and Robert Preissner
- Subjects
Information retrieval ,business.industry ,Computer science ,computer.file_format ,Protein Data Bank ,computer.software_genre ,Software ,Protein Annotation ,Data quality ,Integrated database ,Semantic integration ,Data mining ,Software architecture ,business ,computer ,Information integration - Abstract
We present COLUMBA, an integrated database of protein annotations. COLUMBA is centered around proteins whose structure has been resolved and adds as much annotations as possible to those proteins, describing their proper-ties such as function, sequence, classification, textual description, participation in pathways, etc. Annotations are extracted from seven (soon eleven) external data sources. In this paper we describe the motivation for building COLUMBA, its integrational architecture and the software tools we developed for the integrated data sources and keeping COLUMBA up-to-date. We put special focus on two aspects: First, COLUMBA does not try to remove redundancies and overlaps in data sources, but views each data source as a proper dimension describing a protein. We explain the advantages of this approach compared to a tighter semantic integration as pursued in many other projects. Second, we highlight our current investigations regarding the quality of data in COLUMBA by identification of hot spots of poor data quality.
- Published
- 2004
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.