175 results on '"Andy Brass"'
Search Results
102. maxdLoad2 and maxdBrowse: standards-compliant tools for microarray experimental annotation, data management and dissemination
- Author
-
Helen Hulme, A. Joseph Wood, Norman Morrison, Andy Brass, Douglas B. Kell, Michael Wilson, David Hancock, Andrew Hayes, Giles Velarde, and Karim Nashar
- Subjects
Microarray ,Computer science ,Data management ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,World Wide Web ,User-Computer Interface ,Annotation ,Structural Biology ,Microarray databases ,lcsh:QH301-705.5 ,Molecular Biology ,Internet ,Information Dissemination ,business.industry ,Microarray analysis techniques ,Applied Mathematics ,Microarray Analysis ,Computer Science Applications ,lcsh:Biology (General) ,Data exchange ,Data Interpretation, Statistical ,lcsh:R858-859.7 ,DNA microarray ,business ,Software - Abstract
Background maxdLoad2 is a relational database schema and Java® application for microarray experimental annotation and storage. It is compliant with all standards for microarray meta-data capture; including the specification of what data should be recorded, extensive use of standard ontologies and support for data exchange formats. The output from maxdLoad2 is of a form acceptable for submission to the ArrayExpress microarray repository at the European Bioinformatics Institute. maxdBrowse is a PHP web-application that makes contents of maxdLoad2 databases accessible via web-browser, the command-line and web-service environments. It thus acts as both a dissemination and data-mining tool. Results maxdLoad2 presents an easy-to-use interface to an underlying relational database and provides a full complement of facilities for browsing, searching and editing. There is a tree-based visualization of data connectivity and the ability to explore the links between any pair of data elements, irrespective of how many intermediate links lie between them. Its principle novel features are: • the flexibility of the meta-data that can be captured, • the tools provided for importing data from spreadsheets and other tabular representations, • the tools provided for the automatic creation of structured documents, • the ability to browse and access the data via web and web-services interfaces. Within maxdLoad2 it is very straightforward to customise the meta-data that is being captured or change the definitions of the meta-data. These meta-data definitions are stored within the database itself allowing client software to connect properly to a modified database without having to be specially configured. The meta-data definitions (configuration file) can also be centralized allowing changes made in response to revisions of standards or terminologies to be propagated to clients without user intervention. maxdBrowse is hosted on a web-server and presents multiple interfaces to the contents of maxd databases. maxdBrowse emulates many of the browse and search features available in the maxdLoad2 application via a web-browser. This allows users who are not familiar with maxdLoad2 to browse and export microarray data from the database for their own analysis. The same browse and search features are also available via command-line and SOAP server interfaces. This both enables scripting of data export for use embedded in data repositories and analysis environments, and allows access to the maxd databases via web-service architectures. Conclusion maxdLoad2 http://www.bioinf.man.ac.uk/microarray/maxd/ and maxdBrowse http://dbk.ch.umist.ac.uk/maxdBrowse are portable and compatible with all common operating systems and major database servers. They provide a powerful, flexible package for annotation of microarray experiments and a convenient dissemination environment. They are available for download and open sourced under the Artistic License.
- Published
- 2005
103. Analysis of the transcriptome of the protozoan Theileria parva using MPSS reveals that the majority of genes are transcriptionally active in the schizont stage
- Author
-
Lee R. Haines, David C. Hoyle, Trushar Shah, Roger Pelle, Evans L. N. Taracha, Charles Lu, Simon P. Graham, Malcolm J. Gardner, Terry W. Pearson, Richard P. Bishop, Andy Brass, Helen Hulme, Simon Kang’a, Brian Hass, Owen White, Jennifer R. Wortman, Vishvanath Nene, and Etienne P. de Villiers
- Subjects
Transcriptional Activation ,Theileria parva ,030231 tropical medicine ,Protozoan Proteins ,Genomics ,Genome ,Article ,Massively parallel signature sequencing ,Transcriptome ,03 medical and health sciences ,Open Reading Frames ,0302 clinical medicine ,Complementary DNA ,parasitic diseases ,Genetics ,Animals ,RNA, Antisense ,Gene ,030304 developmental biology ,0303 health sciences ,biology ,Sequence Analysis, RNA ,Telomere ,biology.organism_classification ,Molecular biology ,Open reading frame ,Genome, Protozoan ,RNA, Protozoan - Abstract
Massively parallel signature sequencing (MPSS) was used to analyze the transcriptome of the intracellular protozoan Theileria parva. In total 1,095,000, 20 bp sequences representing 4371 different signatures were generated from T.parva schizonts. Reproducible signatures were identified within 73% of potentially detectable predicted genes and 83% had signatures in at least one MPSS cycle. A predicted leader peptide was detected on 405 expressed genes. The quantitative range of signatures was 4-52,256 transcripts per million (t.p.m.). Rare transcripts (
- Published
- 2005
104. TAMBIS: transparent access to multiple bioinformatics services
- Author
-
Norman W. Paton, Sean Bechhofer, Carole Goble, Andy Brass, Martin Peim, Gary Ng, Patricia G. Baker, and Robert Stevens
- Subjects
World Wide Web ,Computer science ,Interface (Java) ,Informatics ,Mediation ,Query formulation ,Transparency (human–computer interaction) ,Ontology (information science) ,Bioinformatics ,Turing ,computer ,Semantic heterogeneity ,computer.programming_language - Abstract
Transparent Access to Multiple Bioinformatics Information Sources (TAMBIS) addresses the perennial problem of heterogeneity and distribution of bioinformatics resources in performing bioinformatics analyses. Asking questions of these resources usually requires multiple resources to be used and data transferred between those resources. A biologist using these resources needs much knowledge of which resources to use, where they are to be found, in which order they should be used, and how to overcome the heterogeneity between those resources. TAMBIS seeks to make this knowledge burden transparent by cap turing knowledge about molecular biology and bioinformatics tasks in an ontology. The TAMBIS ontology acts as a global schema over diverse resources and drives a query formulation interface offering a common language over those resources. High-level, conceptual, source-independent queries are rewritten to concrete query plans. As a result of its transparency, TAMBIS frees a biologist from needing informatics knowledge to concentrate upon the biological question. Keywords: transparent access; ontology; mediation; semantic heterogeneity; distribution
- Published
- 2005
105. A Little Semantic Web Goes a Long Way in Biology
- Author
-
Ian Horrocks, Katherine Wolstencroft, Ulrike Sattler, Daniele Turi, Phillip Lord, Andy Brass, and Robert Stevens
- Subjects
Information retrieval ,Computer science ,business.industry ,media_common.quotation_subject ,Social Semantic Web ,World Wide Web ,Identification (information) ,ComputingMethodologies_PATTERNRECOGNITION ,Description logic ,e-Science ,Quality (business) ,business ,Semantic Web ,media_common - Abstract
We show how state-of-the-art Semantic Web technology can be used in e-Science, in particular, to automate the classification of proteins in biology. We show that the resulting classification was of comparable quality to that performed by a human expert, and how investigations using the classified data even resulted in the discovery of significant information that had previously been overlooked, leading to the identification of a possible drug-target.
- Published
- 2005
106. Constructing ontology-driven protein family databases
- Author
-
Robert Stevens, Andy Brass, R. Mcentire, Katy Wolstencroft, and Lydia Tabernero
- Subjects
Statistics and Probability ,Protein family ,Information Storage and Retrieval ,Documentation ,Ontology (information science) ,Biology ,computer.software_genre ,Biochemistry ,User-Computer Interface ,Resource (project management) ,Description logic ,Computer Graphics ,Databases, Protein ,Molecular Biology ,Phylogeny ,computer.programming_language ,Natural Language Processing ,Structure (mathematical logic) ,Biological data ,Database ,business.industry ,Proteins ,DAML+OIL ,Phosphoric Monoester Hydrolases ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Database Management Systems ,The Internet ,ATP-Binding Cassette Transporters ,Periodicals as Topic ,business ,computer - Abstract
Motivation:Protein family databases provide a central focus for scientific communities as well as providing useful resources to aide research. However, such resources require constant curation and often become outdated and discontinued. We have developed an ontology-driven system for capturing and managing protein family data that addresses the problems of maintenance and sustainability. Results:Using protein phosphatases and ABC transporters as model protein families, we constructed two protein family database resources around a central DAML+OIL ontology. Each resource contains specialist information about each protein family, providing specialized domain-specific resources based on the same template structure. The formal structure, combined with the extraction of biological data using GO terms, allows for automated update strategies. Despite the functional differences between the two protein families, the ontology model was equally applicable to both, demonstrating the generic nature of the system. Availability: The protein phosphatase resource, PhosphaBase, is freely available on the internet (http://www.bioinf.man.ac.uk/phosphabase). The DAML+OIL ontology for the protein phosphatases and the ABC transporters is available on request from the authors. Contact: kwolstencroft@cs.man.ac.uk
- Published
- 2004
107. PhosphaBase: an ontology-driven database resource for protein phosphatases
- Author
-
Katherine Wolstencroft, Lydia Tabernero, Robert Stevens, and Andy Brass
- Subjects
Proteomics ,Protein family ,Phosphatase ,Information Storage and Retrieval ,Ontology (information science) ,Biology ,Information repository ,computer.software_genre ,Biochemistry ,Domain (software engineering) ,Automation ,Resource (project management) ,Structural Biology ,Databases, Genetic ,Phosphoprotein Phosphatases ,Animals ,Humans ,Phosphorylation ,Databases, Protein ,Molecular Biology ,Natural Language Processing ,Database ,Kinase ,Computational Biology ,Protein Structure, Tertiary ,Databases as Topic ,Database Management Systems ,Programming Languages ,computer ,Protein Kinases - Abstract
PhosphaBase is an ontology-driven database resource containing information on the protein phosphatase family. It is the first public resource dedicated to protein phosphatases, which are enzymes that perform dephosphorylation reactions. In conjunction with the phosphorylation action of protein kinases, phosphatases are involved in important control and communication mechanisms in the cell. They have also been implicated in many human diseases, including diabetes and obesity, cancers, and neurodegenerative conditions. PhosphaBase aims to centralize the growing base of knowledge in the phosphatase research domain. The resource is built around a formal, domain-specific DAML+OIL ontology, and the data are collected from heterogeneous biological sources using Gene Ontology terms as a means of data extraction. The overall ontology-driven architecture provides a robust structure with distinct advantages for sustainability and provides the potential for the development of diagnostic tools, as well as a data repository.
- Published
- 2004
108. Coping with cold: An integrative, multitissue analysis of the transcriptome of a poikilothermic vertebrate
- Author
-
Andrew Y. Gracey, Andrew R. Cossins, E. Jane Fraser, Jane Rogers, Weizhong Li, Ruth R. Taylor, Yongxiang Fang, and Andy Brass
- Subjects
Genetics ,Candidate gene ,Multidisciplinary ,Carps ,DNA, Complementary ,Microarray ,Gene Expression Profiling ,Molecular Sequence Data ,Computational biology ,Biology ,Biological Sciences ,Phenotype ,Adaptation, Physiological ,Transcriptome ,Gene expression profiling ,Cold Temperature ,Metagenomics ,Complementary DNA ,Animals ,RNA, Messenger ,DNA Probes ,Gene ,Oligonucleotide Array Sequence Analysis - Abstract
How do organisms respond adaptively to environmental stress? Although some gene-specific responses have been explored, others remain to be identified, and there is a very poor understanding of the system-wide integration of response, particularly in complex, multitissue animals. Here, we adopt a transcript screening approach to explore the mechanisms underpinning a major, whole-body phenotypic transition in a vertebrate animal that naturally experiences extreme environmental stress. Carp were exposed to increasing levels of cold, and responses across seven tissues were assessed by using a microarray composed of 13,440 cDNA probes. A large set of unique cDNAs (≈3,400) were affected by cold. These cDNAs included an expression signature common to all tissues of 252 up-regulated genes involved in RNA processing, translation initiation, mitochondrial metabolism, proteasomal function, and modification of higher-order structures of lipid membranes and chromosomes. Also identified were large numbers of transcripts with highly tissue-specific patterns of regulation. By unbiased profiling of gene ontologies, we have identified the distinctive functional features of each tissue's response and integrate them into a comprehensive view of the whole-body transition from one strongly adaptive phenotype to another. This approach revealed an expression signature suggestive of atrophy in cooled skeletal muscle. This environmental genomics approach by using a well studied but nongenomic species has identified a range of candidate genes endowing thermotolerance and reveals a previously unrecognized scale and complexity of responses that impacts at the level of cellular and tissue function.
- Published
- 2004
109. Comparison of TFII-I gene family members deleted in Williams-Beuren syndrome
- Author
-
Timothy A. Hinsley, Pamela Cunliffe, May Tassabehji, Andy Brass, and Hannah Tipney
- Subjects
Gene isoform ,Williams Syndrome ,Leucine zipper ,Amino Acid Motifs ,Molecular Sequence Data ,SUMO protein ,Muscle Proteins ,Sequence alignment ,Chromosomal rearrangement ,Biology ,Biochemistry ,Homology (biology) ,Article ,Transcription Factors, TFII ,Transcription Factors, TFIII ,Gene family ,Humans ,Protein Isoforms ,Amino Acid Sequence ,Molecular Biology ,Gene ,Phylogeny ,Genetics ,Leucine Zippers ,Sequence Homology, Amino Acid ,Computational Biology ,Nuclear Proteins ,Exons ,Trans-Activators ,Sequence Alignment ,Chromosomes, Human, Pair 7 ,Gene Deletion - Abstract
Williams-Beuren syndrome (WBS) is a neurological disorder resulting from a microdeletion, typically 1.5 megabases in size, at 7q11.23. Atypical patients implicate genes at the telomeric end of this multigene deletion as the main candidates for the pathology of WBS in particular the unequal cognitive profile associated with the condition. We recently identified a gene (GTF2IRD2) that shares homology with other members of a unique family of transcription factors (TFII-I family), which reside in the critical telomeric region. Using bioinformatics tools this study focuses on the detailed assessment of this gene family, concentrating on their characteristic structural components such as the leucine zipper (LZ) and I-repeat elements, in an attempt to identify features that could aid functional predictions. Phylogenetic analysis identified distinct I-repeat clades shared between family members. Linking functional data to one such clade has implicated them in DNA binding. The identification of PEST, synergy control motifs, and sumoylation sites common to all family members suggest a shared mechanism regulating the stability and transcriptional activity of these factors. In addition, the identification/isolation of short truncated isoforms for each TFII-I family member implies a mode of self-regulation. The exceptionally high identity shared between GTF2I and GTF2IRD2, suggests that heterodimers as well as homodimers are possible, and indicates overlapping functions between their respective short isoforms. Such cross-reactivity between GTF2I and GTF2IRD2 short isoforms might have been the evolutionary driving force for the 7q11.23 chromosomal rearrangement not present in the syntenic region in mice.
- Published
- 2004
110. PEDRo: A database for storing, searching and disseminating experimental proteomics data
- Author
-
Simon J. Gaskell, Zhikang Yin, Kathleen M. Carroll, Julie Howard, Keith F. Chater, Chris F. Taylor, Sarah R. Hart, Alistair J. P. Brown, Caroline A. Evans, Norman Morrison, Lena Hansson, Norman W. Paton, Stephen G. Oliver, David Stead, Kathryn S. Lilley, Andy Brass, Muriel Mewissen, Thomas McLaughlin, Peter Ghazal, Kevin Garwood, Andrew Hesketh, Anthony D. Whetton, Chris Garwood, Scott Joens, Simon J. Hubbard, Lilley, Kathryn [0000-0003-0594-6543], Oliver, Stephen [0000-0001-6330-7526], and Apollo - University of Cambridge Repository
- Subjects
Proteomics ,Saccharomyces cerevisiae Proteins ,lcsh:QH426-470 ,GeneralLiterature_INTRODUCTORYANDSURVEY ,lcsh:Biotechnology ,Data management ,Candida glabrata ,Streptomyces coelicolor ,Biology ,computer.software_genre ,Database ,Fungal Proteins ,Mice ,03 medical and health sciences ,Upload ,0302 clinical medicine ,Bacterial Proteins ,Software Design ,lcsh:TP248.13-248.65 ,Candida albicans ,Genetics ,Human proteome project ,Animals ,Databases, Protein ,Trichinella spiralis ,030304 developmental biology ,0303 health sciences ,Fungal protein ,Arabidopsis Proteins ,business.industry ,QH ,Computational Biology ,Proteins ,Experimental data ,Trichinellosis ,Helminth Proteins ,Jejunal Diseases ,lcsh:Genetics ,ComputingMethodologies_PATTERNRECOGNITION ,Data model ,030220 oncology & carcinogenesis ,Proteome ,Database Management Systems ,business ,computer ,Biotechnology - Abstract
Background Proteomics is rapidly evolving into a high-throughput technology, in which substantial and systematic studies are conducted on samples from a wide range of physiological, developmental, or pathological conditions. Reference maps from 2D gels are widely circulated. However, there is, as yet, no formally accepted standard representation to support the sharing of proteomics data, and little systematic dissemination of comprehensive proteomic data sets. Results This paper describes the design, implementation and use of a Proteome Experimental Data Repository (PEDRo), which makes comprehensive proteomics data sets available for browsing, searching and downloading. It is also serves to extend the debate on the level of detail at which proteomics data should be captured, the sorts of facilities that should be provided by proteome data management systems, and the techniques by which such facilities can be made available. Conclusions The PEDRo database provides access to a collection of comprehensive descriptions of experimental data sets in proteomics. Not only are these data sets interesting in and of themselves, they also provide a useful early validation of the PEDRo data model, which has served as a starting point for the ongoing standardisation activity through the Proteome Standards Initiative of the Human Proteome Organisation.
- Published
- 2004
111. Exploring Williams-Beuren syndrome using myGrid
- Author
-
Carole Goble, Hannah Tipney, Andy Brass, Chris Wroe, Tom Oinn, Martin Senger, Robert Stevens, Phillip Lord, and May Tassabehji
- Subjects
Statistics and Probability ,Williams Syndrome ,Exploit ,Virtual organization ,In silico ,Context (language use) ,Biology ,computer.software_genre ,Biochemistry ,World Wide Web ,User-Computer Interface ,Computer Graphics ,Genetic Predisposition to Disease ,Molecular Biology ,Internet ,business.industry ,Chromosome Mapping ,Sequence Analysis, DNA ,Data science ,Computer Science Applications ,Computational Mathematics ,Workflow ,Semantic grid ,Computational Theory and Mathematics ,Middleware (distributed applications) ,The Internet ,business ,computer ,Algorithms ,Software - Abstract
Motivation:In silico experiments necessitate the virtual organization of people, data, tools and machines. The scientific process also necessitates an awareness of the experience base, both of personal data as well as the wider context of work. The management of all these data and the co-ordination of resources to manage such virtual organizations and the data surrounding them needs significant computational infra-structure support. Results: In this paper, we show that myGrid, middleware for the Semantic Grid, enables biologists to perform and manage in silico experiments, then explore and exploit the results of their experiments. We demonstrate myGrid in the context of a series of bioinformatics experiments focused on a 1.5 Mb region on chromosome 7 which is deleted in Williams--Beuren syndrome (WBS). Due to the highly repetitive nature of sequence flanking/in the WBS critical region (WBSCR), sequencing of the region is incomplete leaving documented gaps in the released sequence. myGrid was used in a series of experiments to find newly sequenced human genomic DNA clones that extended into these 'gap' regions in order to produce a complete and accurate map of the WBSCR. Once placed in this region, these DNA sequences were analysed with a battery of prediction tools in order to locate putative genes and regulatory elements possibly implicated in the disorder. Finally, any genes discovered were submitted to a range of standard bioinformatics tools for their characterization. We report how myGrid has been used to create workflows for these in silico experiments, run those workflows regularly and notify the biologist when new DNA and genes are discovered. The myGrid services collect and co-ordinate data inputs and outputs for the experiment, as well as much provenance information about the performance of experiments on WBS. Availability: The myGrid software is available via http://www.mygrid.org.uk
- Published
- 2004
112. Isolation and characterisation of GTF2IRD2, a novel fusion gene and member of the TFII-I family of transcription factors, deleted in Williams-Beuren syndrome
- Author
-
May Tassabehji, Hannah Tipney, Andy Brass, Timothy A. Hinsley, Dian Donnai, and Kay Metcalfe
- Subjects
Transposable element ,Williams Syndrome ,congenital, hereditary, and neonatal diseases and abnormalities ,Transcription, Genetic ,Molecular Sequence Data ,Muscle Proteins ,Biology ,Fusion gene ,Mice ,Transcription Factors, TFII ,Transcription Factors, TFIII ,Gene Duplication ,Gene duplication ,Genetics ,Animals ,Humans ,Deletion mapping ,Amino Acid Sequence ,Gene ,Genetics (clinical) ,Transposase ,Base Sequence ,Helix-Loop-Helix Motifs ,Chromosome ,Chromosome Mapping ,Nuclear Proteins ,Artificial Gene Fusion ,DNA-Binding Proteins ,Trans-Activators ,Sequence Alignment ,Chromosomes, Human, Pair 7 ,Gene Deletion - Abstract
Williams-Beuren syndrome (WBS) is a developmental disorder with characteristic physical, cognitive and behavioural traits caused by a microdeletion of approximately 1.5 Mb on chromosome 7q11.23. In total, 24 genes have been described within the deleted region to date. We have isolated and characterised a novel human gene, GTF2IRD2, mapping to the WBS critical region thought to harbour genes important for the cognitive aspects of the disorder. GTF2IRD2 is the third member of the novel TFII-I family of genes clustered on 7q11.23. The GTF2IRD2 protein contains two putative helix-loop-helix regions (I-repeats) and an unusual C-terminal CHARLIE8 transposon-like domain, thought to have arisen as a consequence of the random insertion of a transposable element generating a functional fusion gene. The retention of a number of conserved transposase-associated motifs within the protein suggests that the CHARLIE8-like region may still have some degree of transposase functionality that could influence the stability of the region in a mechanism similar to that proposed for Charcot-Marie-Tooth neuropathy type 1A. GTF2IRD2 is highly conserved in mammals and the mouse ortholgue (Gtf2ird2) has also been isolated and maps to the syntenic WBS region on mouse chromosome 5G. Deletion mapping studies using somatic cell hybrids show that some WBS patients are hemizygous for this gene, suggesting that it could play a role in the pathogenesis of the disorder.
- Published
- 2004
113. Pedro: a configurable data entry tool for XML
- Author
-
Chris F. Taylor, Kevin Garwood, Stephen G. Oliver, Norman W. Paton, Andy Brass, and Kai J. Runte
- Subjects
Statistics and Probability ,Document Structure Description ,computer.internet_protocol ,Computer science ,Efficient XML Interchange ,Data field ,Information Storage and Retrieval ,Documentation ,computer.software_genre ,Biochemistry ,Data modeling ,User-Computer Interface ,XML Schema Editor ,Schema (psychology) ,Streaming XML ,Databases, Genetic ,Computer Graphics ,XML schema ,Molecular Biology ,computer.programming_language ,Natural Language Processing ,Database ,XML validation ,computer.file_format ,Computer Science Applications ,XML framework ,Computational Mathematics ,XML Schema (W3C) ,XML database ,Computational Theory and Mathematics ,Data exchange ,Database Management Systems ,Programming Languages ,computer ,XML ,Software - Abstract
Summary: Pedro is a Java™ application that dynamically generates data entry forms for data models expressed in XML Schema, producing XML data files that validate against this schema. The software uses an intuitive tree-based navigation system, can supply context-sensitive help to users and features a sophisticated interface for populating data fields with terms from controlled vocabularies. The software also has the ability to import records from tab delimited text files and features various validation routines. Availability: The application, source code, example models from several domains and tutorials can be downloaded from http://pedro.man.ac.uk/
- Published
- 2004
114. A systematic approach to modeling, capturing, and disseminating proteomics experimental data
- Author
-
Alistair J. P. Brown, Kevin Garwood, Ruedi Aebersold, Laura Selway, Norman W. Paton, Eric W. Deutsch, Stephen G. Oliver, Janet Walker, John R. Yates, David Stead, Michael J. Deery, Chris F. Taylor, Phil Cash, Kathryn S. Lilley, Andy Brass, Isabel Riba-Garcia, Peter Roepstorff, P. Kirby, Douglas B. Kell, Julie Howard, Shabaz Mohammed, Tom Dunkley, Simon J. Hubbard, Simon J. Gaskell, and Zhikang Yin
- Subjects
Models, Molecular ,Proteomics ,SQL ,Knowledge representation and reasoning ,Protein Conformation ,computer.internet_protocol ,Computer science ,GeneralLiterature_INTRODUCTORYANDSURVEY ,Biomedical Engineering ,Information Storage and Retrieval ,Bioengineering ,Documentation ,Bioinformatics ,Applied Microbiology and Biotechnology ,User-Computer Interface ,Unified Modeling Language ,Sequence Analysis, Protein ,Software Design ,Databases, Protein ,Dissemination ,computer.programming_language ,Information Dissemination ,Minimum information about a microarray experiment ,Proteins ,Hypermedia ,Data science ,ComputingMethodologies_PATTERNRECOGNITION ,Database Management Systems ,Molecular Medicine ,Object model ,Software design ,computer ,Software ,XML ,Biotechnology - Abstract
Both the generation and the analysis of proteome data are becoming increasingly widespread, and the field of proteomics is moving incrementally toward high-throughput approaches. Techniques are also increasing in complexity as the relevant technologies evolve. A standard representation of both the methods used and the data generated in proteomics experiments, analogous to that of the MIAME (minimum information about a microarray experiment) guidelines for transcriptomics, and the associated MAGE (microarray gene expression) object model and XML (extensible markup language) implementation, has yet to emerge. This hinders the handling, exchange, and dissemination of proteomics data. Here, we present a UML (unified modeling language) approach to proteomics experimental data, describe XML and SQL (structured query language) implementations of that model, and discuss capture, storage, and dissemination strategies. These make explicit what data might be most usefully captured about proteomics experiments and provide complementary routes toward the implementation of a proteome repository.
- Published
- 2003
115. Semantic similarity measures as tools for exploring the gene ontology
- Author
-
Carole Goble, Phillip Lord, Robert Stevens, and Andy Brass
- Subjects
Proteomics ,Information retrieval ,business.industry ,Computer science ,Ontology-based data integration ,Suggested Upper Merged Ontology ,Computational Biology ,Genomics ,Ontology (information science) ,computer.software_genre ,Classification ,Open Biomedical Ontologies ,Semantic similarity ,Semantic computing ,Upper ontology ,Humans ,Artificial intelligence ,business ,Databases, Protein ,computer ,Ontology alignment ,Sequence Alignment ,Natural language processing - Abstract
Many bioinformatics resources hold data in the form of sequences. Often this sequence data is associated with a large amount of annotation. In many cases this data has been hard to model, and has been represented as scientific natural language, which is not readily computationally amenable. The development of the Gene Ontology provides us with a more accessible representation of some of this data. However it is not clear how this data can best be searched, or queried. Recently we have adapted information content based measures for use with the Gene Ontology (GO). In this paper we present detailed investigation of the properties of these measures, and examine various properties of GO, which may have implications for its future design.
- Published
- 2003
116. TAMBIS Online: a bioinformatics source integration tool
- Author
-
Andy Brass, Sean Bechhofer, Patricia G. Baker, Norman W. Paton, Robert Stevens, Gary Ng, and Carole Goble
- Subjects
Public domain software ,ComputingMethodologies_PATTERNRECOGNITION ,Computer science ,Information analysis ,Heterogeneous information ,Bioinformatics ,Data structure ,Data science - Abstract
Conducting bioinformatic analyses involves biologists in expressing requests over a range of heterogeneous information sources. The TAMBIS (Transparent Access to Multiple Bioinformatics Information Sources) project seeks to make the diversity in data structures, call interfaces and locations of bioinformatics sources transparent to users. TAMBIS is available at .
- Published
- 2003
117. Complex Query Formulation Over Diverse Information Sources in TAMBIS
- Author
-
Patricia G. Baker, Robert Stevens, Norman W. Paton, Carole Goble, Gary Ng, Andy Brass, and Sean Bechhofer
- Subjects
Query expansion ,Information retrieval ,Web search query ,Process (engineering) ,Computer science ,business.industry ,Ontology ,The Internet ,Layer (object-oriented design) ,Ontology (information science) ,Query language ,business - Abstract
Biologists increasingly need to ask complex questions over the large number of data and analysis tools that are available on the Internet. To do this, the individual resources need to be made to work together. The knowledge needed to accomplish this, for example about the locations of the sources and their capabilities, places barriers between biologists and the questions they would like to ask. The TAMBIS project (Transparent Access to Multiple Bioinformatics Information Sources) has sought to remove some of these barriers, thereby making the process of asking questions against multiple sources more straightforward. Central to the TAMBIS system is an ontology of bioinformatics and biological terms. Users express retrieval requests in terms of the concepts and relationships described in the ontology, rather than by making direct reference to individual sources. This allows TAMBIS to be used to formulate rich, declarative queries over multiple sources. The ontology is constructed in a manner that ensures only biologically meaningful queries can be posed. User’s queries are constructed using an interactive ontology browsing and query construction tool, and are rewritten by a query planner for evaluation using a wrapper layer. This paper provides an overview of the TAMBIS approach to source integration, focusing on the way the ontology is used to support query formulation and refinement.
- Published
- 2003
118. Contributors
- Author
-
Patricia Baker, Simon Beaulah, Sean Bechhofer, Andy Brass, John Campbell, I-Min A. Chen, Jing Chen, Su Yun Chung, Terence Critchlow, Susan B. Davidson, Barbara A. Eckman, Thure Etzold, Carole Goble, Peter M.D. Gray, Amarnath Gupta, Laura M. Haas, Howard Harris, Scott Harker, Graham J.L. Kemp, Prasad Kodali, Anthony Kosky, Zoé Lacroix, Eileen T. Lin, Bertram Ludäscher, Maryann E. Martone, Victor M. Markowitz, Gary Ng, Krishna Palaniappan, Norman W. Paton, Julia E. Rice, Peter M. Schwarz, Robert Stevens, Val Tannen, Thodoros Topaloglou, Limsoon Wong, and John C. Wooley
- Published
- 2003
119. Making sense of microarray data distributions
- Author
-
David C. Hoyle, Ray Jupp, Andy Brass, and Magnus Rattray
- Subjects
Statistics and Probability ,Expected value ,Biology ,Biochemistry ,Power law ,Sensitivity and Specificity ,Pattern Recognition, Automated ,Benford's law ,Statistics ,Databases, Genetic ,Animals ,Humans ,Statistical physics ,RNA, Messenger ,Molecular Biology ,Genome size ,Oligonucleotide Array Sequence Analysis ,Analysis of Variance ,Chi-Square Distribution ,Genome ,Models, Statistical ,Zipf's law ,Models, Genetic ,Microarray analysis techniques ,Spot intensity ,Mixed cell ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Algorithms - Abstract
Motivation: Typical analysis of microarray data has focusedon spot by spot comparisons within a single organism. Less analysis has been done on the comparison of the entire distribution of spot intensities between experiments and between organisms. Results: Here we show that mRNA transcription data from a wide range of organisms and measured with a range of experimental platforms show close agreement with Benford’s law (Benford, Proc. Am. Phil. Soc. , 78, 551–572, 1938) and Zipf’s law (Zipf, The Psycho-biology of Language: an Introduction to Dynamic Philology , 1936 and Human Behaviour and the Principle of Least Effort , 1949). The distribution of the bulk of microarray spot intensities is well approximated by a log-normal with the tail of the distribution being closer to power law. The variance, σ2, of log spot intensity shows a positive correlation with genome size (in terms of number of genes) and is therefore relatively fixed within some range for a given organism. The measured value of σ2 can be significantly smaller than the expected value if the mRNA is extracted from a sample of mixed cell types. Our research demonstrates that useful biological findings may result from analyzing microarray data at the level of entire intensity distributions. Contact: david.c.hoyle@man.ac.uk * To whom correspondence should be addressed.
- Published
- 2002
120. LB-006 Quality Of Methods Reporting In Colitis Experiments And The Subsequent Impact On The Development Of A Gut Knowledge Base
- Author
-
Sheena M. Cruickshank, James M. Wilson, Oscar Florez-Vargas, Robert Stevens, Andy Brass, and Michael Bramhall
- Subjects
Annotation ,Knowledge base ,business.industry ,Gene ontology ,media_common.quotation_subject ,Gastroenterology ,Medicine ,Domain knowledge ,Quality (business) ,business ,Data science ,Checklist ,media_common - Abstract
Introduction Current gastroenterological research generates vast quantities of clinical or animal model-derived data that is necessary to the study of IBD, yet much of it lies unused in repositories after publication. There is a growing need to develop more appropriate, structured storage systems that can be used by researchers to query this wealth of information with emerging questions. A domain-specific knowledge base combining clinical and laboratory datasets, annotated and connected via ontological terms, would be a valuable tool for gastroenterologists. However, development of a gut domain knowledge base may be severely impeded by incomplete methods reporting in the literature. This is further compounded by two factors. Firstly, non-domain experts, who may not be well suited to identifying missing methods, usually undertake construction of a knowledge base. Secondly, variations in experimental protocols may make it difficult for investigators to compare results from similar, but not identical, experiments. Methods In order to address these issues we have systematically collated published papers that employed the widely used DSS colitis model. The papers were assessed against a checklist of essential parameters that should be reported for the experiment to be accurately described to allow for correct annotation and entry into a knowledge base. Results We provide a comprehensive review of the quality of methods reporting in experiments using the DSS colitis model. We also report on the heterogeneity of the DSS colitis model currently in use. Conclusion We provide a number of recommendations in order for researchers to standardise their methods and ensure that all relevant factors are reported during the publication of their research. Building from this, we will use what we have learnt to better inform the construction of a gut domain knowledge base. Disclosure of Interest M. Bramhall Grant/research support from: Epistem Ltd., O. Florez-Vargas: None Declared, R. Stevens: None Declared, J. Wilson: None Declared, S. Cruickshank: None Declared, A. Brass: None Declared.
- Published
- 2014
121. GIMS-a data warehouse for storage and analysis of genome sequence and functional data
- Author
-
Karen Eilbeck, Stephen G. Oliver, Norman W. Paton, P. Kirby, Andy Brass, Crispin J. Miller, Shengli Wu, Andrew Hayes, Mike Cornell, and Carole Goble
- Subjects
Whole genome sequencing ,Management information systems ,Genomic data ,education ,Upstream (networking) ,Data mining ,Biology ,Object (computer science) ,computer.software_genre ,Gene ,Genome ,computer ,Data warehouse - Abstract
Effective analysis of genome sequences and associated functional data requires access to many different kinds of biological information. For example, when analysing gene expression data, it may be useful to have access to the sequences upstream of the genes, or to the cellular location of their protein products. Such information is currently stored in different formats at different sites in a way that does not readily allow integrated analyses. The Genome Information Management System (GIMS) is an object database that integrates genome sequence data with functional data on the transcriptome and on protein-protein interactions in a single data warehouse. We have used GIMS to store the Saccharomyces cerevisiae (yeast) genome and to demonstrate how the integrated storage of diverse kinds of genomic data can be beneficial for analysing data using context-rich queries and analyses. GIMS allows data to be stored in a way that reflects the underlying mechanisms in the organism, and permits complex questions to be asked of the data. This paper provides an overview of the GIMS system and describes some analyses that illustrate its use for analysing functional data sets for S. cerevisiae.
- Published
- 2001
122. The fibrillar collagens, collagen VIII, collagen X and the C1q complement proteins share a similar domain in their C‐terminal non‐collagenous regions
- Author
-
Raymond P. Boot-Handford, Andy Brass, Karl E. Kadler, M E Grant, and J. T. Thomas
- Subjects
chemistry.chemical_classification ,Biophysics ,Sequence alignment ,Cell Biology ,Biology ,medicine.disease ,Biochemistry ,Protein tertiary structure ,Schmid metaphyseal chondrodysplasia ,Conserved sequence ,Amino acid ,chemistry ,Structural Biology ,Genetics ,medicine ,Protein folding ,Glycoprotein ,Molecular Biology ,Complement C1q - Abstract
A sequence comparison of the C-termini of collagens X, VIII, the collagen-like complement factor C1q, and the fibrillar collagens showed a conserved cluster of aromatic residues. This conserved cluster was in a domain of approximately 130 amino acids that exhibited marked similarities in hydrophilicity profiles between the different collagens, despite a low level of sequence similarity. These data suggest that the ‘collagen X-like family’ and the fibrillar collagens contain a domain within their C-termini that adopts a common tertiary structure, and that a conserved cluster of aromatic residues in this domain may be involved in C-terminal trimerization.
- Published
- 1992
123. TAMBIS: transparent access to multiple bioinformatics information sources
- Author
-
Robert Stevens, Patricia Baker, Sean Bechhofer, Gary Ng, Alex Jacoby, Norman W. Paton, Carole A. Goble, and Andy Brass
- Subjects
Statistics and Probability ,Computer science ,business.industry ,Biological database ,Computational Biology ,Information Storage and Retrieval ,Construct (python library) ,computer.file_format ,Transparency (human–computer interaction) ,Bioinformatics ,Biochemistry ,Computer Science Applications ,Terminology ,Computational Mathematics ,Text mining ,Computational Theory and Mathematics ,Description logic ,Knowledge base ,Executable ,User interface ,business ,Molecular Biology ,computer ,Software - Abstract
The TAMBIS project aims to provide transparent access to disparate biological databases and analysis tools, enabling users to utilize a wide range of resources with the minimum of effort. A prototype system has been developed that includes a knowledge base of biological terminology (the biological Concept Model), a model of the underlying data sources (the Source Model) and a 'knowledge-driven' user interface. Biological concepts are captured in the knowledge base using a description logic called GRAIL. The Concept Model provides the user with the concepts necessary to construct a wide range of multiple-source queries, and the user interface provides a flexible means of constructing and manipulating those queries. The Source Model provides a description of the underlying sources and mappings between terms used in the sources and terms in the biological Concept Model. The Concept Model and Source Model provide a level of indirection that shields the user from source details, providing a high level of source transparency. Source independent, declarative queries formed from terms in the Concept Model are transformed into a set of source dependent, executable procedures. Query formulation, translation and execution is demonstrated using a working example.
- Published
- 2000
124. A RAPID algorithm for sequence database comparisons: application to the identification of vector contamination in the EMBL databases
- Author
-
Crispin J. Miller, Andy Brass, and John R. Gurd
- Subjects
Statistics and Probability ,Databases, Factual ,Computer science ,Nearest neighbor search ,Genetic Vectors ,Molecular Sequence Data ,Word error rate ,Sequence alignment ,Word search ,computer.software_genre ,Biochemistry ,chemistry.chemical_compound ,Similarity (network science) ,Molecular Biology ,Expressed Sequence Tags ,Sequence ,Expressed sequence tag ,Sequence database ,Base Sequence ,Nucleic acid sequence ,DNA ,Bacteriophage lambda ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,chemistry ,ROC Curve ,DNA Sequence Databases ,Evaluation Studies as Topic ,Test set ,DNA, Viral ,Data mining ,Algorithm ,computer ,Sequence Alignment ,Algorithms - Abstract
MOTIVATION: Word-matching algorithms such as BLAST are routinely used for sequence comparison. These algorithms typically use areas of matching words to seed alignments which are then used to assess the degree of sequence similarity. In this paper, we show that by formally separating the word-matching and sequence-alignment process, and using information about word frequencies to generate alignments and similarity scores, we can create a new sequence-comparison algorithm which is both fast and sensitive. The formal split between word searching and alignment allows users to select an appropriate alignment method without affecting the underlying similarity search. The algorithm has been used to develop software for identifying entries in DNA sequence databases which are contaminated with vector sequence. RESULTS: We present three algorithms, RAPID, PHAT and SPLAT, which together allow vector contaminations to be found and assessed extremely rapidly. RAPID is a word search algorithm which uses probabilities to modify the significance attached to different words; PHAT and SPLAT are alignment algorithms. An initial implementation has been shown to be approximately an order of magnitude faster than BLAST. The formal split between word searching and alignment not only offers considerable gains in performance, but also allows alignment generation to be viewed as a user interface problem, allowing the most useful output method to be selected without affecting the underlying similarity search. Receiver Operator Characteristic (ROC) analysis of an artificial test set allows the optimal score threshold for identifying vector contamination to be determined. ROC curves were also used to determine the optimum word size (nine) for finding vector contamination. An analysis of the entire expressed sequence tag (EST) subset of EMBL found a contamination rate of 0.27%. A more detailed analysis of the 50 000 ESTs in est10.dat (an EST subset of EMBL) finds an error rate of 0.86%, principally due to two large-scale projects. AVAILABILITY: A Web page for the software exists at http://bioinf.man.ac.uk/rapid, or it can be downloaded from ftp://ftp.bioinf.man.ac.uk/RAPID CONTACT: crispin@cs.man.ac.uk
- Published
- 1999
125. Metaphyseal chondrodysplasia type Schmid mutations are predicted to occur in two distinct three-dimensional clusters within type X collagen NC1 domains that retain the ability to trimerize
- Author
-
Karl E. Kadler, Gillian A. Wallis, Raymond P. Boot-Handford, Carl A. Gregory, Debora S. Marks, and Andy Brass
- Subjects
Models, Molecular ,Molecular model ,Stereochemistry ,Mutant ,Molecular Sequence Data ,Trimer ,Osteochondrodysplasias ,Biochemistry ,chemistry.chemical_compound ,Biopolymers ,medicine ,Animals ,Humans ,Amino Acid Sequence ,Molecular Biology ,Peptide sequence ,DNA Primers ,Base Sequence ,Sequence Homology, Amino Acid ,Wild type ,Cell Biology ,medicine.disease ,Schmid metaphyseal chondrodysplasia ,Recombinant Proteins ,Monomer ,chemistry ,Domain (ring theory) ,Mutation ,Collagen - Abstract
Metaphyseal chondrodysplasia type Schmid (MCDS) is caused by mutations in COL10A1 that are clustered in the carboxyl-terminal non-collagenous (NC1) encoding domain. This domain is responsible for initiating trimerization of type X collagen during biosynthesis. We have built a molecular model of the NC1 domain trimer based on the crystal structure coordinates of the highly homologous trimeric domain of ACRP30 (adipocyte complement-related protein of 30 kDa or AdipoQ). Mapping of the MCDS mutations onto the structure reveals two specific clusters of residues as follows: one on the surface of the monomer which forms a tunnel through the center of the assembled trimer and the other on a patch exposed to solvent on the exterior surface of each monomeric unit within the assembled trimer. Biochemical studies on recombinant trimeric NC1 domain show that the trimer has an unusually high stability not exhibited by the closely related ACRP30. The high thermal stability of the trimeric NC1 domain, in comparison with ACRP30, appears to be the result of a number of factors including the 17% greater total buried solvent-accessible surface and the increased numbers of hydrophobic contacts formed upon trimerization. The 27 amino acid sequence present at the amino terminus of the NC1 domain, which has no counterpart in ACRP30, also contributes to the stability of the trimer. We have also shown that NC1 domains containing the MCDS mutations Y598D and S600P retain the ability to homotrimerize and heterotrimerize with wild type NC1 domain, although the trimeric complexes formed are less stable than those of the wild type molecule. These studies suggest strongly that the predominant mechanism causing MCDS involves a dominant interference of mutant chains on wild type chain assembly.
- Published
- 1999
126. Database Challenges for Genome Information in the Post Sequencing Phase Moussouni
- Author
-
Steve Oliver, Norman W. Paton, Carole Goble, Andy Brass, Fouzia Moussouni, and Andrew Hayes
- Subjects
Cancer genome sequencing ,Data set ,Management information systems ,Computer science ,Genomic data ,Bioinformatics ,Phase (combat) ,Data science ,Genome ,DNA sequencing ,Personal genomics ,Variety (cybernetics) - Abstract
Genome sequencing projects are making available to scientists complete records of the genetic make-up of organisms. The resulting data sets, along with the results of experiments that seek systematically to find new information on the functions of genes, will present numerous opportunities and challenges to biologists. However, the complexity and variety of both the data and the analyses required over such data sets also pose significant challenges to computer scientists charged with providing effective information management systems for use with genome data. This paper presents models for the sorts of information that are being produced on genomes and genome-wide experiments, and outlines a project developing an information management system aimed at supporting analyses over genomic data. This information management system replicates data from other sources, with a view to providing an integrated environment for performing complex analyses.
- Published
- 1999
127. Dynamic exchange between stabilized conformations predicted for hyaluronan tetrasaccharides: comparison of molecular dynamics simulations with available NMR data
- Author
-
Andy Brass, Andrew Almond, and John K. Sheehan
- Subjects
chemistry.chemical_classification ,Aqueous solution ,Magnetic Resonance Spectroscopy ,Hydrogen bond ,Molecular Sequence Data ,Hydrogen Bonding ,Polymer ,Biochemistry ,Molecular dynamics ,chemistry.chemical_compound ,chemistry ,Carbohydrate Sequence ,Computational chemistry ,Intramolecular force ,Carbohydrate Conformation ,Monosaccharide ,Hydroxymethyl ,Hyaluronic Acid ,Vicinal - Abstract
Studies of the hyaluronan (HA) tetrasaccharides are important for understanding hydrogen-bonding in the HA polymer, as they are probably the smallest oligomers in which characteristics of the constituent monosaccharides and the polymer are simultaneously exhibited. Here we present extensive molecular dynamics simulations of the two tetrasaccharides of HA in dilute aqueous solution. These simulations have confirmed the existence of intramolecular hydrogen-bonds between the neighboring sugar residues of HA in solution, as proposed by Scott (1989). However, our simulations predict that these intramolecular hydrogen-bonds are not static as previously proposed, but are in constant dynamic exchange on the sub-nanosecond time-scale. This process results in discrete internal motion of the HA tetrasaccharides where they rapidly move between low energy conformations. Specific interactions between water and intramolecular hydrogen-bonds involving the hydroxymethyl group were found to result in differing conformations and dynamics for the two alternative tetrasaccharides of HA. This new observation suggests that this residue may play a key role in the entropy and stability of HA in solution, allowing it to stay soluble up to high concentration. The vicinal coupling constants3 J NHCH of the acetamido groups have been calculated from our aqueous simulations of HA. We found that high values of 3J NHCH approximately 8 Hz, as experimentally measured for HA, are consistent with mixtures of both trans and cis conformations, and thus3 J NHCH cannot be used to imply a purely trans conformation of the acetamido. The rapid exchange of intramolecular hydrogen-bonds indicates that although the structure is at any moment stabilized by these hydrogen-bonds, no one hydrogen-bond exists for an extended period of time. This could explain why NMR often fails to provide evidence for intramolecular hydrogen-bonds in HA and other aqueous carbohydrate structures.
- Published
- 1998
128. Searching DNA databases for similarities to DNA sequences: when is a match significant?
- Author
-
Isobel Anderson and Andy Brass
- Subjects
Statistics and Probability ,Databases, Factual ,Sequence analysis ,Information Storage and Retrieval ,Sequence alignment ,Biology ,computer.software_genre ,Biochemistry ,Sensitivity and Specificity ,DNA sequencing ,Search algorithm ,Predictive Value of Tests ,DNA database ,Molecular Biology ,Database ,Sequence database ,Base Sequence ,Nucleic acid sequence ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,ROC Curve ,Test set ,computer ,Algorithms ,Software - Abstract
MOTIVATION: Searching DNA sequences against a DNA database is an essential element of sequence analysis. However, few systematic studies have been carried out to determine when a match between two DNA sequences has biological significance and this is limiting the use that can be made of DNA searching algorithms. RESULTS: A test set of DNA sequences has been constructed consisting of artificially evolved and real sequences. This set has been used to test various database searching algorithms (BLAST, BLAST2, FASTA and Smith-Waterman) on a subset of the EMBL database. The results of this analysis have been used to determine the sensitivity and coverage of all of the algorithms. Guidelines have been produced which can be used to assess the significance of DNA database search results. The Smith-Waterman algorithm was shown to have the best coverage, but the worst sensitivity, whereas the default BLASTN algorithm (word length set to 11) was shown to have good sensitivity, but poor coverage. A sensible compromise between speed, sensitivity and coverage can be obtained using either the FASTA or BLAST (word length set to 6) algorithms. However, analysis of the results also showed that no algorithm works well when the length of the probe sequence is 35% sequence identity between the corresponding proteins. Searching a DNA sequence against a DNA sequence database can, therefore, be a useful tool in sequence analysis. AVAILABILITY: The test sets used are available via anonymous ftp from mbisg2.sbc.man.ac.uk in the directory /pub/cabios/testdata/ CONTACT: I.Anderson@stud.man.ac.uk; abrass@man.ac.uk
- Published
- 1998
129. Molecular dynamics simulations of the two disaccharides of hyaluronan in aqueous solution
- Author
-
Andrew Almond, John K. Sheehan, and Andy Brass
- Subjects
Models, Molecular ,Conformational change ,Glucuronates ,Disaccharides ,Biochemistry ,Acetylglucosamine ,Molecular dynamics ,Glucuronic Acid ,Computational chemistry ,Carbohydrate Conformation ,Molecule ,Computer Simulation ,Hyaluronic Acid ,Protein secondary structure ,chemistry.chemical_classification ,Aqueous solution ,Molecular Structure ,Chemistry ,Hydrogen bond ,Water ,Glycosidic bond ,Hydrogen Bonding ,Solutions ,Carbohydrate conformation ,Algorithms ,Software - Abstract
Hyaluronan is an unusually stiff polymer when in aqueous solution, which has important consequences for its biological function. Molecular dynamics simulations of hyaluronan disaccharides have been performed, with explicit inclusion of water, to determine the molecular basis of this stiffness, and to investigate the dynamics of the glycosidic linkages. Our simulations reveal that stable sets of hydrogen bonds frequently connect the neighboring residues of hyaluronan. Water caging around the glycosidic linkage was observed to increase the connectivity between sugars, and further constrain them. This, we propose, explains the unusual stiffness of polymeric hyaluronan. It would allow the polysaccharide to maintain local secondary structure, and occupy large solution domains consistent with the visco-elastic nature of hyaluronan. Simulations in water showed no significant changes on inclusion of the exoanomeric effect. This, we deduced, was due to hyaluronan disaccharides ordering first shell water molecules. In some cases these waters were observed to transiently induce conformational change, by breaking intramolecular hydrogen bonds.
- Published
- 1997
130. Clustering techniques in biological sequence analysis
- Author
-
Carole Goble, Andy Brass, John A. Keane, and Anna M. Manning
- Subjects
Structure (mathematical logic) ,Sequence analysis ,Computer science ,Genetic algorithm ,Domain knowledge ,Data mining ,Computational biology ,Cluster analysis ,computer.software_genre ,Focus (optics) ,computer - Abstract
In biological sequence analysis many DNA and RNA sequences discovered in laboratory experiments are not properly identified. Here the focus is on using clustering algorithms to provide a structure to the data. The approach is inter-disciplinary using domain knowledge to identify such sequences. The enormous volume and high dimensionality of unidentified biological sequence data presents a challenge. Nonetheless useful and interesting results have been obtained, both directly and indirectly, by applying clustering to the data.
- Published
- 1997
131. PWE-098 Mechanisms the Underlying Development of Chronic Inflammation in Inflammatory Bowel Disease: Defining the Role of the Rage Pathway Using Computational and Biological Analysis Strategies
- Author
-
James M. Wilson, R Haggart, Namshik Han, Sheena M. Cruickshank, Andy Brass, and Michael Bramhall
- Subjects
business.industry ,Microarray analysis techniques ,Gastroenterology ,Inflammation ,Disease ,Dendritic cell ,medicine.disease ,Inflammatory bowel disease ,RAGE (receptor) ,Immune system ,Immunology ,Medicine ,medicine.symptom ,Colitis ,business - Abstract
Introduction Inflammatory bowel disease (IBD) is a chronic inflammatory disease with an estimated annual cost to the NHS of £720 million. Patients typically present with established disease and this makes it difficult to determine the underlying aetiology: knowledge that would aid early diagnosis and treatment. Methods To better define factors underlying the development of IBD that might be used as diagnostic aids for treatment/prevention of IBD we have analysed the early immune response in mice that will develop colitis using a validated infection model of colitis. Microarray analyses of colon tissue were conducted using the Puma and Tigre packages for Bioconductor to determine gene expression and investigate the transcription factor pathways involved. Results Microarray analysis identified an early and rapid increase in expression of the receptor for advanced glycation end-products (RAGE) in colitic prone (susceptible) mice. In contrast, mice that clear the infection (resistant mice) had no increase in RAGE. In addition, the transcription factor analysis revealed a downregulation of colitic protective factors in the RAGE signalling pathway. Immunohistochemistry data showed high RAGE expression in the gut epithelium prior to the onset of colitis. Previous work from our group has shown that epithelial cells promote dendritic cell recruitment associated with resistance and clearance of parasite infection. In the colitis model, infection also induced the rapid recruitment of macrophages and dendritic cells (DC) into the gut and lymph nodes in resistant mice but not susceptible mice. By day 31 post-infection, resistant mice had resolved the infection and inflammation whereas susceptible mice had significantly higher immune cell accumulation and colitis. Current work is addressing the expression of RAGE blocking ligands and the regulation of RAGE in the guts of resistant and susceptible mice. Conclusion RAGE has been associated with chronic IBD in patients however our data implicates RAGE in the development and propagation of IBD. We propose that the RAGE pathway is an early indicator of IBD and may be useful therapeutically and in determining efficacy of IBD therapy. Disclosure of Interest M. Bramhall Grant/Research Support from: Epistem Ltd., N. Han: None Declared, R. Haggart: None Declared, J. Wilson: None Declared, A. Brass: None Declared, S. Cruickshank: None Declared.
- Published
- 2013
132. Novel SNP Discovery in African Buffalo, Syncerus caffer, Using High-Throughput Sequencing
- Author
-
Steven J. Kemp, Nikki le Roex, Paul D. van Helden, Daniel G. Bradley, Eileen G. Hoal, Andy Brass, Harry Noyes, and Suzanne Kay
- Subjects
0106 biological sciences ,Buffaloes ,Animal Types ,Gene Identification and Analysis ,lcsh:Medicine ,Sequence assembly ,Genomics ,Single-nucleotide polymorphism ,Wildlife ,Biology ,Polymorphism, Single Nucleotide ,010603 evolutionary biology ,01 natural sciences ,Molecular Genetics ,03 medical and health sciences ,Genetics ,Animals ,Genome Sequencing ,Selection, Genetic ,lcsh:Science ,Genotyping ,Cape buffalo ,Conservation Science ,030304 developmental biology ,0303 health sciences ,Multidisciplinary ,Ecology ,lcsh:R ,Computational Biology ,Chromosome Mapping ,High-Throughput Nucleotide Sequencing ,Sequence Analysis, DNA ,Genome project ,biology.organism_classification ,3. Good health ,SNP genotyping ,Mutation ,Genetic Polymorphism ,Veterinary Science ,lcsh:Q ,Gene pool ,Sequence Analysis ,Animal Genetics ,Population Genetics ,Research Article - Abstract
The African buffalo, Syncerus caffer, is one of the most abundant and ecologically important species of megafauna in the savannah ecosystem. It is an important prey species, as well as a host for a vast array of nematodes, pathogens and infectious diseases, such as bovine tuberculosis and corridor disease. Large-scale SNP discovery in this species would greatly facilitate further research into the area of host genetics and disease susceptibility, as well as provide a wealth of sequence information for other conservation and genomics studies. We sequenced pools of Cape buffalo DNA from a total of 9 animals, on an ABI SOLiD4 sequencer. The resulting short reads were mapped to the UMD3.1 Bos taurus genome assembly using both BWA and Bowtie software packages. A mean depth of 2.7× coverage over the mapped regions was obtained. Btau4 gene annotation was added to all SNPs identified within gene regions. Bowtie and BWA identified a maximum of 2,222,665 and 276,847 SNPs within the buffalo respectively, depending on analysis method. A panel of 173 SNPs was validated by fluorescent genotyping in 87 individuals. 27 SNPs failed to amplify, and of the remaining 146 SNPs, 43-54% of the Bowtie SNPs and 57-58% of the BWA SNPs were confirmed as polymorphic. dN/dS ratios found no evidence of positive selection, and although there were genes that appeared to be under negative selection, these were more likely to be slowly evolving house-keeping genes.
- Published
- 2012
133. A secondary structure model of the integrin alpha subunit N-terminal domain based on analysis of multiple alignments
- Author
-
Martin J. Humphries, Andy Brass, and Danny S. Tuckwell
- Subjects
Models, Molecular ,Integrins ,Macromolecular Substances ,Integrin ,Molecular Sequence Data ,Computational biology ,Biology ,Protein Structure, Secondary ,Turn (biochemistry) ,Mice ,Cricetinae ,Consensus Sequence ,Consensus sequence ,Animals ,Humans ,Amino Acid Sequence ,Repeated sequence ,Protein secondary structure ,Peptide sequence ,G alpha subunit ,Genetics ,Multiple sequence alignment ,Sequence Homology, Amino Acid ,Genetic Variation ,General Medicine ,Rats ,biology.protein ,Drosophila ,Software - Abstract
The integrins are alpha/beta heterodimeric proteins which mediate cell-matrix and cell-cell interactions. Current data indicate that the N-terminal moiety of the alpha subunit is involved in ligand binding. This region of the receptor is made up of a seven-fold repeated sequence of unknown structure which contains EF-hand-like putative divalent cation-binding sites. Recent studies have shown that multiple sequence alignments can be analysed to yield secondary structure predictions. Therefore, to obtain a model structure for the integrin alpha subunit N-terminal domain repeat, a large alignment of the seven repeats from sixteen integrin sequences was generated. Two methods of analysis were used: First, Chou and Fasman and Garnier, Osguthorpe and Robson predictions were carried out for individual sequences and the consensus predictions derived. Consensus hydrophobicity and chain flexibility data were also used to provide additional data. Second, sites of conservation and variation were analysed by a computer program STAMA (STructure After Multiple Alignment) to yield a secondary structure prediction. The two analyses gave essentially the same predicted structure: undefined region, loop, alpha-helix, beta-strand, divalent cation-binding loop, beta-strand, putative turn, loop, beta-strand. This is the first model structure to be presented for an integrin domain. Its implications for integrin function are discussed.
- Published
- 1994
134. Self-assembly of rodlike particles in two dimensions: A simple model for collagen fibrillogenesis
- Author
-
John Parkinson, Karl E. Kadler, and Andy Brass
- Subjects
Chemistry ,Simple (abstract algebra) ,Nanotechnology ,Fibrillogenesis ,Self-assembly - Published
- 1994
135. Dynamical and critical behavior of a simple discrete model of the cellular immune system
- Author
-
Richard K. Grencis, Kathryn J. Else, A. J. Bancroft, M. E. Clamp, and Andy Brass
- Subjects
Immune system ,Simple (abstract algebra) ,Computer science ,Biological system - Published
- 1994
136. A cellular automata model for helper T cell subset polarization in chronic and acute infection
- Author
-
Richard K. Grencis, Andy Brass, and Kathryn J. Else
- Subjects
Statistics and Probability ,Cell type ,Lymphoid Tissue ,Biology ,Infections ,Models, Biological ,General Biochemistry, Genetics and Molecular Biology ,Immune system ,Cell–cell interaction ,Antigen ,medicine ,Humans ,Autocrine signalling ,Lymph node ,General Immunology and Microbiology ,Applied Mathematics ,General Medicine ,T lymphocyte ,T-Lymphocytes, Helper-Inducer ,Lymphatic system ,medicine.anatomical_structure ,Modeling and Simulation ,Immunology ,Acute Disease ,Chronic Disease ,General Agricultural and Biological Sciences - Abstract
A cellular automata (CA) model has been built to study the interaction between T-helper subset cells in a secondary lymphoid organ during chronic and acute infection. The TH subset cells interacted via short range cytokine-like factors, each cell type producing an autocrine factor and another factor which suppressed the development and proliferation of the other TH cell type. A cell death term was also included such that T cells not restimulated by antigen within a certain time died to be replaced with new naive cells. The important parameters in the model were the antigen density entering the lymph node and the propensity of the antigens to induce naive T cells down a specific TH subset pathway. Many features of the response of the CA were found to match those seen in infections known to induce TH subset polarization. For example, it could be seen that TH cell subset polarization arose as a natural consequence of the dynamic competition between TH1 and TH2 cytokines to induce or suppress proliferation and was driven by the antigen produced by the pathogen.
- Published
- 1994
137. A cDNA encoding repeating units of the ABA-1 allergen of Ascaris
- Author
-
Joyce Moore, Heather J. Spence, Malcolm W. Kennedy, and Andy Brass
- Subjects
Base Sequence ,Ascaris ,Molecular Sequence Data ,Nucleic acid sequence ,Sequence alignment ,DNA ,Helminth Proteins ,Biology ,Allergens ,medicine.disease_cause ,biology.organism_classification ,Molecular biology ,Homology (biology) ,Allergen ,Complementary DNA ,Antigens, Helminth ,medicine ,Animals ,Parasitology ,Amino Acid Sequence ,Molecular Biology ,Gene ,Ascaris suum ,Repetitive Sequences, Nucleic Acid - Published
- 1993
138. Secondary and tertiary structures involving chondroitin and chondroitin sulphates in solution, investigated by rotary shadowing/electron microscopy and computer simulation
- Author
-
John E. Scott, Andy Brass, and Yuan Chen
- Subjects
Stereochemistry ,Molecular Sequence Data ,Biochemistry ,law.invention ,chemistry.chemical_compound ,law ,Carbohydrate Conformation ,Molecule ,Chondroitin ,Animals ,Computer Simulation ,Protein secondary structure ,chemistry.chemical_classification ,Chondroitin Sulfates ,Whales ,Charge density ,Water ,Polymer ,Protein tertiary structure ,Models, Structural ,Solutions ,Crystallography ,Microscopy, Electron ,Cartilage ,chemistry ,Carbohydrate Sequence ,Helix ,Sharks ,Thermodynamics ,Electron microscope - Abstract
Rotary shadowing/electron microscopy of chondroitin 6-sulphate (CS6) and 4-sulphate (CS4) showed that the former, but not the latter, aggregated to mesh works. Preparations made from salt (ammonium acetate) solutions showed enhanced aggregation. Computer modelling, using molecular mechanics and dynamics, was applied to secondary structures (twofold helices) derived from NMR studies, to determine geometric and energetic constraints on duplex and higher-aggregate formation. The calculations suggested that chondroitin, CS6 and undersulphated CS4 could form duplexes, while CS4 could not, thus bridging the gap between atomic dimensions (NMR) and high polymer scale (electron microscopy). Calculations suggested that water structure helped to stabilise the twofold helix. It is proposed that the twofold helical, flat, tape-like molecules aggregate via hydrophobic bonding between the very extensive hydrophobic patches (9 CH units) repeated on alternating sides of the polymers. The negative charge of the polyanions opposes aggregate formation. Calculations showed that duplexes were formed with decreasing stability as the charge density increased, and as the charge was concentrated towards the centre line of the polymer (i.e. in CS4). The unsulphated polymer chondroitin could form duplexes and higher aggregates as readily as hyaluronan. Hyaluronan was calculated to form stable heteroduplexes with CS6 and CS4. The frequency and positioning of the sulphate-ester group within the polymer thus determines whether the molecule participates in duplex formation.
- Published
- 1992
139. Homology modelling of integrin EF-hands. Evidence for widespread use of a conserved cation-binding site
- Author
-
Andy Brass, Martin J. Humphries, and Danny S. Tuckwell
- Subjects
Models, Molecular ,Cation binding ,Integrins ,Calmodulin ,Monosaccharide Transport Proteins ,Integrin ,Molecular Sequence Data ,Receptors, Cell Surface ,Pregnancy Proteins ,Biochemistry ,CD49c ,Cations ,Animals ,Humans ,Amino Acid Sequence ,Binding site ,Annexin A5 ,Molecular Biology ,Peptide sequence ,Heat-Shock Proteins ,Binding Sites ,biology ,Calcium-Binding Proteins ,Galactose ,Membrane Proteins ,Cell Biology ,Parvalbumins ,Periplasmic Binding Proteins ,biology.protein ,Integrin, beta 6 ,Calcium ,Carrier Proteins ,ITGA6 ,Receptors, Atrial Natriuretic Factor ,Atrial Natriuretic Factor ,Research Article - Abstract
Integrin alpha-subunits contain three or four peptide sequences that are similar to the EF-hand, a 13-residue bivalent cation-binding motif found in calmodulin and parvalbumin. The integrin sequences differ from classical EF-hands in that they lack a co-ordinating residue at position 12. One hypothesis to explain integrin-ligand binding is that aspartate-containing recognition sequences in integrin ligands, which bind at or near to the EF-hand-like sequences, may take the place of the missing residue and co-ordinate directly to the bound cation. In this report, homology modelling of integrin EF-hand-like sequences has been performed using the X-ray structure of calmodulin as a template in order to assess the functional activity of the integrin sequences. In the calmodulin-integrin hybrid structures, integrin EF-hand-like sequences were able to retain cations whereas control sequences did not. Structural analyses demonstrated that the integrin sequences in the hybrid proteins closely resembled conventional EF-hands. The integrin sequences are therefore highly likely to bind Ca2+ ions in vivo, a prerequisite for the ligand-binding model. Database searching with a matrix derived from known integrin EF-hand-like sequences has been used to identify other proteins containing the integrin EF-hand-like motif. Annexin V (anchorin CII), atrial natriuretic peptide receptors and the 70 kDa heat-shock protein were identified by the matrix; the functions of these proteins are known from previous studies to be bivalent cation-dependent. These findings suggest that the integrin EF-hand-like sequence may be a more common motif than originally thought.
- Published
- 1992
140. Bioinformatics Education—A UK perspective
- Author
-
Andy Brass
- Subjects
Statistics and Probability ,Computational Mathematics ,Computational Theory and Mathematics ,Perspective (graphical) ,Computational Biology ,Humans ,Education, Graduate ,Sociology ,Bioinformatics ,Molecular Biology ,Biochemistry ,United Kingdom ,Computer Science Applications - Published
- 2000
141. Inhibition of glucose transport in human erythrocytes by ubiquinone Q0
- Author
-
Allan G. Lowe, Andy Brass, and A.J. Critchley
- Subjects
Blood Glucose ,Erythrocytes ,Monosaccharide Transport Proteins ,Ubiquinone ,Molecular Sequence Data ,Biophysics ,Biology ,Biochemistry ,Binding, Competitive ,Oxidoreductase ,Animals ,Humans ,Amino Acid Sequence ,Binding site ,Ubiquinone binding ,chemistry.chemical_classification ,Tryptophan ,Glucose transporter ,Transporter ,Cell Biology ,Membrane transport ,Glucose binding ,Enzyme Activation ,Kinetics ,chemistry - Abstract
Searches of the protein data bases revealed limited homologies between several regions of the human erythrocyte glucose transporter containing a relative abundance of hydrogen-bonding amino-acid side chains, and proteins of the NADH-ubiquinone oxidoreductase family. This raised the possibility the binding sites for glucose and ubiquinone may be similar in the respective proteins. Experimental studies demonstrated that ubiquinone Q0 does in fact inhibit both glucose entry and glucose exit in human erythrocytes with kinetics consistent with the existence of ubiquinone binding sites at both the exofacial and endofacial sides of the transporter. Glucose transport was also inhibited by the water-soluble tryptophan-inactivating agent, dimethyl(2-hydroxy-5-nitrobenzyl)sulphonium bromide, and this is consistent with the presence of tryptophan residues in two of the exofacial amino-acid sequences proposed as candidates for involvement in glucose binding sites.
- Published
- 1991
142. Identification of antibody epitopes within the CB-11 peptide of type II collagen. I: Detection of antibody binding sites by epitope scanning
- Author
-
K Morgan, Jane Worthington, and Andy Brass
- Subjects
Immunology ,Molecular Sequence Data ,Type II collagen ,Peptide ,Epitope ,Arthritis, Rheumatoid ,chemistry.chemical_compound ,Epitopes ,Immunology and Allergy ,Animals ,Amino Acid Sequence ,Cyanogen Bromide ,Peptide sequence ,chemistry.chemical_classification ,Linear epitope ,biology ,Rats, Inbred Strains ,Molecular biology ,Peptide Fragments ,Amino acid ,Rats ,Disease Models, Animal ,chemistry ,Biochemistry ,biology.protein ,Cyanogen bromide ,Cattle ,Female ,Binding Sites, Antibody ,Collagen ,Antibody - Abstract
Using epitope scanning, the precise location of antibody binding sites on the CB-11 peptide of bovine type II collagen have been identified for the first time. Two hundred and seventy two peptides (8 amino acids in length and overlapping by seven amino acids), representing the complete CB-11 sequence, were synthesised on solid phase supports, in duplicate, and were screened with sera from arthritic and non-arthritic, bovine type II collagen-immunised rats. A total of twenty one different antibody binding sites were identified with no epitope being uniquely recognised by sera from arthritic, as compared to non-arthritic, rats although differences in the relative amount of antibody binding were seen. Individual sera identified between two and thirteen epitopes with one epitope being recognised by all sera. Some of the amino acid sequences, of the CB-11 region of bovine type II collagen, recognised by the rat sera are identical to the sequences in human type II collagen and thus these epitopes may be relevant to autoimmunity to type II collagen in patients with rheumatoid arthritis.
- Published
- 1991
143. Identification of antibody epitopes within the CB-11 peptide of type II collagen. II. Computer modelling studies of peptides and the interpretation of epitope scanning results
- Author
-
Andy Brass, Y Chen, K Morgan, and Jane Worthington
- Subjects
Models, Molecular ,Sequence analysis ,Immunology ,Molecular Sequence Data ,Type II collagen ,Peptide ,Epitope ,Arthritis, Rheumatoid ,chemistry.chemical_compound ,Epitopes ,Immunology and Allergy ,Animals ,Computer Simulation ,Amino Acid Sequence ,Cyanogen Bromide ,Peptide sequence ,chemistry.chemical_classification ,biology ,Protein primary structure ,Molecular biology ,Peptide Fragments ,chemistry ,Biochemistry ,biology.protein ,Cyanogen bromide ,Cattle ,Binding Sites, Antibody ,Collagen ,Antibody - Abstract
Computer modelling techniques were used to investigate the structure of 8-mers from the CB-11 peptide of bovine type II collagen which were recognised by sera from rats which had previously been injected with bovine type II collagen. It was discovered that all the hydrophobic peptides recognised by the rat sera were predicted to have collagenous-like secondary structures. The primary structure of the 8-mers which were recognised was also compared against the sequences in the OWL protein sequence database. The combined results of the computer modelling and sequence analysis suggested that the sequence Gly-Pro-Gly-Phe-Pro is a minimal B cell epitope of the CB-11 fragment of bovine type II collagen.
- Published
- 1991
144. Evidence for Pairing of Semions in Finite Systems
- Author
-
Andy Brass, W. C. Wu, and Catherine Kallin
- Subjects
Superfluidity ,Physics ,Physics::General Physics ,Flux tube ,Quantum mechanics ,Pairing ,Lattice (order) ,Thermodynamic limit ,Finite system ,Ground state ,Magnetic field - Abstract
We present the results of numerical studies of semions on a lattice. From our studies of the pairing energies, flux quantization and effects of external magnetic fields, we find strong evidence in support of the theory that semions will pair due to their statistical interaction and form a coherent or superfluid ground state.
- Published
- 1990
145. A systematic strategy for the discovery of candidate genes responsible for phenotypic variation
- Author
-
Harry Noyes, Katherine Wolstencroft, Stephen J. Kemp, Paul R. Fisher, Andy Brass, Cornelia Hedeler, Robert Stevens, and Helen Hulme
- Subjects
QA75 ,Candidate gene ,Microarray ,Computer science ,Process (engineering) ,Computational biology ,Quantitative trait locus ,computer.software_genre ,Candidate Gene Identification ,Biochemistry ,Genome ,Biological pathway ,Genotype-phenotype distinction ,Structural Biology ,Genotype ,QH426 ,Molecular Biology ,Gene ,Applied Mathematics ,Phenotype ,Computer Science Applications ,Identification (information) ,Poster Presentation ,Data mining ,DNA microarray ,computer - Abstract
IntroductionThe use of Quantitative Trait Loci (QTL) data is increasingly used to aid in the discovery of candidate genes involved in phenotypic variation. Tens to hundreds of genes, however, may lie within even well defined QTL. It is therefore vital that the identification, selection and functional testing of candidate Quantitative Trait genes (QTg) are carried out systematically, and without bias [1]. With the advent of microarrays, researchers are able to directly examine the expression of all genes on a genome wide scale, including those underlying QTL regions.The scale of data being generated by such high-throughput experiments has led some investigators to follow a hypothesis-driven approach [2]. Although these techniques for candidate gene identification are valid, they run the risk of overlooking genes that have less obvious associations with the phenotype. By making selections based on prior assumptions of what processes may be involved, the genes that may actually be involved in the phenotype can be overlooked. A further complication is that the use of ad hoc methods for candidate gene identification are inherently difficult to replicate and are compounded by poor documentation of the methods used to generate and capture the data from such investigations in published literature.With an ever increasing number of institutes offering programmatic access to their resources in the form of web services, however, experiments previously conducted manually can now be replaced by automated experiments, capable of processing a far greater volume of data. By reconstructing the original investigation methods in the form of workflows, we are now able to pass data directly from one service to the next. This enables us to process the data in a much more systematic, un-biased, and explicit manner.MethodsWe propose a data-driven methodology that identifies the known pathways that intersect a QTL and those derived from a set of differentially expressed genes from a microarray study. This methodology is implemented systematically through the use of web services and workflows. For the purpose of implementing this systematic pathway-driven approach, we have chosen to use the Taverna workbench [3].Results and DiscussionPreliminary studies into the modes of resistance to African Trypanosomiasis were carried out for the mouse model organism. These studies illustrated how the large-scale analysis of microarray gene expression and QTL data, investigated at the level of biological pathways, enables links between genotype and phenotype to be successfully established [4]. This approach was implemented systematically through the use of explicitly defined workflows.
- Published
- 2007
146. A model-based analysis of microarray experimental error and normalisation
- Author
-
Magnus Rattray, Stephen G. Oliver, Andrew Hayes, David Waddington, Yongxiang Fang, Abdulla Bashein, Andy Brass, and David C. Hoyle
- Subjects
Systematic error ,Systematic difference ,Models, Statistical ,business.industry ,Pattern recognition ,Regression analysis ,Statistical model ,Reference Standards ,Biology ,Bioinformatics ,Regression ,Identification (information) ,Genetics ,Regression Analysis ,Artificial intelligence ,Artifacts ,business ,Reference standards ,Algorithms ,NAR Methods Online ,Oligonucleotide Array Sequence Analysis ,Block (data storage) - Abstract
A statistical model is proposed for the analysis of errors in microarray experiments and is employed in the analysis and development of a combined normalisation regime. Through analysis of the model and two-dye microarray data sets, this study found the following. The systematic error introduced by microarray experiments mainly involves spot intensity-dependent, feature-specific and spot position-dependent contributions. It is difficult to remove all these errors effectively without a suitable combined normalisation operation. Adaptive normalisation using a suitable regression technique is more effective in removing spot intensity-related dye bias than self-normalisation, while regional normalisation (block normalisation) is an effective way to correct spot position-dependent errors. However, dye-flip replicates are necessary to remove feature-specific errors, and also allow the analyst to identify the experimentally introduced dye bias contained in non-self-self data sets. In this case, the bias present in the data sets may include both experimentally introduced dye bias and the biological difference between two samples. Self-normalisation is capable of removing dye bias without identifying the nature of that bias. The performance of adaptive normalisation, on the other hand, depends on its ability to correctly identify the dye bias. If adaptive normalisation is combined with an effective dye bias identification method then there is no systematic difference between the outcomes of the two methods.
- Published
- 2003
147. Robust normalization of microarray data over multiple experiments
- Author
-
Norman Morrison, Ray Jupp, Sudha Rao, Ian Hayes, Christopher J. Penkett, Nianshu Zhang, Stephen G. Oliver, Andy Brass, Jacqui Lockey, Magnus Rattray, Martin Brutsche, and Andrew Hayes
- Subjects
Genetics ,Normalization (statistics) ,Gene expression profiling ,Microarray analysis techniques ,Gene expression ,Computational biology ,Replicate ,Biology ,DNA microarray ,Gene ,Housekeeping gene - Abstract
Microarrays have set the stage for an explosion of large-scale expression data, driven by a diversity of genome sequencing projects. The technology has already demonstrated its applications in analysis of model systems, such as the response of mammalian fibroblasts to serum and sporulation in yeast. The comparison of data between multiple experiments run as a time series or under different conditions is not a trivial task. Although the analysis is challenging, it has the potential to answer some of the most interesting questions regarding information mining on gene expression patterns or function. To address these questions we have investigated standardization methods over multiple expression analysis experiments covering systems from high-density microarrays (40,000 individual gene transcripts) to membrane applications (500 individual gene transcripts). By making the assumption that global changes in gene activity are negligible, we show that normalization over the entire set of gene expression values in a given profile (provided that profile is not biased for examination of a particular system) provides a more statistically robust method than using housekeeping gene expression values. We also show that there is no significant reason for normalizing with a reduced subset of genes over a given range of expression. We have compared expression data derived from two different technological systems (glass slide and filter based); both of these systems have an intra-experimental distribution close to log-normal. We therefore normalize by mapping logged expression values within each experiment to a standard distribution with zero mean and unit variance. This transformation can be seen to effectively reduce to a minimum intra- and extra-experimental variances when analysing replicate experiment data. These methods are currently being applied in the statistical analysis of differential expression among patient groups and in the analysis of model organisms subject to certain conditions.
- Published
- 1999
148. 103 Secondary structure of a fibrillin-1 eight-cysteine motif
- Author
-
Tim J Wess, Andy Brass, Charlotte Dyer, Cay M. Kielty, and C. Adrian Shuttleworth
- Subjects
Circular dichroism ,Affinity chromatography ,Stereochemistry ,Chemistry ,Complementary DNA ,Sequence motif ,Structural motif ,Biochemistry ,Fibrillin ,Protein secondary structure ,Cysteine - Abstract
University of Manchester, Wellcome Center For Cell/Matrix Research, 2.205 Stopford, Manchester, MI3 9PT *University of Stirling, Scotland Eight-cysteine motifs occur within the fibrillin family of extracellular proteins which includes fibrillin-I and fibrillin-2, and the latent transforming growth factor-p binding proteins (LTBPs) 1-3 [1,2]. The importance of these motifs is highlighted by identification of mutations in eight-cysteine motifs resulting in Marfan syndrome [3], and the fact that one such module in LTBP-I binds TGF-P latency associated peptide via a disulphide linkage [4]. The signature of the eight-cysteine motifs is the distribution of cysteine residues, three of which are contiguous. While biochemical analysis suggests that all cysteine residues are involved in intramolecular disulphide bonds [5], the structure of the eight-cysteine motifs of fibrillin have not been defined. This report compares computer assisted modelling based on the primary sequence with the secondary structure determined by circular dichroism on recombinant fibrillin-I eight-cysteine motifs. Fibrillin1 cDNA encoding three domains, an eight-cysteine motif flanked by two epidermal growth factor (EGF) motifs, was generated by RT-PCR. The sequence was expressed in COS-1 cells using the mammalian expression vector signal plg tail (Invitrogen), into which an enterokinase site was introduced. The hsion protein contained an Fc tail which allowed purification by protein A-Sepharose affinity chromatography. Enterokinase cleavage allowed release of the recombinant protein. Protein purity was assessed by SDS-PAGE on 15% gels and by mass spectrum determination by MALDI-TOF. Circular dichroism was conducted using a JASCO J-600 with scan rates at 10 nm min -’ with a time constant of 2 seconds. Data were collected from 280-195 run. Fitting ofthe data was conducted using CONTM procedure which fits a-helix and Psheet quantities to spectral data when compared to a structural database.
- Published
- 1998
149. Mechanism of integrin α4β1-VCAM-1 interaction
- Author
-
Martin J. Humphries, Peter Newham, Danny S. Tuckwell, and Andy Brass
- Subjects
Integrin α4β1 ,chemistry.chemical_compound ,Integrin alpha M ,biology ,Chemistry ,Mechanism (biology) ,biology.protein ,Integrin, beta 6 ,VCAM-1 ,Biochemistry ,Cell biology - Published
- 1993
150. The aromatic zipper: A model for the initial trimerization event in collagen folding
- Author
-
Michael E. Grant, Raymond P. Boot-Handford, Karl E. Kadler, J. Terrig Thomas, and Andy Brass
- Subjects
Zipper ,Protein Conformation ,Chemistry ,Stereochemistry ,Complement C1q ,Event (relativity) ,Molecular Sequence Data ,Biochemistry ,Folding (chemistry) ,Models, Chemical ,Sequence Homology, Nucleic Acid ,Animals ,Humans ,Amino Acid Sequence ,Collagen - Published
- 1991
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.