16 results on '"Stærfeldt HH"'
Search Results
2. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.
- Author
-
Westergaard D, Stærfeldt HH, Tønsberg C, Jensen LJ, and Brunak S
- Subjects
- Area Under Curve, Computational Biology methods, False Positive Reactions, Genes, Periodicals as Topic, Proteins genetics, ROC Curve, Software, Terminology as Topic, Abstracting and Indexing, Data Mining methods, Information Storage and Retrieval, MEDLINE
- Abstract
Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
- Published
- 2018
- Full Text
- View/download PDF
3. A scored human protein-protein interaction network to catalyze genomic interpretation.
- Author
-
Li T, Wernersson R, Hansen RB, Horn H, Mercer J, Slodkowicz G, Workman CT, Rigina O, Rapacki K, Stærfeldt HH, Brunak S, Jensen TS, and Lage K
- Subjects
- Databases, Protein, Genome, Human, Humans, User-Computer Interface, Computational Biology methods, Data Interpretation, Statistical, Gene Regulatory Networks, Genomics methods, Neoplasms genetics, Neoplasms metabolism, Protein Interaction Maps genetics
- Abstract
Genome-scale human protein-protein interaction networks are critical to understanding cell biology and interpreting genomic data, but challenging to produce experimentally. Through data integration and quality control, we provide a scored human protein-protein interaction network (InWeb_InBioMap, or InWeb_IM) with severalfold more interactions (>500,000) and better functional biological relevance than comparable resources. We illustrate that InWeb_InBioMap enables functional interpretation of >4,700 cancer genomes and genes involved in autism.
- Published
- 2017
- Full Text
- View/download PDF
4. Proteome analysis of pod and seed development in the model legume Lotus japonicus.
- Author
-
Nautrup-Pedersen G, Dam S, Laursen BS, Siegumfeldt AL, Nielsen K, Goffard N, Stærfeldt HH, Friis C, Sato S, Tabata S, Lorentzen A, Roepstorff P, and Stougaard J
- Subjects
- Fabaceae, Fruit growth & development, Lotus growth & development, Metabolic Networks and Pathways, Seeds growth & development, Tandem Mass Spectrometry, Fruit chemistry, Lotus chemistry, Plant Proteins analysis, Proteome analysis, Seeds chemistry
- Abstract
Legume pods serve important functions during seed development and are themselves sources of food and feed. Compared to seeds, the metabolism and development of pods are not well-defined. The present characterization of pods from the model legume Lotus japonicus, together with the detailed analyses of the pod and seed proteomes in five developmental stages, paves the way for comparative pathway analysis and provides new metabolic information. Proteins were analyzed by two-dimensional gel electrophoresis and tandem-mass spectrometry. These analyses lead to the identification of 604 pod proteins and 965 seed proteins, including 263 proteins distinguishing the pod. The complete data set is publicly available at http://www.cbs.dtu.dk/cgi-bin/lotus/db.cgi , where spots in a reference map are linked to experimental data, such as matched peptides, quantification values, and gene accessions. Identified pod proteins represented enzymes from 85 different metabolic pathways, including storage globulins and a late embryogenesis abundant protein. In contrast to seed maturation, pod maturation was associated with decreasing total protein content, especially proteins involved in protein biosynthesis and photosynthesis. Proteins detected only in pods included three enzymes participating in the urea cycle and four in nitrogen and amino group metabolism, highlighting the importance of nitrogen metabolism during pod development. Additionally, five legume seed proteins previously unassigned in the glutamate metabolism pathway were identified.
- Published
- 2010
- Full Text
- View/download PDF
5. GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes.
- Author
-
Hallin PF, Stærfeldt HH, Rotenberg E, Binnewies TT, Benham CJ, and Ussery DW
- Abstract
We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from http://www.cbs.dtu.dk/services/gwBrowser. Supplemental material including interactive atlases is available online at http://www.cbs.dtu.dk/services/gwBrowser/suppl/.
- Published
- 2009
- Full Text
- View/download PDF
6. The proteome of seed development in the model legume Lotus japonicus.
- Author
-
Dam S, Laursen BS, Ornfelt JH, Jochimsen B, Staerfeldt HH, Friis C, Nielsen K, Goffard N, Besenbacher S, Krusell L, Sato S, Tabata S, Thøgersen IB, Enghild JJ, and Stougaard J
- Subjects
- Biomass, Chromatography, Liquid, Databases, Protein, Electrophoresis, Gel, Two-Dimensional, Fatty Acids analysis, Globulins genetics, Globulins metabolism, Internet, Seed Storage Proteins metabolism, Seeds cytology, Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization, Starch metabolism, Water, Lotus embryology, Lotus metabolism, Models, Biological, Proteome metabolism, Seeds growth & development, Seeds metabolism
- Abstract
We have characterized the development of seeds in the model legume Lotus japonicus. Like soybean (Glycine max) and pea (Pisum sativum), Lotus develops straight seed pods and each pod contains approximately 20 seeds that reach maturity within 40 days. Histological sections show the characteristic three developmental phases of legume seeds and the presence of embryo, endosperm, and seed coat in desiccated seeds. Furthermore, protein, oil, starch, phytic acid, and ash contents were determined, and this indicates that the composition of mature Lotus seed is more similar to soybean than to pea. In a first attempt to determine the seed proteome, both a two-dimensional polyacrylamide gel electrophoresis approach and a gel-based liquid chromatography-mass spectrometry approach were used. Globulins were analyzed by two-dimensional polyacrylamide gel electrophoresis, and five legumins, LLP1 to LLP5, and two convicilins, LCP1 and LCP2, were identified by matrix-assisted laser desorption ionization quadrupole/time-of-flight mass spectrometry. For two distinct developmental phases, seed filling and desiccation, a gel-based liquid chromatography-mass spectrometry approach was used, and 665 and 181 unique proteins corresponding to gene accession numbers were identified for the two phases, respectively. All of the proteome data, including the experimental data and mass spectrometry spectra peaks, were collected in a database that is available to the scientific community via a Web interface (http://www.cbs.dtu.dk/cgi-bin/lotus/db.cgi). This database establishes the basis for relating physiology, biochemistry, and regulation of seed development in Lotus. Together with a new Web interface (http://bioinfoserver.rsbs.anu.edu.au/utils/PathExpress4legumes/) collecting all protein identifications for Lotus, Medicago, and soybean seed proteomes, this database is a valuable resource for comparative seed proteomics and pathway analysis within and beyond the legume family.
- Published
- 2009
- Full Text
- View/download PDF
7. RNAmmer: consistent and rapid annotation of ribosomal RNA genes.
- Author
-
Lagesen K, Hallin P, Rødland EA, Staerfeldt HH, Rognes T, and Ussery DW
- Subjects
- Computational Biology methods, Genome, Bacterial, Genomics methods, Markov Chains, Genes, rRNA, Software
- Abstract
The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.
- Published
- 2007
- Full Text
- View/download PDF
8. FeatureMap3D--a tool to map protein features and sequence conservation onto homologous structures in the PDB.
- Author
-
Wernersson R, Rapacki K, Staerfeldt HH, Sackett PW, and Mølgaard A
- Subjects
- Amino Acid Sequence, Amino Acids chemistry, Computer Graphics, Conserved Sequence, Exons, Internet, Models, Molecular, Proteins chemistry, Sequence Alignment, Databases, Protein, Protein Conformation, Sequence Homology, Amino Acid, Software, Structural Homology, Protein
- Abstract
FeatureMap3D is a web-based tool that maps protein features onto 3D structures. The user provides sequences annotated with any feature of interest, such as post-translational modifications, protease cleavage sites or exonic structure and FeatureMap3D will then search the Protein Data Bank (PDB) for structures of homologous proteins. The results are displayed both as an annotated sequence alignment, where the user-provided annotations as well as the sequence conservation between the query and the target sequence are displayed, and also as a publication-quality image of the 3D protein structure with the selected features and sequence conservation enhanced. The results are also returned in a readily parsable text format as well as a PyMol (http://pymol.sourceforge.net/) script file, which allows the user to easily modify the protein structure image to suit a specific purpose. FeatureMap3D can also be used without sequence annotation, to evaluate the quality of the alignment of the input sequences to the most homologous structures in the PDB, through the sequence conservation colored 3D structure visualization tool. FeatureMap3D is available at: http://www.cbs.dtu.dk/services/FeatureMap3D/.
- Published
- 2006
- Full Text
- View/download PDF
9. Origin of replication in circular prokaryotic chromosomes.
- Author
-
Worning P, Jensen LJ, Hallin PF, Staerfeldt HH, and Ussery DW
- Subjects
- DNA Replication genetics, DNA, Circular genetics, Phylogeny, Archaea genetics, Bacteria genetics, Chromosomes, Archaeal genetics, Chromosomes, Bacterial genetics, Replication Origin
- Abstract
To predict origins of replication in prokaryotic chromosomes, we analyse the leading and lagging strands of 200 chromosomes for differences in oligomer composition and show that these correlate strongly with taxonomic grouping, lifestyle and molecular details of the replication process. While all bacteria have a preference for Gs over Cs on the leading strand, we discover that the direction of the A/T skew is determined by the polymerase-alpha subunit that replicates the leading strand. The strength of the strand bias varies greatly between both phyla and environments and appears to correlate with growth rate. Finally we observe much greater diversity of skew among archaea than among bacteria. We have developed a program that accurately locates the origins of replication by measuring the differences between leading and lagging strand of all oligonucleotides up to 8 bp in length. The program and results for all publicly available genomes are available from http://www.cbs.dtu.dk/services/GenomeAtlas/suppl/origin.
- Published
- 2006
- Full Text
- View/download PDF
10. Pigs in sequence space: a 0.66X coverage pig genome survey based on shotgun sequencing.
- Author
-
Wernersson R, Schierup MH, Jørgensen FG, Gorodkin J, Panitz F, Staerfeldt HH, Christensen OF, Mailund T, Hornshøj H, Klein A, Wang J, Liu B, Hu S, Dong W, Li W, Wong GK, Yu J, Wang J, Bendixen C, Fredholm M, Brunak S, Yang H, and Bolund L
- Subjects
- Animals, Computational Biology methods, Evolution, Molecular, Exons, Genome, Human, Humans, Mice, Phylogeny, RNA, Messenger metabolism, Repetitive Sequences, Nucleic Acid, Species Specificity, Swine, Genome, Genomics methods, Sequence Analysis, DNA methods
- Abstract
Background: Comparative whole genome analysis of Mammalia can benefit from the addition of more species. The pig is an obvious choice due to its economic and medical importance as well as its evolutionary position in the artiodactyls., Results: We have generated approximately 3.84 million shotgun sequences (0.66X coverage) from the pig genome. The data are hereby released (NCBI Trace repository with center name "SDJVP", and project name "Sino-Danish Pig Genome Project") together with an initial evolutionary analysis. The non-repetitive fraction of the sequences was aligned to the UCSC human-mouse alignment and the resulting three-species alignments were annotated using the human genome annotation. Ultra-conserved elements and miRNAs were identified. The results show that for each of these types of orthologous data, pig is much closer to human than mouse is. Purifying selection has been more efficient in pig compared to human, but not as efficient as in mouse, and pig seems to have an isochore structure most similar to the structure in human., Conclusion: The addition of the pig to the set of species sequenced at low coverage adds to the understanding of selective pressures that have acted on the human genome by bisecting the evolutionary branch between human and mouse with the mouse branch being approximately 3 times as long as the human branch. Additionally, the joint alignment of the shot-gun sequences to the human-mouse alignment offers the investigator a rapid way to defining specific regions for analysis and resequencing.
- Published
- 2005
- Full Text
- View/download PDF
11. Genome Update: proteome comparisons.
- Author
-
Binnewies TT, Hallin PF, Stærfeldt HH, and Ussery DW
- Subjects
- Animals, Genome, Archaeal, Genome, Bacterial, Molecular Sequence Data, Cryptosporidium genetics, Genome, Haloarcula genetics, Mycoplasma hyopneumoniae genetics, Photobacterium genetics, Proteome
- Published
- 2005
- Full Text
- View/download PDF
12. Prediction of human protein function according to Gene Ontology categories.
- Author
-
Jensen LJ, Gupta R, Staerfeldt HH, and Brunak S
- Subjects
- Databases, Protein, Gene Expression Profiling methods, Humans, Information Storage and Retrieval methods, Pattern Recognition, Automated, Proteins classification, Proteins genetics, Sequence Homology, Structure-Activity Relationship, Algorithms, Database Management Systems, Neural Networks, Computer, Proteins chemistry, Sequence Alignment methods, Sequence Analysis, Protein methods
- Abstract
Motivation: The human genome project has led to the discovery of many human protein coding genes which were previously unknown. As a large fraction of these are functionally uncharacterized, it is of interest to develop methods for predicting their molecular function from sequence., Results: We have developed a method for prediction of protein function for a subset of classes from the Gene Ontology classification scheme. This subset includes several pharmaceutically interesting categories-transcription factors, receptors, ion channels, stress and immune response proteins, hormones and growth factors can all be predicted. Although the method relies on protein sequences as the sole input, it does not rely on sequence similarity, but instead on sequence derived protein features such as predicted post translational modifications (PTMs), protein sorting signals and physical/chemical properties calculated from the amino acid composition. This allows for prediction of the function for orphan proteins where no homologs can be found. Using this method we propose two novel receptors in the human genome, and further demonstrate chromosomal clustering of related proteins.
- Published
- 2003
- Full Text
- View/download PDF
13. Bias of purine stretches in sequenced chromosomes.
- Author
-
Ussery D, Soumpasis DM, Brunak S, Staerfeldt HH, Worning P, and Krogh A
- Subjects
- Animals, Base Sequence, Bias, Chromosomes, Archaeal genetics, Chromosomes, Bacterial genetics, Eukaryota genetics, Eukaryotic Cells, Genome, Humans, Molecular Sequence Data, Nucleic Acid Conformation, Plasmids genetics, Pyrimidines analysis, Chromosomes genetics, Databases, Genetic, Purines analysis
- Abstract
We examined more than 700 DNA sequences (full length chromosomes and plasmids) for stretches of purines (R) or pyrimidines (Y) and alternating YR stretches; such regions will likely adopt structures which are different from the canonical B-form. Since one turn of the DNA helix is roughly 10 bp, we measured the fraction of each genome which contains purine (or pyrimidine) tracts of lengths of 10 bp or longer (hereafter referred to as 'purine tracts'), as well as stretches of alternating pyrimidines/purine (pyr/pur tracts') of the same length. Using this criteria, a random sequence would be expected to contain 1.0% of purine tracts and also 1.0% of the alternating pyr/pur tracts. In the vast majority of cases, there are more purine tracts than would be expected from a random sequence, with an average of 3.5%, significantly larger than the expectation value. The fraction of the chromosomes containing pyr/pur tracts was slightly less than expected, with an average of 0.8%. One of the most surprising findings is a clear difference in the length distributions of the regions studied between prokaryotes and eukaryotes. Whereas short-range correlations can explain the length distributions in prokaryotes, in eukaryotes there is an abundance of long stretches of purines or alternating purine/pyrimidine tracts, which cannot be explained in this way; these sequences are likely to play an important role in eukaryotic chromosome organisation.
- Published
- 2002
- Full Text
- View/download PDF
14. Prediction of human protein function from post-translational modifications and localization features.
- Author
-
Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CA, Knudsen S, Krogh A, Valencia A, and Brunak S
- Subjects
- Databases, Protein, Enzymes chemistry, Enzymes classification, Enzymes metabolism, Genome, Human, Glycosylation, Humans, Isoelectric Point, Linguistics, Neural Networks, Computer, Phosphorylation, Physical Chromosome Mapping, Protein Binding, Protein Transport, Proteins metabolism, Software, Computational Biology methods, Protein Processing, Post-Translational, Protein Sorting Signals, Proteins chemistry, Proteins classification
- Abstract
We have developed an entirely sequence-based method that identifies and integrates relevant features that can be used to assign proteins of unknown function to functional classes, and enzyme categories for enzymes. We show that strategies for the elucidation of protein function may benefit from a number of functional attributes that are more directly related to the linear sequence of amino acids, and hence easier to predict, than protein structure. These attributes include features associated with post-translational modifications and protein sorting, but also much simpler aspects such as the length, isoelectric point and composition of the polypeptide chain., ((c) 2002 Elsevier Science Ltd.)
- Published
- 2002
- Full Text
- View/download PDF
15. A DNA structural atlas for Escherichia coli.
- Author
-
Pedersen AG, Jensen LJ, Brunak S, Staerfeldt HH, and Ussery DW
- Subjects
- Bacterial Proteins genetics, Base Pairing genetics, Color, Computational Biology, Computer Simulation, Crystallography, X-Ray, DNA, Superhelical chemistry, DNA, Superhelical genetics, Deoxyribonuclease I metabolism, Genes, Bacterial genetics, Models, Molecular, Nucleosomes chemistry, Nucleosomes genetics, Pattern Recognition, Automated, Phylogeny, Pliability, Promoter Regions, Genetic genetics, RNA, Bacterial genetics, Software, Statistics as Topic, Thermodynamics, DNA, Bacterial chemistry, DNA, Bacterial genetics, Escherichia coli genetics, Genome, Bacterial, Nucleic Acid Conformation
- Abstract
We have performed a computational analysis of DNA structural features in 18 fully sequenced prokaryotic genomes using models for DNA curvature, DNA flexibility, and DNA stability. The structural values that are computed for the Escherichia coli chromosome are significantly different from (and generally more extreme than) that expected from the nucleotide composition. To aid this analysis, we have constructed tools that plot structural measures for all positions in a long DNA sequence (e.g. an entire chromosome) in the form of color-coded wheels (http://www.cbs.dtu. dk/services/GenomeAtlas/). We find that these "structural atlases" are useful for the discovery of interesting features that may then be investigated in more depth using statistical methods. From investigation of the E. coli structural atlas, we discovered a genome-wide trend, where an extended region encompassing the terminus displays a high of level curvature, a low level of flexibility, and a low degree of helix stability. The same situation is found in the distantly related Gram-positive bacterium Bacillus subtilis, suggesting that the phenomenon is biologically relevant. Based on a search for long DNA segments where all the independent structural measures agree, we have found a set of 20 regions with identical and very extreme structural properties. Due to their strong inherent curvature, we suggest that these may function as topological domain boundaries by efficiently organizing plectonemically supercoiled DNA. Interestingly, we find that in practically all the investigated eubacterial and archaeal genomes, there is a trend for promoter DNA being more curved, less flexible, and less stable than DNA in coding regions and in intergenic DNA without promoters. This trend is present regardless of the absolute levels of the structural parameters, and we suggest that this may be related to the requirement for helix unwinding during initiation of transcription, or perhaps to the previously observed location of promoters at the apex of plectonemically supercoiled DNA. We have also analyzed the structural similarities between groups of genes by clustering all RNA and protein-encoding genes in E. coli, based on the average structural parameters. We find that most ribosomal genes (protein-encoding as well as rRNA genes) cluster together, and we suggest that DNA structure may play a role in the transcription of these highly expressed genes., (Copyright 2000 Academic Press.)
- Published
- 2000
- Full Text
- View/download PDF
16. MatrixPlot: visualizing sequence constraints.
- Author
-
Gorodkin J, Staerfeldt HH, Lund O, and Brunak S
- Subjects
- Nucleic Acids chemistry, Sequence Analysis, Protein, Proteins chemistry, Sequence Alignment, Software
- Abstract
Unlabelled: MatrixPlot is a program for making high-quality matrix plots, such as mutual information plots of sequence alignments and distance matrices of sequences with known three-dimensional coordinates. The user can add information about the sequences (e.g. a sequence logo profile) along the edges of the plot, as well as zoom in on any region in the plot., Availability: MatrixPlot can be obtained on request, and can also be accessed online at http://www. cbs.dtu.dk/services/MatrixPlot., Contact: gorodkin@cbs.dtu.dk
- Published
- 1999
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.