27 results on '"Reinders, Marcel J T"'
Search Results
2. Determining epitope specificity of T-cell receptors with transformers.
- Author
-
Khan, Abdul Rehman, Reinders, Marcel J T, and Khatri, Indu
- Subjects
- *
DEEP learning , *TRANSFORMER models , *MAJOR histocompatibility complex , *DATABASES , *MONOCLONAL antibodies , *AMINO acid sequence - Abstract
Summary T-cell receptors (TCRs) on T cells recognize and bind to epitopes presented by the major histocompatibility complex in case of an infection or cancer. However, the high diversity of TCRs, as well as their unique and complex binding mechanisms underlying epitope recognition, make it difficult to predict the binding between TCRs and epitopes. Here, we present the utility of transformers, a deep learning strategy that incorporates an attention mechanism that learns the informative features, and show that these models pre-trained on a large set of protein sequences outperform current strategies. We compared three pre-trained auto-encoder transformer models (ProtBERT, ProtAlbert, and ProtElectra) and one pre-trained auto-regressive transformer model (ProtXLNet) to predict the binding specificity of TCRs to 25 epitopes from the VDJdb database (human and murine). Two additional modifications were performed to incorporate gene usage of the TCRs in the four transformer models. Of all 12 transformer implementations (four models with three different modifications), a modified version of the ProtXLNet model could predict TCR–epitope pairs with the highest accuracy (weighted F1 score 0.55 simultaneously considering all 25 epitopes). The modification included additional features representing the gene names for the TCRs. We also showed that the basic implementation of transformers outperformed the previously available methods, i.e. TCRGP, TCRdist, and DeepTCR, developed for the same biological problem, especially for the hard-to-classify labels. We show that the proficiency of transformers in attention learning can be made operational in a complex biological setting like TCR binding prediction. Further ingenuity in utilizing the full potential of transformers, either through attention head visualization or introducing additional features, can extend T-cell research avenues. Availability and implementation Data and code are available on https://github.com/InduKhatri/tcrformer. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Epigenetic and Metabolomic Biomarkers for Biological Age: A Comparative Analysis of Mortality and Frailty Risk.
- Author
-
Kuiper, Lieke M, Polinder-Bos, Harmke A, Bizzarri, Daniele, Vojinovic, Dina, Vallerga, Costanza L, Beekman, Marian, Dollé, E T, Ghanbari, Mohsen, Voortman, Trudy, Reinders, Marcel J T, Verschuren, W M Monique, Slagboom, P Eline, Akker, Erik B van den, and Meurs, Joyce B J van
- Subjects
FRAILTY ,METABOLOMICS ,BIOMARKERS ,EPIGENETICS ,GERIATRIC assessment - Abstract
Biological age captures a person's age-related risk of unfavorable outcomes using biophysiological information. Multivariate biological age measures include frailty scores and molecular biomarkers. These measures are often studied in isolation, but here we present a large-scale study comparing them. In 2 prospective cohorts (n = 3 222), we compared epigenetic (DNAm Horvath, DNAm Hannum, DNAm Lin, DNAm epiTOC, DNAm PhenoAge, DNAm DunedinPoAm, DNAm GrimAge, and DNAm Zhang) and metabolomic-based (MetaboAge and MetaboHealth) biomarkers in reflection of biological age, as represented by 5 frailty measures and overall mortality. Biomarkers trained on outcomes with biophysiological and/or mortality information outperformed age-trained biomarkers in frailty reflection and mortality prediction. DNAm GrimAge and MetaboHealth, trained on mortality, showed the strongest association with these outcomes. The associations of DNAm GrimAge and MetaboHealth with frailty and mortality were independent of each other and of the frailty score mimicking clinical geriatric assessment. Epigenetic, metabolomic, and clinical biological age markers seem to capture different aspects of aging. These findings suggest that mortality-trained molecular markers may provide novel phenotype reflecting biological age and strengthen current clinical geriatric health and well-being assessment. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. Single-cell reference mapping to construct and extend cell-type hierarchies.
- Author
-
Michielsen, Lieke, Lotfollahi, Mohammad, Strobl, Daniel, Sikkema, Lisa, Reinders, Marcel J T, Theis, Fabian J, and Mahfouz, Ahmed
- Published
- 2023
- Full Text
- View/download PDF
5. Cell type matching across species using protein embeddings and transfer learning.
- Author
-
Biharie, Kirti, Michielsen, Lieke, Reinders, Marcel J T, and Mahfouz, Ahmed
- Subjects
SPECIES ,MOTOR cortex ,AMINO acid sequence - Abstract
Motivation Knowing the relation between cell types is crucial for translating experimental results from mice to humans. Establishing cell type matches, however, is hindered by the biological differences between the species. A substantial amount of evolutionary information between genes that could be used to align the species is discarded by most of the current methods since they only use one-to-one orthologous genes. Some methods try to retain the information by explicitly including the relation between genes, however, not without caveats. Results In this work, we present a model to transfer and align cell types in cross-species analysis (TACTiCS). First, TACTiCS uses a natural language processing model to match genes using their protein sequences. Next, TACTiCS employs a neural network to classify cell types within a species. Afterward, TACTiCS uses transfer learning to propagate cell type labels between species. We applied TACTiCS on scRNA-seq data of the primary motor cortex of human, mouse, and marmoset. Our model can accurately match and align cell types on these datasets. Moreover, our model outperforms Seurat and the state-of-the-art method SAMap. Finally, we show that our gene matching method results in better cell type matches than BLAST in our model. Availability and implementation The implementation is available on GitHub (https://github.com/kbiharie/TACTiCS). The preprocessed datasets and trained models can be downloaded from Zenodo (https://doi.org/10.5281/zenodo.7582460). [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. scTopoGAN: unsupervised manifold alignment of single-cell data.
- Author
-
Singh, Akash, Biharie, Kirti, Reinders, Marcel J T, Mahfouz, Ahmed, and Abdelaal, Tamim
- Subjects
MOLECULAR biology ,GENERATIVE adversarial networks ,MULTIOMICS ,BIOINFORMATICS ,COMPUTATIONAL biology - Abstract
Motivation Single-cell technologies allow deep characterization of different molecular aspects of cells. Integrating these modalities provides a comprehensive view of cellular identity. Current integration methods rely on overlapping features or cells to link datasets measuring different modalities, limiting their application to experiments where different molecular layers are profiled in different subsets of cells. Results We present scTopoGAN, a method for unsupervised manifold alignment of single-cell datasets with non-overlapping cells or features. We use topological autoencoders (topoAE) to obtain latent representations of each modality separately. A topology-guided Generative Adversarial Network then aligns these latent representations into a common space. We show that scTopoGAN outperforms state-of-the-art manifold alignment methods in complete unsupervised settings. Interestingly, the topoAE for individual modalities also showed better performance in preserving the original structure of the data in the low-dimensional representations when compared to other manifold projection methods. Taken together, we show that the concept of topology preservation might be a powerful tool to align multiple single modality datasets, unleashing the potential of multi-omic interpretations of cells. Availability and implementation Implementation available on GitHub (https://github.com/AkashCiel/scTopoGAN). All datasets used in this study are publicly available. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history.
- Author
-
Maurits, Marc P, Korsunsky, Ilya, Raychaudhuri, Soumya, Murphy, Shawn N, Smoller, Jordan W, Weiss, Scott T, Huizinga, Thomas W J, Reinders, Marcel J T, Karlson, Elizabeth W, Akker, Erik B van den, Knevel, Rachel, and van den Akker, Erik B
- Abstract
Objective: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects.Material and Methods: We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features.Results: We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles.Discussion: Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data.Conclusion: We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes. [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
8. Effect of Phenotype and Genotype on the Plasma Proteome in Patients with Inflammatory Bowel Disease.
- Author
-
Bourgonje, Arno R, Hu, Shixian, Spekhorst, Lieke M, Zhernakova, Daria V, Vila, Arnau Vich, Li, Yanni, Voskuil, Michiel D, Berkel, Lisette A van, Folly, Brenda Bley, Charrout, Mohammed, Mahfouz, Ahmed, Reinders, Marcel J T, Heck, Julia I P van, Joosten, Leo A B, Visschedijk, Marijn C, Dullemen, Hendrik M van, Faber, Klaas Nico, Samsom, Janneke N, Festen, Eleonora A M, and Dijkstra, Gerard
- Abstract
Background and Aims Protein profiling in patients with inflammatory bowel diseases [IBD] for diagnostic and therapeutic purposes is underexplored. This study analysed the association between phenotype, genotype, and the plasma proteome in IBD. Methods A total of 92 inflammation-related proteins were quantified in plasma of 1028 patients with IBD (567 Crohn's disease [CD]; 461 ulcerative colitis [UC]) and 148 healthy individuals to assess protein-phenotype associations. Corresponding whole-exome sequencing and global screening array data of 919 patients with IBD were included to analyse the effect of genetics on protein levels (protein quantitative trait loci [pQTL] analysis). Intestinal mucosal RNA sequencing and faecal metagenomic data were used for complementary analyses. Results Thirty-two proteins were differentially abundant between IBD and healthy individuals, of which 22 proteins were independent of active inflammation; 69 proteins were associated with 15 demographic and clinical factors. Fibroblast growth factor-19 levels were decreased in CD patients with ileal disease or a history of ileocecal resection. Thirteen novel cis -pQTLs were identified and 10 replicated from previous studies. One trans -pQTL of the fucosyltransferase 2 [ FUT2 ] gene [rs602662] and two independent cis -pQTLs of C-C motif chemokine 25 [ CCL25 ] affected plasma CCL25 levels. Intestinal gene expression data revealed an overlapping cis -expression [e]QTL-variant [rs3745387] of the CCL25 gene. The FUT2 rs602662 trans -pQTL was associated with reduced abundances of faecal butyrate-producing bacteria. Conclusions This study shows that genotype and multiple disease phenotypes strongly associate with the plasma inflammatory proteome in IBD, and identifies disease-associated pathways that may help to improve disease management in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. scMoC: single-cell multi-omics clustering.
- Author
-
Eltager, Mostafa, Abdelaal, Tamim, Mahfouz, Ahmed, and Reinders, Marcel J T
- Subjects
MULTIOMICS ,NUCLEOTIDE sequence ,BIOINFORMATICS ,INFORMATION retrieval ,DATA analysis - Abstract
Motivation Single-cell multi-omics assays simultaneously measure different molecular features from the same cell. A key question is how to benefit from the complementary data available and perform cross-modal clustering of cells. Results We propose S ingle- C ell M ulti- o mics C lustering (scMoC), an approach to identify cell clusters from data with comeasurements of scRNA-seq and scATAC-seq from the same cell. We overcome the high sparsity of the scATAC-seq data by using an imputation strategy that exploits the less-sparse scRNA-seq data available from the same cell. Subsequently, scMoC identifies clusters of cells by merging clusterings derived from both data domains individually. We tested scMoC on datasets generated using different protocols with variable data sparsity levels. We show that scMoC (i) is able to generate informative scATAC-seq data due to its RNA-guided imputation strategy and (ii) results in integrated clusters based on both RNA and ATAC information that are biologically meaningful either from the RNA or from the ATAC perspective. Availability and implementation The data used in this manuscript is publicly available, and we refer to the original manuscript for their description and availability. For convience sci-CAR data is available at NCBI GEO under the accession number of GSE117089. SNARE-seq data is available at NCBI GEO under the accession number of GSE126074. The 10X multiome data is available at the following link https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-no-cell-sorting-3-k-1-standard-2-0-0. Supplementary information Supplementary data are available at Bioinformatics Advances online. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Differential analysis of binarized single-cell RNA sequencing data captures biological variation.
- Author
-
Bouland, Gerard A, Mahfouz, Ahmed, and Reinders, Marcel J T
- Published
- 2021
- Full Text
- View/download PDF
11. snpXplorer: a web application to explore human SNP-associations and annotate SNP-sets.
- Author
-
Tesi, Niccolo, van der Lee, Sven, Hulsman, Marc, Holstege, Henne, and Reinders, Marcel J. T.
- Published
- 2021
- Full Text
- View/download PDF
12. Polygenic Risk Score of Longevity Predicts Longer Survival Across an Age Continuum.
- Author
-
Tesi, Niccoló, van der Lee, Sven J., Hulsman, Marc, Jansen, Iris E., Stringa, Najada, van Schoor, Natasja M., Scheltens, Philip, van der Flier, Wiesje M., Huisman, Martijn, Reinders, Marcel J. T., and Holstege, Henne
- Abstract
Studying the genome of centenarians may give insights into the molecular mechanisms underlying extreme human longevity and the escape of age-related diseases. Here, we set out to construct polygenic risk scores (PRSs) for longevity and to investigate the functions of longevityassociated variants. Using a cohort of centenarians with maintained cognitive health (N = 343), a population-matched cohort of older adults from 5 cohorts (N = 2905), and summary statistics data from genome-wide association studies on parental longevity, we constructed a PRS including 330 variants that significantly discriminated between centenarians and older adults. This PRS was also associated with longer survival in an independent sample of younger individuals (p = .02), leading up to a 4-year difference in survival based on common genetic factors only. We show that this PRS was, in part, able to compensate for the deleterious effect of the APOE-ε4 allele. Using an integrative framework, we annotated the 330 variants included in this PRS by the genes they associate with. We find that they are enriched with genes associated with cellular differentiation, developmental processes, and cellular response to stress. Together, our results indicate that an extended human life span is, in part, the result of a constellation of variants each exerting small advantageous effects on aging-related biological mechanisms that maintain overall health and decrease the risk of age-related diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.
- Author
-
Villegas-Morcillo, Amelia, Makrodimitris, Stavros, Ham, Roeland C H J van, Gomez, Angel M, Sanchez, Victoria, and Reinders, Marcel J T
- Subjects
AMINO acid sequence ,PROTEIN structure ,PROTEINS ,PREDICTION models ,FUNCTIONAL assessment - Abstract
Motivation Protein function prediction is a difficult bioinformatics problem. Many recent methods use deep neural networks to learn complex sequence representations and predict function from these. Deep supervised models require a lot of labeled training data which are not available for this task. However, a very large amount of protein sequences without functional labels is available. Results We applied an existing deep sequence model that had been pretrained in an unsupervised setting on the supervised task of protein molecular function prediction. We found that this complex feature representation is effective for this task, outperforming hand-crafted features such as one-hot encoding of amino acids, k -mer counts, secondary structure and backbone angles. Also, it partly negates the need for complex prediction models, as a two-layer perceptron was enough to achieve competitive performance in the third Critical Assessment of Functional Annotation benchmark. We also show that combining this sequence representation with protein 3D structure information does not lead to performance improvement, hinting that 3D structure is also potentially learned during the unsupervised pretraining. Availability and implementation Implementations of all used models can be found at https://github.com/stamakro/GCN-for-Structure-and-Function. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
14. SCHNEL: scalable clustering of high dimensional single-cell data.
- Author
-
Abdelaal, Tamim, Raadt, Paul de, Lelieveldt, Boudewijn P F, Reinders, Marcel J T, and Mahfouz, Ahmed
- Subjects
REPRESENTATIONS of graphs ,RNA sequencing ,CELL populations ,DIMENSION reduction (Statistics) ,MACHINE learning ,CYTOMETRY ,HIERARCHICAL clustering (Cluster analysis) - Abstract
Motivation Single cell data measures multiple cellular markers at the single-cell level for thousands to millions of cells. Identification of distinct cell populations is a key step for further biological understanding, usually performed by clustering this data. Dimensionality reduction based clustering tools are either not scalable to large datasets containing millions of cells, or not fully automated requiring an initial manual estimation of the number of clusters. Graph clustering tools provide automated and reliable clustering for single cell data, but suffer heavily from scalability to large datasets. Results We developed SCHNEL, a scalable, reliable and automated clustering tool for high-dimensional single-cell data. SCHNEL transforms large high-dimensional data to a hierarchy of datasets containing subsets of data points following the original data manifold. The novel approach of SCHNEL combines this hierarchical representation of the data with graph clustering, making graph clustering scalable to millions of cells. Using seven different cytometry datasets, SCHNEL outperformed three popular clustering tools for cytometry data, and was able to produce meaningful clustering results for datasets of 3.5 and 17.2 million cells within workable time frames. In addition, we show that SCHNEL is a general clustering tool by applying it to single-cell RNA sequencing data, as well as a popular machine learning benchmark dataset MNIST. Availability and implementation Implementation is available on GitHub (https://github.com/biovault/SCHNELpy). All datasets used in this study are publicly available. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
15. Metric learning on expression data for gene function prediction.
- Author
-
Makrodimitris, Stavros, Reinders, Marcel J T, and Ham, Roeland C H J van
- Subjects
- *
ALGORITHMS , *GENE expression , *ARABIDOPSIS thaliana , *GENE ontology , *PSEUDOMONAS aeruginosa - Abstract
Motivation Co-expression of two genes across different conditions is indicative of their involvement in the same biological process. However, when using RNA-Seq datasets with many experimental conditions from diverse sources, only a subset of the experimental conditions is expected to be relevant for finding genes related to a particular Gene Ontology (GO) term. Therefore, we hypothesize that when the purpose is to find similarly functioning genes, the co-expression of genes should not be determined on all samples but only on those samples informative for the GO term of interest. Results To address this, we developed Metric Learning for Co-expression (MLC), a fast algorithm that assigns a GO-term-specific weight to each expression sample. The goal is to obtain a weighted co-expression measure that is more suitable than the unweighted Pearson correlation for applying Guilt-By-Association-based function predictions. More specifically, if two genes are annotated with a given GO term, MLC tries to maximize their weighted co-expression and, in addition, if one of them is not annotated with that term, the weighted co-expression is minimized. Our experiments on publicly available Arabidopsis thaliana RNA-Seq data demonstrate that MLC outperforms standard Pearson correlation in term-centric performance. Moreover, our method is particularly good at more specific terms, which are the most interesting. Finally, by observing the sample weights for a particular GO term, one can identify which experiments are important for learning that term and potentially identify novel conditions that are relevant, as demonstrated by experiments in both A. thaliana and Pseudomonas Aeruginosa. Availability and implementation MLC is available as a Python package at www.github.com/stamakro/MLC. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
16. CyTOFmerge: integrating mass cytometry data across multiple panels.
- Author
-
Abdelaal, Tamim, Höllt, Thomas, Unen, Vincent van, Lelieveldt, Boudewijn P F, Koning, Frits, Reinders, Marcel J T, and Mahfouz, Ahmed
- Subjects
CYTOMETRY ,NEIGHBORHOODS ,DATA ,HETEROGENEITY ,BIOINFORMATICS - Abstract
Motivation High-dimensional mass cytometry (CyTOF) allows the simultaneous measurement of multiple cellular markers at single-cell level, providing a comprehensive view of cell compositions. However, the power of CyTOF to explore the full heterogeneity of a biological sample at the single-cell level is currently limited by the number of markers measured simultaneously on a single panel. Results To extend the number of markers per cell, we propose an in silico method to integrate CyTOF datasets measured using multiple panels that share a set of markers. Additionally, we present an approach to select the most informative markers from an existing CyTOF dataset to be used as a shared marker set between panels. We demonstrate the feasibility of our methods by evaluating the quality of clustering and neighborhood preservation of the integrated dataset, on two public CyTOF datasets. We illustrate that by computationally extending the number of markers we can further untangle the heterogeneity of mass cytometry data, including rare cell-population detection. Availability and implementation Implementation is available on GitHub (https://github.com/tabdelaal/CyTOFmerge). Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
17. PRECISE: a domain adaptation approach to transfer predictors of drug response from pre-clinical models to tumors.
- Author
-
Mourragui, Soufiane, Loog, Marco, Wiel, Mark A van de, Reinders, Marcel J T, and Wessels, Lodewyk F A
- Subjects
TUMORS ,CELL lines ,INFORMATION commons ,PHYSIOLOGICAL adaptation ,STOCKS (Finance) - Abstract
Motivation Cell lines and patient-derived xenografts (PDXs) have been used extensively to understand the molecular underpinnings of cancer. While core biological processes are typically conserved, these models also show important differences compared to human tumors, hampering the translation of findings from pre-clinical models to the human setting. In particular, employing drug response predictors generated on data derived from pre-clinical models to predict patient response remains a challenging task. As very large drug response datasets have been collected for pre-clinical models, and patient drug response data are often lacking, there is an urgent need for methods that efficiently transfer drug response predictors from pre-clinical models to the human setting. Results We show that cell lines and PDXs share common characteristics and processes with human tumors. We quantify this similarity and show that a regression model cannot simply be trained on cell lines or PDXs and then applied on tumors. We developed PRECISE, a novel methodology based on domain adaptation that captures the common information shared amongst pre-clinical models and human tumors in a consensus representation. Employing this representation, we train predictors of drug response on pre-clinical data and apply these predictors to stratify human tumors. We show that the resulting domain-invariant predictors show a small reduction in predictive performance in the pre-clinical domain but, importantly, reliably recover known associations between independent biomarkers and their companion drugs on human tumors. Availability and implementation PRECISE and the scripts for running our experiments are available on our GitHub page (https://github.com/NKI-CCB/PRECISE). Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
18. Improving protein function prediction using protein sequence and GO-term similarities.
- Author
-
Makrodimitris, Stavros, Ham, Roeland C H J van, and Reinders, Marcel J T
- Subjects
PROTEIN analysis ,PEPTIDE analysis ,GENE ontology ,PHILOSOPHY ,BIOLOGICAL apparatus & supplies - Abstract
Motivation Most automatic functional annotation methods assign Gene Ontology (GO) terms to proteins based on annotations of highly similar proteins. We advocate that proteins that are less similar are still informative. Also, despite their simplicity and structure, GO terms seem to be hard for computers to learn, in particular the Biological Process ontology, which has the most terms (>29 000). We propose to use Label-Space Dimensionality Reduction (LSDR) techniques to exploit the redundancy of GO terms and transform them into a more compact latent representation that is easier to predict. Results We compare proteins using a sequence similarity profile (SSP) to a set of annotated training proteins. We introduce two new LSDR methods, one based on the structure of the GO, and one based on semantic similarity of terms. We show that these LSDR methods, as well as three existing ones, improve the Critical Assessment of Functional Annotation performance of several function prediction algorithms. Cross-validation experiments on Arabidopsis thaliana proteins pinpoint the superiority of our GO-aware LSDR over generic LSDR. Our experiments on A. thaliana proteins show that the SSP representation in combination with a kNN classifier outperforms state-of-the-art and baseline methods in terms of cross-validated F -measure. Availability and implementation Source code for the experiments is available at https://github.com/stamakro/SSP-LSDR. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
19. ImSpectR: R package to quantify immune repertoire diversity in spectratype and repertoire sequencing data.
- Author
-
Cordes, Martijn, Pike-Overzet, Karin, Eggermond, Marja van, Vloemans, Sandra, Baert, Miranda R, Garcia-Perez, Laura, Staal, Frank J T, Reinders, Marcel J T, and Akker, Erik B van den
- Subjects
INSPECTION & review ,MICROARRAY technology ,RNA sequencing ,IMMUNODEFICIENCY ,QUANTITATIVE research ,IMMUNE system ,PROGRESSION-free survival - Abstract
Summary An effective immune system is characterized by a diverse immune repertoire. There is a strong demand for accurate and quantitative methods to assess the diversity of the immune repertoire for various (pre-)clinical applications, including the diagnosis and prognosis of primary immune deficiencies, or to assess the response to therapy. Current strategies for immune diversity assessment generally comprise the visual inspection of the length distribution of rearranged T- and B-cell receptors. Visual inspections, however, are prone to subjective assessments and thus lead to biases. Here, we introduce ImSpectR , a unified approach to quantify immunodiversity using either spectratype, repertoire sequencing or single cell RNA sequencing data. ImSpectR scores various types of deviations from the expected length distribution and integrates these into one measure, allowing for robust quantitative comparisons of immune diversity across individuals or conditions. Availability and implementation R-package is available for download on GitHub at https://github.com/martijn-cordes/ImSpectR. Supplementary information Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
20. BrainScope: interactive visual exploration of the spatial and temporal human brain transcriptome.
- Author
-
Huisman, Sjoerd M. H., van Lew, Baldur, Mahfouz, Ahmed, Pezzotti, Nicola, Höllt, Thomas, Michielsen, Lieke, Vilanova, Anna, Reinders, Marcel J. T., and Lelieveldt, Boudewijn P. F.
- Published
- 2017
- Full Text
- View/download PDF
21. Proteny: discovering and visualizing statistically significant syntenic clusters at the proteome level.
- Author
-
Gehrmann, Thies and Reinders, Marcel J. T.
- Subjects
- *
GENOMES , *PROTEOMICS , *EXONS (Genetics) , *GENE expression , *BIG data , *MANAGEMENT - Abstract
Background: With more and more genomes being sequenced, detecting synteny between genomes becomes more and more important. However, for microorganisms the genomic divergence quickly becomes large, resulting in different codon usage and shuffling of gene order and gene elements such as exons. Results: We present Proteny, a methodology to detect synteny between diverged genomes. It operates on the amino acid sequence level to be insensitive to codon usage adaptations and clusters groups of exons disregarding order to handle diversity in genomic ordering between genomes. Furthermore, Proteny assigns significance levels to the syntenic clusters such that they can be selected on statistical grounds. Finally, Proteny provides novel ways to visualize results at different scales, facilitating the exploration and interpretation of syntenic regions. We test the performance of Proteny on a standard ground truth dataset, and we illustrate the use of Proteny on two closely related genomes (two different strains of Aspergillus niger) and on two distant genomes (two species of Basidiomycota). In comparison to other tools, we find that Proteny finds clusters with more true homologies in fewer clusters that contain more genes, i.e. Proteny is able to identify a more consistent synteny. Further, we show how genome rearrangements, assembly errors, gene duplications and the conservation of specific genes can be easily studied with Proteny. Availability and implementation: Proteny is freely available at the Delft Bioinformatics Lab website http://bioinformatics.tudelft.nl/dbl/software. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
22. Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold.
- Author
-
Nijkamp, Jurgen F., Pop, Mihai, Reinders, Marcel J. T., and de Ridder, Dick
- Subjects
COMPARATIVE genomics ,METAGENOMICS ,GENETIC algorithms ,ESCHERICHIA coli ,BIOINFORMATICS ,COMPARATIVE studies - Abstract
Motivation: Although many tools are available to study variation and its impact in single genomes, there is a lack of algorithms for finding such variation in metagenomes. This hampers the interpretation of metagenomics sequencing datasets, which are increasingly acquired in research on the (human) microbiome, in environmental studies and in the study of processes in the production of foods and beverages. Existing algorithms often depend on the use of reference genomes, which pose a problem when a metagenome of a priori unknown strain composition is studied. In this article, we develop a method to perform reference-free detection and visual exploration of genomic variation, both within a single metagenome and between metagenomes.Results: We present the MaryGold algorithm and its implementation, which efficiently detects bubble structures in contig graphs using graph decomposition. These bubbles represent variable genomic regions in closely related strains in metagenomic samples. The variation found is presented in a condensed Circos-based visualization, which allows for easy exploration and interpretation of the found variation.We validated the algorithm on two simulated datasets containing three respectively seven Escherichia coli genomes and showed that finding allelic variation in these genomes improves assemblies. Additionally, we applied MaryGold to publicly available real metagenomic datasets, enabling us to find within-sample genomic variation in the metagenomes of a kimchi fermentation process, the microbiome of a premature infant and in microbial communities living on acid mine drainage. Moreover, we used MaryGold for between-sample variation detection and exploration by comparing sequencing data sampled at different time points for both of these datasets.Availability: MaryGold has been written in C++ and Python and can be downloaded from http://bioinformatics.tudelft.nl/softwareContact: d.deridder@tudelft.nl [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
23. Pattern recognition in bioinformatics.
- Author
-
de Ridder, Dick, de Ridder, Jeroen, and Reinders, Marcel J. T.
- Subjects
PATTERN perception ,BIOINFORMATICS ,DIMENSION reduction (Statistics) ,DIMENSIONAL reduction algorithms ,MICROARRAY technology ,MASS spectrometry - Abstract
Pattern recognition is concerned with the development of systems that learn to solve a given problem using a set of example instances, each represented by a number of features. These problems include clustering, the grouping of similar instances; classification, the task of assigning a discrete label to a given instance; and dimensionality reduction, combining or selecting features to arrive at a more useful representation. The use of statistical pattern recognition algorithms in bioinformatics is pervasive. Classification and clustering are often applied to high-throughput measurement data arising from microarray, mass spectrometry and next-generation sequencing experiments for selecting markers, predicting phenotype and grouping objects or genes. Less explicitly, classification is at the core of a wide range of tools such as predictors of genes, protein function, functional or genetic interactions, etc., and used extensively in systems biology. A course on pattern recognition (or machine learning) should therefore be at the core of any bioinformatics education program. In this review, we discuss the main elements of a pattern recognition course, based on material developed for courses taught at the BSc, MSc and PhD levels to an audience of bioinformaticians, computer scientists and life scientists. We pay attention to common problems and pitfalls encountered in applications and in interpretation of the results obtained. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
24. De novo detection of copy number variation by co-assembly.
- Author
-
Nijkamp, Jurgen F., van den Broek, Marcel A., Geertman, Jan-Maarten A., Reinders, Marcel J. T., Daran, Jean-Marc G., and de Ridder, Dick
- Subjects
GENOMES ,NUCLEOTIDE sequence ,GENETIC algorithms ,YEAST ,BREWING - Abstract
Motivation: Comparing genomes of individual organisms using next-generation sequencing data is, until now, mostly performed using a reference genome. This is challenging when the reference is distant and introduces bias towards the exact sequence present in the reference. Recent improvements in both sequencing read length and efficiency of assembly algorithms have brought direct comparison of individual genomes by de novo assembly, rather than through a reference genome, within reach.Results: Here, we develop and test an algorithm, named Magnolya, that uses a Poisson mixture model for copy number estimation of contigs assembled from sequencing data. We combine this with co-assembly to allow de novo detection of copy number variation (CNV) between two individual genomes, without mapping reads to a reference genome. In co-assembly, multiple sequencing samples are combined, generating a single contig graph with different traversal counts for the nodes and edges between the samples. In the resulting ‘coloured’ graph, the contigs have integer copy numbers; this negates the need to segment genomic regions based on depth of coverage, as required for mapping-based detection methods. Magnolya is then used to assign integer copy numbers to contigs, after which CNV probabilities are easily inferred. The copy number estimator and CNV detector perform well on simulated data. Application of the algorithms to hybrid yeast genomes showed allotriploid content from different origin in the wine yeast Y12, and extensive CNV in aneuploid brewing yeast genomes. Integer CNV was also accurately detected in a short-term laboratory-evolved yeast strain.Availability: Magnolya is implemented in Python and available at: http://bioinformatics.tudelft.nl/Contact: d.deridder@tudelft.nlSupplementary information: Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
25. Generic and specific transcriptional responses to different weak organic acids in anaerobic chemostat cultures of Saccharomyces cerevisiae.
- Author
-
Abbott, Derek A., Knijnenburg, Theo A., De Poorter, Linda M. I., Reinders, Marcel J. T., Pronx, Jack T., and Van Maris, Antonius J. A.
- Subjects
ORGANIC acids ,SACCHAROMYCES cerevisiae ,CHEMOSTAT ,PROPIONATES ,ACETATES - Abstract
Transcriptional responses to four weak organic acids (benzoate, sorbate, acetate and propionate) were investigated in anaerobic, glucose-limited chemostat cultures of Saccharomyces cerevisiae. To enable quantitative comparison of the responses to the acids, their concentrations were chosen such that they caused a 50% decrease of the biomass yield on glucose. The concentration of each acid required to achieve this yield was negatively correlated with membrane affinity. Microarray analysis revealed that each acid caused hundreds of transcripts to change by over twofold relative to reference cultures without added organic acids. However, only 14 genes were consistently upregulated in response to all acids. The moderately lipophilic compounds benzoate and sorbate and, to a lesser extent, the less lipophilic acids acetate and propionate showed overlapping transcriptional responses. Statistical analysis for overrepresented functional categories and upstream regulatory elements indicated that responses to the strongly lipophilic acids were focused on genes related to the cell wall, while acetate and propionate had a stronger impact on membrane-associated transport processes. The fact that S. cerevisiae exhibits a minimal generic transcriptional response to weak organic acids along with extensive specific responses is relevant for interpreting and controlling weak acid toxicity in food products and in industrial fermentation processes. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
26. CytoscapeRPC: a plugin to create, modify and query Cytoscape networks from scripting languages.
- Author
-
Bot, Jan J. and Reinders, Marcel J. T.
- Subjects
- *
SCRIPTING languages (Computer science) , *BIOINFORMATICS , *PROGRAMMING languages , *XML (Extensible Markup Language) , *COMPUTERS in biology , *COMPUTER networks , *INSTALL programs (Computer programs) - Abstract
Summary: CytoscapeRPC is a plugin for Cytoscape which allows users to create, query and modify Cytoscape networks from any programming language which supports XML-RPC. This enables them to access Cytoscape functionality and visualize their data interactively without leaving the programming environment with which they are familiar.Availability: Install through the Cytoscape plugin manager or visit the web page: http://wiki.nbic.nl/index.php/CytoscapeRPC for the user tutorial and download.Contact: j.j.bot@tudelft.nl; j.j.bot@tudelft.nl [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
27. WISECONDOR: detection of fetal aberrations from shallow sequencing maternal plasma based on a within-sample comparison scheme.
- Author
-
Straver, Roy, Sistermans, Erik A., Holstege, Henne, Visser, Allerdien, Oudejans, Cees B. M., and Reinders, Marcel J. T.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.