481,899 results on '"Bioinformatics"'
Search Results
2. Unveiling the pathophysiology of restless legs syndrome through transcriptome analysis.
- Author
-
Mogavero, Maria, Salemi, Michele, Lanza, Giuseppe, Rinaldi, Antonio, Marchese, Giovanna, Ravo, Maria, Salluzzo, Maria, Antoci, Amedeo, DelRosso, Lourdes, Bruni, Oliviero, Ferini-Strambi, Luigi, and Ferri, Raffaele
- Subjects
Bioinformatics ,Neurology ,Transcriptomics - Abstract
The aim of this study was to analyze signaling pathways associated with differentially expressed messenger RNAs in people with restless legs syndrome (RLS). Seventeen RLS patients and 18 controls were enrolled. Coding RNA expression profiling of 12,857 gene transcripts by next-generation sequencing was performed. Enrichment analysis by pathfindR tool was carried-out, with p-adjusted ≤0.001 and fold-change ≥2.5. Nine main different network groups were significantly dysregulated in RLS: infections, inflammation, immunology, neurodegeneration, cancer, neurotransmission and biological, blood and metabolic mechanisms. Genetic predisposition plays a key role in RLS and evidence indicates its inflammatory nature; the high involvement of mainly neurotropic viruses and the TORCH complex might trigger inflammatory/immune reactions in genetically predisposed subjects and activate a series of biological pathways-especially IL-17, receptor potential channels, nuclear factor kappa-light-chain-enhancer of activated B cells, NOD-like receptor, mitogen-activated protein kinase, p53, mitophagy, and ferroptosis-involved in neurotransmitter mechanisms, synaptic plasticity, axon guidance, neurodegeneration, carcinogenesis, and metabolism.
- Published
- 2024
3. Community recommendations on cryoEM data archiving and validation.
- Author
-
Kleywegt, Gerard, Butcher, Sarah, Lawson, Catherine, Rohou, Alexis, Rosenthal, Peter, Subramaniam, Sriram, Topf, Maya, Abbott, Sanja, Baldwin, Philip, Berrisford, John, Bricogne, Gérard, Choudhary, Preeti, Croll, Tristan, Danev, Radostin, Ganesan, Sai, Grant, Timothy, Gutmanas, Aleksandras, Henderson, Richard, Heymann, J, Huiskonen, Juha, Istrate, Andrei, Kato, Takayuki, Lander, Gabriel, Lok, Shee, Ludtke, Steven, Murshudov, Garib, Pye, Ryan, Pintilie, Grigore, Richardson, Jane, Sachse, Carsten, Salih, Osman, Scheres, Sjors, Schroeder, Gunnar, Sorzano, Carlos, Stagg, Scott, Wang, Zhe, Warshamanage, Rangana, Westbrook, John, Winn, Martyn, Young, Jasmine, Burley, Stephen, Hoch, Jeffrey, Kurisu, Genji, Morris, Kyle, Patwardhan, Ardan, Velankar, Sameer, and Adams, Paul
- Subjects
Electron Microscopy Data Bank ,Protein Data Bank ,bioinformatics ,cryogenic-specimen electron microscopy ,data archiving ,databases ,quality control ,single-particle cryoEM ,structure determination ,validation - Abstract
In January 2020, a workshop was held at EMBL-EBI (Hinxton, UK) to discuss data requirements for the deposition and validation of cryoEM structures, with a focus on single-particle analysis. The meeting was attended by 47 experts in data processing, model building and refinement, validation, and archiving of such structures. This report describes the workshops motivation and history, the topics discussed, and the resulting consensus recommendations. Some challenges for future methods-development efforts in this area are also highlighted, as is the implementation to date of some of the recommendations.
- Published
- 2024
4. Comprehensive assessment of 11 de novo HiFi assemblers on complex eukaryotic genomes and metagenomes.
- Author
-
Yu, Wenjuan, Luo, Haohui, Yang, Jinbao, Zhang, Shengchen, Jiang, Heling, Zhao, Xianjia, Hui, Xingqi, Sun, Da, Li, Liang, Wei, Xiu-Qing, Lonardi, Stefano, and Pan, Weihua
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Medical and Health Sciences ,Bioinformatics ,Genetics - Abstract
Pacific Biosciences (PacBio) HiFi sequencing technology generates long reads (>10 kbp) with very high accuracy (
- Published
- 2024
5. HERC6 regulates STING activity in a sex-biased manner through modulation of LATS2/VGLL3 Hippo signaling.
- Author
-
Uppala, Ranjitha, Sarkar, Mrinal, Young, Kelly, Ma, Feiyang, Vemulapalli, Pritika, Wasikowski, Rachael, Plazyo, Olesya, Swindell, William, Maverakis, Emanual, Gharaee-Kermani, Mehrnaz, Billi, Allison, Tsoi, Lam, Kahlenberg, J, and Gudjonsson, Johann
- Subjects
Bioinformatics ,Cell biology ,Molecular biology ,Omics - Abstract
Interferon (IFN) activity exhibits a gender bias in human skin, skewed toward females. We show that HERC6, an IFN-induced E3 ubiquitin ligase, is induced in human keratinocytes through the epidermal type I IFN; IFN-κ. HERC6 knockdown in human keratinocytes results in enhanced induction of interferon-stimulated genes (ISGs) upon treatment with a double-stranded (ds) DNA STING activator cGAMP but not in response to the RNA-sensing TLR3 agonist. Keratinocytes lacking HERC6 exhibit sustained STING-TBK1 signaling following cGAMP stimulation through modulation of LATS2 and TBK1 activity, unmasking more robust ISG responses in female keratinocytes. This enhanced female-biased immune response with loss of HERC6 depends on VGLL3, a regulator of type I IFN signature. These data identify HERC6 as a previously unrecognized negative regulator of ISG expression specific to dsDNA sensing and establish it as a regulator of female-biased immune responses through modulation of STING signaling.
- Published
- 2024
6. PIEZO1 regulates leader cell formation and cellular coordination during collective keratinocyte migration
- Author
-
Chen, Jinghao, Holt, Jesse R, Evans, Elizabeth L, Lowengrub, John S, and Pathak, Medha M
- Subjects
Biochemistry and Cell Biology ,Engineering ,Biological Sciences ,Biomedical Engineering ,Bioengineering ,Underpinning research ,1.1 Normal biological development and functioning ,Cell Movement ,Keratinocytes ,Ion Channels ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
The collective migration of keratinocytes during wound healing requires both the generation and transmission of mechanical forces for individual cellular locomotion and the coordination of movement across cells. Leader cells along the wound edge transmit mechanical and biochemical cues to ensuing follower cells, ensuring their coordinated direction of migration across multiple cells. Despite the observed importance of mechanical cues in leader cell formation and in controlling coordinated directionality of cell migration, the underlying biophysical mechanisms remain elusive. The mechanically-activated ion channel PIEZO1 was recently identified to play an inhibitory role during the reepithelialization of wounds. Here, through an integrative experimental and mathematical modeling approach, we elucidate PIEZO1's contributions to collective migration. Time-lapse microscopy reveals that PIEZO1 activity inhibits leader cell formation at the wound edge. To probe the relationship between PIEZO1 activity, leader cell formation and inhibition of reepithelialization, we developed an integrative 2D continuum model of wound closure that links observations at the single cell and collective cell migration scales. Through numerical simulations and subsequent experimental validation, we found that coordinated directionality plays a key role during wound closure and is inhibited by upregulated PIEZO1 activity. We propose that PIEZO1-mediated retraction suppresses leader cell formation which inhibits coordinated directionality between cells during collective migration.
- Published
- 2024
7. PyPop: a mature open-source software pipeline for population genomics
- Author
-
Lancaster, Alexander K, Single, Richard M, Mack, Steven J, Sochat, Vanessa, Mariani, Michael P, and Webster, Gordon D
- Subjects
Biological Sciences ,Genetics ,Human Genome ,Generic health relevance ,Metagenomics ,Software ,Genetics ,Population ,Haplotypes ,Genotype ,HLA ,MHC ,population genomics ,software ,bioinformatics ,Immunology ,Medical Microbiology ,Biochemistry and cell biology - Abstract
Python for Population Genomics (PyPop) is a software package that processes genotype and allele data and performs large-scale population genetic analyses on highly polymorphic multi-locus genotype data. In particular, PyPop tests data conformity to Hardy-Weinberg equilibrium expectations, performs Ewens-Watterson tests for selection, estimates haplotype frequencies, measures linkage disequilibrium, and tests significance. Standardized means of performing these tests is key for contemporary studies of evolutionary biology and population genetics, and these tests are central to genetic studies of disease association as well. Here, we present PyPop 1.0.0, a new major release of the package, which implements new features using the more robust infrastructure of GitHub, and is distributed via the industry-standard Python Package Index. New features include implementation of the asymmetric linkage disequilibrium measures and, of particular interest to the immunogenetics research communities, support for modern nomenclature, including colon-delimited allele names, and improvements to meta-analysis features for aggregating outputs for multiple populations. Code available at: https://zenodo.org/records/10080668 and https://github.com/alexlancaster/pypop.
- Published
- 2024
8. Biogeographic distribution of five Antarctic cyanobacteria using large-scale k-mer searching with sourmash branchwater.
- Author
-
Jungblut, Anne, Irber, Luiz, Pierce-Ward, N, Lumian, Jessica, Sumner, Dawn, Brown, Titus, and Grettenberger, Christen
- Subjects
biogeography ,bioinformatics ,cyrosphere ,metagenomics ,polar cyanobacteria - Abstract
Cyanobacteria form diverse communities and are important primary producers in Antarctic freshwater environments, but their geographic distribution patterns in Antarctica and globally are still unresolved. There are however few genomes of cultured cyanobacteria from Antarctica available and therefore metagenome-assembled genomes (MAGs) from Antarctic cyanobacteria microbial mats provide an opportunity to explore distribution of uncultured taxa. These MAGs also allow comparison with metagenomes of cyanobacteria enriched communities from a range of habitats, geographic locations, and climates. However, most MAGs do not contain 16S rRNA gene sequences, making a 16S rRNA gene-based biogeography comparison difficult. An alternative technique is to use large-scale k-mer searching to find genomes of interest in public metagenomes. This paper presents the results of k-mer based searches for 5 Antarctic cyanobacteria MAGs from Lake Fryxell and Lake Vanda, assigned the names Phormidium pseudopriestleyi FRX01, Microcoleus sp. MP8IB2.171, Leptolyngbya sp. BulkMat.35, Pseudanabaenaceae cyanobacterium MP8IB2.15, and Leptolyngbyaceae cyanobacterium MP9P1.79 in 498,942 unassembled metagenomes from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). The Microcoleus sp. MP8IB2.171 MAG was found in a wide variety of environments, the P. pseudopriestleyi MAG was found in environments with challenging conditions, the Leptolyngbyaceae cyanobacterium MP9P1.79 MAG was only found in Antarctica, and the Leptolyngbya sp. BulkMat.35 and Pseudanabaenaceae cyanobacterium MP8IB2.15 MAGs were found in Antarctic and other cold environments. The findings based on metagenome matches and global comparisons suggest that these Antarctic cyanobacteria have distinct distribution patterns ranging from locally restricted to global distribution across the cold biosphere and other climatic zones.
- Published
- 2024
9. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods
- Author
-
Jain, Shantanu, Bakolitsa, Constantina, Brenner, Steven E, Radivojac, Predrag, Moult, John, Repo, Susanna, Hoskins, Roger A, Andreoletti, Gaia, Barsky, Daniel, Chellapan, Ajithavalli, Chu, Hoyin, Dabbiru, Navya, Kollipara, Naveen K, Ly, Melissa, Neumann, Andrew J, Pal, Lipika R, Odell, Eric, Pandey, Gaurav, Peters-Petrulewicz, Robin C, Srinivasan, Rajgopal, Yee, Stephen F, Yeleswarapu, Sri Jyothsna, Zuhl, Maya, Adebali, Ogun, Patra, Ayoti, Beer, Michael A, Hosur, Raghavendra, Peng, Jian, Bernard, Brady M, Berry, Michael, Dong, Shengcheng, Boyle, Alan P, Adhikari, Aashish, Chen, Jingqi, Hu, Zhiqiang, Wang, Robert, Wang, Yaqiong, Miller, Maximilian, Wang, Yanran, Bromberg, Yana, Turina, Paola, Capriotti, Emidio, Han, James J, Ozturk, Kivilcim, Carter, Hannah, Babbi, Giulia, Bovo, Samuele, Di Lena, Pietro, Martelli, Pier Luigi, Savojardo, Castrense, Casadio, Rita, Cline, Melissa S, De Baets, Greet, Bonache, Sandra, Diez, Orland, Gutierrez-Enriquez, Sara, Fernandez, Alejandro, Montalban, Gemma, Ootes, Lars, Ozkan, Selen, Padilla, Natalia, Riera, Casandra, De la Cruz, Xavier, Diekhans, Mark, Huwe, Peter J, Wei, Qiong, Xu, Qifang, Dunbrack, Roland L, Gotea, Valer, Elnitski, Laura, Margolin, Gennady, Fariselli, Piero, Kulakovskiy, Ivan V, Makeev, Vsevolod J, Penzar, Dmitry D, Vorontsov, Ilya E, Favorov, Alexander V, Forman, Julia R, Hasenahuer, Marcia, Fornasari, Maria S, Parisi, Gustavo, Avsec, Ziga, Celik, Muhammed H, Thi, Yen Duong Nguyen, Gagneur, Julien, Shi, Fang-Yuan, Edwards, Matthew D, Guo, Yuchun, Tian, Kevin, Zeng, Haoyang, Gifford, David K, Goke, Jonathan, Zaucha, Jan, Gough, Julian, Ritchie, Graham RS, Frankish, Adam, Mudge, Jonathan M, Harrow, Jennifer, Young, Erin L, and Yu, Yao
- Subjects
Biological Sciences ,Genetics ,Genetic Testing ,Human Genome ,Aetiology ,2.1 Biological and endogenous factors ,Good Health and Well Being ,Humans ,Computational Biology ,Mutation ,Missense ,Phenotype ,Critical Assessment of Genome Interpretation Consortium ,Environmental Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
BackgroundThe Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors.ResultsPerformance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic.ConclusionsResults show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
- Published
- 2024
10. SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine-learning method.
- Author
-
de Bernardi Schneider, Adriano, Su, Michelle, Hinrichs, Angie, Wang, Jade, Amin, Helly, Bell, John, Wadford, Debra, OToole, Áine, Scher, Emily, Perry, Marc, De Maio, Nicola, Hughes, Scott, Corbett-Detig, Russ, and Turakhia, Yatish
- Subjects
Bioinformatics ,COVID-19 ,Phylogenetics ,variants - Abstract
With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine-learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.
- Published
- 2024
11. Torch-eCpG: a fast and scalable eQTM mapper for thousands of molecular phenotypes with graphical processing units
- Author
-
Kober, Kord M, Berger, Liam, Roy, Ritu, and Olshen, Adam
- Subjects
Biological Sciences ,Genetics ,Human Genome ,DNA Methylation ,Phenotype ,Quantitative Trait Loci ,Regulatory Sequences ,Nucleic Acid ,Software ,DNA methylation ,Gene expression ,Transcriptional regulation ,Expression quantitative trait methylation ,eQTM ,eCpG ,GPU ,Tensor ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Information and computing sciences ,Mathematical sciences - Abstract
BackgroundGene expression may be regulated by the DNA methylation of regulatory elements in cis, distal, and trans regions. One method to evaluate the relationship between DNA methylation and gene expression is the mapping of expression quantitative trait methylation (eQTM) loci (also called expression associated CpG loci, eCpG). However, no open-source tools are available to provide eQTM mapping. In addition, eQTM mapping can involve a large number of comparisons which may prevent the analyses due to limitations of computational resources. Here, we describe Torch-eCpG, an open-source tool to perform eQTM mapping that includes an optimized implementation that can use the graphical processing unit (GPU) to reduce runtime.ResultsWe demonstrate the analyses using the tool are reproducible, up to 18 × faster using the GPU, and scale linearly with increasing methylation loci.ConclusionsTorch-eCpG is a fast, reliable, and scalable tool to perform eQTM mapping. Source code for Torch-eCpG is available at https://github.com/kordk/torch-ecpg .
- Published
- 2024
12. deMULTIplex2: robust sample demultiplexing for scRNA-seq
- Author
-
Zhu, Qin, Conrad, Daniel N, and Gartner, Zev J
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Human Genome ,Biotechnology ,Single-Cell Gene Expression Analysis ,Single-Cell Analysis ,Algorithms ,Sequence Analysis ,RNA ,High-Throughput Nucleotide Sequencing ,scRNA-seq ,Sample multiplexing ,Demultiplex ,Generalized linear models ,Expectation-maximization ,Expectation–maximization ,Environmental Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
Sample multiplexing enables pooled analysis during single-cell RNA sequencing workflows, thereby increasing throughput and reducing batch effects. A challenge for all multiplexing techniques is to link sample-specific barcodes with cell-specific barcodes, then demultiplex sample identity post-sequencing. However, existing demultiplexing tools fail under many real-world conditions where barcode cross-contamination is an issue. We therefore developed deMULTIplex2, an algorithm inspired by a mechanistic model of barcode cross-contamination. deMULTIplex2 employs generalized linear models and expectation-maximization to probabilistically determine the sample identity of each cell. Benchmarking reveals superior performance across various experimental conditions, particularly on large or noisy datasets with unbalanced sample compositions.
- Published
- 2024
13. Genomic profiles of four novel cyanobacteria MAGs from Lake Vanda, Antarctica: insights into photosynthesis, cold tolerance, and the circadian clock
- Author
-
Lumian, Jessica, Grettenberger, Christen, Jungblut, Anne D, Mackey, Tyler J, Hawes, Ian, Alatorre-Acevedo, Eduardo, and Sumner, Dawn Y
- Subjects
Microbiology ,Biological Sciences ,Genetics ,Sleep Research ,polar cyanobacteria ,cryosphere ,bioinformatics ,genomics ,photosynthesis ,circadian clock ,Environmental Science and Management ,Soil Sciences ,Medical microbiology - Abstract
Cyanobacteria in polar environments face environmental challenges, including cold temperatures and extreme light seasonality with small diurnal variation, which has implications for polar circadian clocks. However, polar cyanobacteria remain underrepresented in available genomic data, and there are limited opportunities to study their genetic adaptations to these challenges. This paper presents four new Antarctic cyanobacteria metagenome-assembled genomes (MAGs) from microbial mats in Lake Vanda in the McMurdo Dry Valleys in Antarctica. The four MAGs were classified as Leptolyngbya sp. BulkMat.35, Pseudanabaenaceae cyanobacterium MP8IB2.15, Microcoleus sp. MP8IB2.171, and Leptolyngbyaceae cyanobacterium MP9P1.79. The MAGs contain 2.76 Mbp - 6.07 Mbp, and the bin completion ranges from 74.2-92.57%. Furthermore, the four cyanobacteria MAGs have average nucleotide identities (ANIs) under 90% with each other and under 77% with six existing polar cyanobacteria MAGs and genomes. This suggests that they are novel cyanobacteria and demonstrates that polar cyanobacteria genomes are underrepresented in reference databases and there is continued need for genome sequencing of polar cyanobacteria. Analyses of the four novel and six existing polar cyanobacteria MAGs and genomes demonstrate they have genes coding for various cold tolerance mechanisms and most standard circadian rhythm genes with the Leptolyngbya sp. BulkMat.35 and Leptolyngbyaceae cyanobacterium MP9P1.79 contained kaiB3, a divergent homolog of kaiB.
- Published
- 2024
14. Evaluating E. coli genome‐scale metabolic model accuracy with high‐throughput mutant fitness data
- Author
-
Bernstein, David B, Akkas, Batu, Price, Morgan N, and Arkin, Adam P
- Subjects
Biological Sciences ,Industrial Biotechnology ,Biotechnology ,Genetics ,Escherichia coli ,Genome ,Chromosome Mapping ,Carbon ,Models ,Biological ,flux balance analysis ,genome-scale metabolic model ,RB-TnSeq ,Biochemistry and Cell Biology ,Other Biological Sciences ,Bioinformatics ,Biochemistry and cell biology - Abstract
The Escherichia coli genome-scale metabolic model (GEM) is an exemplar systems biology model for the simulation of cellular metabolism. Experimental validation of model predictions is essential to pinpoint uncertainty and ensure continued development of accurate models. Here, we quantified the accuracy of four subsequent E. coli GEMs using published mutant fitness data across thousands of genes and 25 different carbon sources. This evaluation demonstrated the utility of the area under a precision-recall curve relative to alternative accuracy metrics. An analysis of errors in the latest (iML1515) model identified several vitamins/cofactors that are likely available to mutants despite being absent from the experimental growth medium and highlighted isoenzyme gene-protein-reaction mapping as a key source of inaccurate predictions. A machine learning approach further identified metabolic fluxes through hydrogen ion exchange and specific central metabolism branch points as important determinants of model accuracy. This work outlines improved practices for the assessment of GEM accuracy with high-throughput mutant fitness data and highlights promising areas for future model refinement in E. coli and beyond.
- Published
- 2023
15. CASP15 cryo‐EM protein and RNA targets: Refinement and analysis using experimental maps
- Author
-
Mulvaney, Thomas, Kretsch, Rachael C, Elliott, Luc, Beton, Joseph G, Kryshtafovych, Andriy, Rigden, Daniel J, Das, Rhiju, and Topf, Maya
- Subjects
Biochemistry and Cell Biology ,Biological Sciences ,Bioengineering ,Cryoelectron Microscopy ,Models ,Molecular ,Proteins ,Protein Conformation ,3D structure prediction ,AlphaFold ,CASP ,CASP15 ,cryoEM ,protein structure ,refinement ,RNA ,RNA structure ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
CASP assessments primarily rely on comparing predicted coordinates with experimental reference structures. However, experimental structures by their nature are only models themselves-their construction involves a certain degree of subjectivity in interpreting density maps and translating them to atomic coordinates. Here, we directly utilized density maps to evaluate the predictions by employing a method for ranking the quality of protein chain predictions based on their fit into the experimental density. The fit-based ranking was found to correlate well with the CASP assessment scores. Overall, the evaluation against the density map indicated that the models are of high accuracy, and occasionally even better than the reference structure in some regions of the model. Local assessment of predicted side chains in a 1.52 Å resolution map showed that side-chains are sometimes poorly positioned. Additionally, the top 118 predictions associated with 9 protein target reference structures were selected for automated refinement, in addition to the top 40 predictions for 11 RNA targets. For both proteins and RNA, the refinement of CASP15 predictions resulted in structures that are close to the reference target structure. This refinement was successful despite large conformational changes often being required, showing that predictions from CASP-assessed methods could serve as a good starting point for building atomic models in cryo-EM maps for both proteins and RNA. Loop modeling continued to pose a challenge for predictors, and together with the lack of consensus amongst models in these regions suggests that modeling, in combination with model-fit to the density, holds the potential for identifying more flexible regions within the structure.
- Published
- 2023
16. Critical assessment of methods of protein structure prediction (CASP)—Round XV
- Author
-
Kryshtafovych, Andriy, Schwede, Torsten, Topf, Maya, Fidelis, Krzysztof, and Moult, John
- Subjects
Biochemistry and Cell Biology ,Bioinformatics and Computational Biology ,Biological Sciences ,Generic health relevance ,Protein Conformation ,Models ,Molecular ,Proteins ,Amino Acid Sequence ,Computational Biology ,CASP ,community wide experiment ,protein structure prediction ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
Computing protein structure from amino acid sequence information has been a long-standing grand challenge. Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.
- Published
- 2023
17. Breaking the conformational ensemble barrier: Ensemble structure modeling challenges in CASP15
- Author
-
Kryshtafovych, Andriy, Montelione, Gaetano T, Rigden, Daniel J, Mesdaghi, Shahram, Karaca, Ezgi, and Moult, John
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Bioengineering ,Generic health relevance ,Protein Conformation ,Proteins ,Mutation ,RNA ,AlphaFold ,CASP ,conformational ensemble ,protein structure ,RNA structure ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
For the first time, the 2022 CASP (Critical Assessment of Structure Prediction) community experiment included a section on computing multiple conformations for protein and RNA structures. There was full or partial success in reproducing the ensembles for four of the nine targets, an encouraging result. For protein structures, enhanced sampling with variations of the AlphaFold2 deep learning method was by far the most effective approach. One substantial conformational change caused by a single mutation across a complex interface was accurately reproduced. In two other assembly modeling cases, methods succeeded in sampling conformations near to the experimental ones even though environmental factors were not included in the calculations. An experimentally derived flexibility ensemble allowed a single accurate RNA structure model to be identified. Difficulties included how to handle sparse or low-resolution experimental data and the current lack of effective methods for modeling RNA/protein complexes. However, these and other obstacles appear addressable.
- Published
- 2023
18. The impact of AI‐based modeling on the accuracy of protein assembly prediction: Insights from CASP15
- Author
-
Ozden, Burcu, Kryshtafovych, Andriy, and Karaca, Ezgi
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Vaccine Related ,Emerging Infectious Diseases ,Furylfuramide ,Algorithms ,Models ,Molecular ,Proteins ,Artificial Intelligence ,Protein Conformation ,Computational Biology ,AF2-Multimer ,CASP ,deep-learning based modeling ,domain-domain interactions ,protein assembly ,protein-protein interaction ,quaternary structure prediction ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
In CASP15, 87 predictors submitted around 11 000 models on 41 assembly targets. The community demonstrated exceptional performance in overall fold and interface contact predictions, achieving an impressive success rate of 90% (compared to 31% in CASP14). This remarkable accomplishment is largely due to the incorporation of DeepMind's AF2-Multimer approach into custom-built prediction pipelines. To evaluate the added value of participating methods, we compared the community models to the baseline AF2-Multimer predictor. In over 1/3 of cases, the community models were superior to the baseline predictor. The main reasons for this improved performance were the use of custom-built multiple sequence alignments, optimized AF2-Multimer sampling, and the manual assembly of AF2-Multimer-built subcomplexes. The best three groups, in order, are Zheng, Venclovas, and Wallner. Zheng and Venclovas reached a 73.2% success rate over all (41) cases, while Wallner attained 69.4% success rate over 36 cases. Nonetheless, challenges remain in predicting structures with weak evolutionary signals, such as nanobody-antigen, antibody-antigen, and viral complexes. Expectedly, modeling large complexes also remains challenging due to their high memory compute demands. In addition to the assembly category, we assessed the accuracy of modeling interdomain interfaces in the tertiary structure prediction targets. Models on seven targets featuring 17 unique interfaces were analyzed. Best predictors achieved a 76.5% success rate, with the UM-TBM group being the leader. In the interdomain category, we observed that the predictors faced challenges, as in the case of the assembly category, when the evolutionary signal for a given domain pair was weak or the structure was large. Overall, CASP15 witnessed unprecedented improvement in interface modeling, reflecting the AI revolution seen in CASP14.
- Published
- 2023
19. Protein target highlights in CASP15: Analysis of models by structure providers
- Author
-
Alexander, Leila T, Durairaj, Janani, Kryshtafovych, Andriy, Abriata, Luciano A, Bayo, Yusupha, Bhabha, Gira, Breyton, Cécile, Caulton, Simon G, Chen, James, Degroux, Séraphine, Ekiert, Damian C, Erlandsen, Benedikte S, Freddolino, Peter L, Gilzer, Dominic, Greening, Chris, Grimes, Jonathan M, Grinter, Rhys, Gurusaran, Manickam, Hartmann, Marcus D, Hitchman, Charlie J, Keown, Jeremy R, Kropp, Ashleigh, Kursula, Petri, Lovering, Andrew L, Lemaitre, Bruno, Lia, Andrea, Liu, Shiheng, Logotheti, Maria, Lu, Shuze, Markússon, Sigurbjörn, Miller, Mitchell D, Minasov, George, Niemann, Hartmut H, Opazo, Felipe, Phillips, George N, Davies, Owen R, Rommelaere, Samuel, Rosas‐Lemus, Monica, Roversi, Pietro, Satchell, Karla, Smith, Nathan, Wilson, Mark A, Wu, Kuan‐Lin, Xia, Xian, Xiao, Han, Zhang, Wenhua, Zhou, Z Hong, Fidelis, Krzysztof, Topf, Maya, Moult, John, and Schwede, Torsten
- Subjects
Biochemistry and Cell Biology ,Bioinformatics and Computational Biology ,Biological Sciences ,Protein Conformation ,Models ,Molecular ,Computational Biology ,Proteins ,CASP ,cryo-EM ,protein structure prediction ,X-ray crystallography ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
We present an in-depth analysis of selected CASP15 targets, focusing on their biological and functional significance. The authors of the structures identify and discuss key protein features and evaluate how effectively these aspects were captured in the submitted predictions. While the overall ability to predict three-dimensional protein structures continues to impress, reproducing uncommon features not previously observed in experimental structures is still a challenge. Furthermore, instances with conformational flexibility and large multimeric complexes highlight the need for novel scoring strategies to better emphasize biologically relevant structural regions. Looking ahead, closer integration of computational and experimental techniques will play a key role in determining the next challenges to be unraveled in the field of structural molecular biology.
- Published
- 2023
20. Tertiary structure assessment at CASP15
- Author
-
Simpkin, Adam J, Mesdaghi, Shahram, Rodríguez, Filomeno Sánchez, Elliott, Luc, Murphy, David L, Kryshtafovych, Andriy, Keegan, Ronan M, and Rigden, Daniel J
- Subjects
Biochemistry and Cell Biology ,Biological Sciences ,Genetics ,Generic health relevance ,Furylfuramide ,Computational Biology ,Models ,Molecular ,Proteins ,Sequence Alignment ,CASP15 ,machine learning ,molecular replacement ,protein modelling ,protein structure prediction ,structural bioinformatics ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
- Published
- 2023
21. RNA target highlights in CASP15: Evaluation of predicted models by structure providers
- Author
-
Kretsch, Rachael C, Andersen, Ebbe S, Bujnicki, Janusz M, Chiu, Wah, Das, Rhiju, Luo, Bingnan, Masquida, Benoît, McRae, Ewan KS, Schroeder, Griffin M, Su, Zhaoming, Wedekind, Joseph E, Xu, Lily, Zhang, Kaiming, Zheludev, Ivan N, Moult, John, and Kryshtafovych, Andriy
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Generic health relevance ,Protein Conformation ,Proteins ,Models ,Molecular ,Computational Biology ,X-Ray Diffraction ,CASP ,community-wide experiment ,cryo-EM ,RNA folding ,RNA structure prediction ,x-ray crystallography ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.
- Published
- 2023
22. New prediction categories in CASP15
- Author
-
Kryshtafovych, Andriy, Antczak, Maciej, Szachniuk, Marta, Zok, Tomasz, Kretsch, Rachael C, Rangan, Ramya, Pham, Phillip, Das, Rhiju, Robin, Xavier, Studer, Gabriel, Durairaj, Janani, Eberhardt, Jerome, Sweeney, Aaron, Topf, Maya, Schwede, Torsten, Fidelis, Krzysztof, and Moult, John
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Protein Conformation ,Proteins ,Models ,Molecular ,Computational Biology ,Ligands ,3D structure prediction ,CASP15 ,protein structure ,protein-ligand complexes ,RNA structure ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.
- Published
- 2023
23. To split or not to split: CASP15 targets and their processing into tertiary structure evaluation units
- Author
-
Kryshtafovych, Andriy and Rigden, Daniel J
- Subjects
Biochemistry and Cell Biology ,Biological Sciences ,Protein Folding ,Models ,Molecular ,Computational Biology ,Databases ,Protein ,Proteins ,CASP15 ,evaluation units ,protein domains ,protein structure ,protein structure prediction ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Mathematical sciences - Abstract
Processing of CASP15 targets into evaluation units (EUs) and assigning them to evolutionary-based prediction classes is presented in this study. The targets were first split into structural domains based on compactness and similarity to other proteins. Models were then evaluated against these domains and their combinations. The domains were joined into larger EUs if predictors' performance on the combined units was similar to that on individual domains. Alternatively, if most predictors performed better on the individual domains, then they were retained as EUs. As a result, 112 evaluation units were created from 77 tertiary structure prediction targets. The EUs were assigned to four prediction classes roughly corresponding to target difficulty categories in previous CASPs: TBM (template-based modeling, easy or hard), FM (free modeling), and the TBM/FM overlap category. More than a third of CASP15 EUs were attributed to the historically most challenging FM class, where homology or structural analogy to proteins of known fold cannot be detected.
- Published
- 2023
24. A Narrative Review of Personalized Musculoskeletal Modeling Using the Physiome and Musculoskeletal Atlas Projects.
- Author
-
Fernandez, Justin, Shim, Vickie, Schneider, Marco, Choisne, Julie, Handsfield, Geoff, Yeung, Ted, Zhang, Ju, Hunter, Peter, and Besier, Thor
- Subjects
BIOLOGICAL models ,COMPUTER simulation ,REFERENCE books ,KNEE joint ,ACHILLES tendon ,SKELETAL muscle ,COMPUTER-aided design ,BIOINFORMATICS ,CALF muscles ,STATISTICAL models ,CARPOMETACARPAL joints - Abstract
In this narrative review, we explore developments in the field of computational musculoskeletal model personalization using the Physiome and Musculoskeletal Atlas Projects. Model geometry personalization; statistical shape modeling; and its impact on segmentation, classification, and model creation are explored. Examples include the trapeziometacarpal and tibiofemoral joints, Achilles tendon, gastrocnemius muscle, and pediatric lower limb bones. Finally, a more general approach to model personalization is discussed based on the idea of multiscale personalization called scaffolds. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Host translation machinery is not a barrier to phages that interact with both CPR and non-CPR bacteria.
- Author
-
Liu, Jett, Jaffe, Alexander, Chen, LinXing, Bor, Batbileg, and Banfield, Jill
- Subjects
CPR bacteria ,CRISPR-Cas systems ,bacteriophage evolution ,bacteriophage genetics ,bioinformatics - Abstract
Here, we profiled putative phages of Saccharibacteria, which are of particular importance as Saccharibacteria influence some human oral diseases. We additionally profiled putative phages of Gracilibacteria and Absconditabacteria, two Candidate Phyla Radiation (CPR) lineages of interest given their use of an alternative genetic code. Among the phages identified in this study, some are targeted by spacers from both CPR and non-CPR bacteria and others by both bacteria that use the standard genetic code as well as bacteria that use an alternative genetic code. These findings represent new insights into possible phage replication strategies and have relevance for phage therapies that seek to manipulate microbiomes containing CPR bacteria.
- Published
- 2023
26. Protocol for the prediction, interpretation, and mutation evaluation of post-translational modification using MIND-S.
- Author
-
Yan, Yu, Wang, Dean, Xin, Ruiqi, Soriano, Raine, Ng, Dominic, Wang, Wei, and Ping, Peipei
- Subjects
Bioinformatics ,Computer Sciences ,Protein Biochemistry ,Sequence Analysis - Abstract
Post-translational modifications (PTMs) serve as key regulatory mechanisms in various cellular processes; altered PTMs can potentially lead to human diseases. We present a protocol for using MIND-S (multi-label interpretable deep-learning approach for PTM prediction-structure version), to study PTMs. This protocol consists of step-by-step guide and includes three key applications of MIND-S: PTM predictions based on protein sequences, important amino acids identification, and elucidation of altered PTM landscape resulting from molecular mutations. For complete details on the use and execution of this protocol, please refer to Yan et al (2023).1.
- Published
- 2023
27. Amnion responses to intrauterine inflammation and effects of inhibition of TNF signaling in preterm Rhesus macaque.
- Author
-
Presicce, Pietro, Cappelletti, Monica, Morselli, Marco, Ma, Feiyang, Senthamaraikannan, Paranthaman, Protti, Giulia, Nadel, Brian, Aryan, Laila, Eghbali, Mansoureh, Salwinski, Lukasz, Pithia, Neema, De Franco, Emily, Pellegrini, Matteo, Jobe, Alan, Chougnet, Claire, Miller, Lisa, and Kallapur, Suhas
- Subjects
Bioinformatics ,Immunology ,Omics - Abstract
Intrauterine infection/inflammation (IUI) is a frequent complication of pregnancy leading to preterm labor and fetal inflammation. How inflammation is modulated at the maternal-fetal interface is unresolved. We compared transcriptomics of amnion (a fetal tissue in contact with amniotic fluid) in a preterm Rhesus macaque model of IUI induced by lipopolysaccharide with human cohorts of chorioamnionitis. Bulk RNA sequencing (RNA-seq) amnion transcriptomic profiles were remarkably similar in both Rhesus and human subjects and revealed that induction of key labor-mediating genes such as IL1 and IL6 was dependent on nuclear factor κB (NF-κB) signaling and reversed by the anti-tumor necrosis factor (TNF) antibody Adalimumab. Inhibition of collagen biosynthesis by IUI was partially restored by Adalimumab. Interestingly, single-cell transcriptomics, flow cytometry, and immunohistology demonstrated that a subset of amnion mesenchymal cells (AMCs) increase CD14 and other myeloid cell markers during IUI both in the human and Rhesus macaque. Our data suggest that CD14+ AMCs represent activated AMCs at the maternal-fetal interface.
- Published
- 2023
28. Establishment of a consensus protocol to explore the brain pathobiome in patients with mild cognitive impairment and Alzheimers disease: Research outline and call for collaboration.
- Author
-
Lathe, Richard, Schultek, Nikki, Balin, Brian, Ehrlich, Garth, Auber, Lavinia, Perry, George, Breitschwerdt, Edward, Corry, David, Doty, Richard, Nara, Peter, Itzhaki, Ruth, Eimer, William, Tanzi, Rudolph, and Rissman, Robert
- Subjects
Alzheimers disease ,antimicrobial ,antiviral ,bioinformatics ,blood ,cerebrospinal fluid ,collaboration ,dementia ,diagnosis ,methodology ,microbiome ,mild cognitive impairment ,olfactory neuroepithelium ,pathobiome ,polymerase chain reaction ,protocol ,sequencing ,Humans ,Alzheimer Disease ,Consensus ,Cognitive Dysfunction ,Brain - Abstract
Microbial infections of the brain can lead to dementia, and for many decades microbial infections have been implicated in Alzheimers disease (AD) pathology. However, a causal role for infection in AD remains contentious, and the lack of standardized detection methodologies has led to inconsistent detection/identification of microbes in AD brains. There is a need for a consensus methodology; the Alzheimers Pathobiome Initiative aims to perform comparative molecular analyses of microbes in post mortem brains versus cerebrospinal fluid, blood, olfactory neuroepithelium, oral/nasopharyngeal tissue, bronchoalveolar, urinary, and gut/stool samples. Diverse extraction methodologies, polymerase chain reaction and sequencing techniques, and bioinformatic tools will be evaluated, in addition to direct microbial culture and metabolomic techniques. The goal is to provide a roadmap for detecting infectious agents in patients with mild cognitive impairment or AD. Positive findings would then prompt tailoring of antimicrobial treatments that might attenuate or remit mounting clinical deficits in a subset of patients.
- Published
- 2023
29. HIV-1 subtypes maintain distinctive physicochemical signatures in Nef domains associated with immunoregulation.
- Author
-
Lamers, Susanna, Fogel, Gary, Liu, Enoch, Nolan, David, Rose, Rebecca, and Mcgrath, Michael
- Subjects
Bioinformatics ,HIV ,HIV subtypes ,Machine-learning ,Nef protein ,Sequence analysis ,Humans ,HIV-1 ,Amino Acid Sequence ,nef Gene Products ,Human Immunodeficiency Virus ,HIV Infections ,Amino Acids ,Disease Progression - Abstract
BACKGROUND: HIV subtype is associated with varied rates of disease progression. The HIV accessory protein, Nef, continues to be present during antiretroviral therapy (ART) where it has numerous immunoregulatory effects. In this study, we analyzed Nef sequences from HIV subtypes A1, B, C, and D using a machine learning approach that integrates functional amino acid information to identify if unique physicochemical features are associated with Nef functional/structural domains in a subtype-specific manner. METHODS: 2253 sequences representing subtypes A1, B, C, and D were aligned and domains with known functional properties were scored based on amino acid physicochemical properties. Following feature generation, we used statistical pruning and evolved neural networks (ENNs) to determine if we could successfully classify subtypes. Next, we used ENNs to identify the top five key Nef physicochemical features applied to specific immunoregulatory domains that differentiated subtypes. A signature pattern analysis was performed to the assess amino acid diversity in sub-domains that differentiated each subtype. RESULTS: In validation studies, ENNs successfully differentiated each subtype at A1 (87.2%), subtype B (89.5%), subtype C (91.7%), and subtype D (85.1%). Our feature-based domain scoring, followed by t-tests, and a similar ENN identified subtype-specific domain-associated features. Subtype A1 was associated with alterations in Nef CD4 binding domain; subtype B was associated with alterations with the AP-2 Binding domain; subtype C was associated with alterations in a structural Alpha Helix domain; and, subtype D was associated with alterations in a Beta-Sheet domain. CONCLUSIONS: Recent studies have focused on HIV Nef as a driver of immunoregulatory disease in those HIV infected and on ART. Nef acts through a complex mixture of interactions that are directly linked to the key features of the subtype-specific domains we identified with the ENN. The study supports the hypothesis that varied Nef subtypes contribute to subtype-specific disease progression.
- Published
- 2023
30. Precise surface functionalization of PLGA particles for human T cell modulation
- Author
-
Hadley, Pierce, Chen, Yuanzhou, Cline, Lariana, Han, Zhiyuan, Tang, Qizhi, Huang, Xiao, and Desai, Tejal
- Subjects
Engineering ,Biomedical Engineering ,Biotechnology ,1.3 Chemical and physical sciences ,Underpinning research ,Generic health relevance ,Humans ,T-Lymphocytes ,Biocompatible Materials ,DNA ,Polymers ,Chemical Sciences ,Biological Sciences ,Medical and Health Sciences ,Bioinformatics - Abstract
The biofunctionalization of synthetic materials has extensive utility for biomedical applications, but approaches to bioconjugation typically show insufficient efficiency and controllability. We recently developed an approach by building synthetic DNA scaffolds on biomaterial surfaces that enables the precise control of cargo density and ratio, thus improving the assembly and organization of functional cargos. We used this approach to show that the modulation and phenotypic adaptation of immune cells can be regulated using our precisely functionalized biomaterials. Here, we describe the three key procedures, including the fabrication of polymeric particles engrafted with short DNA scaffolds, the attachment of functional cargos with complementary DNA strands, and the surface assembly control and quantification. We also explain the critical checkpoints needed to ensure the overall quality and expected characteristics of the biological product. We provide additional experimental design considerations for modifying the approach by varying the material composition, size or cargo types. As an example, we cover the use of the protocol for human primary T cell activation and for the identification of parameters that affect ex vivo T cell manufacturing. The protocol requires users with diverse expertise ranging from synthetic materials to bioconjugation chemistry to immunology. The fabrication procedures and validation assays to design high-fidelity DNA-scaffolded biomaterials typically require 8 d.
- Published
- 2023
31. dsRID: in silico identification of dsRNA regions using long-read RNA-seq data
- Author
-
Yamamoto, Ryo, Liu, Zhiheng, Choudhury, Mudra, and Xiao, Xinshu
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Alzheimer's Disease including Alzheimer's Disease Related Dementias (AD/ADRD) ,Human Genome ,Alzheimer's Disease ,Aging ,Brain Disorders ,Dementia ,Neurosciences ,Neurodegenerative ,Acquired Cognitive Impairment ,1.1 Normal biological development and functioning ,Underpinning research ,Humans ,RNA ,Double-Stranded ,RNA-Seq ,Sequence Analysis ,RNA ,Base Sequence ,Genome ,Software ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Information and computing sciences ,Mathematical sciences - Abstract
MotivationDouble-stranded RNAs (dsRNAs) are potent triggers of innate immune responses upon recognition by cytosolic dsRNA sensor proteins. Identification of endogenous dsRNAs helps to better understand the dsRNAome and its relevance to innate immunity related to human diseases.ResultsHere, we report dsRID (double-stranded RNA identifier), a machine-learning-based method to predict dsRNA regions in silico, leveraging the power of long-read RNA-sequencing (RNA-seq) and molecular traits of dsRNAs. Using models trained with PacBio long-read RNA-seq data derived from Alzheimer's disease (AD) brain, we show that our approach is highly accurate in predicting dsRNA regions in multiple datasets. Applied to an AD cohort sequenced by the ENCODE consortium, we characterize the global dsRNA profile with potentially distinct expression patterns between AD and controls. Together, we show that dsRID provides an effective approach to capture global dsRNA profiles using long-read RNA-seq data.Availability and implementationSoftware implementation of dsRID, and genomic coordinates of regions predicted by dsRID in all samples are available at the GitHub repository: https://github.com/gxiaolab/dsRID.
- Published
- 2023
32. Re-Evaluating Human Cytomegalovirus Vaccine Design: Prediction of T Cell Epitopes.
- Author
-
Barry, Peter, Iyer, Smita, and Gibson, Laura
- Subjects
bioinformatics ,cytomegalovirus ,unconventional T cell antigen candidates ,vaccine - Abstract
HCMV vaccine development has traditionally focused on viral antigens identified as key targets of neutralizing antibody (NAb) and/or T cell responses in healthy adults with chronic HCMV infection, such as glycoprotein B (gB), the glycoprotein H-anchored pentamer complex (PC), and the unique long 83 (UL83)-encoded phosphoprotein 65 (pp65). However, the protracted absence of a licensed HCMV vaccine that reduces the risk of infection in pregnancy regardless of serostatus warrants a systematic reassessment of assumptions informing vaccine design. To illustrate this imperative, we considered the hypothesis that HCMV proteins infrequently detected as targets of T cell responses may contain important vaccine antigens. Using an extant dataset from a T cell profiling study, we tested whether HCMV proteins recognized by only a small minority of participants encompass any T cell epitopes. Our analyses demonstrate a prominent skewing of T cell responses away from most viral proteins-although they contain robust predicted CD8 T cell epitopes-in favor of a more restricted set of proteins. Our findings raise the possibility that HCMV may benefit from evading the T cell recognition of certain key proteins and that, contrary to current vaccine design approaches, including them as vaccine antigens could effectively take advantage of this vulnerability.
- Published
- 2023
33. Integrated workflow for discovery of microprotein-coding small open reading frames.
- Author
-
Cao, Kevin, Hajy Heydary, Yasamin, Tong, Gregory, and Martinez, Thomas
- Subjects
Bioinformatics ,Cell Biology ,Cell Culture ,Gene Expression ,Genomics ,Molecular Biology ,RNAseq ,Sequence Analysis ,Sequencing - Abstract
Small open reading frame (smORF)-encoded microproteins, proteins containing less than 100-150 amino acids, are an emerging class of functional biomolecules. Here, we present a protocol for identifying translated smORFs in mammalian systems genome wide. We describe steps for generation of ribosome profiling (Ribo-seq) data, in silico translation of a transcriptome assembly to create an ORF database, and computational analysis of Ribo-seq to score individual smORFs for translation. Identification of translated smORFs is the first step to studying the functions of microproteins. For complete details on the use and execution of this protocol, please refer to Martinez et al.1.
- Published
- 2023
34. Decoding the transcriptomic expression and genomic methylation patterns in the tendon proper and its peritenon region in the aging horse.
- Author
-
Pechanec, Monica and Mienaltowski, Michael
- Subjects
Bioinformatics ,Equine ,Peritenon ,RNASeq ,RRBS ,Tendon - Abstract
OBJECTIVES: Equine tendinopathies are challenging because of the poor healing capacity of tendons commonly resulting in high re-injury rates. Within the tendon, different regions - tendon proper (TP) and peritenon (PERI) - contribute to the tendon matrix in differing capacities during injury and aging. Aged tendons have decreased repair potential; the underlying transcriptional and epigenetic changes that occur in the TP and PERI regions are not well understood. The objective of this study was to assess TP and PERI regional differences in adolescent, midlife, and geriatric horses using RNA sequencing and DNA methylation techniques. RESULTS: Differences existed between TP and PERI regions of equine superficial digital flexor tendons by age as evidenced by RNASeq and DNA methylation. Cluster analysis indicated that regional distinctions existed regardless of age. Genes such as DCN, COMP, FN1, and LOX maintained elevated TP expression while genes such as GSN and AHNAK were abundant in PERI. Increased gene activity was present in adolescent and geriatric populations but decreased during midlife. Regional differences in DNA methylation were also noted. Notably, when evaluating all ages of TP against PERI, five genes (HAND2, CHD9, RASL11B, ADGRD1, and COL14A1) had regions of differential methylation as well as differential gene expression.
- Published
- 2023
35. Reanalysis of primate brain circadian transcriptomics reveals connectivity-related oscillations
- Author
-
Lee, Justine, Chen, Siwei, Monfared, Roudabeh Vakil, Derdeyn, Pieter, Leong, Kenneth, Chang, Tiffany, Beier, Kevin, Baldi, Pierre, and Alachkar, Amal
- Subjects
Biological Psychology ,Psychology ,Brain Disorders ,Neurosciences ,Genetics ,Sleep Research ,Underpinning research ,1.1 Normal biological development and functioning ,Bioinformatics ,Expression study ,Neuroscience - Abstract
Research shows that brain circuits controlling vital physiological processes are closely linked with endogenous time-keeping systems. In this study, we aimed to examine oscillatory gene expression patterns of well-characterized neuronal circuits by reanalyzing publicly available transcriptomic data from a spatiotemporal gene expression atlas of a non-human primate. Unexpectedly, brain structures known for regulating circadian processes (e.g., hypothalamic nuclei) did not exhibit robust cycling expression. In contrast, basal ganglia nuclei, not typically associated with circadian physiology, displayed the most dynamic cycling behavior of its genes marked by sharp temporally defined expression peaks. Intriguingly, the mammillary bodies, considered hypothalamic nuclei, exhibited gene expression patterns resembling the basal ganglia, prompting reevaluation of their classification. Our results emphasize the potential for high throughput circadian gene expression analysis to deepen our understanding of the functional synchronization across brain structures that influence physiological processes and resulting complex behaviors.
- Published
- 2023
36. cloneRate: fast estimation of single-cell clonal dynamics using coalescent theory
- Author
-
Johnson, Brian, Shuai, Yubo, Schweinsberg, Jason, and Curtius, Kit
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Cancer ,Good Health and Well Being ,Humans ,Software ,Neoplasms ,Sequence Analysis ,DNA ,Phylogeny ,Clone Cells ,Mutation ,Clonal Evolution ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Information and computing sciences ,Mathematical sciences - Abstract
MotivationWhile evolutionary approaches to medicine show promise, measuring evolution itself is difficult due to experimental constraints and the dynamic nature of body systems. In cancer evolution, continuous observation of clonal architecture is impossible, and longitudinal samples from multiple timepoints are rare. Increasingly available DNA sequencing datasets at single-cell resolution enable the reconstruction of past evolution using mutational history, allowing for a better understanding of dynamics prior to detectable disease. There is an unmet need for an accurate, fast, and easy-to-use method to quantify clone growth dynamics from these datasets.ResultsWe derived methods based on coalescent theory for estimating the net growth rate of clones using either reconstructed phylogenies or the number of shared mutations. We applied and validated our analytical methods for estimating the net growth rate of clones, eliminating the need for complex simulations used in previous methods. When applied to hematopoietic data, we show that our estimates may have broad applications to improve mechanistic understanding and prognostic ability. Compared to clones with a single or unknown driver mutation, clones with multiple drivers have significantly increased growth rates (median 0.94 versus 0.25 per year; P = 1.6×10-6). Further, stratifying patients with a myeloproliferative neoplasm (MPN) by the growth rate of their fittest clone shows that higher growth rates are associated with shorter time to MPN diagnosis (median 13.9 versus 26.4 months; P = 0.0026).Availability and implementationWe developed a publicly available R package, cloneRate, to implement our methods (Package website: https://bdj34.github.io/cloneRate/). Source code: https://github.com/bdj34/cloneRate/.
- Published
- 2023
37. Extensive introgression among strongylocentrotid sea urchins revealed by phylogenomics.
- Author
-
Glasenapp, Matthew and Pogson, Grant
- Subjects
bioinformatics ,echinoderms ,gamete recognition proteins ,hybridization ,molecular evolution ,phyloinformatics - Abstract
Gametic isolation is thought to play an important role in the evolution of reproductive isolation in broadcast-spawning marine invertebrates. However, it is unclear whether gametic isolation commonly evolves early in the speciation process or only accumulates after other reproductive barriers are already in place. It is also unknown whether gametic isolation is an effective barrier to introgression following speciation. Here, we used whole-genome sequencing data and multiple complementary phylogenomic approaches to test whether the well-documented gametic incompatibilities among the strongylocentrotid sea urchins have limited introgression. We quantified phylogenetic discordance, inferred reticulate phylogenetic networks, and applied the Δ statistic using gene tree topologies reconstructed from multiple sequence alignments of protein-coding single-copy orthologs. In addition, we conducted ABBA-BABA tests on genome-wide single nucleotide variants and reconstructed a phylogeny of mitochondrial genomes. Our results revealed strong mito-nuclear discordance and considerable nonrandom gene tree discordance that cannot be explained by incomplete lineage sorting alone. Eight of the nine species examined demonstrated a history of introgression with at least one other species or ancestral lineage, indicating that introgression was common during the diversification of the strongylocentrotid urchins. There was strong support for introgression between four extant species pairs (Strongylocentrotus pallidus ⇔ S. droebachiensis, S. intermedius ⇔ S. pallidus, S. purpuratus ⇔ S. fragilis, and Mesocentrotus franciscanus ⇔ Pseudocentrotus depressus) and additional evidence for introgression on internal branches of the phylogeny. Our results suggest that the existing gametic incompatibilities among the strongylocentrotid urchin species have not been a complete barrier to hybridization and introgression following speciation. Their continued divergence in the face of widespread introgression indicates that other reproductive isolating barriers likely exist and may have been more critical in establishing reproductive isolation early in speciation.
- Published
- 2023
38. PyWGCNA: a Python package for weighted gene co-expression network analysis
- Author
-
Rezaie, Narges, Reese, Farilie, and Mortazavi, Ali
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Genetics ,Neurosciences ,Human Genome ,Biotechnology ,Gene Expression Profiling ,RNA-Seq ,Gene Regulatory Networks ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Information and computing sciences ,Mathematical sciences - Abstract
MotivationWeighted gene co-expression network analysis (WGCNA) is frequently used to identify modules of genes that are co-expressed across many RNA-seq samples. However, the current R implementation is slow, is not designed to compare modules between multiple WGCNA networks, and its results can be hard to interpret as well as to visualize. We introduce the PyWGCNA Python package, which is designed to identify co-expression modules from large RNA-seq datasets. PyWGCNA has a faster implementation than the R version of WGCNA and several additional downstream analysis modules for functional enrichment analysis using GO, KEGG, and REACTOME, inter-module analysis of protein-protein interactions, as well as comparison of multiple co-expression modules to each other and/or external lists of genes such as marker genes from single cell.ResultsWe apply PyWGCNA to two distinct datasets of brain bulk RNA-seq from MODEL-AD to identify modules associated with the genotypes. We compare the resulting modules to each other to find shared co-expression signatures in the form of modules with significant overlap across the datasets.Availability and implementationThe PyWGCNA library for Python 3 is available on PyPi at pypi.org/project/PyWGCNA and on GitHub at github.com/mortazavilab/PyWGCNA. The data underlying this article are available in GitHub at github.com/mortazavilab/PyWGCNA/tutorials/5xFAD_paper.
- Published
- 2023
39. KG-Hub—building and exchanging biological knowledge graphs
- Author
-
Caufield, J Harry, Putman, Tim, Schaper, Kevin, Unni, Deepak R, Hegde, Harshad, Callahan, Tiffany J, Cappelletti, Luca, Moxon, Sierra AT, Ravanmehr, Vida, Carbon, Seth, Chan, Lauren E, Cortes, Katherina, Shefchek, Kent A, Elsarboukh, Glass, Balhoff, Jim, Fontana, Tommaso, Matentzoglu, Nicolas, Bruskiewich, Richard M, Thessen, Anne E, Harris, Nomi L, Munoz-Torres, Monica C, Haendel, Melissa A, Robinson, Peter N, Joachimiak, Marcin P, Mungall, Christopher J, and Reese, Justin T
- Subjects
Biological Sciences ,Bioinformatics and Computational Biology ,Networking and Information Technology R&D (NITRD) ,Humans ,Pattern Recognition ,Automated ,COVID-19 ,Biological Ontologies ,Rare Diseases ,Machine Learning ,Mathematical Sciences ,Information and Computing Sciences ,Bioinformatics ,Biological sciences ,Information and computing sciences ,Mathematical sciences - Abstract
MotivationKnowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking.ResultsHere we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification.Availability and implementationhttps://kghub.org.
- Published
- 2023
40. Accelerating open modification spectral library searching on tensor core in high-dimensional space
- Author
-
Kang, Jaeyoung, Xu, Weihong, Bittremieux, Wout, Moshiri, Niema, and Rosing, Tajana
- Subjects
Information and Computing Sciences ,Biotechnology ,Tandem Mass Spectrometry ,Databases ,Protein ,Software ,Peptides ,Search Engine ,Algorithms ,Peptide Library ,Mathematical Sciences ,Biological Sciences ,Bioinformatics ,Biological sciences ,Information and computing sciences ,Mathematical sciences - Abstract
MotivationDriven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets.ResultsWe propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA's tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools.Availability and implementationHOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc.
- Published
- 2023
41. Purification and functional characterization of novel human skeletal stem cell lineages
- Author
-
Hoover, Malachia Y, Ambrosi, Thomas H, Steininger, Holly M, Koepke, Lauren S, Wang, Yuting, Zhao, Liming, Murphy, Matthew P, Alam, Alina A, Arouge, Elizabeth J, Butler, M Gohazrua K, Takematsu, Eri, Stavitsky, Suzan P, Hu, Serena, Sahoo, Debashis, Sinha, Rahul, Morri, Maurizio, Neff, Norma, Bishop, Julius, Gardner, Michael, Goodman, Stuart, Longaker, Michael, and Chan, Charles KF
- Subjects
Medical Biotechnology ,Biomedical and Clinical Sciences ,Engineering ,Biomedical Engineering ,Regenerative Medicine ,Stem Cell Research ,Stem Cell Research - Nonembryonic - Human ,Stem Cell Research - Nonembryonic - Non-Human ,Clinical Research ,1.1 Normal biological development and functioning ,Underpinning research ,Musculoskeletal ,Humans ,Mice ,Animals ,Cell Lineage ,Reproducibility of Results ,Mesenchymal Stem Cells ,Cell Differentiation ,Bone and Bones ,Bone Marrow Cells ,Cells ,Cultured ,Chemical Sciences ,Biological Sciences ,Medical and Health Sciences ,Bioinformatics - Abstract
Human skeletal stem cells (hSSCs) hold tremendous therapeutic potential for developing new clinical strategies to effectively combat congenital and age-related musculoskeletal disorders. Unfortunately, refined methodologies for the proper isolation of bona fide hSSCs and the development of functional assays that accurately recapitulate their physiology within the skeleton have been lacking. Bone marrow-derived mesenchymal stromal cells (BMSCs), commonly used to describe the source of precursors for osteoblasts, chondrocytes, adipocytes and stroma, have held great promise as the basis of various approaches for cell therapy. However, the reproducibility and clinical efficacy of these attempts have been obscured by the heterogeneous nature of BMSCs due to their isolation by plastic adherence techniques. To address these limitations, our group has refined the purity of individual progenitor populations that are encompassed by BMSCs by identifying defined populations of bona fide hSSCs and their downstream progenitors that strictly give rise to skeletally restricted cell lineages. Here, we describe an advanced flow cytometric approach that utilizes an extensive panel of eight cell surface markers to define hSSCs; bone, cartilage and stromal progenitors; and more differentiated unipotent subtypes, including an osteogenic subset and three chondroprogenitors. We provide detailed instructions for the FACS-based isolation of hSSCs from various tissue sources, in vitro and in vivo skeletogenic functional assays, human xenograft mouse models and single-cell RNA sequencing analysis. This application of hSSC isolation can be performed by any researcher with basic skills in biology and flow cytometry within 1-2 days. The downstream functional assays can be performed within a range of 1-2 months.
- Published
- 2023
42. Topology and adenocarcinoma cell localization dataset on the labyrinthin diapeutic biomarker.
- Author
-
Sharma, Ankit, Babich, Michael, Li, Tianhong, and Radosevich, James A
- Subjects
Humans ,Adenocarcinoma ,Lung Neoplasms ,Calcium-Binding Proteins ,Biomarkers ,ASPH ,Biomarker ,Labyrinthin ,Neo-antigen ,Pan-tumor target ,Tumor associated antigen ,Tumor specific antigen ,Lung Cancer ,Rare Diseases ,Lung ,Cancer ,Underpinning research ,1.1 Normal biological development and functioning ,Biochemistry and Cell Biology ,Other Medical and Health Sciences ,Bioinformatics - Abstract
ObjectiveThe discovery and characterization of tumor associated antigens is increasingly important to advance the field of immuno-oncology. In this regard, labyrinthin has been implicated as a neoantigen found on the cell surface of adenocarcinomas. Data on the (1) topology, (2) amino acid (a.a.) homology analyses and (3) cell surface localization of labyrinthin by fluorescent activated cell sorter (FACS) are studied in support of labyrinthin as a novel, pan-adenocarcinoma marker.ResultsBioinformatics analyses predict labyrinthin as a type II protein with calcium binding domain(s), N-myristoylation sites, and kinase II phosphorylation sites. Sequence homologies for labyrinthin (255 a.a.) were found vs. the intracellular aspartyl/asparaginyl beta-hydroxylase (ASPH; 758 a.a.) and the ASPH-gene related protein junctate (299 a.a.), which are both type II proteins. Labyrinthin was detected by FACS on only non-permeablized A549 human lung adenocarcinoma cells, but not on normal WI-38 human lung fibroblasts nor primary cultures of normal human glandular-related cells. Microscopic images of immunofluorescent labelled MCA 44-3A6 binding to A549 cells at random cell cycle stages complement the FACS results by further showing that labyrinthin persisted on the cell surfaces along with some cell internalization for greater than 20 min.
- Published
- 2023
43. Whole‐genome RNA sequencing identifies distinct transcriptomic profiles in impingement cartilage between patients with femoroacetabular impingement and hip osteoarthritis
- Author
-
Kuhns, Benjamin D, Reuter, John M, Hansen, Victoria L, Soles, Gillian L, Jonason, Jennifer H, Ackert‐Bicknell, Cheryl L, Wu, Chia‐Lung, and Giordano, Brian D
- Subjects
Control Engineering ,Mechatronics and Robotics ,Engineering ,Biomedical Engineering ,Clinical Research ,Human Genome ,Arthritis ,Biotechnology ,Pain Research ,Aging ,Genetics ,Chronic Pain ,Osteoarthritis ,2.1 Biological and endogenous factors ,Aetiology ,Musculoskeletal ,Humans ,Osteoarthritis ,Hip ,Femoracetabular Impingement ,Hip Joint ,RNA ,Transcriptome ,Cartilage ,Articular ,Disease Progression ,Sequence Analysis ,RNA ,bioinformatics ,femoroacetabular impingement ,hip osteoarthritis ,mRNA sequencing ,Clinical Sciences ,Human Movement and Sports Sciences ,Orthopedics ,Biomedical engineering ,Sports science and exercise - Abstract
Femoroacetabular impingement (FAI) has a strong clinical association with the development of hip osteoarthritis (OA); however, the pathobiological mechanisms underlying the transition from focal impingement to global joint degeneration remain poorly understood. The purpose of this study is to use whole-genome RNA sequencing to identify and subsequently validate differentially expressed genes (DEGs) in femoral head articular cartilage samples from patients with FAI and hip OA secondary to FAI. Thirty-seven patients were included in the study with whole-genome RNA sequencing performed on 10 gender-matched patients in the FAI and OA cohorts and the remaining specimens were used for validation analyses. We identified a total of 3531 DEGs between the FAI and OA cohorts with multiple targets for genes implicated in canonical OA pathways. Quantitative reverse transcription-polymerase chain reaction validation confirmed increased expression of FGF18 and WNT16 in the FAI samples, while there was increased expression of MMP13 and ADAMTS4 in the OA samples. Expression levels of FGF18 and WNT16 were also higher in FAI samples with mild cartilage damage compared to FAI samples with severe cartilage damage or OA cartilage. Our study further expands the knowledge regarding distinct genetic reprogramming in the cartilage between FAI and hip OA patients. We independently validated the results of the sequencing analysis and found increased expression of anabolic markers in patients with FAI and minimal histologic cartilage damage, suggesting that anabolic signaling may be increased in early FAI with a transition to catabolic and inflammatory gene expression as FAI progresses towards more severe hip OA. Clinical significance:Cam-type FAI has a strong clinical association with hip OA; however, the cellular pathophysiology of disease progression remains poorly understood. Several previous studies have demonstrated increased expression of inflammatory markers in FAI cartilage samples, suggesting the involvement of these inflammatory pathways in the disease progression. Our study further expands the knowledge regarding distinct genetic reprogramming in the cartilage between FAI and hip OA patients. In addition to differences in inflammatory gene expression, we also identified differential expression in multiple pathways involved in hip OA progression.
- Published
- 2023
44. Sensitivity Analysis of Genome-Scale Metabolic Flux Prediction.
- Author
-
Niu, Puhua, Soto, Maria J, Huang, Shuai, Yoon, Byung-Jun, Dougherty, Edward R, Alexander, Francis J, Blaby, Ian, and Qian, Xiaoning
- Subjects
Bayes Theorem ,Models ,Biological ,Gene Regulatory Networks ,Metabolic Networks and Pathways ,Metabolic Flux Analysis ,Bayesian network structure learning ,metabolic engineering ,optimal experimental design ,regulated metabolic network modeling ,uncertainty quantification ,Genetics ,Human Genome ,Mathematical Sciences ,Biological Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
TRIMER, Transcription Regulation Integrated with MEtabolic Regulation, is a genome-scale modeling pipeline targeting at metabolic engineering applications. Using TRIMER, regulated metabolic reactions can be effectively predicted by integrative modeling of metabolic reactions with a transcription factor-gene regulatory network (TRN), which is modeled through a Bayesian network (BN). In this article, we focus on sensitivity analysis of metabolic flux prediction for uncertainty quantification of BN structures for TRN modeling in TRIMER. We propose a computational strategy to construct the uncertainty class of TRN models based on the inferred regulatory order uncertainty given transcriptomic expression data. With that, we analyze the prediction sensitivity of the TRIMER pipeline for the metabolite yields of interest. The obtained sensitivity analyses can guide optimal experimental design (OED) to help acquire new data that can enhance TRN modeling and achieve specific metabolic engineering objectives, including metabolite yield alterations. We have performed small- and large-scale simulated experiments, demonstrating the effectiveness of our developed sensitivity analysis strategy for BN structure learning to quantify the edge importance in terms of metabolic flux prediction uncertainty reduction and its potential to effectively guide OED.
- Published
- 2023
45. A multilocus approach for accurate variant calling in low-copy repeats using whole-genome sequencing
- Author
-
Prodanov, Timofey and Bansal, Vikas
- Subjects
Human Genome ,Genetics ,2.1 Biological and endogenous factors ,Aetiology ,Generic health relevance ,Good Health and Well Being ,Humans ,Segmental Duplications ,Genomic ,DNA Copy Number Variations ,Whole Genome Sequencing ,Benchmarking ,Genome ,Human ,Mathematical Sciences ,Biological Sciences ,Information and Computing Sciences ,Bioinformatics - Abstract
MotivationLow-copy repeats (LCRs) or segmental duplications are long segments of duplicated DNA that cover > 5% of the human genome. Existing tools for variant calling using short reads exhibit low accuracy in LCRs due to ambiguity in read mapping and extensive copy number variation. Variants in more than 150 genes overlapping LCRs are associated with risk for human diseases.MethodsWe describe a short-read variant calling method, ParascopyVC, that performs variant calling jointly across all repeat copies and utilizes reads independent of mapping quality in LCRs. To identify candidate variants, ParascopyVC aggregates reads mapped to different repeat copies and performs polyploid variant calling. Subsequently, paralogous sequence variants that can differentiate repeat copies are identified using population data and used for estimating the genotype of variants for each repeat copy.ResultsOn simulated whole-genome sequence data, ParascopyVC achieved higher precision (0.997) and recall (0.807) than three state-of-the-art variant callers (best precision = 0.956 for DeepVariant and best recall = 0.738 for GATK) in 167 LCR regions. Benchmarking of ParascopyVC using the genome-in-a-bottle high-confidence variant calls for HG002 genome showed that it achieved a very high precision of 0.991 and a high recall of 0.909 across LCR regions, significantly better than FreeBayes (precision = 0.954 and recall = 0.822), GATK (precision = 0.888 and recall = 0.873) and DeepVariant (precision = 0.983 and recall = 0.861). ParascopyVC demonstrated a consistently higher accuracy (mean F1 = 0.947) than other callers (best F1 = 0.908) across seven human genomes.Availability and implementationParascopyVC is implemented in Python and is freely available at https://github.com/tprodanov/ParascopyVC.
- Published
- 2023
46. Predicting cellular responses to complex perturbations in high‐throughput screens
- Author
-
Lotfollahi, Mohammad, Susmelj, Anna Klimovskaia, De Donno, Carlo, Hetzel, Leon, Ji, Yuge, Ibarra, Ignacio L, Srivatsan, Sanjay R, Naghipourfar, Mohsen, Daza, Riza M, Martin, Beth, Shendure, Jay, McFaline‐Figueroa, Jose L, Boyeau, Pierre, Wolf, F Alexander, Yakubova, Nafissa, Günnemann, Stephan, Trapnell, Cole, Lopez‐Paz, David, and Theis, Fabian J
- Subjects
Biochemistry and Cell Biology ,Biological Sciences ,Genetics ,Bioengineering ,Underpinning research ,1.1 Normal biological development and functioning ,Generic health relevance ,Gene Expression Profiling ,High-Throughput Screening Assays ,Computational Biology ,Single-Cell Gene Expression Analysis ,generative modeling ,high-throughput screening ,machine learning ,perturbation prediction ,single-cell transcriptomics ,Other Biological Sciences ,Bioinformatics ,Biochemistry and cell biology - Abstract
Recent advances in multiplexed single-cell transcriptomics experiments facilitate the high-throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep-learning approaches for single-cell response modeling. CPA learns to in silico predict transcriptional perturbation response at the single-cell level for unseen dosages, cell types, time points, and species. Using newly generated single-cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture's modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing in silico 5,329 missing combinations (97.6% of all possibilities) in a single-cell Perturb-seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single-cell level and thus accelerate therapeutic applications using single-cell technologies.
- Published
- 2023
47. In vivo imaging of the human retina using a two-photon excited fluorescence ophthalmoscope
- Author
-
Bogusławski, Jakub, Tomczewski, Sławomir, Dąbrowski, Michał, Komar, Katarzyna, Milkiewicz, Jadwiga, Palczewska, Grażyna, Palczewski, Krzysztof, and Wojtkowski, Maciej
- Subjects
Biomedical and Clinical Sciences ,Physical Sciences ,Ophthalmology and Optometry ,Neurosciences ,Eye Disease and Disorders of Vision ,Eye ,Bioinformatics ,Biotechnology and Bioengineering ,Clinical Protocol ,Health Sciences ,Molecular/Chemical Probes ,Neuroscience ,Physics - Abstract
Noninvasive imaging of endogenous retinal fluorophores, including vitamin A derivatives, is vital to developing new treatments for retinal diseases. Here, we present a protocol for obtaining in vivo two-photon excited fluorescence images of the fundus in the human eye. We describe steps for laser characterization, system alignment, positioning human subjects, and data registration. We detail data processing and demonstrate analysis with example datasets. This technique allays safety concerns by allowing for the acquisition of informative images at low laser exposure. For complete details on the use and execution of this protocol, please refer to Bogusławski et al. (2022).1.
- Published
- 2023
48. Determination of Effect Sizes for Power Analysis for Microbiome Studies Using Large Microbiome Databases.
- Author
-
Rahman, Gibraan, McDonald, Daniel, Gonzalez, Antonio, Vázquez-Baeza, Yoshiki, Jiang, Lingjing, Casals-Pascual, Climent, Hakim, Daniel, Dilmore, Amanda Hazel, Nowinski, Brent, Peddada, Shyamal, and Knight, Rob
- Subjects
Software ,Databases ,Factual ,Microbiota ,Gastrointestinal Microbiome ,bioinformatics ,effect size ,microbiome ,statistics ,Human Genome ,Complementary and Integrative Health ,Genetics ,Networking and Information Technology R&D (NITRD) - Abstract
Herein, we present a tool called Evident that can be used for deriving effect sizes for a broad spectrum of metadata variables, such as mode of birth, antibiotics, socioeconomics, etc., to provide power calculations for a new study. Evident can be used to mine existing databases of large microbiome studies (such as the American Gut Project, FINRISK, and TEDDY) to analyze the effect sizes for planning future microbiome studies via power analysis. For each metavariable, the Evident software is flexible to compute effect sizes for many commonly used measures of microbiome analyses, including α diversity, β diversity, and log-ratio analysis. In this work, we describe why effect size and power analysis are necessary for computational microbiome analysis and show how Evident can help researchers perform these procedures. Additionally, we describe how Evident is easy for researchers to use and provide an example of efficient analyses using a dataset of thousands of samples and dozens of metadata categories.
- Published
- 2023
49. Genomic signatures of local adaptation in recent invasive Aedes aegypti populations in California.
- Author
-
Soudi, Shaghayegh, Crepeau, Marc, Collier, Travis C, Lee, Yoosook, Cornel, Anthony J, and Lanzaro, Gregory C
- Subjects
Animals ,Aedes ,Genomics ,Adaptation ,Physiological ,California ,Mosquito Vectors ,Adaptive loci ,Aedes mosquitoes ,Genome scan ,Landscape genomics ,Selection ,Human Genome ,Genetics ,Biotechnology ,Prevention ,2.2 Factors relating to the physical environment ,Aetiology ,Climate Action ,Biological Sciences ,Information and Computing Sciences ,Medical and Health Sciences ,Bioinformatics - Abstract
BackgroundRapid adaptation to new environments can facilitate species invasions and range expansions. Understanding the mechanisms of adaptation used by invasive disease vectors in new regions has key implications for mitigating the prevalence and spread of vector-borne disease, although they remain relatively unexplored.ResultsHere, we integrate whole-genome sequencing data from 96 Aedes aegypti mosquitoes collected from various sites in southern and central California with 25 annual topo-climate variables to investigate genome-wide signals of local adaptation among populations. Patterns of population structure, as inferred using principal components and admixture analysis, were consistent with three genetic clusters. Using various landscape genomics approaches, which all remove the confounding effects of shared ancestry on correlations between genetic and environmental variation, we identified 112 genes showing strong signals of local environmental adaptation associated with one or more topo-climate factors. Some of them have known effects in climate adaptation, such as heat-shock proteins, which shows selective sweep and recent positive selection acting on these genomic regions.ConclusionsOur results provide a genome wide perspective on the distribution of adaptive loci and lay the foundation for future work to understand how environmental adaptation in Ae. aegypti impacts the arboviral disease landscape and how such adaptation could help or hinder efforts at population control.
- Published
- 2023
50. Expanding the Utility of Bioinformatic Data for the Full Stereostructural Assignments of Marinolides A and B, 24- and 26-Membered Macrolactones Produced by a Chemically Exceptional Marine-Derived Bacterium.
- Author
-
Kim, Min Cheol, Winter, Jaclyn M, Cullum, Reiko, Smith, Alexander J, and Fenical, William
- Subjects
Bacteria ,Macrolides ,Polyketide Synthases ,Biological Products ,Computational Biology ,bioinformatics ,genome mining ,macrolactone ,modular type I polyketide synthase ,natural product structure elucidation ,Physical Chemistry (incl. Structural) ,Pharmacology and Pharmaceutical Sciences ,Medicinal & Biomolecular Chemistry - Abstract
Marinolides A and B, two new 24- and 26-membered bacterial macrolactones, were isolated from the marine-derived actinobacterium AJS-327 and their stereostructures initially assigned by bioinformatic data analysis. Macrolactones typically possess complex stereochemistry, the assignments of which have been one of the most difficult undertakings in natural products chemistry, and in most cases, the use of X-ray diffraction methods and total synthesis have been the major methods of assigning their absolute configurations. More recently, however, it has become apparent that the integration of bioinformatic data is growing in utility to assign absolute configurations. Genome mining and bioinformatic analysis identified the 97 kb mld biosynthetic cluster harboring seven type I polyketide synthases. A detailed bioinformatic investigation of the ketoreductase and enoylreductase domains within the multimodular polyketide synthases, coupled with NMR and X-ray diffraction data, allowed for the absolute configurations of marinolides A and B to be determined. While using bioinformatics to assign the relative and absolute configurations of natural products has high potential, this method must be coupled with full NMR-based analysis to both confirm bioinformatic assignments as well as any additional modifications that occur during biosynthesis.
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.