685 results on '"Sequence logo"'
Search Results
2. plotnineSeqSuite: a Python package for visualizing sequence data using ggplot2 style
- Author
-
Tianze Cao, Qian Li, Yuexia Huang, and Anshui Li
- Subjects
ggplot2 ,plotnine ,Bioinformatics tool ,Sequence logo ,Multiple sequence alignment ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background The visual sequence logo has been a hot area in the development of bioinformatics tools. ggseqlogo written in R language has been the most popular API since it was published. With the popularity of artificial intelligence and deep learning, Python is currently the most popular programming language. The programming language used by bioinformaticians began to shift to Python. Providing APIs in Python that are similar to those in R can reduce the learning cost of relearning a programming language. And compared to ggplot2 in R, drawing framework is not as easy to use in Python. The appearance of plotnine (ggplot2 in Python version) makes it possible to unify the programming methods of bioinformatics visualization tools between R and Python. Results Here, we introduce plotnineSeqSuite, a new plotnine-based Python package provides a ggseqlogo-like API for programmatic drawing of sequence logos, sequence alignment diagrams and sequence histograms. To be more precise, it supports custom letters, color themes, and fonts. Moreover, the class for drawing layers is based on object-oriented design so that users can easily encapsulate and extend it. Conclusions plotnineSeqSuite is the first ggplot2-style package to implement visualization of sequence -related graphs in Python. It enhances the uniformity of programmatic plotting between R and Python. Compared with tools appeared already, the categories supported by plotnineSeqSuite are much more complete. The source code of plotnineSeqSuite can be obtained on GitHub ( https://github.com/caotianze/plotnineseqsuite ) and PyPI ( https://pypi.org/project/plotnineseqsuite ), and the documentation homepage is freely available on GitHub at ( https://caotianze.github.io/plotnineseqsuite/ ).
- Published
- 2023
- Full Text
- View/download PDF
3. Basal Transcriptional Machinery
- Author
-
Carlberg, Carsten, Velleuer, Eunike, Molnár, Ferdinand, Carlberg, Carsten, Velleuer, Eunike, and Molnár, Ferdinand
- Published
- 2023
- Full Text
- View/download PDF
4. plotnineSeqSuite: a Python package for visualizing sequence data using ggplot2 style.
- Author
-
Cao, Tianze, Li, Qian, Huang, Yuexia, and Li, Anshui
- Subjects
- *
PYTHON programming language , *DEEP learning , *PROGRAMMING languages , *ARTIFICIAL intelligence , *SEQUENCE alignment , *SOURCE code , *DATA visualization - Abstract
Background: The visual sequence logo has been a hot area in the development of bioinformatics tools. ggseqlogo written in R language has been the most popular API since it was published. With the popularity of artificial intelligence and deep learning, Python is currently the most popular programming language. The programming language used by bioinformaticians began to shift to Python. Providing APIs in Python that are similar to those in R can reduce the learning cost of relearning a programming language. And compared to ggplot2 in R, drawing framework is not as easy to use in Python. The appearance of plotnine (ggplot2 in Python version) makes it possible to unify the programming methods of bioinformatics visualization tools between R and Python. Results: Here, we introduce plotnineSeqSuite, a new plotnine-based Python package provides a ggseqlogo-like API for programmatic drawing of sequence logos, sequence alignment diagrams and sequence histograms. To be more precise, it supports custom letters, color themes, and fonts. Moreover, the class for drawing layers is based on object-oriented design so that users can easily encapsulate and extend it. Conclusions: plotnineSeqSuite is the first ggplot2-style package to implement visualization of sequence -related graphs in Python. It enhances the uniformity of programmatic plotting between R and Python. Compared with tools appeared already, the categories supported by plotnineSeqSuite are much more complete. The source code of plotnineSeqSuite can be obtained on GitHub (https://github.com/caotianze/plotnineseqsuite) and PyPI (https://pypi.org/project/plotnineseqsuite), and the documentation homepage is freely available on GitHub at (https://caotianze.github.io/plotnineseqsuite/). [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Basal Transcriptional Machinery
- Author
-
Carlberg, Carsten, Molnár, Ferdinand, Carlberg, Carsten, and Molnár, Ferdinand
- Published
- 2020
- Full Text
- View/download PDF
6. Structure memes: Intuitive visualization of sequence logo and subfamily logo information in a 3D protein‐structural context.
- Abstract
The number of available protein sequences covering virtually all known species is tremendous and ever growing due to the feasibility of the underlying nucleotide sequencing. The speed at which protein structures are being determined is increasing, and as a result of refined cryo‐electron microscopy the proportion of solved membrane protein folds is expanding. Sequence data are used to illustrate evolution and to group proteins into families with various levels of subfamilies. Structure data of prototypical proteins provide insight into function brought about by an interplay of specific amino acid residues that are dispersed throughout the sequence. Visually combining rich sequence information with structure data in an intuitively comprehensible way would enhance the process of elucidating key protein aspects regarding evolution, sequence relations, and function. Here, a method is described that projects the information contained in sequence logos and subfamily logos onto protein structures. The amino acid composition at a site is encoded by a mix color in the red‐yellow‐blue space and the information content is presented by the radius of a sphere at the α‐carbon position. The resulting display is termed "structure meme." The underlying sequence and atom coordinate data are retained in the file for simple retrieval on demand using a molecular structure visualization program. Structure memes are recognizable and convey extensive information in a human‐discernable way that requires little training. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
7. Interpretable prioritization of splice variants in diagnostic next-generation sequencing.
- Author
-
Danis, Daniel, Jacobsen, Julius O.B., Carmody, Leigh C., Gargano, Michael A., McMurry, Julie A., Hegde, Ayushi, Haendel, Melissa A., Valentini, Giorgio, Smedley, Damian, and Robinson, Peter N.
- Subjects
- *
NUCLEOTIDE sequencing , *RNA splicing , *KNOTS & splices , *MACHINE learning , *DINUCLEOTIDES , *LOGISTIC regression analysis , *EXOMES - Abstract
A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5′ and 3′ ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms
- Author
-
Nielsen, Henrik, Compans, Richard W, Series editor, Honjo, Tasuku, Series editor, Oldstone, Michael B. A., Series editor, Vogt, Peter K., Series editor, Malissen, Bernard, Series editor, Aktories, Klaus, Series editor, Kawaoka, Yoshihiro, Series editor, Rappuoli, Rino, Series editor, Galan, Jorge E., Series editor, Ahmed, Rafi, Series editor, Palme, Klaus, Series editor, Casadevall, Arturo, Series editor, Garcia-Sastre, Adolfo, Series editor, and Bagnoli, Fabio, editor
- Published
- 2017
- Full Text
- View/download PDF
9. RaacLogo: a new sequence logo generator by using reduced amino acid clusters.
- Author
-
Zheng, Lei, Liu, Dongyang, Yang, Wuritu, Yang, Lei, and Zuo, Yongchun
- Subjects
- *
AMINO acids , *PROTEIN engineering , *INTERNET servers , *AMINO acid sequence , *SEQUENCE alignment - Abstract
Sequence logos give a fast and concise display in visualizing consensus sequence. Protein exhibits greater complexity and diversity than DNA, which usually affects the graphical representation of the logo. Reduced amino acids perform powerful ability for simplifying complexity of sequence alignment, which motivated us to establish RaacLogo. As a new sequence logo generator by using reduced amino acid alphabets, RaacLogo can easily generate many different simplified logos tailored to users by selecting various reduced amino acid alphabets that consisted of more than 40 clustering algorithms. This current web server provides 74 types of reduced amino acid alphabet, which were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with protein alignment. A two-dimensional selector was proposed for easily selecting desired RAACs with underlying biology knowledge. It is anticipated that the RaacLogo web server will play more high-potential roles for protein sequence alignment, topological estimation and protein design experiments. RaacLogo is freely available at http://bioinfor.imu.edu.cn/raaclogo. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. Motto: Representing Motifs in Consensus Sequences with Minimum Information Loss.
- Author
-
Mengchi Wang, Wang, David, Kai Zhang, Vu Ngo, Shicai Fan, and Wei Wang
- Subjects
- *
ALGORITHMS , *AMINO acids , *CONSENSUS (Social sciences) , *GENETICS , *HUMAN genome , *INFORMATION science , *MATHEMATICAL models , *NUCLEOTIDES , *TRANSCRIPTION factors , *INFORMATION literacy , *THEORY , *DESCRIPTIVE statistics , *SEQUENCE analysis - Abstract
Sequence analysis frequently requires intuitive understanding and convenient representation of motifs. Typically, motifs are represented as position weight matrices (PWMs) and visualized using sequence logos. However, in many scenarios, in order to interpret the motif information or search for motif matches, it is compact and sufficient to represent motifs by wildcard-style consensus sequences (such as [GC][AT]GATAAG[GAC]). Based on mutual information theory and Jensen-Shannon divergence, we propose a mathematical framework to minimize the information loss in converting PWMs to consensus sequences. We name this representation as sequence Motto and have implemented an efficient algorithm with flexible options for converting motif PWMs into Motto from nucleotides, amino acids, and customized characters. We show that this representation provides a simple and efficient way to identify the binding sites of 1156 common transcription factors (TFs) in the human genome. The effectiveness of the method was benchmarked by comparing sequence matches found by Motto with PWM scanning results found by FIMO. On average, our method achieves a 0.81 area under the precision-recall curve, significantly (P-value, 0.01) outperforming all existing methods, including maximal positional weight, Cavener's method, and minimal mean square error. We believe this representation provides a distilled summary of a motif, as well as the statistical justification. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. The Basal Transcriptional Machinery
- Author
-
Carlberg, Carsten, Molnár, Ferdinand, Carlberg, Carsten, and Molnár, Ferdinand
- Published
- 2016
- Full Text
- View/download PDF
12. VoQs: A Web Application for Visualization of Questionnaire Surveys
- Author
-
Zhang, Xiaowei, Klawonn, Frank, Grigull, Lorenz, Lechner, Werner, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Fromont, Elisa, editor, De Bie, Tijl, editor, and van Leeuwen, Matthijs, editor
- Published
- 2015
- Full Text
- View/download PDF
13. Sequences
- Author
-
Pazos, Florencio, Chagoyen, Mónica, Pazos, Florencio, and Chagoyen, Mónica
- Published
- 2015
- Full Text
- View/download PDF
14. Logo2PWM: a tool to convert sequence logo to position weight matrix
- Author
-
Zhen Gao, Lu Liu, and Jianhua Ruan
- Subjects
Sequence logo ,Position weight matrix ,Convert ,Motif finding ,Transcription ,Binding site ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background position weight matrix (PWM) and sequence logo are the most widely used representations of transcription factor binding site (TFBS) in biological sequences. Sequence logo - a graphical representation of PWM, has been widely used in scientific publications and reports, due to its easiness of human perception, rich information, and simple format. Different from sequence logo, PWM works great as a precise and compact digitalized form, which can be easily used by a variety of motif analysis software. There are a few available tools to generate sequence logos from PWM; however, no tool does the reverse. Such tool to convert sequence logo back to PWM is needed to scan a TFBS represented in logo format in a publication where the PWM is not provided or hard to be acquired. A major difficulty in developing such tool to convert sequence logo to PWM is to deal with the diversity of sequence logo images. Results We propose logo2PWM for reconstructing PWM from a large variety of sequence logo images. Evaluation results on over one thousand logos from three sources of different logo format show that the correlation between the reconstructed PWMs and the original PWMs are constantly high, where median correlation is greater than 0.97. Conclusion Because of the high recognition accuracy, the easiness of usage, and, the availability of both web-based service and stand-alone application, we believe that logo2PWM can readily benefit the study of transcription by filling the gap between sequence logo and PWM.
- Published
- 2017
- Full Text
- View/download PDF
15. The Basal Transcriptional Machinery
- Author
-
Carlberg, Carsten, Molnár, Ferdinand, Carlberg, Carsten, and Molnár, Ferdinand
- Published
- 2014
- Full Text
- View/download PDF
16. Re-evaluation of protein kinase CK2 pleiotropy: new insights provided by a phosphoproteomics analysis of CK2 knockout cells.
- Author
-
Franchin, Cinzia, Borgo, Christian, Cesaro, Luca, Zaramella, Silvia, Vilardell, Jordi, Salvi, Mauro, Arrigoni, Giorgio, and Pinna, Lorenzo A.
- Subjects
- *
PROTEIN kinase CK2 , *PROTEOMICS , *GENETIC pleiotropy , *CRISPRS , *PROTEIN kinases - Abstract
CK2 denotes a ubiquitous and pleiotropic protein kinase whose holoenzyme is composed of two catalytic (α and/or α′) and two regulatory β subunits. The CK2 consensus sequence, S/T-x-x-D/E/pS/pT is present in numerous phosphosites, but it is not clear how many of these are really generated by CK2. To gain information about this issue, advantage has been taken of C2C12 cells entirely deprived of both CK2 catalytic subunits by the CRISPR/Cas9 methodology. A comparative SILAC phosphoproteomics analysis reveals that, although about 30% of the quantified phosphosites do conform to the CK2 consensus, only one-third of these are substantially reduced in the CK2α/α′(−/−) cells, consistent with their generation by CK2. A parallel study with C2C12 cells deprived of the regulatory β subunit discloses a role of this subunit in determining CK2 targeting. We also find that phosphosites notoriously generated by CK2 are not fully abrogated in CK2α/α′(−/−) cells, while some phosphosites unrelated to CK2 are significantly altered. Collectively taken our data allow to conclude that the phosphoproteome generated by CK2 is not as ample and rigidly pre-determined as it was believed before. They also show that the lack of CK2 promotes phosphoproteomics perturbations attributable to kinases other than CK2. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
17. Expression of Genes and Proteins
- Author
-
Ussery, David W., Wassenaar, Trudy M., Borini, Stefano, Grippen, Gordon, editor, Felsenstein, Joe, editor, Gusfield, Dan, editor, Istrail, Sorin, editor, Karlin, Samuel, editor, Lengauer, Thomas, editor, McClure, Marcella, editor, Nowak, Martin, editor, Sankoff, David, editor, Shamir, Ron, editor, Steel, Mike, editor, Stormo, Gary, editor, Tavaré, Simon, editor, Warnow, Tandy, editor, Myers, Gene, editor, Giegerich, Robert, editor, Fitch, Walter M., editor, Pevzner, Pavel, editor, Vingron, Martin, editor, Ussery, David W., editor, Wassenaar, Trudy M., editor, and Borini, Stefano, editor
- Published
- 2009
- Full Text
- View/download PDF
18. Predicting SUMOylation Sites
- Author
-
Bauer, Denis C., Buske, Fabian A., Bodén, Mikael, Istrail, Sorin, editor, Pevzner, Pavel, editor, Waterman, Michael S., editor, Chetty, Madhu, editor, Ngom, Alioune, editor, and Ahmad, Shandar, editor
- Published
- 2008
- Full Text
- View/download PDF
19. Proteolytic profiling of streptococcal pyrogenic exotoxin b (Speb) by complementary hplc-ms approaches
- Author
-
Austrian Science Fund, Blöchl, Constantin, Holzner, Christoph, Luciano, Michela, Bauer, Renate, Horejs-Hoeck, Jutta, Eckhard, Ulrich, Brandstetter, Hans, Huber, Christian G., Austrian Science Fund, Blöchl, Constantin, Holzner, Christoph, Luciano, Michela, Bauer, Renate, Horejs-Hoeck, Jutta, Eckhard, Ulrich, Brandstetter, Hans, and Huber, Christian G.
- Abstract
Streptococcal pyrogenic exotoxin B (SpeB) is a cysteine protease expressed during group A streptococcal infection that represents a major virulence factor. Although subject to several studies, its role during infection is still under debate, and its proteolytic properties remain insufficiently characterized. Here, we revisited this protease through a set of complementary approaches relying on state of-the-art HPLC-MS methods. After conceiving an efficient protocol to recombinantly express SpeB, the zymogen of the protease and its activation were characterized. Employing proteome-derived peptide libraries, a strong preference for hydrophobic and aromatic residues at P2 alongside negatively charged amino acids at P3′ to P6′ was revealed. To identify relevant in vivo substrates, native proteins were obtained from monocytic secretome and plasma to assess their cleavage under physiological conditions. Besides corroborating our findings concerning specificity, more than 200 cleaved proteins were identified, including proteins of the extracellular matrix, proteins of the immune system, and proteins involved in inflammation. Finally, the cleavage of IgG subclasses was studied in detail. This study precisely depicts the proteolytic properties of SpeB and provides a library of potential host substrates, including their exact cleavage positions, as a valuable source for further research to unravel the role of SpeB during streptococcal infection.
- Published
- 2022
20. Biomedical Information Visualization
- Author
-
Lungu, Mircea, Xu, Kai, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Kerren, Andreas, editor, Ebert, Achim, editor, and Meyer, Jörg, editor
- Published
- 2007
- Full Text
- View/download PDF
21. Gene Prediction
- Author
-
Haubold, Bernhard and Wiehe, Thomas
- Published
- 2006
- Full Text
- View/download PDF
22. Signals in DNA
- Author
-
Deonier, Richard C., Waterman, Michael S., and Tavaré, Simon
- Published
- 2005
- Full Text
- View/download PDF
23. MHC Class I Epitope Binding Prediction Trained on Small Data Sets
- Author
-
Lundegaard, Claus, Nielsen, Morten, Lamberth, Kasper, Worning, Peder, Sylvester-Hvid, Christina, Buus, Søren, Brunak, Søren, Lund, Ole, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Nicosia, Giuseppe, editor, Cutello, Vincenzo, editor, Bentley, Peter J., editor, and Timmis, Jon, editor
- Published
- 2004
- Full Text
- View/download PDF
24. Some Lessons for Molecular Biology from Information Theory
- Author
-
Schneider, Thomas D., Kacprzyk, Janusz, editor, and Karmeshu, editor
- Published
- 2003
- Full Text
- View/download PDF
25. Expression Profiler
- Author
-
Vilo, Jaak, Kapushesky, Misha, Kemmeren, Patrick, Sarkans, Ugis, Brazma, Alvis, Dietz, K., editor, Gail, M., editor, Krickeberg, K., editor, Samet, J., editor, Tsiatis, A., editor, Parmigiani, Giovanni, editor, Garrett, Elizabeth S., editor, Irizarry, Rafael A., editor, and Zeger, Scott L., editor
- Published
- 2003
- Full Text
- View/download PDF
26. Pattern Discovery Allowing Wild-Cards, Substitution Matrices, and Multiple Score Functions
- Author
-
Mancheron, Alban, Rusu, Irena, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Istrail, Sorin, editor, Pevzner, Pavel, editor, Waterman, Michael, editor, Benson, Gary, editor, and Page, Roderic D. M., editor
- Published
- 2003
- Full Text
- View/download PDF
27. SigmoID: a user-friendly tool for improving bacterial genome annotation through analysis of transcription control signals
- Author
-
Yevgeny Nikolaichik and Aliaksandr U. Damienikan
- Subjects
Transcription factor binding site ,Promoter ,Terminator ,Genome browser ,Genome annotation ,Sequence logo ,Medicine ,Biology (General) ,QH301-705.5 - Abstract
The majority of bacterial genome annotations are currently automated and based on a ‘gene by gene’ approach. Regulatory signals and operon structures are rarely taken into account which often results in incomplete and even incorrect gene function assignments. Here we present SigmoID, a cross-platform (OS X, Linux and Windows) open-source application aiming at simplifying the identification of transcription regulatory sites (promoters, transcription factor binding sites and terminators) in bacterial genomes and providing assistance in correcting annotations in accordance with regulatory information. SigmoID combines a user-friendly graphical interface to well known command line tools with a genome browser for visualising regulatory elements in genomic context. Integrated access to online databases with regulatory information (RegPrecise and RegulonDB) and web-based search engines speeds up genome analysis and simplifies correction of genome annotation. We demonstrate some features of SigmoID by constructing a series of regulatory protein binding site profiles for two groups of bacteria: Soft Rot Enterobacteriaceae (Pectobacterium and Dickeya spp.) and Pseudomonas spp. Furthermore, we inferred over 900 transcription factor binding sites and alternative sigma factor promoters in the annotated genome of Pectobacterium atrosepticum. These regulatory signals control putative transcription units covering about 40% of the P. atrosepticum chromosome. Reviewing the annotation in cases where it didn’t fit with regulatory information allowed us to correct product and gene names for over 300 loci.
- Published
- 2016
- Full Text
- View/download PDF
28. Motto: Representing Motifs in Consensus Sequences with Minimum Information Loss
- Author
-
Wei Wang, Shicai Fan, Kai Zhang, Vu Ngo, Mengchi Wang, and David Y. Wang
- Subjects
Mean squared error ,Computer science ,Sequence analysis ,0206 medical engineering ,02 engineering and technology ,Information loss ,Investigations ,Biology ,sequence logo ,Information theory ,03 medical and health sciences ,0302 clinical medicine ,Methods, Technology, & Resources ,Genetics ,Consensus sequence ,Humans ,Position-Specific Scoring Matrices ,Nucleotide ,Binding site ,information theory ,030304 developmental biology ,chemistry.chemical_classification ,0303 health sciences ,motif ,Genome, Human ,business.industry ,Pattern recognition ,Sequence Analysis, DNA ,Mutual information ,Amino acid ,Sequence logo ,transcription factor binding ,chemistry ,consensus ,Human genome ,Motif (music) ,Artificial intelligence ,business ,Algorithm ,Algorithms ,020602 bioinformatics ,030217 neurology & neurosurgery ,Transcription Factors - Abstract
Sequence analysis frequently requires intuitive understanding and convenient representation of motifs. Typically, motifs are represented as position weight matrices (PWMs) and visualized using sequence logos. However, in many scenarios, representing motifs by wildcard-style consensus sequences is compact and sufficient for interpreting the motif information and search for motif match. Based on mutual information theory and Jenson-Shannon Divergence, we propose a mathematical framework to minimize the information loss in converting PWMs to consensus sequences. We name this representation as sequence Motto and have implemented an efficient algorithm with flexible options for converting motif PWMs into Motto from nucleotides, amino acids, and customized alphabets. Here we show that this representation provides a simple and efficient way to identify the binding sites of 1156 common TFs in the human genome. The effectiveness of the method was benchmarked by comparing sequence matches found by Motto with PWM scanning results found by FIMO. On average, our method achieves 0.81 area under the precision-recall curve, significantly (p-value < 0.01) outperforming all existing methods, including maximal positional weight, Douglas and minimal mean square error. We believe this representation provides a distilled summary of a motif, as well as the statistical justification.AVAILABILITYMotto is freely available at http://wanglab.ucsd.edu/star/motto.
- Published
- 2020
- Full Text
- View/download PDF
29. MetaLogo: a heterogeneity-aware sequence logo generator and aligner
- Author
-
Yahui Men, Xiaomin Ying, Guohua Dong, Zhen He, Yaowen Chen, and Shuofeng Hu
- Subjects
Internet ,Web server ,Generator (computer programming) ,Theoretical computer science ,Multiple sequence alignment ,Phylogenetic tree ,Computer science ,Sequence Analysis, DNA ,Python (programming language) ,computer.software_genre ,Sequence logo ,Humans ,Position-Specific Scoring Matrices ,Sequence motif ,Sequence Alignment ,computer ,Molecular Biology ,Phylogeny ,Software ,Sequence (medicine) ,computer.programming_language ,Information Systems - Abstract
Sequence logos are used to visually display conservations and variations in short sequences. They can indicate the fixed patterns or conserved motifs in a batch of DNA or protein sequences. However, most of the popular sequence logo generators are based on the assumption that all the input sequences are from the same homologous group, which will lead to an overlook of the heterogeneity among the sequences during the sequence logo making process. Heterogeneous groups of sequences may represent clades of different evolutionary origins, or genes families with different functions. Therefore, it is essential to divide the sequences into different phylogenetic or functional groups to reveal their specific sequence motifs and conservation patterns. To solve these problems, we developed MetaLogo, which can automatically cluster the input sequences after multiple sequence alignment and phylogenetic tree construction, and then output sequence logos for multiple groups and aligned them in one figure. User-defined grouping is also supported by MetaLogo to allow users to investigate functional motifs in a more delicate and dynamic perspective. MetaLogo can highlight both the homologous and nonhomologous sites among sequences. MetaLogo can also be used to annotate the evolutionary positions and gene functions of unknown sequences, together with their local sequence characteristics. We provide users a public MetaLogo web server (http://metalogo.omicsnet.org), a standalone Python package (https://github.com/labomics/MetaLogo), and also a built-in web server available for local deployment. Using MetaLogo, users can draw informative, customized and publishable sequence logos without any programming experience to present and investigate new knowledge on specific sequence sets.
- Published
- 2022
- Full Text
- View/download PDF
30. Proteolytic Profiling of Streptococcal Pyrogenic Exotoxin B (SpeB) by Complementary HPLC-MS Approaches
- Author
-
Huber, Constantin Blöchl, Christoph Holzner, Michela Luciano, Renate Bauer, Jutta Horejs-Hoeck, Ulrich Eckhard, Hans Brandstetter, and Christian G.
- Subjects
streptococcal cysteine protease ,SCP ,streptopain ,protease degradomics ,HUNTER ,N-terminomics ,positional proteomics ,sequence specificity ,sequence logo ,IgG subclasses - Abstract
Streptococcal pyrogenic exotoxin B (SpeB) is a cysteine protease expressed during group A streptococcal infection that represents a major virulence factor. Although subject to several studies, its role during infection is still under debate, and its proteolytic properties remain insufficiently characterized. Here, we revisited this protease through a set of complementary approaches relying on state of-the-art HPLC-MS methods. After conceiving an efficient protocol to recombinantly express SpeB, the zymogen of the protease and its activation were characterized. Employing proteome-derived peptide libraries, a strong preference for hydrophobic and aromatic residues at P2 alongside negatively charged amino acids at P3′ to P6′ was revealed. To identify relevant in vivo substrates, native proteins were obtained from monocytic secretome and plasma to assess their cleavage under physiological conditions. Besides corroborating our findings concerning specificity, more than 200 cleaved proteins were identified, including proteins of the extracellular matrix, proteins of the immune system, and proteins involved in inflammation. Finally, the cleavage of IgG subclasses was studied in detail. This study precisely depicts the proteolytic properties of SpeB and provides a library of potential host substrates, including their exact cleavage positions, as a valuable source for further research to unravel the role of SpeB during streptococcal infection.
- Published
- 2021
- Full Text
- View/download PDF
31. Proteolytic Profiling of Streptococcal Pyrogenic Exotoxin B (SpeB) by Complementary HPLC-MS Approaches
- Author
-
Blöchl, Constantin, Holzner, Christoph, Luciano, Michela, Bauer, Renate, Horejs-Hoeck, Jutta, Eckhard, Ulrich, Brandstetter, Hans, Huber, Christian G., and Austrian Science Fund
- Subjects
Positional proteomics ,streptopain ,Proteome ,protease degradomics ,QH301-705.5 ,Streptococcus pyogenes ,Exotoxins ,sequence logo ,Sequence logo ,Mass Spectrometry ,Article ,Substrate Specificity ,HUNTER ,Bacterial Proteins ,Streptopain ,sequence specificity ,Escherichia coli ,Humans ,Amino Acid Sequence ,IgG subclasses ,Biology (General) ,QD1-999 ,Chromatography, High Pressure Liquid ,Protease degradomics ,streptococcal cysteine protease ,N-terminomics ,Recombinant Proteins ,Chemistry ,SCP ,Immunoglobulin G ,Proteolysis ,Streptococcal cysteine protease ,Peptides ,Sequence specificity ,Peptide Hydrolases ,positional proteomics - Abstract
© 2021 by the authors., Streptococcal pyrogenic exotoxin B (SpeB) is a cysteine protease expressed during group A streptococcal infection that represents a major virulence factor. Although subject to several studies, its role during infection is still under debate, and its proteolytic properties remain insufficiently characterized. Here, we revisited this protease through a set of complementary approaches relying on state of-the-art HPLC-MS methods. After conceiving an efficient protocol to recombinantly express SpeB, the zymogen of the protease and its activation were characterized. Employing proteome-derived peptide libraries, a strong preference for hydrophobic and aromatic residues at P2 alongside negatively charged amino acids at P3′ to P6′ was revealed. To identify relevant in vivo substrates, native proteins were obtained from monocytic secretome and plasma to assess their cleavage under physiological conditions. Besides corroborating our findings concerning specificity, more than 200 cleaved proteins were identified, including proteins of the extracellular matrix, proteins of the immune system, and proteins involved in inflammation. Finally, the cleavage of IgG subclasses was studied in detail. This study precisely depicts the proteolytic properties of SpeB and provides a library of potential host substrates, including their exact cleavage positions, as a valuable source for further research to unravel the role of SpeB during streptococcal infection., C.B., C.H., M.L., R.B., J.H.H., H.B. and C.G.H. acknowledge funding by the Austrian Science Fund (FWF, project number W1213); M.L., J.H.H., H.B., and C.G.H. by the Land Salzburg 20102-F2001080-FPR “Cancer Cluster II”; and U.E. by the Beatriu de Pinós COFUND program (2018 BP 00163).
- Published
- 2021
32. Structural analysis of key gap junction domains—Lessons from genome data and disease-linked mutants.
- Author
-
Bai, Donglin
- Subjects
- *
CHEMICAL structure , *GAP junctions (Cell biology) , *DATA analysis , *GENETIC mutation , *STRUCTURE-activity relationships - Abstract
A gap junction (GJ) channel is formed by docking of two GJ hemichannels and each of these hemichannels is a hexamer of connexins. All connexin genes have been identified in human, mouse, and rat genomes and their homologous genes in many other vertebrates are available in public databases. The protein sequences of these connexins align well with high sequence identity in the same connexin across different species. Domains in closely related connexins and several residues in all known connexins are also well-conserved. These conserved residues form signatures (also known as sequence logos) in these domains and are likely to play important biological functions. In this review, the sequence logos of individual connexins, groups of connexins with common ancestors, and all connexins are analyzed to visualize natural evolutionary variations and the hot spots for human disease-linked mutations. Several gap junction domains are homologous, likely forming similar structures essential for their function. The availability of a high resolution Cx26 GJ structure and the subsequently-derived homology structure models for other connexin GJ channels elevated our understanding of sequence logos at the three-dimensional GJ structure level, thus facilitating the understanding of how disease-linked connexin mutants might impair GJ structure and function. This knowledge will enable the design of complementary variants to rescue disease-linked mutants. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
33. Author response for 'Structure memes: Intuitive visualization of sequence logo and subfamily logo information in a 3D protein‐structural context'
- Author
-
Eric Beitz
- Subjects
Structure (mathematical logic) ,Sequence logo ,Subfamily ,Information retrieval ,Computer science ,Structural context ,Logo ,Visualization - Published
- 2021
- Full Text
- View/download PDF
34. Structure memes: Intuitive visualization of sequence logo and subfamily logo information in a 3D protein-structural context
- Author
-
Eric Beitz
- Subjects
Structure (mathematical logic) ,0303 health sciences ,Subfamily ,Computer science ,030302 biochemistry & molecular biology ,Proteins ,Computational biology ,Biochemistry ,Visualization ,03 medical and health sciences ,A-site ,Sequence logo ,Protein structure ,Structural Biology ,Simple (abstract algebra) ,Amino Acid Sequence ,Molecular Biology ,Sequence Alignment ,Software ,030304 developmental biology ,Sequence (medicine) - Abstract
The number of available protein sequences covering virtually all known species is tremendous and ever growing due to the feasibility of the underlying nucleotide sequencing. The speed at which protein structures are being determined is increasing, and as a result of refined cryo-electron microscopy the proportion of solved membrane protein folds is expanding. Sequence data are used to illustrate evolution and to group proteins into families with various levels of subfamilies. Structure data of prototypical proteins provide insight into function brought about by an interplay of specific amino acid residues that are dispersed throughout the sequence. Visually combining rich sequence information with structure data in an intuitively comprehensible way would enhance the process of elucidating key protein aspects regarding evolution, sequence relations, and function. Here, a method is described that projects the information contained in sequence logos and subfamily logos onto protein structures. The amino acid composition at a site is encoded by a mix color in the red-yellow-blue space and the information content is presented by the radius of a sphere at the α-carbon position. The resulting display is termed "structure meme." The underlying sequence and atom coordinate data are retained in the file for simple retrieval on demand using a molecular structure visualization program. Structure memes are recognizable and convey extensive information in a human-discernable way that requires little training.
- Published
- 2021
35. DiffLogo: a comparative visualization of sequence motifs.
- Author
-
Nettling, Martin, Treutler, Hendrik, Grau, Jan, Keilwagen, Jens, Posch, Stefan, and Grosse, Ivo
- Subjects
- *
HELIX-loop-helix motifs , *VISUALIZATION , *TRANSCRIPTION factors , *CELL lines , *GENETIC algorithms - Abstract
Background: For three decades, sequence logos are the de facto standard for the visualization of sequence motifs in biology and bioinformatics. Reasons for this success story are their simplicity and clarity. The number of inferred and published motifs grows with the number of data sets and motif extraction algorithms. Hence, it becomes more and more important to perceive differences between motifs. However, motif differences are hard to detect from individual sequence logos in case of multiple motifs for one transcription factor, highly similar binding motifs of different transcription factors, or multiple motifs for one protein domain. Results: Here, we present DiffLogo, a freely available, extensible, and user-friendly R package for visualizing motif differences. DiffLogo is capable of showing differences between DNA motifs as well as protein motifs in a pair-wise manner resulting in publication-ready figures. In case of more than two motifs, DiffLogo is capable of visualizing pair-wise differences in a tabular form. Here, the motifs are ordered by similarity, and the difference logos are colored for clarity. We demonstrate the benefit of DiffLogo on CTCF motifs from different human cell lines, on E-box motifs of three basic helix-loop-helix transcription factors as examples for comparison of DNA motifs, and on F-box domains from three different families as example for comparison of protein motifs. Conclusions: DiffLogo provides an intuitive visualization of motif differences. It enables the illustration and investigation of differences between highly similar motifs such as binding patterns of transcription factors for different cell types, treatments, and algorithmic approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
36. Software for the analysis and visualization of deep mutational scanning data.
- Author
-
Bloom, Jesse D.
- Subjects
- *
BIOINFORMATICS , *AMINO acids , *GENETIC mutation , *SOFTWARE sequencers , *INFORMATION science - Abstract
Background: Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection. Results: I describe a software package, dms_tools, to infer the impacts of mutations from deep mutational scanning data using a likelihood-based treatment of the mutation counts. I show that dms_tools yields more accurate inferences on simulated data than simply calculating ratios of counts pre- and post-selection. Using dms_tools, one can infer the preference of each site for each amino acid given a single selection pressure, or assess the extent to which these preferences change under different selection pressures. The preferences and their changes can be intuitively visualized with sequence-logo-style plots created using an extension to weblogo. Conclusions: dms_tools implements a statistically principled approach for the analysis and subsequent visualization of deep mutational scanning data. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
37. Interpretable prioritization of splice variants in diagnostic next-generation sequencing
- Author
-
Daniel Danis, Peter N. Robinson, Melissa A. Haendel, Michael A. Gargano, Julie A. McMurry, Damian Smedley, Leigh C. Carmody, Ayushi Hegde, Giorgio Valentini, and Julius O.B. Jacobsen
- Subjects
Prioritization ,Computer science ,RNA Splicing ,Computational biology ,Biology ,Genome ,Article ,DNA sequencing ,Exon ,Exome Sequencing ,Genetics ,Cryptic Splice Sites ,Humans ,Exome ,splice ,Nucleotide ,Data Curation ,Genetics (clinical) ,Exome sequencing ,chemistry.chemical_classification ,Base Sequence ,Alternative splicing ,Genetic Diseases, Inborn ,Wild type ,Intron ,Computational Biology ,High-Throughput Nucleotide Sequencing ,Correction ,Exons ,Introns ,Human genetics ,Sequence logo ,chemistry ,Regulatory sequence ,Mutation ,RNA splicing ,RNA Splice Sites ,Algorithms ,Software - Abstract
A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5′ and 3′ ends of introns. To address this gap, we developed the Super Quick Informationcontent Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content (IC) of wildtype and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splicealtering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state of the art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings
- Published
- 2021
- Full Text
- View/download PDF
38. A New Lineage of Artificial Luciferases for Mammalian Cell Imaging
- Author
-
Rika Fujii and Sung Bae Kim
- Subjects
0301 basic medicine ,Luciferases ,Phylogenetic tree ,010405 organic chemistry ,Lineage (evolution) ,Computational biology ,Biology ,01 natural sciences ,0104 chemical sciences ,03 medical and health sciences ,Sequence logo ,chemistry.chemical_compound ,030104 developmental biology ,chemistry ,Coelenterazine ,Bioluminescence imaging ,Bioluminescence ,Luciferase - Abstract
The present protocol introduces a new lineage of artificial luciferases (ALucs) with unique optical properties for mammalian cell imaging. The primary candidate sequence was first created with a sequence logo generator, resulting in a total of 11 sibling sequences by extracting consensus amino acids from the alignment of 25 copepod luciferase sequences available in natural luciferase pools in public databases. Phylogenetic analysis shows that the newly fabricated ALucs form an independent branch, genetically isolated from the natural luciferases and from a prior series of ALucs produced by our laboratory using a smaller basis set. The protocol also exemplifies that the new lineage of ALucs was strongly luminescent in living mammalian cells with specific substrate selectivity to native coelenterazine. The success of this approach guides on how to engineer and functionalize marine luciferases for bioluminescence imaging and assays.
- Published
- 2021
- Full Text
- View/download PDF
39. Sequence Logo
- Author
-
Rédei, George P.
- Published
- 2008
- Full Text
- View/download PDF
40. Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models.
- Author
-
Wheeler, Travis J., Clements, Jody, and Finn, Robert D.
- Subjects
- *
LOGOS (Symbols) , *MOLECULAR biology , *INTERACTIVE computer graphics , *GRAPHICAL modeling (Statistics) , *MARKOV processes , *WEBSITES - Abstract
Background Logos are commonly used in molecular biology to provide a compact graphical representation of the conservation pattern of a set of sequences. They render the information contained in sequence alignments or profile hidden Markov models by drawing a stack of letters for each position, where the height of the stack corresponds to the conservation at that position, and the height of each letter within a stack depends on the frequency of that letter at that position. Results We present a new tool and web server, called Skylign, which provides a unified framework for creating logos for both sequence alignments and profile hidden Markov models. In addition to static image files, Skylign creates a novel interactive logo plot for inclusion in web pages. These interactive logos enable scrolling, zooming, and inspection of underlying values. Skylign can avoid sampling bias in sequence alignments by down-weighting redundant sequences and by combining observed counts with informed priors. It also simplifies the representation of gap parameters, and can optionally scale letter heights based on alternate calculations of the conservation of a position. Conclusion Skylign is available as a website, a scriptable web service with a RESTful interface, and as a software package for download. Skylign's interactive logos are easily incorporated into a web page with just a few lines of HTML markup. Skylign may be found at http://skylign.org. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
41. RaacLogo: a new sequence logo generator by using reduced amino acid clusters
- Author
-
Lei Yang, Wuritu Yang, Yongchun Zuo, Dongyang Liu, and Lei Zheng
- Subjects
Web server ,Protein design ,Sequence alignment ,computer.software_genre ,03 medical and health sciences ,Sequence Analysis, Protein ,Consensus sequence ,Position-Specific Scoring Matrices ,Amino Acid Sequence ,Databases, Protein ,Cluster analysis ,Molecular Biology ,030304 developmental biology ,Sequence (medicine) ,0303 health sciences ,Generator (computer programming) ,business.industry ,030302 biochemistry & molecular biology ,Pattern recognition ,Sequence logo ,Artificial intelligence ,business ,Sequence Alignment ,computer ,Algorithms ,Software ,Information Systems - Abstract
Sequence logos give a fast and concise display in visualizing consensus sequence. Protein exhibits greater complexity and diversity than DNA, which usually affects the graphical representation of the logo. Reduced amino acids perform powerful ability for simplifying complexity of sequence alignment, which motivated us to establish RaacLogo. As a new sequence logo generator by using reduced amino acid alphabets, RaacLogo can easily generate many different simplified logos tailored to users by selecting various reduced amino acid alphabets that consisted of more than 40 clustering algorithms. This current web server provides 74 types of reduced amino acid alphabet, which were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with protein alignment. A two-dimensional selector was proposed for easily selecting desired RAACs with underlying biology knowledge. It is anticipated that the RaacLogo web server will play more high-potential roles for protein sequence alignment, topological estimation and protein design experiments. RaacLogo is freely available at http://bioinfor.imu.edu.cn/raaclogo.
- Published
- 2020
- Full Text
- View/download PDF
42. Escherichia coli σ38 promoters use two UP elements instead of a −35 element: resolution of a paradox and discovery that σ38 transcribes ribosomal promoters
- Author
-
Yuhong Zuo, Kevin S. Franco, Cedric Cagliero, Yan Ning Zhou, Mikhail Kashlev, Zhe Sun, Thomas D. Schneider, Chen Y, and Ding Jun Jin
- Subjects
Physics ,0303 health sciences ,biology ,Stereochemistry ,030302 biochemistry & molecular biology ,RNA ,Sigma ,Promoter ,Ribosomal RNA ,03 medical and health sciences ,Sequence logo ,chemistry.chemical_compound ,chemistry ,RNA polymerase ,biology.protein ,Dyad symmetry ,Polymerase ,030304 developmental biology - Abstract
1AbstractInE. coli, one RNA polymerase (RNAP) transcribes all RNA species, and different regulons are transcribed by employing different sigma (σ) factors. RNAP containingσ38(σS) activates genes responding to stress conditions such as stationary phase. The structure ofσ38promoters has been controversial for more than two decades. To construct a model ofσ38promoters using information theory, we aligned proven transcriptional start sites to maximize the sequence information, in bits, and identified a −10 element similar toσ70promoters. We could not align any −35 sequence logo; instead we found two patterns upstream of the −35 region. These patterns have dyad symmetry sequences and correspond to the location of UP elements in ribosomal RNA (rRNA) promoters. Additionally the UP element dyad symmetry suggests that the two polymeraseαsubunits, which bind to the UPs, should have two-fold dyad axis of symmetry on the polymerase and this is indeed observed in an X-ray crystal structure. Curiously theαCTDs should compete for overlapping UP elements.In vitroexperiments confirm thatσ38recognizes therrnBP1 promoter, requires a −10, UP elements and no −35. This clarifies the long-standing paradox of howσ38promoters differ from those ofσ70.
- Published
- 2020
- Full Text
- View/download PDF
43. Mapping the Structural Topology of IRS Family Cascades Through Computational Biology.
- Author
-
Chakraborty, Chiranjib, George Priya Doss, C., Bandyopadhyay, Sanghamitra, Sarkar, Bimal Kumar, and Syed Haneef, S. A.
- Abstract
Structural topologies of proteins play significant roles in analyzing their biological functions. Converting the amino acid data in a protein sequence into structural information to outline the function of a protein is a major challenge in post-genome research which can add an extra room in understanding the protein sequence–structure–function relationships. In this study, we performed a comprehensive bioinformatics analysis of structural topology of the IRS family members such as IRS-1, IRS-2, IRS-3, IRS-4, IRS-5 and IRS-6. Based on this assessment, we found that IRS-2 encloses the highest number of α helices, β sheets and β turns in the secondary structure topology compared to IRS-1 and IRS-6. IRS family members are rich in serine or leucine residues. Among the IRS family members, the highest percentage of serine and leucine was observed in IRS-1 (15 %) and IRS-5 (10 %), respectively. Notably, the highest number of disulphide bonds was observed in IRS-1 (10) which is responsible for structural stability of the protein. Hydrogen bond pattern in α helices and β sheet was recorded in IRS-1, IRS-2 and IRS-6. By conservation analysis, the longest protein IRS-3 was found to be highly conserved among the IRS family members. The cluster of sequence logo present in the N terminus of these cascades was noted, and highly conserved residues in N-terminal region help in the formation of the two highly conserved domains such as PH domain and PTB domain. Results generated from this analysis will be more beneficial to researchers in understanding more about insulin signalling mechanism(s) as well as insulin resistance pathway. We discuss here that bioinformatics tools utilized in this study can play a vital role in addressing the complexity of structural topology to understand structure–function relationships in insulin signalling cascades. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
44. Oxysterol binding proteins (OSBPs) and their encoding genes in Arabidopsis and rice
- Author
-
Umate, Pavan
- Subjects
- *
OXYSTEROLS , *CARRIER proteins , *ARABIDOPSIS , *RICE , *BIOSYNTHESIS , *STEROID hormones , *STEROLS , *ENDOPLASMIC reticulum - Abstract
Abstract: Cell wall deposition, biosynthesis of steroid hormones, and maintenance of membrane composition and integrity, are some of the crucial functions of sterols in plants. Followed by their synthesis in the endoplasmic reticulum, the sterols accumulate in the plasma membrane. The concept of sterol trafficking in plant cell is not well understood. The oxysterol binding proteins are implicated in sterol transport in non-plant systems. In the study, the oxysterol binding proteins in Arabidopsis and rice are described and classified. The Arabidopsis genome encodes 12 oxysterol binding proteins-related proteins (ORPs) as compared to 6 oxysterol binding proteins (OSBPs/ORPs) in rice. The protein alignment studies reveal that amino acid sequences for oxysterol binding proteins are relatively well conserved in Arabidopsis and rice. The rice OSBPs are classified based on their phylogenetic relationship with Arabidopsis ORPs. The sequence LOGO built on LOC_Os03g16690 indicated presence of fingerprint region of amino acids “EQVSHHPP” for Arabidopsis and rice OSBPs/ORPs. The organization of pleckstrin homology domain is identified in several OSBPs/ORPs in Arabidopsis and rice. The Arabidopsis oligonucleotide array data is explored to understand the expression patterns of ORPs under 17 different experimental conditions. The analysis showed the expression of ORPs in Arabidopsis is necessarily under the control of biotic stress, chemical, elicitor, hormone, light intensity, abiotic stress, and temperature conditions. The linear mean signal values for Arabidopsis ORPs revealed their relative expression patterns in different developmental stages. The genes for ORP3C and ORP3B are highly expressed in all developmental stages that were analyzed. The present study thus indicates crucial functional role of the individual members of this gene family in different environmental stress conditions. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
45. A brief review of molecular information theory.
- Author
-
Schneider, Thomas D.
- Subjects
INFORMATION theory ,HIGH technology ,NANOTECHNOLOGY ,SYSTEMS theory - Abstract
Abstract: The idea that we could build molecular communications systems can be advanced by investigating how actual molecules from living organisms function. Information theory provides tools for such an investigation. This review describes how we can compute the average information in the DNA binding sites of any genetic control protein and how this can be extended to analyze its individual sites. A formula equivalent to Claude Shannon’s channel capacity can be applied to molecular systems and used to compute the efficiency of protein binding. This efficiency is often 70% and a brief explanation for that is given. The results imply that biological systems have evolved to function at channel capacity, which means that we should be able to build molecular communications that are just as robust as our macroscopic ones. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
46. Genome-wide analysis of helicase gene family from rice and Arabidopsis: a comparison with yeast and human.
- Author
-
Umate, Pavan, Tuteja, Renu, and Tuteja, Narendra
- Abstract
Helicases are motor proteins which can catalyze the unwinding of stable RNA or DNA duplex utilizing mainly ATP as source of energy. In this study we have identified complete sets of helicases from rice and Arabidopsis. The helicase gene family in rice and Arabidopsis contains 115 and 113 genes respectively. These helicases were validated based on their annotations and supported with organization of conserved helicase signature motifs. We have also identified homologs of 64 rice RNA and DNA helicases in Arabidopsis, yeast and human. We explored Arabidopsis oligonucleotide array data to gain functional insights into the transcriptome of helicase family members under ten different stress conditions. Our results revealed that expression of helicase genes is profoundly regulated under various stress conditions. The helicases identified in this study lay a foundation for the in depth characterization of each helicase type. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
47. Proteolytic Profiling of Streptococcal Pyrogenic Exotoxin B (SpeB) by Complementary HPLC-MS Approaches.
- Author
-
Blöchl, Constantin, Holzner, Christoph, Luciano, Michela, Bauer, Renate, Horejs-Hoeck, Jutta, Eckhard, Ulrich, Brandstetter, Hans, and Huber, Christian G.
- Subjects
- *
ZYMOGENS , *EXTRACELLULAR matrix proteins , *EXOTOXIN , *PEPTIDES , *STREPTOCOCCAL diseases , *AMINO acids - Abstract
Streptococcal pyrogenic exotoxin B (SpeB) is a cysteine protease expressed during group A streptococcal infection that represents a major virulence factor. Although subject to several studies, its role during infection is still under debate, and its proteolytic properties remain insufficiently characterized. Here, we revisited this protease through a set of complementary approaches relying on state of-the-art HPLC-MS methods. After conceiving an efficient protocol to recombinantly express SpeB, the zymogen of the protease and its activation were characterized. Employing proteome-derived peptide libraries, a strong preference for hydrophobic and aromatic residues at P2 alongside negatively charged amino acids at P3′ to P6′ was revealed. To identify relevant in vivo substrates, native proteins were obtained from monocytic secretome and plasma to assess their cleavage under physiological conditions. Besides corroborating our findings concerning specificity, more than 200 cleaved proteins were identified, including proteins of the extracellular matrix, proteins of the immune system, and proteins involved in inflammation. Finally, the cleavage of IgG subclasses was studied in detail. This study precisely depicts the proteolytic properties of SpeB and provides a library of potential host substrates, including their exact cleavage positions, as a valuable source for further research to unravel the role of SpeB during streptococcal infection. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. Re-evaluation of protein kinase CK2 pleiotropy: new insights provided by a phosphoproteomics analysis of CK2 knockout cells
- Author
-
Silvia Zaramella, Jordi Vilardell, Christian Borgo, Cinzia Franchin, Lorenzo A. Pinna, Mauro Salvi, Giorgio Arrigoni, and Luca Cesaro
- Subjects
Phosphopeptides ,Proteomics ,0301 basic medicine ,Cell signaling ,animal structures ,Protein subunit ,Biology ,Sequence logo ,Cell Line ,Gene Knockout Techniques ,Mice ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Protein phosphorylation ,Stable isotope labeling by amino acids in cell culture ,Quantitative proteomics ,Consensus sequence ,Animals ,Phosphorylation ,Casein Kinase II ,Protein kinase A ,Molecular Biology ,Pharmacology ,Kinase ,fungi ,Phosphoproteomics ,Cell Biology ,Cell biology ,030104 developmental biology ,Kinase inhibitors ,embryonic structures ,Cell signalling ,Molecular Medicine ,Gene Deletion - Abstract
CK2 denotes a ubiquitous and pleiotropic protein kinase whose holoenzyme is composed of two catalytic (α and/or α') and two regulatory β subunits. The CK2 consensus sequence, S/T-x-x-D/E/pS/pT is present in numerous phosphosites, but it is not clear how many of these are really generated by CK2. To gain information about this issue, advantage has been taken of C2C12 cells entirely deprived of both CK2 catalytic subunits by the CRISPR/Cas9 methodology. A comparative SILAC phosphoproteomics analysis reveals that, although about 30% of the quantified phosphosites do conform to the CK2 consensus, only one-third of these are substantially reduced in the CK2α/α'(-/-) cells, consistent with their generation by CK2. A parallel study with C2C12 cells deprived of the regulatory β subunit discloses a role of this subunit in determining CK2 targeting. We also find that phosphosites notoriously generated by CK2 are not fully abrogated in CK2α/α'(-/-) cells, while some phosphosites unrelated to CK2 are significantly altered. Collectively taken our data allow to conclude that the phosphoproteome generated by CK2 is not as ample and rigidly pre-determined as it was believed before. They also show that the lack of CK2 promotes phosphoproteomics perturbations attributable to kinases other than CK2.
- Published
- 2017
- Full Text
- View/download PDF
49. Logo2PWM: a tool to convert sequence logo to position weight matrix
- Author
-
Lu Liu, Zhen Gao, and Jianhua Ruan
- Subjects
0301 basic medicine ,lcsh:QH426-470 ,Median correlation ,lcsh:Biotechnology ,Position weight matrix ,Biology ,Sequence logo ,Binding site ,03 medical and health sciences ,lcsh:TP248.13-248.65 ,Genetics ,Analysis software ,Binding Sites ,030102 biochemistry & molecular biology ,business.industry ,Research ,Computational Biology ,Convert ,Pattern recognition ,Sequence Analysis, DNA ,DNA binding site ,Motif finding ,lcsh:Genetics ,030104 developmental biology ,Artificial intelligence ,business ,Transcription ,Pulse-width modulation ,Software ,Biotechnology ,Transcription Factors - Abstract
Background position weight matrix (PWM) and sequence logo are the most widely used representations of transcription factor binding site (TFBS) in biological sequences. Sequence logo - a graphical representation of PWM, has been widely used in scientific publications and reports, due to its easiness of human perception, rich information, and simple format. Different from sequence logo, PWM works great as a precise and compact digitalized form, which can be easily used by a variety of motif analysis software. There are a few available tools to generate sequence logos from PWM; however, no tool does the reverse. Such tool to convert sequence logo back to PWM is needed to scan a TFBS represented in logo format in a publication where the PWM is not provided or hard to be acquired. A major difficulty in developing such tool to convert sequence logo to PWM is to deal with the diversity of sequence logo images. Results We propose logo2PWM for reconstructing PWM from a large variety of sequence logo images. Evaluation results on over one thousand logos from three sources of different logo format show that the correlation between the reconstructed PWMs and the original PWMs are constantly high, where median correlation is greater than 0.97. Conclusion Because of the high recognition accuracy, the easiness of usage, and, the availability of both web-based service and stand-alone application, we believe that logo2PWM can readily benefit the study of transcription by filling the gap between sequence logo and PWM.
- Published
- 2017
- Full Text
- View/download PDF
50. Fabrication of a New Lineage of Artificial Luciferases from Natural Luciferase Pools
- Author
-
Koji Suzuki, Ryo Nishihara, Sung Bae Kim, and Daniel Citterio
- Subjects
0301 basic medicine ,Lineage (evolution) ,Computational biology ,010402 general chemistry ,01 natural sciences ,Cell Line ,Substrate Specificity ,Copepoda ,03 medical and health sciences ,chemistry.chemical_compound ,Coelenterazine ,Chlorocebus aethiops ,Consensus Sequence ,Animals ,Bioluminescence ,Luciferase ,Amino Acid Sequence ,Luciferases ,Gene ,Phylogeny ,Phylogenetic tree ,Chemistry ,General Chemistry ,General Medicine ,Molecular biology ,0104 chemical sciences ,Sequence logo ,030104 developmental biology ,COS Cells ,Luminescent Measurements ,Sequence Alignment - Abstract
The fabrication of artificial luciferases (ALucs) with unique optical properties has a fundamental impact on bioassays and molecular imaging. In this study, we developed a new lineage of ALucs with unique substrate preferences by extracting consensus amino acids from the alignment of 25 copepod luciferase sequences available in natural luciferase pools. The primary sequence was first created with a sequence logo generator resulting in a total of 11 sibling sequences. Phylogenetic analysis shows that the newly fabricated ALucs form an independent branch, genetically isolated from the natural luciferases, and from a prior series of ALucs produced by our laboratory using a smaller basis set. The new lineage of ALucs were strongly luminescent in living mammalian cells with specific substrate selectivity to native coelenterazine. A single-residue-level comparison of the C-terminal sequences of new ALucs reveals that some amino acids in the C-terminal ends are greatly influential on the optical intensities but limited in the color variance. The success of this approach guides on how to engineer and functionalize marine luciferases for bioluminescence imaging and assays.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.