Author: "Marcin Tatjewski" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Marcin Tatjewski"' showing total 9 results

Start Over Author "Marcin Tatjewski"

9 results on '"Marcin Tatjewski"'

1. Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae

Author: Julian Zubek, Marcin Tatjewski, Adam Boniecki, Maciej Mnich, Subhadip Basu, and Dariusz Plewczynski
Subjects: Protein-protein interactions, Protein interaction networks, Multi-scale models, Protein sequence, Machine learning, Physico-chemical indices, Medicine, Biology (General), QH301-705.5
Abstract: Accurate identification of protein–protein interactions (PPI) is the key step in understanding proteins’ biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein–protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein–protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
Published: 2015
Full Text: View/download PDF

2. Multi-label Classification of Biomedical Articles.

Author: Karol Kurach, Krzysztof Pawlowski, Lukasz Romaszko, Marcin Tatjewski, Andrzej Janusz, and Hung Son Nguyen
Published: 2013
Full Text: View/download PDF

3. An Ensemble Approach to Multi-label Classification of Textual Data.

Author: Karol Kurach, Krzysztof Pawlowski, Lukasz Romaszko, Marcin Tatjewski, Andrzej Janusz, and Hung Son Nguyen
Published: 2012
Full Text: View/download PDF

4. The proline-rich region of glyceraldehyde-3-phosphate dehydrogenase from human sperm may bind SH3 domains, as revealed by a bioinformatic study of low-complexity protein segments

Author: Aleksandra Gruca, Dariusz Plewczynski, Marcin Tatjewski, and Marcin Grynberg
Subjects: 0301 basic medicine, Gene isoform, Genetics, 030102 biochemistry & molecular biology, biology, Sperm flagellum, Cell Biology, Sperm, SH3 domain, 03 medical and health sciences, 030104 developmental biology, stomatognathic system, Biochemistry, biology.protein, Tyrosine, Protein kinase A, Sperm motility, Glyceraldehyde 3-phosphate dehydrogenase, Developmental Biology
Abstract: Glyceraldehyde-3-phosphate dehydrogenase from human sperm (GAPDHS) provides energy to the sperm flagellum, and is therefore essential for sperm motility and male fertility. This isoform is distinct from somatic GAPDH, not only in being specific for the testis but also because it contains an additional amino-terminal region that encodes a proline-rich motif that is known to bind to the fibrous sheath of the sperm tail. By conducting a large-scale sequence comparison on low-complexity sequences available in databases, we identified a strong similarity between the proline-rich motif from GAPDHS and the proline-rich sequence from Ena/vasodilator-stimulated phosphoprotein-like (EVL), which is known to bind an SH3 domain of dynamin-binding protein (DNMBP). The putative binding partners of the proline-rich GAPDHS motif include SH3 domain-binding protein 4 (SH3BP4) and the IL2-inducible T-cell kinase/tyrosine-protein kinase ITK/TSK (ITK). This result implies that GAPDHS participates in specific signal-transduction pathways. Gene Ontology category-enrichment analysis showed several functional classes shared by both proteins, of which the most interesting ones are related to signal transduction and regulation of hydrolysis. Furthermore, a mutation of one EVL proline to leucine is known to cause colorectal cancer, suggesting that mutation of homologous amino acid residue in the GAPDHS motif may be functionally deleterious.
Published: 2016

5. Predicting Post-Translational Modifications from Local Sequence Fragments Using Machine Learning Algorithms: Overview and Best Practices

Author: Marcin, Tatjewski, Marcin, Kierczak, and Dariusz, Plewczynski
Subjects: Machine Learning, Sequence Analysis, Protein, Computational Biology, Proteins, Amino Acid Sequence, Protein Processing, Post-Translational, Algorithms, Software
Abstract: Here, we present two perspectives on the task of predicting post translational modifications (PTMs) from local sequence fragments using machine learning algorithms. The first is the description of the fundamental steps required to construct a PTM predictor from the very beginning. These steps include data gathering, feature extraction, or machine-learning classifier selection. The second part of our work contains the detailed discussion of more advanced problems which are encountered in PTM prediction task. Probably the most challenging issues which we have covered here are: (1) how to address the training data class imbalance problem (we also present statistics describing the problem); (2) how to properly set up cross-validation folds with an approach which takes into account the homology of protein data records, to address this problem we present our folds-over-clusters algorithm; and (3) how to efficiently reach for new sources of learning features. Presented techniques and notes resulted from intense studies in the field, performed by our and other groups, and can be useful both for researchers beginning in the field of PTM prediction and for those who want to extend the repertoire of their research techniques.
Published: 2016

6. Predicting Post-Translational Modifications from Local Sequence Fragments Using Machine Learning Algorithms: Overview and Best Practices

Author: Dariusz Plewczynski, Marcin Tatjewski, and Marcin Kierczak
Subjects: 0301 basic medicine, Training set, 030102 biochemistry & molecular biology, business.industry, Computer science, Best practice, Feature extraction, Feature selection, Machine learning, computer.software_genre, Cross-validation, 03 medical and health sciences, Local sequence, 030104 developmental biology, Posttranslational modification, Artificial intelligence, business, computer, Algorithm, Classifier (UML)
Abstract: Here, we present two perspectives on the task of predicting post translational modifications (PTMs) from local sequence fragments using machine learning algorithms. The first is the description of the fundamental steps required to construct a PTM predictor from the very beginning. These steps include data gathering, feature extraction, or machine-learning classifier selection. The second part of our work contains the detailed discussion of more advanced problems which are encountered in PTM prediction task. Probably the most challenging issues which we have covered here are: (1) how to address the training data class imbalance problem (we also present statistics describing the problem); (2) how to properly set up cross-validation folds with an approach which takes into account the homology of protein data records, to address this problem we present our folds-over-clusters algorithm; and (3) how to efficiently reach for new sources of learning features. Presented techniques and notes resulted from intense studies in the field, performed by our and other groups, and can be useful both for researchers beginning in the field of PTM prediction and for those who want to extend the repertoire of their research techniques.
Published: 2016

7. Nie całkiem obce. Zapożyczenia wyrazowe w języku polskim i czeskim

Author: Joanna Rączaszek-Leonardi, Diana Svobodová, Marcin Tatjewski, and Mirosław Bańko
Subjects: Psychology
Published: 2016

8. The proline-rich region of glyceraldehyde-3-phosphate dehydrogenase from human sperm may bind SH3 domains, as revealed by a bioinformatic study of low-complexity protein segments

Author: Marcin, Tatjewski, Aleksandra, Gruca, Dariusz, Plewczynski, and Marcin, Grynberg
Subjects: Male, src Homology Domains, Amino Acid Substitution, Proline, Leucine, Sperm Tail, Mutation, Missense, Humans, Glyceraldehyde-3-Phosphate Dehydrogenase (Phosphorylating), Protein-Tyrosine Kinases, Cell Adhesion Molecules, Adaptor Proteins, Signal Transducing, Signal Transduction
Abstract: Glyceraldehyde-3-phosphate dehydrogenase from human sperm (GAPDHS) provides energy to the sperm flagellum, and is therefore essential for sperm motility and male fertility. This isoform is distinct from somatic GAPDH, not only in being specific for the testis but also because it contains an additional amino-terminal region that encodes a proline-rich motif that is known to bind to the fibrous sheath of the sperm tail. By conducting a large-scale sequence comparison on low-complexity sequences available in databases, we identified a strong similarity between the proline-rich motif from GAPDHS and the proline-rich sequence from Ena/vasodilator-stimulated phosphoprotein-like (EVL), which is known to bind an SH3 domain of dynamin-binding protein (DNMBP). The putative binding partners of the proline-rich GAPDHS motif include SH3 domain-binding protein 4 (SH3BP4) and the IL2-inducible T-cell kinase/tyrosine-protein kinase ITK/TSK (ITK). This result implies that GAPDHS participates in specific signal-transduction pathways. Gene Ontology category-enrichment analysis showed several functional classes shared by both proteins, of which the most interesting ones are related to signal transduction and regulation of hydrolysis. Furthermore, a mutation of one EVL proline to leucine is known to cause colorectal cancer, suggesting that mutation of homologous amino acid residue in the GAPDHS motif may be functionally deleterious.
Published: 2015

9. Multi-label Classification of Biomedical Articles

Author: Marcin Tatjewski, Hung Son Nguyen, Andrzej Janusz, Krzysztof Pawłowski, Łukasz Romaszko, and Karol Kurach
Subjects: Multi-label classification, Computer science, business.industry, Search engine indexing, Object (computer science), Machine learning, computer.software_genre, Ensemble learning, Set (abstract data type), ComputingMethodologies_PATTERNRECOGNITION, Binary classification, Explicit semantic analysis, Artificial intelligence, Special case, business, computer
Abstract: In this paper we investigate a special case of classification problem, called multi-label learning, where each instance (or object) is associated with a set of target labels (or simple decisions). Multi-label classification is one of the most important issues in semantic indexing and text categorization systems. Most of multi-label classification methods are based on combination of binary classifiers, which are trained separately for each label. In this paper we concentrate on the application of ensemble technique to multi-label classification problem. We present the most recent ensemble methods for both the binary classifier training phase as well as the combination learning phase. The proposed methods have been implemented within the SONCA system which is a part of SYNAT project. We present some experiment results performed on PubMed Central biomedical articles database.
Published: 2013

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Marcin Tatjewski"'

1. Multi-level machine learning prediction of protein–protein interactions in Saccharomyces cerevisiae

2. Multi-label Classification of Biomedical Articles.

3. An Ensemble Approach to Multi-label Classification of Textual Data.

4. The proline-rich region of glyceraldehyde-3-phosphate dehydrogenase from human sperm may bind SH3 domains, as revealed by a bioinformatic study of low-complexity protein segments

5. Predicting Post-Translational Modifications from Local Sequence Fragments Using Machine Learning Algorithms: Overview and Best Practices

6. Predicting Post-Translational Modifications from Local Sequence Fragments Using Machine Learning Algorithms: Overview and Best Practices

7. Nie całkiem obce. Zapożyczenia wyrazowe w języku polskim i czeskim

8. The proline-rich region of glyceraldehyde-3-phosphate dehydrogenase from human sperm may bind SH3 domains, as revealed by a bioinformatic study of low-complexity protein segments

9. Multi-label Classification of Biomedical Articles

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

9 results on '"Marcin Tatjewski"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources