10 results on '"ÖZgür, Arzucan"'
Search Results
2. PEAK: Explainable Privacy Assistant through Automated Knowledge Extraction
- Author
-
Ayci, Gonul, Özgür, Arzucan, Şensoy, Murat, and Yolum, Pınar
- Subjects
FOS: Computer and information sciences ,Computer Science - Cryptography and Security ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction ,Cryptography and Security (cs.CR) ,Human-Computer Interaction (cs.HC) - Abstract
In the realm of online privacy, privacy assistants play a pivotal role in empowering users to manage their privacy effectively. Although recent studies have shown promising progress in tackling tasks such as privacy violation detection and personalized privacy recommendations, a crucial aspect for widespread user adoption is the capability of these systems to provide explanations for their decision-making processes. This paper presents a privacy assistant for generating explanations for privacy decisions. The privacy assistant focuses on discovering latent topics, identifying explanation categories, establishing explanation schemes, and generating automated explanations. The generated explanations can be used by users to understand the recommendations of the privacy assistant. Our user study of real-world privacy dataset of images shows that users find the generated explanations useful and easy to understand. Additionally, the generated explanations can be used by privacy assistants themselves to improve their decision-making. We show how this can be realized by incorporating the generated explanations into a state-of-the-art privacy assistant., Comment: 43 pages, 14 figures
- Published
- 2023
- Full Text
- View/download PDF
3. Evaluation of GPT and BERT-based models on identifying protein-protein interactions in biomedical text
- Author
-
Rehana, Hasin, Çam, Nur Bengisu, Basmaci, Mert, He, Yongqun, Özgür, Arzucan, and Hur, Junguk
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computation and Language (cs.CL) - Abstract
Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. However, with the fast-paced growth of biomedical literature, there is a growing need for automated and accurate extraction of PPIs to facilitate scientific knowledge discovery. Pre-trained language models, such as generative pre-trained transformer (GPT) and bidirectional encoder representations from transformers (BERT), have shown promising results in natural language processing (NLP) tasks. We evaluated the PPI identification performance of various GPT and BERT models using a manually curated benchmark corpus of 164 PPIs in 77 sentences from learning language in logic (LLL). BERT-based models achieved the best overall performance, with PubMedBERT achieving the highest precision (85.17%) and F1-score (86.47%) and BioM-ALBERT achieving the highest recall (93.83%). Despite not being explicitly trained for biomedical texts, GPT-4 achieved comparable performance to the best BERT models with 83.34% precision, 76.57% recall, and 79.18% F1-score. These findings suggest that GPT models can effectively detect PPIs from text data and have the potential for use in biomedical literature mining tasks.
- Published
- 2023
- Full Text
- View/download PDF
4. Interpreting Chemical Words of a Data-driven Segmentation Method as Protein Family Pharmacophores and Functional Groups
- Author
-
Temizer, Asu Büşra, Özçelik, Rıza, Koulani, Taha, Ozkirimli, Elif, Ulgen, Kutlu O., Karalı, Nilgün, and Özgür, Arzucan
- Subjects
Quantitative Biology - Biomolecules - Abstract
Machine learning models have found numerous successful applications in computational drug discovery. A large body of these models represents molecules as sequences since molecular sequences are easily available, simple, and informative. The sequence-based models often segment molecular sequences into pieces called chemical words and then apply advanced natural language processing techniques for tasks such as de novo drug design, property prediction, and binding affinity prediction. However, the fundamental building blocks of these models, chemical words, have not yet been studied from a chemical perspective so far, and it is unknown whether they capture chemical information. This raises the question: do chemical-word-based drug discovery models rely on chemically meaningful building blocks or arbitrary chemical subsequences? To answer this question, we first propose a novel pipeline to highlight the key chemical words for strong binding to a protein family and then study the substructures designated by the key chemical words for three protein families. For all three families, we find extensive evidence in the literature that chemical words can designate pharmacophores and functional groups, and thus chemical-word-based models, indeed, rely on chemically meaningful building blocks. Our findings will help shed light on the chemistry captured by the chemical words, and by machine learning models for drug discovery at large.
- Published
- 2022
5. Enhancements to the BOUN Treebank Reflecting the Agglutinative Nature of Turkish
- Author
-
Marşan, Büşra, Akkurt, Salih Furkan, Şen, Muhammet, Gürbüz, Merve, Güngör, Onur, Özateş, Şaziye Betül, Üsküdarlı, Suzan, Özgür, Arzucan, Güngör, Tunga, and Öztürk, Balkız
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computation and Language (cs.CL) - Abstract
In this study, we aim to offer linguistically motivated solutions to resolve the issues of the lack of representation of null morphemes, highly productive derivational processes, and syncretic morphemes of Turkish in the BOUN Treebank without diverging from the Universal Dependencies framework. In order to tackle these issues, new annotation conventions were introduced by splitting certain lemmas and employing the MISC (miscellaneous) tab in the UD framework to denote derivation. Representational capabilities of the re-annotated treebank were tested on a LSTM-based dependency parser and an updated version of the BoAT Tool is introduced., This is a peer reviewed article that has been presented in The International Conference on Agglutinative Language Technologies as a challenge of Natural Language Processing (ALTNLP) 2022
- Published
- 2022
6. A Dataset and BERT-based Models for Targeted Sentiment Analysis on Turkish Texts
- Author
-
Mutlu, M. Melih and Özgür, Arzucan
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,InformationSystems_MISCELLANEOUS ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
Targeted Sentiment Analysis aims to extract sentiment towards a particular target from a given text. It is a field that is attracting attention due to the increasing accessibility of the Internet, which leads people to generate an enormous amount of data. Sentiment analysis, which in general requires annotated data for training, is a well-researched area for widely studied languages such as English. For low-resource languages such as Turkish, there is a lack of such annotated data. We present an annotated Turkish dataset suitable for targeted sentiment analysis. We also propose BERT-based models with different architectures to accomplish the task of targeted sentiment analysis. The results demonstrate that the proposed models outperform the traditional sentiment analysis models for the targeted sentiment analysis task.
- Published
- 2022
7. Editorial: Machine Learning Methodologies to Study Molecular Interactions
- Author
-
Yakimovich, Artur, Özgür, Arzucan, Doğan, Tunca, and Ozkirimli, Elif
- Subjects
Editorial ,machine learning ,Molecular Biosciences ,DNA ,protein ,biomolecule ,molecular interactions ,interaction prediction - Published
- 2021
8. DebiasedDTA: A Framework for Improving the Generalizability of Drug-Target Affinity Prediction Models
- Author
-
Özçelik, Rıza, Bağ, Alperen, Atıl, Berk, Barsbey, Melih, Özgür, Arzucan, and Özkırımlı, Elif
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,FOS: Biological sciences ,Quantitative Biology - Quantitative Methods ,Quantitative Methods (q-bio.QM) ,Machine Learning (cs.LG) - Abstract
Computational models that accurately predict the binding affinity of an input protein-chemical pair can accelerate drug discovery studies. These models are trained on available protein-chemical interaction datasets, which may contain dataset biases that may lead the model to learn dataset-specific patterns, instead of generalizable relationships. As a result, the prediction performance of models drops for previously unseen biomolecules, $\textit{i.e.}$ the prediction models cannot generalize to biomolecules outside of the dataset. The latest approaches that aim to improve model generalizability either have limited applicability or introduce the risk of degrading prediction performance. Here, we present DebiasedDTA, a novel drug-target affinity (DTA) prediction model training framework that addresses dataset biases to improve the generalizability of affinity prediction models. DebiasedDTA reweights the training samples to mitigate the effect of dataset biases and is applicable to most DTA prediction models. The results suggest that models trained in the DebiasedDTA framework can achieve improved generalizability in predicting the interactions of the previously unseen biomolecules, as well as performance improvements on those previously seen. Extensive experiments with different biomolecule representations, model architectures, and datasets demonstrate that DebiasedDTA can upgrade DTA prediction models irrespective of the biomolecule representation, model architecture, and training dataset. Last but not least, we release DebiasedDTA as an open-source python library to enable other researchers to debias their own predictors and/or develop their own debiasing methods. We believe that this python library will corroborate and foster research to develop more generalizable DTA prediction models.
- Published
- 2021
9. A Hybrid Approach to Dependency Parsing: Combining Rules and Morphology with Deep Learning
- Author
-
Özateş, Şaziye Betül, Özgür, Arzucan, Güngör, Tunga, and Öztürk, Balkız
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,I.2.7 ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
Fully data-driven, deep learning-based models are usually designed as language-independent and have been shown to be successful for many natural language processing tasks. However, when the studied language is low-resourced and the amount of training data is insufficient, these models can benefit from the integration of natural language grammar-based information. We propose two approaches to dependency parsing especially for languages with restricted amount of training data. Our first approach combines a state-of-the-art deep learning-based parser with a rule-based approach and the second one incorporates morphological information into the parser. In the rule-based approach, the parsing decisions made by the rules are encoded and concatenated with the vector representations of the input words as additional information to the deep network. The morphology-based approach proposes different methods to include the morphological structure of words into the parser network. Experiments are conducted on the IMST-UD Treebank and the results suggest that integration of explicit knowledge about the target language to a neural parser through a rule-based parsing system and morphological analysis leads to more accurate annotations and hence, increases the parsing performance in terms of attachment scores. The proposed methods are developed for Turkish, but can be adapted to other languages as well., 25 pages, 7 figures
- Published
- 2020
10. Supervised and unsupervised machine learning techniques for text document categorization
- Author
-
Özgür, Arzucan, Alpaydın, Ahmet İbrahim Ethem, and Diğer
- Subjects
Computer Engineering and Computer Science and Control ,Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol - Abstract
ÖZET BELGE SINIFLANDIRMA İÇİN GÖZETİMLİ VE GÖZETİMSİZ ÖĞRENME ALGORİTMALARI Bilgisayar ve elektronik teknolojilerinin gelişmesi, İnternet ve Web'in yaygınlaş masıyla elektronik belgelerin miktarı her geçen gün artmaktadır. Bu elektronik veri- tabanlarmda ilgili verilere daha hızlı, kolay, ve doğru bir şekilde erişebilmek için bel gelerin otomatik olarak sınıflandırılması önem kazanmıştır. Otomatik sınıflandırma için temelde iki yapay öğrenme yaklaşımı vardır: gözetimli öğrenme ve gözetimsiz öğrenme. Gözetimli öğrenmede, önceden sınıfların bilinmesi ve bu sınıflara ait belgelerden oluşan bir öğrenme kümesi gerekir. Gözetimsiz öğrenmede ise sınıfların önceden bilinmesine ve herhangi bir aşamada insan yardımına ihtiyaç yoktur. Bu çalışmada otomatik belge sınıflandırma için gözetimli ve gözetimsiz temel yöntemleri ele alıyoruz. Bu temel yöntemlerin beş standart veritabanı üzerindeki başarımlarım farklı kıstaslara dayanarak inceüyor, gözetimli ve gözetimsiz öğrenme yaklaşımlarını birbiriyle kıyaslıyoruz. Bu çalışma sonucunda gözetimsiz yöntemler içinde fc-means ve bisecting A;-means'in belge öbeklenmesi için daha elverişli olduğunu gördük. Gözetimli yöntemler arasında en iyi başarımı destek vektör makinaları elde ediyor. Gözetimsiz yöntemler olmalarına rağmen A;-means ve bisecting fc-means göze timli bir yöntem olan naive Bayes'den daha kaliteli öbekler oluşturuyor. Gözetimsiz yöntemlerin oluşturduğu öbeklerin toplam benzerliği gözetimli yöntemlerininkinden genellikle daha yüksek. Bu sonuç öğrenme kümesinde hatalı bazı belgelerin olmasından kaynaklanıyor olabilir. Bu nedenle sınıfların belirlenmesi ve öğrenme kümesinin oluştu rulması aşamasında gözetimsiz yöntemlerden faydalanılmasını öneriyoruz. IV ABSTRACT SUPERVISED AND UNSUPERVISED MACHINE LEARNING TECHNIQUES FOR TEXT DOCUMENT CATEGORIZATION Automatic organization of documents has become an important research issue since the explosion of digital and online text information. There are mainly two ma chine learning approaches to enhance this task: supervised approach, where pre-defined category labels are assigned to documents based on the likelihood suggested by a train ing set of labelled documents; and unsupervised approach, where there is no need for human intervention or labelled documents at any point in the whole process. In this study we compare and evaluate the performance of the leading supervised and unsupervised techniques for document organization by using different standard performance measures and five standard document corpora. We conclude that among the unsupervised techniques we have evaluated, &-means and bisecting &-means perform the best in terms of time complexity and the quality of the clusters produced. On the other hand, among the supervised techniques support vector machines achieve the highest performance while naive Bayes performs the worst. Finally, we compare the supervised and the unsupervised techniques in terms of the quality of the clusters they produce. In contrast to our expectations, we observe that although &-means and bisecting fc-means are unsupervised they produce clusters of higher quality than the naive Bayes supervised technique. Furthermore, the overall similarities of the clustering solutions obtained by the unsupervised techniques are higher than the supervised ones. We discuss that the reason may be due to the outliers in the training set and we propose to use unsupervised techniques to enhance the task of pre-defining the categories and labelling the documents in the training set. 116
- Published
- 2004
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.