Author: "Hägglund, Marcus" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hägglund, Marcus"' showing total 5 results

Start Over Author "Hägglund, Marcus"

5 results on '"Hägglund, Marcus"'

1. Djupinlärningsmetoder för gruppering av källkod efter funktionalitet

Author: Hägglund, Marcus
Subjects: Klusteranalys, Källkod, cuBERT, Code2Vec, Cluster Analysis, Deep learning, Sannolikhetsteori och statistik, Probability Theory and Statistics, Source Code, Djupinlärning
Abstract: With the rise of artificial intelligence, applications for machine learning can be found in nearly everyaspect of modern life, from healthcare and transportation to software services like recommendationsystems. Consequently, there are now more developers engaged in the field than ever - with the numberof implementations rapidly increasing by the day. In order to meet the new demands, it would be usefulto provide services that allow for an easy orchestration of a large number of repositories. Enabling usersto easily share, access and search for source code would be beneficial for both research and industryalike. A first step towards this is to find methods for clustering source code by functionality. The problem of clustering source code has previously been studied in the literature. However, theproposed methods have so far not leveraged the capabilities of deep neural networks (DNN). In thiswork, we investigate the possibility of using DNNs to learn embeddings of source code for the purpose ofclustering by functionality. In particular, we evaluate embeddings from Code2Vec and cuBERT modelsfor this specific purpose. From the results of our work we conclude that both Code2Vec and cuBERT are capable of learningsuch embeddings. Among the different frameworks that we used to fine-tune cuBERT, we found thebest performance for this task when fine-tuning the model under the triplet loss criterion. With thisframework, the model was capable of learning embeddings that yielded the most compact and well-separated clusters. We found that a majority of the cluster assignments were semantically coherent withrespect to the functionalities implemented by the methods. With these results, we have found evidenceindicating that it is possible to learn embeddings of source code that encode the functional similaritiesamong the methods. Future research could therefore aim to further investigate the possible applicationsof the embeddings learned by the different frameworks. Med den avsevärda ökningen av användandet av artificiell intelligens går det att finna tillämpningar förmaskininlärningsalgoritmer i nästan alla aspekter av det moderna livet, från sjukvård och transport tillmjukvarutjänster som rekommendationssystem. Till följd av detta så är det fler utvecklare än någonsinengagerade inom området, där antalet nya implementationer ökar för var dag. För att möta de nyakraven skulle det vara användbart att kunna tillhandahålla tjänster som möjliggör en enkel hantering avett stort antal kodförråd. Att göra det möjligt för användare att enkelt dela, komma åt och söka efterkällkod skulle vara till nytta inom både forskning och industri. Ett första steg mot detta är att hittametoder som gör det möjligt att klustra källkod med avseende på funktionalitet. Problemet med klustring av källkod är något som har tidigare studerats. De föreslagna metoderna hardock hittils inte utnyttjat kapaciteten hos djupa neurala nätverk (DNN). I detta arbete undersöker vimöjligheten att använda DNN för inlärning av inbäddningar av källkod i syfte att klustra med avseendepå funktionalitet. I synnerhet så utvärderar vi inbäddningar från Code2Vec- och cuBERT-modeller fördetta specifika ändamål. Från resultatet av vårt arbete drar vi slutsatsen att både Code2Vec och cuBERT har kapacitet för attlära sig sådana inbäddningar. Bland de olika ramverken som vi undersökte för att finjustera cuBERT,fann vi att modellen som finjusterades under triplet-förlustkriteriet var bäst lämpad för denna uppgift.Med detta ramverk kunde modellen lära sig inbäddningar som resulterade i de mest kompakta och välseparerade klusterna, där en majoritet av klustertilldelningarna var semantiskt sammanhängande medavseende på funktionaliteten som metoderna implementerade. Med dessa resultat har vi funnit beläggsom tyder på att det är möjligt att lära sig inbäddning av källkod som bevarar och åtger funktionellalikheter mellan metoder. Framtida forskning kan därför syfta till att ytterligare undersöka de olikamöjliga användningsområdena för de inbäddningar som lärts in inom de olika ramverken.
Published: 2021

2. Deep Learning Approaches for Clustering Source Code by Functionality

Author: Hägglund, Marcus and Hägglund, Marcus
Abstract: With the rise of artificial intelligence, applications for machine learning can be found in nearly everyaspect of modern life, from healthcare and transportation to software services like recommendationsystems. Consequently, there are now more developers engaged in the field than ever - with the numberof implementations rapidly increasing by the day. In order to meet the new demands, it would be usefulto provide services that allow for an easy orchestration of a large number of repositories. Enabling usersto easily share, access and search for source code would be beneficial for both research and industryalike. A first step towards this is to find methods for clustering source code by functionality. The problem of clustering source code has previously been studied in the literature. However, theproposed methods have so far not leveraged the capabilities of deep neural networks (DNN). In thiswork, we investigate the possibility of using DNNs to learn embeddings of source code for the purpose ofclustering by functionality. In particular, we evaluate embeddings from Code2Vec and cuBERT modelsfor this specific purpose. From the results of our work we conclude that both Code2Vec and cuBERT are capable of learningsuch embeddings. Among the different frameworks that we used to fine-tune cuBERT, we found thebest performance for this task when fine-tuning the model under the triplet loss criterion. With thisframework, the model was capable of learning embeddings that yielded the most compact and well-separated clusters. We found that a majority of the cluster assignments were semantically coherent withrespect to the functionalities implemented by the methods. With these results, we have found evidenceindicating that it is possible to learn embeddings of source code that encode the functional similaritiesamong the methods. Future research could therefore aim to further investigate the possible applicationsof the embeddings learned by the different frameworks., Med den avsevärda ökningen av användandet av artificiell intelligens går det att finna tillämpningar förmaskininlärningsalgoritmer i nästan alla aspekter av det moderna livet, från sjukvård och transport tillmjukvarutjänster som rekommendationssystem. Till följd av detta så är det fler utvecklare än någonsinengagerade inom området, där antalet nya implementationer ökar för var dag. För att möta de nyakraven skulle det vara användbart att kunna tillhandahålla tjänster som möjliggör en enkel hantering avett stort antal kodförråd. Att göra det möjligt för användare att enkelt dela, komma åt och söka efterkällkod skulle vara till nytta inom både forskning och industri. Ett första steg mot detta är att hittametoder som gör det möjligt att klustra källkod med avseende på funktionalitet. Problemet med klustring av källkod är något som har tidigare studerats. De föreslagna metoderna hardock hittils inte utnyttjat kapaciteten hos djupa neurala nätverk (DNN). I detta arbete undersöker vimöjligheten att använda DNN för inlärning av inbäddningar av källkod i syfte att klustra med avseendepå funktionalitet. I synnerhet så utvärderar vi inbäddningar från Code2Vec- och cuBERT-modeller fördetta specifika ändamål. Från resultatet av vårt arbete drar vi slutsatsen att både Code2Vec och cuBERT har kapacitet för attlära sig sådana inbäddningar. Bland de olika ramverken som vi undersökte för att finjustera cuBERT,fann vi att modellen som finjusterades under triplet-förlustkriteriet var bäst lämpad för denna uppgift.Med detta ramverk kunde modellen lära sig inbäddningar som resulterade i de mest kompakta och välseparerade klusterna, där en majoritet av klustertilldelningarna var semantiskt sammanhängande medavseende på funktionaliteten som metoderna implementerade. Med dessa resultat har vi funnit beläggsom tyder på att det är möjligt att lära sig inbäddning av källkod som bevarar och åtger funktionellalikheter mellan metoder. Framtida forskning kan därför syfta till att ytterligare unde
Published: 2021

3. COCLUBERT : Clustering Machine Learning Source Code

Author: Hägglund, Marcus, Pena, Francisco J., Pashami, Sepideh, Al-Shishtawy, Ahmad, Payberah, Amir H., Hägglund, Marcus, Pena, Francisco J., Pashami, Sepideh, Al-Shishtawy, Ahmad, and Payberah, Amir H.
Abstract: Nowadays, we can find machine learning (ML) applications in nearly every aspect of modern life, and we see that more developers are engaged in the field than ever. In order to facilitate the development of new ML applications, it would be beneficial to provide services that enable developers to share, access, and search for source code easily. A step towards making such a service is to cluster source code by functionality. In this work, we present COCLUBERT, a BERT-based model for source code embedding based on their functionality and clustering them accordingly. We build COCLUBERT using CuBERT, a variant of BERT pre-trained on source code, and present three ways to fine-tune it for the clustering task. In the experiments, we compare COCLUBERT with a baseline model, where we cluster source code using CuBERT embedding without fine-tuning. We show that COCLUBERT significantly outperforms the baseline model by increasing the Dunn Index metric by a factor of 141, the Silhouette Score metric by a factor of two, and the Adjusted Rand Index metric by a factor of 11., QC 20220530Part of proceedings ISBN 978-1-6654-4337-1
Published: 2021
Full Text: View/download PDF

4. Machine Learning Methods for Classification of G Protein-Coupled Receptors

Author: Hägglund, Marcus
Subjects: Teknik och teknologier, Engineering and Technology
Abstract: G-proteinkopplade receptorer utgör ett av de största ämnesområderna inom läkemedelsindustrin då approximativt en tredjedel av dagens mediciner binder till dessa. Målet med detta projekt är att använda olika maskininlärningsalgoritmer för att se hur väl de kan klassifiera dessa receptorer i olika tillstånd. Både de övervakade och oövervakade algoritmerna identifierade snarlika områden av proteinet som viktiga för klassificering av de olika tillstånden. Mer specifikt så kunde de övervakade metoderna "Random Forest" och "Multilayer Perceptron" göra förutsägelser om vilket tillstånd som proteinet befinner sig i med mycket god träffsäkerhet. De undersökta metoderna kan komma till användning när det gäller att analyzera mer komplexa molekyldynamiska simuleringar av G proteinkopplade receptorer. Slutligen kan detta sedan användas för att få en djupare förståelse av ligand bindningsmekanismen, vilket är ett avgörande steg i utvecklingen av nya mediciner för att behandla sjukdomar. G protein-coupled receptors are one of the biggest targets for pharmaceutical drugs today. The aim with this project was to use different machine learning algorithms to classify the protein into different functional states and compare the results obtained by different algorithms. Both the supervised and unsupervised methods implemented in this project identified similar regions of the protein as important for classification of their functional state. More specifically, the supervised methods Random Forest and Multilayer Perceptron were able to make predictions of the functional state of a protein with great accuracy. The methods investigated will be useful for designing and analyzing the molecular dynamics simulations of GPCRs. Ultimately this will further our understanding of the drug binding mechanics, a critical step for the rational development of new drugs to treat various diseases.
Published: 2019

5. Maskininlärningsmetoder för klassificering av G-proteinkopplade receptorer

Author: Hägglund, Marcus and Hägglund, Marcus
Abstract: G-proteinkopplade receptorer utgör ett av de största ämnesområderna inom läkemedelsindustrin då approximativt en tredjedel av dagens mediciner binder till dessa. Målet med detta projekt är att använda olika maskininlärningsalgoritmer för att se hur väl de kan klassifiera dessa receptorer i olika tillstånd. Både de övervakade och oövervakade algoritmerna identifierade snarlika områden av proteinet som viktiga för klassificering av de olika tillstånden. Mer specifikt så kunde de övervakade metoderna "Random Forest" och "Multilayer Perceptron" göra förutsägelser om vilket tillstånd som proteinet befinner sig i med mycket god träffsäkerhet. De undersökta metoderna kan komma till användning när det gäller att analyzera mer komplexa molekyldynamiska simuleringar av G proteinkopplade receptorer. Slutligen kan detta sedan användas för att få en djupare förståelse av ligand bindningsmekanismen, vilket är ett avgörande steg i utvecklingen av nya mediciner för att behandla sjukdomar., G protein-coupled receptors are one of the biggest targets for pharmaceutical drugs today. The aim with this project was to use different machine learning algorithms to classify the protein into different functional states and compare the results obtained by different algorithms. Both the supervised and unsupervised methods implemented in this project identified similar regions of the protein as important for classification of their functional state. More specifically, the supervised methods Random Forest and Multilayer Perceptron were able to make predictions of the functional state of a protein with great accuracy. The methods investigated will be useful for designing and analyzing the molecular dynamics simulations of GPCRs. Ultimately this will further our understanding of the drug binding mechanics, a critical step for the rational development of new drugs to treat various diseases.
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Hägglund, Marcus"'

1. Djupinlärningsmetoder för gruppering av källkod efter funktionalitet

2. Deep Learning Approaches for Clustering Source Code by Functionality

3. COCLUBERT : Clustering Machine Learning Source Code

4. Machine Learning Methods for Classification of G Protein-Coupled Receptors

5. Maskininlärningsmetoder för klassificering av G-proteinkopplade receptorer

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

5 results on '"Hägglund, Marcus"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources