Author: "Marina Evangelou" / Topic: computer.software_genre - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Marina Evangelou"' showing total 11 results

Start Over Author "Marina Evangelou" Topic computer.software_genre

11 results on '"Marina Evangelou"'

1. Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study

Author: Vahid Shahrezaei, Theodoulos Rodosthenous, Marina Evangelou, and Engineering & Physical Science Research Council (EPSRC)
Subjects: Statistics and Probability, Multifactorial Inheritance, Multivariate analysis, AcademicSubjects/SCI01060, Bioinformatics, Iterative method, Computer science, Latent variable, Machine learning, computer.software_genre, 01 natural sciences, Biochemistry, Matrix decomposition, 010104 statistics & probability, 03 medical and health sciences, Humans, 0101 mathematics, Molecular Biology, 01 Mathematical Sciences, 030304 developmental biology, 0303 health sciences, business.industry, Systems Biology, Supervised learning, 06 Biological Sciences, Original Papers, Computer Science Applications, Computational Mathematics, ComputingMethodologies_PATTERNRECOGNITION, Phenotype, Computational Theory and Mathematics, Multivariate Analysis, Unsupervised learning, 08 Information and Computing Sciences, Artificial intelligence, Canonical correlation, business, computer, Algorithms
Abstract: Motivation Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p≫n) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. Results Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. Availability and implementation https://github.com/theorod93/sCCA. Supplementary information Supplementary data are available at Bioinformatics online.
Published: 2020

2. Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining

Author: Sarah Filippi, Marina Evangelou, Yoann Pitarch, Clément Frainay, Adnan Custovic, Imperial College London, Métabolisme et Xénobiotiques (ToxAlim-MeX), ToxAlim (ToxAlim), Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Ecole Nationale Vétérinaire de Toulouse (ENVT), Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Ecole d'Ingénieurs de Purpan (INPT - EI Purpan), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Recherche d’Information et Synthèse d’Information (IRIT-IRIS), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, National Heart and Lung Institute [London] (NHLI), Imperial College London-Royal Brompton and Harefield NHS Foundation Trust, and UK Research & Innovation (UKRI)Medical Research Council UK (MRC)European Commission
Subjects: Medical terminology, GENES, Allergy, media_common.quotation_subject, [SDV]Life Sciences [q-bio], Immunology, Decision tree, Eczema, text mining, computer.software_genre, Terminology, TEXT, Dermatitis, Atopic, 1117 Public Health and Health Services, 03 medical and health sciences, 0302 clinical medicine, Terminology as Topic, medical terminology, medicine, Immunology and Allergy, Data Mining, Humans, information retrieval, REVISED NOMENCLATURE, 030304 developmental biology, media_common, 0303 health sciences, Science & Technology, atopic dermatitis, business.industry, Decision tree learning, Subject (documents), Atopic dermatitis, Ambiguity, medicine.disease, 3. Good health, Systematic review, 030228 respiratory system, 1107 Immunology, 1111 Nutrition and Dietetics, Artificial intelligence, business, Psychology, computer, Life Sciences & Biomedicine, Natural language processing
Abstract: International audience; Background: Biomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications.Objective: To investigate the consequence of the ambiguity between the use of terms "Eczema" and "Atopic Dermatitis" (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining.Methods: Articles were retrieved by querying the PubMed using terms 'eczema' (D003876) and "dermatitis, atopic" (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used.Results: Atopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with "AD" or "Eczema" differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query.Conclusions and clinical relevance: There is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning as a tool to spot and characterize ambiguity, and provide the source code for disambiguation at https://github.com/cfrainay/ResearchCodeBase.
Published: 2021

3. Multi-Type relational clustering for enterprise cyber-security networks

Author: Marina Evangelou, Niall M. Adams, and Elizabeth Riddle-Workman
Subjects: Relational database, Computer science, 1702 Cognitive Sciences, 02 engineering and technology, computer.software_genre, 01 natural sciences, Non-negative matrix factorization, Matrix (mathematics), Artificial Intelligence, 0103 physical sciences, Singular value decomposition, 0202 electrical engineering, electronic engineering, information engineering, 0801 Artificial Intelligence and Image Processing, Artificial Intelligence & Image Processing, Adjacency matrix, 010306 general physics, Cluster analysis, Measure (data warehouse), Matrix multiplication, 0906 Electrical and Electronic Engineering, Signal Processing, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Data mining, computer, Software
Abstract: Several cyber-security data sources are collected in enterprise networks providing relational information between different types of nodes in the network, namely computers, users and ports. This relational data can be expressed as adjacency matrices detailing inter-type relationships corresponding to relations between nodes of different types and intra-type relationships showing relationships between nodes of the same type. In this paper, we propose an extension of Non-Negative Matrix Tri-Factorisation (NMTF) to simultaneously cluster nodes based on their intra and inter-type relationships. Existing NMTF based clustering methods suffer from long computational times due to large matrix multiplications. In our approach, we enforce stricter cluster indicator constraints on the factor matrices to circumvent these issues. Additionally, to make our proposed approach less susceptible to variation in results due to random initialisation, we propose a novel initialisation procedure based on Non-Negative Double Singular Value Decomposition for multi-type relational clustering. Finally, a new performance measure suitable for assessing clustering performance on unlabelled multi-type relational data sets is presented. Our algorithm is assessed on both a simulated and real computer network against standard approaches showing its strong performance.
Published: 2021

4. Integrating multi-OMICS data through sparse Canonical Correlation Analysis for predicting complex traits: A comparative study

Author: Marina Evangelou, Theodoulos Rodosthenous, and Vahid Shahrezaei
Subjects: Computer science, business.industry, Iterative method, Supervised learning, Latent variable, Machine learning, computer.software_genre, Matrix decomposition, ComputingMethodologies_PATTERNRECOGNITION, Unsupervised learning, Multi omics, Artificial intelligence, Canonical correlation, business, computer
Abstract: MotivationRecent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that by integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. As OMICS datasets are heterogeneous and high-dimensional (p >> n) integrating them can be done through Sparse Canonical Correlation Analysis (sCCA) that penalises the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sCCA have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets.ResultsThrough a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al. [2009], penalised matrix decomposition CCA proposed by Witten and Tibshirani [2009] and its extension proposed by Suo et al. [2017]. The aferomentioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets.Contacttr1915@ic.ac.uk
Published: 2019

5. Adaptive Anomaly Detection on Network Data Streams

Author: Niall M. Adams, Elizabeth Riddle-Workman, and Marina Evangelou
Subjects: Technology, Authentication events, Computer science, Anomaly detection, 02 engineering and technology, computer.software_genre, Data modeling, Engineering, Computer Science, Theory & Methods, Netflow Data, 0202 electrical engineering, electronic engineering, information engineering, Structure (mathematical logic), Authentication, Science & Technology, Emphasis (telecommunications), Engineering, Electrical & Electronic, 020206 networking & telecommunications, Computer Science, Software Engineering, Signature (logic), Regression, Data set, Forgetting factor, Computer Science, 020201 artificial intelligence & image processing, Data mining, computer
Abstract: As the number of cyber-attacks increases, there has been increasing emphasis on developing complementary methods of detection to the existing signature-based approaches. This work builds upon a previously discovered persistent structure within the Los Alamos National Laboratory network data sources, to develop a regression based streaming anomaly detection mechanism that can adapt to the network behaviour over time. The methodology has also been applied to a new data set of the same network to assess the extent of its pertinence in time.
Published: 2018

6. An anomaly detection framework for cyber-security data

Author: Niall M. Adams and Marina Evangelou
Subjects: General Computer Science, Series (mathematics), Strategic, Defence & Security Studies, Computer science, 020206 networking & telecommunications, 02 engineering and technology, computer.software_genre, Signature (logic), Quantile regression, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Anomaly detection, 08 Information and Computing Sciences, Data mining, Law, computer
Abstract: Data-driven anomaly detection systems unrivalled potential as complementary defence systems to existing signature-based tools as the number of cyber attacks increases. In this manuscript an anomaly detection system is presented that detects any abnormal deviations from the normal behaviour of an individual device. Device behaviour is defined as the number of network traffic events involving the device of interest observed within a pre-specified time period. The behaviour of each device at normal state is modelled to depend on its observed historic behaviour. A number of statistical and machine learning approaches are explored for modelling this relationship and through a comparative study, the Quantile Regression Forests approach is found to have the best predictive power. Based on the prediction intervals of the Quantile Regression Forests an anomaly detection system is proposed that characterises as abnormal, any observed behaviour outside of these intervals. A series of experiments for contaminating normal device behaviour are presented for examining the performance of the anomaly detection system. Through the conducted analysis the proposed anomaly detection system is found to outperform two other detection systems. The presented work has been conducted on two enterprise networks.
Published: 2020

7. Clustering and monitoring edge behaviour in enterprise network traffic

Author: Marina Evangelou, Christopher Schon, and Niall M. Adams
Subjects: Technology, Computer science, 0211 other engineering and technologies, 02 engineering and technology, Cyber-security, computer.software_genre, Engineering, Computer Science, Theory & Methods, NetFlow, 0202 electrical engineering, electronic engineering, information engineering, Cluster (physics), Enterprise private network, Cluster analysis, Structure (mathematical logic), 021110 strategic, defence & security studies, Science & Technology, Process (computing), Engineering, Electrical & Electronic, Computer Science, Unsupervised learning, 020201 artificial intelligence & image processing, Enhanced Data Rates for GSM Evolution, Data mining, computer, clustering
Abstract: This paper takes an unsupervised learning approach for monitoring edge activity within an enterprise computer network. Using NetFlow records, features are gathered across the active connections (edges) in 15-minute time windows. Then, edges are grouped into clusters using the k-means algorithm. This process is repeated over contiguous windows. A series of informative indicators are derived by examining the relationship of edges with the observed cluster structure. This leads to an intuitive method for monitoring network behaviour and a temporal description of edge behaviour at global and local levels.
Published: 2017

8. Two novel pathway analysis methods based on a hierarchical model

Author: Frank Dudbridge, Marina Evangelou, and Lorenz Wernisch
Subjects: Blood Platelets, Statistics and Probability, Computational complexity theory, Computer science, Genome-wide association study, computer.software_genre, Polymorphism, Single Nucleotide, Biochemistry, Hierarchical database model, Body Mass Index, Bayes' theorem, Lasso (statistics), Humans, Computer Simulation, Molecular Biology, Genetics and Population Analysis, Bayes Theorem, Bayes factor, Pathway analysis, Original Papers, Computer Science Applications, Computational Mathematics, Phenotype, Computational Theory and Mathematics, Data mining, computer, Algorithms, Genome-Wide Association Study
Abstract: Motivation: Over the past few years several pathway analysis methods have been proposed for exploring and enhancing the analysis of genome-wide association data. Hierarchical models have been advocated as a way to integrate SNP and pathway effects in the same model, but their computational complexity has prevented them being applied on a genome-wide scale to date. Methods: We present two novel methods for identifying associated pathways. In the proposed hierarchical model, the SNP effects are analytically integrated out of the analysis, allowing computationally tractable model fitting to genome-wide data. The first method uses Bayes factors for calculating the effect of the pathways, whereas the second method uses a machine learning algorithm and adaptive lasso for finding a sparse solution of associated pathways. Results: The performance of the proposed methods was explored on both simulated and real data. The results of the simulation study showed that the methods outperformed some well-established association methods: the commonly used Fisher’s method for combining P-values and also the recently published BGSA. The methods were applied to two genome-wide association study datasets that aimed to find the genetic structure of platelet function and body mass index, respectively. The results of the analyses replicated the results of previously published pathway analysis of these phenotypes but also identified novel pathways that are potentially involved. Availability: An R package is under preparation. In the meantime, the scripts of the methods are available on request from the authors. Contact: marina.evangelou@cimr.cam.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online.
Published: 2013

9. The time-varying dependency patterns of NetFlow statistics

Author: Alex Gibberd, Marina Evangelou, and James D. B. Nelson
Subjects: Technology, Dependency (UML), Computer science, Feature extraction, 02 engineering and technology, Intrusion detection system, Machine learning, computer.software_genre, 01 natural sciences, Electronic mail, Data modeling, 010104 statistics & probability, Computer Science, Theory & Methods, NetFlow, 0202 electrical engineering, electronic engineering, information engineering, 0101 mathematics, Science & Technology, Computer Science, Information Systems, business.industry, Probabilistic logic, Identification (information), Computer Science, 020201 artificial intelligence & image processing, Data mining, Artificial intelligence, business, computer
Abstract: We investigate where and how key dependency structure between measures of network activity change throughout the course of daily activity. Our approach to data-mining is probabilistic in nature, we formulate the identification of dependency patterns as a regularised statistical estimation problem. The resulting model can be interpreted as a set of time-varying graphs and provides a useful visual interpretation of network activity. We believe this is the first application of dynamic graphical modelling to network traffic of this kind. Investigations are performed on 9 days of real-world network traffic across a subset of IP's. We demonstrate that dependency between features may change across time and discuss how these change at an intra and inter-day level. Such variation in feature dependency may have important consequences for the design and implementation of probabilistic intrusion detection systems.
Published: 2016

10. Predictability of NetFlow data

Author: Niall M. Adams and Marina Evangelou
Subjects: Technology, Computer science, Regression trees, Feature extraction, Principal component analysis, 02 engineering and technology, computer.software_genre, 01 natural sciences, Electronic mail, 010104 statistics & probability, Engineering, Computer Science, Theory & Methods, Predictive regression, NetFlow, 0202 electrical engineering, electronic engineering, information engineering, Enterprise private network, 0101 mathematics, Predictability, Science & Technology, Engineering, Electrical & Electronic, Computer Science, Cyber-attack, 020201 artificial intelligence & image processing, Anomaly detection, Data mining, computer
Abstract: The behaviour of individual devices connected to an enterprise network can vary dramatically, as a device’s activity depends on the user operating the device as well as on all behind the scenes operations between the device and the network. Being able to understand and predict a device’s behaviour in a network can work as the foundation of an anomaly detection framework, as devices may show abnormal activity as part of a cyber attack. The aim of this work is the construction of a predictive regression model for a device’s behaviour at normal state. The behaviour of a device is presented by a quantitative response and modelled to depend on historic data recorded by NetFlow.
Published: 2016

11. Activity-based temporal anomaly detection in enterprise-cyber security

Author: Marina Evangelou, Mark Whitehouse, and Niall M. Adams
Subjects: Technology, Authentication events, Computer science, Netflow data, 02 engineering and technology, computer.software_genre, Computer security, Electronic mail, Data modeling, Engineering, Computer Science, Theory & Methods, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Enterprise private network, NETWORK, Complement (set theory), Structure (mathematical logic), Authentication, Science & Technology, Engineering, Electrical & Electronic, Data set, Computer Science, 020201 artificial intelligence & image processing, Anomaly detection, Data mining, computer
Abstract: Statistical anomaly detection is emerging as an important complement to signature-based methods for enterprise network defence. In this paper, we isolate a persistent structure in two different enterprise network data sources. This structure provides the basis of a regression-based anomaly detection method. The procedure is demonstrated on a large public domain data set.
Published: 2016

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

11 results on '"Marina Evangelou"'

1. Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study

2. Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining

3. Multi-Type relational clustering for enterprise cyber-security networks

4. Integrating multi-OMICS data through sparse Canonical Correlation Analysis for predicting complex traits: A comparative study

5. Adaptive Anomaly Detection on Network Data Streams

6. An anomaly detection framework for cyber-security data

7. Clustering and monitoring edge behaviour in enterprise network traffic

8. Two novel pathway analysis methods based on a hierarchical model

9. The time-varying dependency patterns of NetFlow statistics

10. Predictability of NetFlow data

11. Activity-based temporal anomaly detection in enterprise-cyber security

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

11 results on '"Marina Evangelou"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources