169 results on '"Kantarcioglu AS"'
Search Results
2. A Game Theoretic Perspective on Adversarial Machine Learning and Related Cybersecurity Applications
- Author
-
Yan Zhou, Murat Kantarcioglu, and Bowei Xi
- Subjects
Computer Science::Computer Science and Game Theory ,Sequential game ,Computer science ,Adversarial machine learning ,Minimax ,Computer security ,computer.software_genre ,Support vector machine ,Attack model ,Strategy ,Zero-sum game ,Game theory ,computer ,Computer Science::Cryptography and Security - Abstract
In cybersecurity applications where machine learning algorithms are increasingly used to detect vulnerabilities, a somewhat unique challenge arises as exploits targeting machine learning models are constantly devised by the attackers. Traditional machine learning models are no longer robust and reliable when they are under attack. The action and reaction between machine learning systems and the adversary can be modeled as a game between two or more players. Under well‐defined attack models, game theory can provide robustness guarantee for machine learning models that are otherwise vulnerable to application‐time data corruption. We review two cases of game theory‐based machine learning techniques: in one case, players play a zero sum game by following a minimax strategy, while in the other case, players play a sequential game with one player as the leader and the rest as the followers. Experimental results on e‐mail spam and web spam datasets are presented. In the zero sum game, we demonstrate that an adversarial SVM model built upon the minimax strategy is much more resilient to adversarial attacks than standard SVM and one‐class SVM models. We also show that optimal learning strategies derived to counter overly pessimistic attack models can produce unsatisfactory results when the real attacks are much weaker. In the sequential game, we demonstrate that the mixed strategy, allowing a player to randomize over available strategies, is the best solution in general without knowing what types of adversaries machine learning applications are facing in the wild. We also discuss scenarios where players' behavior may derail rational decision making and models that consider such decision risks.
- Published
- 2021
- Full Text
- View/download PDF
3. GraphBoot: Quantifying Uncertainty in Node Feature Learning on Large Networks
- Author
-
Vyacheslav Lyubchich, Yulia R. Gel, Murat Kantarcioglu, Cuneyt Gurcan Akcora, and Bhavani Thuraisingham
- Subjects
Propagation of uncertainty ,Social network ,Computer science ,business.industry ,Reliability (computer networking) ,Node (networking) ,Sampling (statistics) ,Estimator ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Data mining ,Metric (unit) ,Uncertainty quantification ,business ,Feature learning ,computer ,Uncertainty analysis ,Information Systems - Abstract
In recent years, as online social networks continue to grow in size, estimating node features, such as sociodemographics, preferences and health status, in a scalable and reliable way has become a primary research direction in social network mining. Although many techniques have been developed for estimating various node features, quantifying uncertainty in such estimations has received little attention. Furthermore, most existing methods study networks parametrically, which limits insights about necessary quantity of queried data, reliable feature estimation, and estimator uncertainty. Uncertainty quantification is critical for answering key questions, such as, given a limited availability of social network data, how much data should be queried from the network?, and which node features can be learned reliably? More importantly, how can we evaluate uncertainty of our estimators? Uncertainty quantification is not equivalent to network sampling but constitutes a key complementary concept to sampling and the associated reliability analysis. To our knowledge, this paper is the first work that sheds light on uncertainty quantification and uncertainty propagation in social network feature mining. We propose a novel non-parametric bootstrap method for uncertainty analysis of node features in social network mining, derive its asymptotic properties, and demonstrate its effectiveness with extensive experiments. Furthermore, we develop a new metric based on dispersion of estimations, enabling analysts to assess how much more information is needed for increasing prediction reliability based on the estimated uncertainty. We demonstrate the effectiveness of our new uncertainty quantification methodology with extensive experiments on real life social networks, and a case study of mental health on Twitter.
- Published
- 2021
- Full Text
- View/download PDF
4. Trailblazing the Artificial Intelligence for Cybersecurity Discipline
- Author
-
Hsinchun Chen, Murat Kantarcioglu, and Sagar Samtani
- Subjects
Prioritization ,Engineering ,ComputingMilieux_THECOMPUTINGPROFESSION ,General Computer Science ,Multi disciplinary ,business.industry ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,02 engineering and technology ,Vulnerability management ,Funding Mechanism ,Adversarial machine learning ,Asset (computer security) ,Computer security ,computer.software_genre ,Management Information Systems ,Analytics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Disinformation ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
Cybersecurity has rapidly emerged as a grand societal challenge of the 21st century. Innovative solutions to proactively tackle emerging cybersecurity challenges are essential to ensuring a safe and secure society. Artificial Intelligence (AI) has rapidly emerged as a viable approach for sifting through terabytes of heterogeneous cybersecurity data to execute fundamental cybersecurity tasks, such as asset prioritization, control allocation, vulnerability management, and threat detection, with unprecedented efficiency and effectiveness. Despite its initial promise, AI and cybersecurity have been traditionally siloed disciplines that relied on disparate knowledge and methodologies. Consequently, the AI for Cybersecurity discipline is in its nascency. In this article, we aim to provide an important step to progress the AI for Cybersecurity discipline. We first provide an overview of prevailing cybersecurity data, summarize extant AI for Cybersecurity application areas, and identify key limitations in the prevailing landscape. Based on these key issues, we offer a multi-disciplinary AI for Cybersecurity roadmap that centers on major themes such as cybersecurity applications and data, advanced AI methodologies for cybersecurity, and AI-enabled decision making. To help scholars and practitioners make significant headway in tackling these grand AI for Cybersecurity issues, we summarize promising funding mechanisms from the National Science Foundation (NSF) that can support long-term, systematic research programs. We conclude this article with an introduction of the articles included in this special issue.
- Published
- 2020
- Full Text
- View/download PDF
5. Robust Transparency Against Model Inversion Attacks
- Author
-
Yasmeen Alufaisan, Yan Zhou, and Murat Kantarcioglu
- Subjects
Privacy preserving ,Novel technique ,Computer science ,Credibility ,Differential privacy ,Electrical and Electronic Engineering ,Computer security ,computer.software_genre ,Transparency (behavior) ,computer ,Article ,Model inversion - Abstract
Transparency has become a critical need in machine learning (ML) applications. Designing transparent ML models helps increase trust, ensure accountability, and scrutinize fairness. Some organizations may opt-out of transparency to protect individuals’ privacy. Therefore, there is a great demand for transparency models that consider both privacy and security risks. Such transparency models can motivate organizations to improve their credibility by making the ML-based decision-making process comprehensible to end-users. Differential privacy (DP) provides an important technique to disclose information while protecting individual privacy. However, it has been shown that DP alone cannot prevent certain types of privacy attacks against disclosed ML models. DP with low $\epsilon$ e values can provide high privacy guarantees, but may result in significantly weaker ML models in terms of accuracy. On the other hand, setting $\epsilon$ e value too high may lead to successful privacy attacks. This raises the question whether we can disclose accurate transparent ML models while preserving privacy. In this article we introduce a novel technique that complements DP to ensure model transparency and accuracy while being robust against model inversion attacks. We show that combining the proposed technique with DP provide highly transparent and accurate ML models while preserving privacy against model inversion attacks.
- Published
- 2022
6. Data Science on Blockchains
- Author
-
Yulia R. Gel, Cuneyt Gurcan Akcora, and Murat Kantarcioglu
- Subjects
021110 strategic, defence & security studies ,Blockchain ,Parsing ,Computer science ,media_common.quotation_subject ,0211 other engineering and technologies ,Volume (computing) ,02 engineering and technology ,computer.software_genre ,Data science ,Information science ,Reading (process) ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,Data analysis ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,computer ,media_common - Abstract
Blockchain technology garners an ever-increasing interest of researchers in various domains that benefit from scalable cooperation among trust-less parties. As blockchains and their applications proliferate, so do the complexity and volume of data stored by Blockchains. Analyzing this data has emerged as an important research topic, already leading to methodological advancements in information sciences. In this tutorial, we offer a holistic view of applied Data Science on Blockchains. Starting with the core components of Blockchain, we will detail the state of art in Blockchain data analytics for graph, security, and finance domains. Our examples will answer questions, such as, how to parse, extract and clean the data stored in blockchains?, how to store and query Blockchain data? and what features we could compute from blockchains? We will share tutorial notes, collected meta-information, and further reading pointers on our tutorial website at https://blockchaintutorial.github.io/
- Published
- 2021
- Full Text
- View/download PDF
7. Alphacore
- Author
-
Yulia R. Gel, Cuneyt Gurcan Akcora, Murat Kantarcioglu, and Friedhelm Victor
- Subjects
Node (networking) ,02 engineering and technology ,Flow network ,computer.software_genre ,01 natural sciences ,010104 statistics & probability ,020204 information systems ,Core (graph theory) ,0202 electrical engineering, electronic engineering, information engineering ,Decomposition (computer science) ,Decomposition method (queueing theory) ,Enhanced Data Rates for GSM Evolution ,Data mining ,0101 mathematics ,Precision and recall ,Centrality ,computer - Abstract
Core decomposition in networks has proven useful for evaluating the importance of nodes and communities in a variety of application domains, ranging from biology to social networks and finance. However, existing core decomposition algorithms have limitations in simultaneously handling multiple node and edge attributes. We propose a novel unsupervised core decomposition method that can be easily applied to directed and weighted networks. Our algorithm, AlphaCore, allows us to systematically and mathematically rigorously combine multiple node properties by using the notion of data depth. In addition, it can be used as a mixture of centrality measure and core decomposition. Compared to existing approaches, AlphaCore avoids the need to specify numerous thresholds or coefficients and yields meaningful quantitative and qualitative insights into the network structural organization. We evaluate AlphaCore's performance with a focus on financial, blockchain-based token networks, the social network Reddit and a transportation network of international flight routes. We compare our results with existing core decomposition and centrality algorithms. Using ground truth about node importance, we show that AlphaCore yields the best precision and recall results among core decomposition methods using the same input features. An implementation is available at https://github.com/friedhelmvictor/alphacore.
- Published
- 2021
- Full Text
- View/download PDF
8. AI for Security and Security for AI
- Author
-
Sagar Samtani, Sudip Mittal, Maanak Gupta, Elisa Bertino, Cuneyt Gurcan Akcora, and Murat Kantarcioglu
- Subjects
business.industry ,Computer science ,Homeland security ,Security industry ,Access control ,Denial-of-service attack ,Intrusion detection system ,computer.software_genre ,Computer security ,GeneralLiterature_MISCELLANEOUS ,Information sensitivity ,Adversarial system ,ComputingMethodologies_PATTERNRECOGNITION ,Malware ,business ,computer - Abstract
On one side, the security industry has successfully adopted some AI-based techniques. Use varies from mitigating denial of service attacks, forensics, intrusion detection systems, homeland security, critical infrastructures protection, sensitive information leakage, access control, and malware detection. On the other side, we see the rise of Adversarial AI. Here the core idea is to subvert AI systems for fun and profit. The methods utilized for the production of AI systems are systematically vulnerable to a new class of vulnerabilities. Adversaries are exploiting these vulnerabilities to alter AI system behavior to serve a malicious end goal. This panel discusses some of these aspects.
- Published
- 2021
- Full Text
- View/download PDF
9. Investigation of a differential cryptanalysis inspired approach for Trojan AI detection
- Author
-
Yan Zhou, Murat Kantarcioglu, and Aref Asvadishirehjini
- Subjects
Differential cryptanalysis ,Social network ,Computer science ,business.industry ,media_common.quotation_subject ,Deep learning ,Stop sign ,Machine learning ,computer.software_genre ,Decision system ,Trojan ,Quality (business) ,Artificial intelligence ,business ,computer ,Broad category ,media_common - Abstract
Deep Learning (DL) is becoming a popular paradigm in a broad category of decision systems that are crucial to the well-being of our society. Self-driving vehicles, online dating, social network content recommendation, chest X-Ray screening, etc. are all examples that show how the quality of our lives is tied to the decisions of these systems. We must take into account that these systems may be gamed to make favorable decisions for unqualified instances by malicious actors. For instance, if a self-driving car's traffic-sign detection model can classify a traffic stop sign as speed-limit if the pattern that triggers the faulty behavior is present. Our initial investigation result show, given we can generate/access a rich and high-quality dataset of random images, we may be able to build meta-models that can distinguish the poisoned/clean models with acceptable performance.
- Published
- 2021
- Full Text
- View/download PDF
10. Session details: Session 2A: ML and Information Leakage
- Author
-
Murat Kantarcioglu
- Subjects
Multimedia ,Computer science ,Information leakage ,Session (computer science) ,computer.software_genre ,computer - Published
- 2020
- Full Text
- View/download PDF
11. Secure IoT Data Analytics in Cloud via Intel SGX
- Author
-
Shihabul Islam, Latifur Khan, Murat Kantarcioglu, and Mustafa Safa Ozdayi
- Subjects
FOS: Computer and information sciences ,021110 strategic, defence & security studies ,Information privacy ,Computer Science - Cryptography and Security ,business.industry ,Computer science ,Process (engineering) ,0211 other engineering and technologies ,Databases (cs.DB) ,020206 networking & telecommunications ,Cloud computing ,02 engineering and technology ,Computer security ,computer.software_genre ,Encryption ,Computer Science - Databases ,Data integrity ,Computer data storage ,0202 electrical engineering, electronic engineering, information engineering ,Data analysis ,business ,Cryptography and Security (cs.CR) ,computer ,Private information retrieval - Abstract
The growing adoption of IoT devices in our daily life is engendering a data deluge, mostly private information that needs careful maintenance and secure storage system to ensure data integrity and protection. Also, the prodigious IoT ecosystem has provided users with opportunities to automate systems by interconnecting their devices and other services with rule-based programs. The cloud services that are used to store and process sensitive IoT data turn out to be vulnerable to outside threats. Hence, sensitive IoT data and rule-based programs need to be protected against cyberattacks. To address this important challenge, in this paper, we propose a framework to maintain confidentiality and integrity of IoT data and rule-based program execution. We design the framework to preserve data privacy utilizing Trusted Execution Environment (TEE) such as Intel SGX, and end-to-end data encryption mechanism. We evaluate the framework by executing rule-based programs in the SGX securely with both simulated and real IoT device data.
- Published
- 2020
12. Leveraging blockchain for immutable logging and querying across multiple sites
- Author
-
Murat Kantarcioglu, Mustafa Safa Ozdayi, and Bradley A. Malin
- Subjects
FOS: Computer and information sciences ,lcsh:Internal medicine ,Computer Science - Cryptography and Security ,Blockchain ,lcsh:QH426-470 ,Relational database ,Computer science ,Interface (Java) ,Information Storage and Retrieval ,02 engineering and technology ,Query-response ,computer.software_genre ,Bottleneck ,03 medical and health sciences ,Computer Science - Databases ,Node (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Genetics ,Humans ,Data deduplication ,Leverage (statistics) ,lcsh:RC31-1245 ,Genetics (clinical) ,030304 developmental biology ,0303 health sciences ,Cross-site data sharing ,Database ,Research ,Databases (cs.DB) ,020206 networking & telecommunications ,Multichain ,Data structure ,lcsh:Genetics ,Computer Science - Distributed, Parallel, and Cluster Computing ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Cryptography and Security (cs.CR) ,computer ,Algorithms - Abstract
Background Blockchain has emerged as a decentralized and distributed framework that enables tamper-resilience and, thus, practical immutability for stored data. This immutability property is important in scenarios where auditability is desired, such as in maintaining access logs for sensitive healthcare and biomedical data. However, the underlying data structure of blockchain, by default, does not provide capabilities to efficiently query the stored data. In this investigation, we show that it is possible to efficiently run complex audit queries over the access log data stored on blockchains by using additional key-value stores. This paper specifically reports on the approach we designed for the blockchain track of iDASH Privacy & Security Workshop 2018 competition. In this track, participants were asked to devise an efficient way to run conjunctive equality and range queries on a genomic dataset access log trail after storing it in a permissioned blockchain network consisting of 4 identical nodes, each representing a different site, created with the Multichain platform. Methods Multichain duplicates and indexes blockchain data locally at each node in a key-value store to support retrieval requests at a later point in time. To efficiently leverage the key-value storage mechanism, we applied various techniques and optimizations, such as bucketization, simple data duplication and batch loading by accounting for the required query types of the competition and the interface provided by Multichain. Particularly, we implemented our solution and compared its loading and query-response performance with SQLite, a commonly used relational database, using the data provided by the iDASH 2018 organizers. Results Depending on the query type and the data size, the run time difference between blockchain based query-response and SQLite based query-response ranged from 0.2 seconds to 6 seconds. A deeper inspection revealed that range queries were the bottleneck of our solution which, nevertheless, scales up linearly. Conclusions This investigation demonstrates that blockchain-based systems can provide reasonable query-response times to complex queries even if they only use simple key-value stores to manage their data. Consequently, we show that blockchains may be useful for maintaining data with auditability and immutability requirements across multiple sites.
- Published
- 2020
- Full Text
- View/download PDF
13. Efficacy of defending deep neural networks against adversarial attacks with randomization
- Author
-
Yan Zhou, Bowei Xi, and Murat Kantarcioglu
- Subjects
Independent and identically distributed random variables ,education.field_of_study ,Computer science ,Population ,Adversarial machine learning ,Adversary ,Computer security ,computer.software_genre ,Adversarial system ,Empirical research ,Robustness (computer science) ,education ,computer ,Test data - Abstract
Adversarial machine learning is concerned with the study of vulnerabilities of machine learning techniques to adversarial attacks and potential defenses against such attacks. Intrinsic vulnerabilities, incongruous and often suboptimal defenses are both rooted in the standard assumption upon which machine learning methods have been developed. The assumption that data are independent and identically distributed (i.i.d) samples implies training data are representative of the general population. Thus, learning models that fit the training data accurately would perform well on the test data from the rest of the population. Violations of the i.i.d assumption characterize the challenges of detecting and defending against adversarial attacks. For an informed adversary, the most effective attack strategy is to transform malicious data so that they appear indistinguishable from legitimate data to the target model. Current development in adversarial machine learning suggests that the adversary can easily gain the upper hand on this arms race since the adversary only needs to make a local breakthrough against the stationary target while the target model struggles to extend its predictive power to the general population, including the corrupted data. The fundamental cause of stagnation in effective defense against adversarial attacks suggests developing a moving target defense for a machine learning model for greater robustness. We investigate the feasibility and effectiveness of employing randomization in creating moving target defense for deep neural network learning models. Randomness is introduced through randomizing the input and adding small random noise to the learned parameters. Extensive empirical study is performed, covering different attack strategies and defense/detection techniques against adversarial attacks.
- Published
- 2020
- Full Text
- View/download PDF
14. Session details: Session 6: System Security
- Author
-
Murat Kantarcioglu
- Subjects
Computer science ,Session (computer science) ,Computer security ,computer.software_genre ,computer - Published
- 2020
- Full Text
- View/download PDF
15. Deployment-quality and Accessible Solutions for Cryptography Code Development
- Author
-
Sazzadur Rahaman, Ya Xiao, Miles Frantz, Murat Kantarcioglu, Ke Tian, Fahad Shaon, Barton P. Miller, Na Meng, Sharmin Afrose, and Danfeng Yao
- Subjects
Computer science ,business.industry ,020207 software engineering ,Static program analysis ,02 engineering and technology ,Software ,SPARK (programming language) ,Software security assurance ,Software deployment ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Program slicing ,Android (operating system) ,Software engineering ,business ,computer ,Software assurance ,computer.programming_language - Abstract
Cryptographic API misuses seriously threatens software security. Automatic screening of cryptographic misuse vulnerabilities has been a popular and important line of research over the years. However, the vision of producing a scalable detection tool that developers can routinely use to screen millions of line of code has not been achieved yet. Our main technical goal is to attain a high precision and high throughput approach based on specialized program analysis. Specifically, we design inter-procedural program slicing on top of a new on-demand flow-, context- and field- sensitive data flow analysis. Our current prototype named CryptoGuard can detect a wide range of Java cryptographic API misuses with a precision of 98.61%, when evaluated on 46 complex Apache Software Foundation projects (including, Spark, Ranger, and Ofbiz). Our evaluation on 6,181 Android apps also generated many security insights. We created a comprehensive benchmark named CryptoApi-Bench with 40-unit basic cases and 131-unit advanced cases for in-depth comparison with leading solutions (e.g., SpotBugs, CrySL, Coverity). To make CryptoGuard widely accessible, we are in the process of integrating CryptoGuard with the Software Assurance Marketplace (SWAMP). SWAMP is a popular no-cost service for continuous software assurance and static code analysis.
- Published
- 2020
- Full Text
- View/download PDF
16. Attacking Machine Learning Models for Social Good
- Author
-
Yan Zhou, Bhavani M. Thuriasingham, Vibha Belavadi, and Murat Kantarcioglu
- Subjects
Social accounting ,050101 languages & linguistics ,Information privacy ,Ethical issues ,Computer science ,business.industry ,05 social sciences ,02 engineering and technology ,Adversarial machine learning ,Machine learning ,computer.software_genre ,Automation ,Adversarial system ,0202 electrical engineering, electronic engineering, information engineering ,Training phase ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Artificial intelligence ,business ,computer - Abstract
As machine learning (ML) techniques are becoming widely used, awareness of the harmful effect of automation is growing. Especially, in problem domains where critical decisions are made, machine learning-based applications may raise ethical issues with respect to fairness and privacy. Existing research on fairness and privacy in the ML community mainly focuses on providing remedies during the ML model training phase. Unfortunately, such remedies may not be voluntarily adopted by the industry that is concerned about the profits. In this paper, we propose to apply, from the user’s end, a fair and legitimate technique to “game” the ML system to ameliorate its social accountability issues. We show that although adversarial attacks can be exploited to tamper with ML systems, they can also be used for social good. We demonstrate the effectiveness of our proposed technique on real world image and credit data.
- Published
- 2020
- Full Text
- View/download PDF
17. BlockFLA: Accountable Federated Learning via Hybrid Blockchain Architecture
- Author
-
Mustafa Safa Ozdayi, Harsh Bimal Desai, and Murat Kantarcioglu
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Information privacy ,Computer Science - Cryptography and Security ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,010501 environmental sciences ,Computer security ,computer.software_genre ,01 natural sciences ,Federated learning ,Machine Learning (cs.LG) ,Set (abstract data type) ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,Architecture ,Function (engineering) ,Protocol (object-oriented programming) ,0105 earth and related environmental sciences ,Backdoor ,media_common ,020206 networking & telecommunications ,Computer Science - Distributed, Parallel, and Cluster Computing ,Distributed, Parallel, and Cluster Computing (cs.DC) ,computer ,Cryptography and Security (cs.CR) - Abstract
Federated Learning (FL) is a distributed, and decentralized machine learning protocol. By executing FL, a set of agents can jointly train a model without sharing their datasets with each other, or a third-party. This makes FL particularly suitable for settings where data privacy is desired. At the same time, concealing training data gives attackers an opportunity to inject backdoors into the trained model. It has been shown that an attacker can inject backdoors to the trained model during FL, and then can leverage the backdoor to make the model misclassify later. Several works tried to alleviate this threat by designing robust aggregation functions. However, given more sophisticated attacks are developed over time, which by-pass the existing defenses, we approach this problem from a complementary angle in this work. Particularly, we aim to discourage backdoor attacks by detecting, and punishing the attackers, possibly after the end of training phase. To this end, we develop a hybrid blockchain-based FL framework that uses smart contracts to automatically detect, and punish the attackers via monetary penalties. Our framework is general in the sense that, any aggregation function, and any attacker detection algorithm can be plugged into it. We conduct experiments to demonstrate that our framework preserves the communication-efficient nature of FL, and provide empirical results to illustrate that it can successfully penalize attackers by leveraging our novel attacker detection algorithm.
- Published
- 2020
- Full Text
- View/download PDF
18. PREDICT
- Author
-
Ugwuoke, C.I., Erkin, Z., Reinders, M.J.T., Lagendijk, R.L., Carminati, Barbara, and Kantarcioglu, Murat
- Subjects
Service (systems architecture) ,Computer science ,Cryptography ,privacy-preserving ,0102 computer and information sciences ,02 engineering and technology ,Computer security ,computer.software_genre ,Encryption ,01 natural sciences ,Genome ,obfuscation ,Obfuscation ,0202 electrical engineering, electronic engineering, information engineering ,genome ,business.industry ,Homomorphic encryption ,snp ,Test (assessment) ,Intervention (law) ,010201 computation theory & mathematics ,disease susceptibility testing ,020201 artificial intelligence & image processing ,business ,computer ,direct-to-customer - Abstract
Genome sequencing has rapidly advanced in the last decade, making it easier for anyone to obtain digital genomes at low costs from companies such as Helix, MyHeritage, and 23andMe. Companies now offer their services in a direct-to-consumer (DTC) model without the intervention of a medical institution. Thereby, providing people with direct services for paternity testing, ancestry testing and disease susceptibility testing (DST) to infer diseases' predisposition. Genome analyses are partly motivated by curiosity and people often want to partake without fear of privacy invasion. Existing privacy protection solutions for DST adopt cryptographic techniques to protect the genome of a patient from the party responsible for computing the analysis. Said techniques include homomorphic encryption, which can be computationally expensive and could take minutes for only a few single-nucleotide polymorphisms (SNPs). A predominant approach is a solution that computes DST over encrypted data, but the design depends on a medical unit and exposes test results of patients to the medical unit, making the design uncomfortable for privacy-aware individuals. Hence it is pertinent to have an efficient privacy-preserving DST solution with a DTC service. We propose a novel DTC model that protects the privacy of SNPs and prevents leakage of test results to any other party save for the genome owner. Conversely, we protect the privacy of the algorithms or trade secrets used by the genome analyzing companies. Our work utilizes a secure obfuscation technique in computing DST, eliminating expensive computations over encrypted data. Our approach significantly outperforms existing state-of-the-art solutions in runtime and scales linearly for equivalent levels of security. As an example, computing DST for 10,000 SNPs requires approximately 96 milliseconds on commodity hardware. With this efficient and privacy-preserving solution which is also simulation-based secure, we open possibilities for performing genome analyses on collectively shared data resources.
- Published
- 2020
- Full Text
- View/download PDF
19. Adversarial Classification Under Differential Privacy
- Author
-
Murat Kantarcioglu, Jonathan Katz, Alvaro A. Cardenas, and Jairo Giraldo
- Subjects
Adversarial system ,Computer science ,Differential privacy ,Computer security ,computer.software_genre ,computer - Published
- 2020
- Full Text
- View/download PDF
20. Bitcoin risk modeling with blockchain graphs
- Author
-
Yulia R. Gel, Matthew Dixon, Cuneyt Gurcan Akcora, and Murat Kantarcioglu
- Subjects
Power graph analysis ,Economics and Econometrics ,Cryptocurrency ,050208 finance ,Blockchain ,Computer science ,Financial economics ,Financial risk ,05 social sciences ,Computer security ,computer.software_genre ,External Data Representation ,FOS: Economics and business ,Currency ,Risk Management (q-fin.RM) ,0502 economics and business ,Key (cryptography) ,Economics ,050207 economics ,Volatility (finance) ,Foreign exchange risk ,Database transaction ,computer ,Finance ,Quantitative Finance - Risk Management - Abstract
A key challenge for Bitcoin cryptocurrency holders, such as startups using ICOs to raise funding, is managing their FX risk. Specifically, a misinformed decision to convert Bitcoin to fiat currency could, by itself, cost USD millions. In contrast to financial exchanges, Blockchain based crypto-currencies expose the entire transaction history to the public. By processing all transactions, we model the network with a high fidelity graph so that it is possible to characterize how the flow of information in the network evolves over time. We demonstrate how this data representation permits a new form of microstructure modeling - with the emphasis on the topological network structures to study the role of users, entities and their interactions in formation and dynamics of crypto-currency investment risk. In particular, we identify certain sub-graphs ('chainlets') that exhibit predictive influence on Bitcoin price and volatility, and characterize the types of chainlets that signify extreme losses., Comment: JEL Classification: C58, C63, G18
- Published
- 2018
- Full Text
- View/download PDF
21. Secure logical schema and decomposition algorithm for proactive context dependent attribute based inference control
- Author
-
Ismail Hakki Toroslu, Ugur Turan, and Murat Kantarcioglu
- Subjects
Information privacy ,Information Systems and Management ,Relational database ,Computer science ,Probabilistic logic ,Inference ,0102 computer and information sciences ,02 engineering and technology ,Indirect Inference ,computer.software_genre ,01 natural sciences ,Set (abstract data type) ,Information sensitivity ,Relational database management system ,010201 computation theory & mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Algorithm ,computer - Abstract
Inference problem has always been an important and challenging topic of data privacy in databases. In relational databases, the traditional solution to this problem was to define views on relational schemas to restrict the subset of attributes and operations available to the users in order to prevent unwanted inferences. This method is a form of decomposition strategy, which mainly concentrates on the granularity of the accessible fields to the users, to prevent sensitive information inference. Nowadays, due to increasing data sharing among parties, the possibility of constructing complex indirect methods to obtain sensitive data has also increased. Therefore, we need to not only consider security threats due to direct access to sensitive data but also address indirect inference channels using functional and probabilistic dependencies (e.g., deducing gender of an individual from his/her name) while creating security views. In this paper, we propose a proactive and decomposition based inference control strategy for relational databases to prevent direct or indirect inference of private data. We introduce a new kind of context dependent attribute policy rule, which is named as security dependent set, as a set of attributes whose association should not be inferred. Then, we define a logical schema decomposition algorithm that prevents inference among attributes in security dependent set. The decomposition algorithm takes both functional and probabilistic dependencies into consideration in order to prevent all kinds of known inferences of relations among the attributes of security dependent sets. We prove that our proposed decomposition algorithm generates a secure logical schema that complies with the given security dependent set constraints. Since our proposed technique is purely proactive, it does not require any prior knowledge about executed queries and do not need to modify any submitted queries. It can also be embedded into any relational database management system without changing anything in the underlying system. We empirically compare our proposed method with the state of art reactive methods. Our extensive experimental analysis, conducted using TPC-H 1 benchmark scheme, shows the effectives our proposed approach.
- Published
- 2017
- Full Text
- View/download PDF
22. Security and Privacy in Cyber-Physical Systems: A Survey of Surveys
- Author
-
Michail Maniatakos, Jairo Giraldo, Alvaro A. Cardenas, Murat Kantarcioglu, and Esha Sarkar
- Subjects
0209 industrial biotechnology ,Engineering ,business.industry ,Internet privacy ,Cyber-physical system ,02 engineering and technology ,Computer security model ,Computer security ,computer.software_genre ,020901 industrial engineering & automation ,Hardware and Architecture ,Order (exchange) ,0202 electrical engineering, electronic engineering, information engineering ,Jungle ,020201 artificial intelligence & image processing ,Electrical and Electronic Engineering ,business ,Security level ,computer ,Software - Abstract
The following is a survey on surveys and may help the interested reader to find a way through the jungle of literature on the security and CPS topics out there already. In order to ease the search, the authors have provided a classification in CPS Domains, Attacks, Defenses, Research-trends, Network-security, Security level implementation, and Computational Strategies which makes this survey a unique and I believe very helpful article. —Jorg Henkel, Karlsruhe Institute of Technology
- Published
- 2017
- Full Text
- View/download PDF
23. Secure Real-Time Heterogeneous IoT Data Management System
- Author
-
Harsh Verma, Shihabul Islam, Latifur Khan, and Murat Kantarcioglu
- Subjects
Information privacy ,Edge device ,business.industry ,Computer science ,Data management ,Data security ,Cloud computing ,Computer security ,computer.software_genre ,Data processing system ,Analytics ,business ,computer ,Edge computing - Abstract
The growing adoption of IoT devices in our daily life engendered a need for secure systems to safely store and analyze sensitive data as well as the real-time data processing system to be as fast as possible. The cloud services used to store and process sensitive data are often come out to be vulnerable to outside threats. Furthermore, to analyze streaming IoT data swiftly, they are in need of a fast and efficient system. The Paper will envision the aspects of complexity dealing with real time data from various devices in parallel, building solution to ingest data from different IOT devices, forming a secure platform to process data in a short time, and using various techniques of IOT edge computing to provide meaningful intuitive results to users. The paper envisions two modules of building a real time data analytics system. In the first module, we propose to maintain confidentiality and integrity of IoT data, which is of paramount importance, and manage large-scale data analytics with real-time data collection from various IoT devices in parallel. We envision a framework to preserve data privacy utilizing Trusted Execution Environment (TEE) such as Intel SGX, end-to-end data encryption mechanism, and strong access control policies. Moreover, we design a generic framework to simplify the process of collecting and storing heterogeneous data coming from diverse IoT devices. In the second module, we envision a drone-based data processing system in real-time using edge computing and on-device computing. As, we know the use of drones is growing rapidly across many application domains including real-time monitoring, remote sensing, search and rescue, delivery of goods, security and surveillance, civil infrastructure inspection etc. This paper demonstrates the potential drone applications and their challenges discussing current research trends and provide future insights for potential use cases using edge and on-device computing.
- Published
- 2019
- Full Text
- View/download PDF
24. Securing Big Data in the Age of AI
- Author
-
Murat Kantarcioglu and Fahad Shaon
- Subjects
Database ,Computer science ,business.industry ,Data management ,Big data ,Data security ,Unstructured data ,computer.software_genre ,NoSQL ,Data type ,Data governance ,Analytics ,business ,computer - Abstract
Increasingly organizations are collecting ever larger amounts of data to build complex data analytics, machine learning and AI models. Furthermore, the data needed for building such models may be unstructured (e.g., text, image, and video). Hence such data may be stored in different data management systems ranging from relational databases to newer NoSQL databases tailored for storing unstructured data. Furthermore, data scientists are increasingly using programming languages such as Python, R etc. to process data using many existing libraries. In some cases, the developed code will be automatically executed by the NoSQL system on the stored data. These developments indicate the need for a data security and privacy solution that can uniformly protect data stored in many different data management systems and enforce security policies even if sensitive data is processed using a data scientist submitted complex program. In this paper, we introduce our vision for building such a solution for protecting big data. Specifically, our proposed system system allows organizations to 1) enforce policies that control access to sensitive data, 2) keep necessary audit logs automatically for data governance and regulatory compliance, 3) sanitize and redact sensitive data on-the-fly based on the data sensitivity and AI model needs, 4) detect potentially unauthorized or anomalous access to sensitive data, 5) automatically create attribute-based access control policies based on data sensitivity and data type.
- Published
- 2019
- Full Text
- View/download PDF
25. CryptoGuard
- Author
-
Fahad Shaon, Danfeng Yao, Sazzadur Rahaman, Ke Tian, Murat Kantarcioglu, Sharmin Afrose, Ya Xiao, and Miles Frantz
- Subjects
Java ,business.industry ,Computer science ,Vulnerability ,020207 software engineering ,Cryptography ,Static program analysis ,02 engineering and technology ,Computer security ,computer.software_genre ,Software security assurance ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,False positive paradox ,Android (operating system) ,business ,computer ,computer.programming_language - Abstract
Cryptographic API misuses, such as exposed secrets, predictable random numbers, and vulnerable certificate verification, seriously threaten software security. The vision of automatically screening cryptographic API calls in massive-sized (e.g., millions of LoC) programs is not new. However, hindered by the practical difficulty of reducing false positives without compromising analysis quality, this goal has not been accomplished. CryptoGuard is a set of detection algorithms that refine program slices by identifying language-specific irrelevant elements. The refinements reduce false alerts by 76% to 80% in our experiments. Running our tool, CryptoGuard, on 46 high-impact large-scale Apache projects and 6,181 Android apps generated many security insights. Our findings helped multiple popular Apache projects to harden their code, including Spark, Ranger, and Ofbiz. We also have made progress towards the science of analysis in this space, including manually analyzing 1,295 Apache alerts, confirming 1,277 true positives (98.61% precision), and in-depth comparison with leading solutions including CrySL, SpotBugs, and Coverity.
- Published
- 2019
- Full Text
- View/download PDF
26. Poster
- Author
-
Fahad Shaon, Sharmin Afrose, Ya Xiao, Na Meng, Miles Frantz, Danfeng Yao, Barton P. Miller, Sazzadur Rahaman, Ke Tian, and Murat Kantarcioglu
- Subjects
business.industry ,Computer science ,020207 software engineering ,Static program analysis ,02 engineering and technology ,Software ,SPARK (programming language) ,Software deployment ,Software security assurance ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Program slicing ,Android (operating system) ,Software engineering ,business ,computer ,Software assurance ,computer.programming_language - Abstract
Cryptographic API misuses seriously threaten software security. Automatic screening of cryptographic misuse vulnerabilities has been a popular and important line of research over the years. However, the vision of producing a scalable detection tool that developers can routinely use to screen millions of line of code has not been achieved yet. Our main technical goal is to attain a high precision and high throughput approach based on specialized program analysis. Specifically, we design inter-procedural program slicing on top of a new on-demand flow-, context- and field- sensitive data flow analysis. Our current prototype named CryptoGuard can detect a wide range of Java cryptographic API misuses with a precision of 98.61%,, when evaluated on 46 complex Apache Software Foundation projects (including, Spark, Ranger, and Ofbiz). Our evaluation on 6,181 Android apps also generated many security insights. We created a comprehensive benchmark named CryptoAPI-Bench with 40-unit basic cases and 131-unit advanced cases for in-depth comparison with leading solutions (e.g., SpotBugs, CrySL, Coverity). To make CryptoGuard widely accessible, we are in the process of integrating CryptoGuard with the Software Assurance Marketplace (SWAMP). SWAMP is a popular no-cost service for continuous software assurance and static code analysis.
- Published
- 2019
- Full Text
- View/download PDF
27. A Hybrid Blockchain Architecture for Privacy-Enabled and Accountable Auctions
- Author
-
Lalana Kagal, Harsh Bimal Desai, and Murat Kantarcioglu
- Subjects
Cryptocurrency ,Blockchain ,Computer science ,business.industry ,media_common.quotation_subject ,Cryptography ,Computer security ,computer.software_genre ,Payment ,Data sharing ,Information sensitivity ,Common value auction ,business ,Implementation ,computer ,media_common - Abstract
Blockchain has recently emerged as an important tool that can enable critical distributed applications without requiring centralized trust. For example, public blockchains have been used to enable many different cryptocurrencies. Unfortunately, existing public blockchains and smart contracts deployed on them may disclose sensitive information. Although there is some ongoing work that leverage advanced cryptography to address some of these sensitive information leakage issues, they require significant changes to existing and popular blockchains such as Ethereum and are usually computationally expensive. On the other hand, private blockchains have been proposed to allow more efficient and privacy-preserving data sharing among pre-approved group of nodes/participants. Although private blockchains address some of the privacy challenges by allowing sensitive data to be only seen by the select group of participants, they do not allow public accountability of transactions since transactions are approved by known set of users, and cannot be accessed publicly. Given these observations, one natural question that arise is, can we leverage both public and private blockchain infrastructures to enable efficient, privacy enhancing and accountable applications? In this work, we try to address this challenge in the context of digital auctions. Mainly, we propose a novel hybrid blockchain architecture that combines private and public blockchains to allow sensitive bids to be opened on a private blockchain so that only the auctioneer can learn the bids, and no one else. At the same time, we leverage public blockchains to make the auction winner announcement, and payments accountable. Furthermore, using smart contracts deployed on public blockchain, we show how to incentivize truthful behavior among the auction participants. Our extensive empirical results show that this architecture is more efficient in terms of run time and monetary cost compared to pure public blockchain based auction implementations.
- Published
- 2019
- Full Text
- View/download PDF
28. Securing Big Data
- Author
-
Murat Kantarcioglu
- Subjects
business.industry ,Computer science ,Privacy policy ,Big data ,Access control ,Encryption ,NoSQL ,computer.software_genre ,Computer security ,Data access ,Identity theft ,Privacy law ,business ,computer - Abstract
Recent cyber attacks have shown that the leakage/stealing of big data may result in enormous monetary loss and damage to organizational reputation, and increased identity theft risks for individuals. Furthermore, in the age of big data, protecting the security and privacy of stored data is paramount for maintaining public trust, and getting the full value from the collected data. In this talk, we first discuss the unique security and privacy challenges arise due to big data and the NoSQL systems designed to analyze big data. Also we discuss our proposed SecureDL system that is built on top of existing NoSQL databases such as Hadoop and Spark and designed as a data access broker where each request submitted by a user app is automatically captured. These captured requests are logged, analyzed and then modified (if needed) to conform with security and privacy policies (e.g.,[5]), and submitted to underlying NoSQL database. Furthermore, SecureDL can allow organizations to audit their big data usage to prevent data misuse and comply with various privacy regulations[2]. SecureDL is totally transparent from the user point of view and does not require any change to the user's code and/or the underlying NoSQL database systems. Therefore, it can be deployed on existing NoSQL databases. Later on, we discuss how to add additional security layer for protecting big data using encryption techniques (e.g., [1, 3, 4]). Especially, we discuss our work on leveraging the modern hardware based trusted execution environments (TEEs) such as Intel SGX for secure encrypted data processing. We also discuss how to provide a simple, secure and high level language based framework that is suitable for enabling generic data analytics for non-security experts who do not have security concepts such as "oblivious execution''. Our proposed framework allows data scientists to perform the data analytic tasks with TEEs using a Python/Matlab like high level language; and automatically compiles programs written in our language to optimal execution code by managing issues such as optimal data block sizes for I/O, vectorized computations to simplify much of the data processing, and optimal ordering of operations for certain tasks. Using these design choices, we show how to provide guarantees for efficient and secure big data analytics over encrypted data.
- Published
- 2019
- Full Text
- View/download PDF
29. Research Challenges at the Intersection of Big Data, Security and Privacy
- Author
-
Murat Kantarcioglu and Elena Ferrari
- Subjects
Big Data ,cybersecurity ,Computer science ,business.industry ,Big data ,security ,privacy ,Computer security ,computer.software_genre ,sharing ,machine learning ,Intersection ,Artificial Intelligence ,Computer Science (miscellaneous) ,Big data security ,business ,computer ,Specialty Grand Challenge ,Information Systems - Published
- 2019
- Full Text
- View/download PDF
30. Adversarial Active Learning in the Presence of Weak and Malicious Oracles
- Author
-
Yan Zhou, Bowei Xi, and Murat Kantarcioglu
- Subjects
Ideal (set theory) ,business.industry ,Computer science ,Heuristic ,020206 networking & telecommunications ,02 engineering and technology ,010501 environmental sciences ,Crowdsourcing ,Machine learning ,computer.software_genre ,01 natural sciences ,Oracle ,Data modeling ,Adversarial system ,Active learning ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,computer ,Lying ,0105 earth and related environmental sciences - Abstract
We present a robust active learning technique for situations where there are weak and adversarial oracles. Our work falls under the general umbrella of active learning in which training data is insufficient and oracles are queried to supply labels for the most informative samples to expand the training set. On top of that, we consider problems where a large percentage of oracles may be strategically lying, as in adversarial settings. We present an adversarial active learning technique that explores the duality between oracle modeling and data modeling. We demonstrate on real datasets that our adversarial active learning technique is superior to not only the heuristic majority-voting technique but one of the state-of-the-art adversarial crowdsourcing technique—Generative model of Labels, Abilities, and Difficulties (GLAD), when genuine oracles are outnumbered by weak oracles and malicious oracles, and even in the extreme cases where all the oracles are either weak or malicious. To put our technique under more rigorous tests, we compare our adversarial active learner to the ideal active learner that always receives correct labels. We demonstrate that our technique is as effective as the ideal active learner when only one third of the oracles are genuine.
- Published
- 2019
- Full Text
- View/download PDF
31. Determining the Impact of Missing Values on Blocking in Record Linkage
- Author
-
Murat Kantarcioglu, Imrul Chowdhury Anindya, and Bradley A. Malin
- Subjects
0303 health sciences ,Matching (statistics) ,Computer science ,02 engineering and technology ,Blocking (statistics) ,computer.software_genre ,Missing data ,03 medical and health sciences ,Disparate system ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Data Corruption ,Data deduplication ,Data mining ,computer ,Record linkage ,030304 developmental biology ,Block (data storage) - Abstract
Record linkage is the process of integrating information from the same underlying entity across disparate data sets. This process, which is increasingly utilized to build accurate representations of individuals and organizations for a variety of applications, ranging from credit worthiness assessments to continuity of medical care, can be computationally intensive because it requires comparing large quantities of records over a range of attributes. To reduce the amount of computation in record linkage in big data settings, blocking methods, which are designed to limit the number of record pair comparisons that needs to be performed, are critical for scaling up the record linkage process. These methods group together potential matches into blocks, often using a subset of attributes before a final comparator function predicts which record pairs within the blocks correspond to matches. Yet data corruption and missing values adversely influence the performance of blocking methods (e.g., it may cause some matching records not to be placed in the same block). While there has been some investigation into the impact of missing values on general record linkage techniques (e.g., the comparator function), no study has addressed the impact of the missing values on blocking methods. To address this issue, in this work, we systematically perform a detailed empirical analysis of the individual and joint impact of missing values and data corruption on different blocking methods using realistic data sets. Our results show that blocking approaches that do not depend on one type of blocking attributes are more robust against missing values. In addition, our results indicate that blocking parameters must be chosen carefully for different blocking techniques.
- Published
- 2019
- Full Text
- View/download PDF
32. Attacklets: Modeling High Dimensionality in Real World Cyberattacks
- Author
-
Cuneyt Gurcan Akcora, Bhavani Thuraisingham, Yulia R. Gel, Laura R. Marusich, Jonathan Z. Bakdash, and Murat Kantarcioglu
- Subjects
021110 strategic, defence & security studies ,Exploit ,Computer science ,0211 other engineering and technologies ,020206 networking & telecommunications ,02 engineering and technology ,Data breach ,High dimensional ,computer.software_genre ,Visualization ,Data modeling ,Range (mathematics) ,Server ,0202 electrical engineering, electronic engineering, information engineering ,Data mining ,High dimensionality ,computer - Abstract
We introduce attacklets, a novel approach to model the high dimensional interactions in cyberattacks. Attacklets are implemented using a real-world dataset of cyberattacks from the Verizon Data Breach Investigation Report. Whereas the commonly used attack graphs model the action sequences of attackers for specific exploits, attacklets model general attributes and states of each attack separately. Attacklets may inform the number and types of attributes across a wide range of cyberattacks. These structural properties can then be used in machine learning models to classify and predict future cyberattacks.
- Published
- 2018
- Full Text
- View/download PDF
33. Adjudicating Violations in Data Sharing Agreements Using Smart Contracts
- Author
-
Harsh Bimal Desai, Murat Kantarcioglu, Kevin Liu, and Lalana Kagal
- Subjects
Value (ethics) ,business.industry ,Computer science ,Process (engineering) ,media_common.quotation_subject ,020206 networking & telecommunications ,Cryptography ,02 engineering and technology ,Computer security ,computer.software_genre ,External auditor ,Data sharing ,Work (electrical) ,Voting ,Scale (social sciences) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,computer ,media_common - Abstract
As more and more data is collected for various reasons, the sharing of such data becomes paramount to increasing its value. Many applications ranging from smart cities to personalized health care require individuals and organizations to share data at an unprecedented scale. Data sharing is crucial in today's world, but due to privacy reasons, security concerns and regulation issues, the conditions under which the sharing occurs needs to be carefully specified. Currently, this process is done by lawyers and requires the costly signing of legal agreements. In many cases, these data sharing agreements are hard to track, manage or enforce. In this work, we propose a novel alternative for tracking, managing and especially adjudicating such data sharing agreements using smart contracts and blockchain technology. We design a framework that generates smart contracts from parameters based on legal data sharing agreements. The terms in these agreements are automatically adjudicating by the system. Monetary punishment can be employed using secure voting by external auditors to hold the violators accountable. Our experimental evaluation shows that our proposed framework is efficient and low-cost.
- Published
- 2018
- Full Text
- View/download PDF
34. Integrating Cyber Security and Data Science for Social Media: A Position Paper
- Author
-
Bhavani Thuraisingham, Murat Kantarcioglu, and Latifur Khan
- Subjects
Computer science ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Malware ,Position paper ,020201 artificial intelligence & image processing ,Social media ,02 engineering and technology ,computer.software_genre ,Computer security ,computer ,Data science - Abstract
Cyber security and data science are two of the fastest growing fields in Computer Science and more recently they are being integrated for various applications. This position paper will review the developments in applying Data science for cyber security and cyber security for data science and then discuss the applications in Social Media.
- Published
- 2018
- Full Text
- View/download PDF
35. Sensitive Task Assignments in Crowdsourcing Markets with Colluding Workers
- Author
-
Wendy Hui Wang, Bo Zhang, Haipei Sun, Murat Kantarcioglu, and Boxiang Dong
- Subjects
Computer science ,business.industry ,Heuristic ,02 engineering and technology ,Machine learning ,computer.software_genre ,Crowdsourcing ,Task (project management) ,Information sensitivity ,020204 information systems ,Collusion ,0202 electrical engineering, electronic engineering, information engineering ,Selection (linguistics) ,020201 artificial intelligence & image processing ,Pairwise comparison ,Artificial intelligence ,business ,Set (psychology) ,computer - Abstract
Crowdsourcing has raised several security concerns. One of the concerns is how to assign sensitive tasks in the crowdsourcing market, especially when there are colluding participants in crowdsourcing. In this paper, we consider adversarial colluding participants who intend to extract sensitive data by exchanging information. We design a 3-step sensitive task assignment method: (1) the collusion estimation step that quantifies the workers' pairwise collusion probability by estimating answer truth based on their responses; (2) the worker selection step that executes a heuristic sampling-based approach to select the fewest workers whose collusion probability satisfies the given security requirement; and (3) the task partitioning step that splits the sensitive information among the selected workers. We perform an extensive set of experiments on both real-world and synthetic datasets. The results demonstrate the accuracy and efficiency of our method.
- Published
- 2018
- Full Text
- View/download PDF
36. Data Mining with Algorithmic Transparency
- Author
-
Murat Kantarcioglu, Yan Zhou, and Yasmeen Alufaisan
- Subjects
Reverse engineering ,Training set ,Computer science ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Decision tree ,020201 artificial intelligence & image processing ,02 engineering and technology ,Data mining ,computer.software_genre ,Classifier (UML) ,computer - Abstract
In this paper, we investigate whether decision trees can be used to interpret a black-box classifier without knowing the learning algorithm and the training data. Decision trees are known for their transparency and high expressivity. However, they are also notorious for their instability and tendency to grow excessively large. We present a classifier reverse engineering model that outputs a decision tree to interpret the black-box classifier. There are two major challenges. One is to build such a decision tree with controlled stability and size, and the other is that probing the black-box classifier is limited for security and economic reasons. Our model addresses the two issues by simultaneously minimizing sampling cost and classifier complexity. We present our empirical results on four real datasets, and demonstrate that our reverse engineering learning model can effectively approximate and simplify the black box classifier.
- Published
- 2018
- Full Text
- View/download PDF
37. Security Analytics: Essential Data Analytics Knowledge for Cybersecurity Professionals and Students
- Author
-
Thamar Solorio, Murat Kantarcioglu, Ernst L. Leiss, Rakesh M. Verma, and David J. Marchette
- Subjects
Critical security studies ,Cloud computing security ,Certified Information Security Manager ,Computer Networks and Communications ,business.industry ,Computer science ,Standard of Good Practice ,Internet privacy ,Information security ,Computer security ,computer.software_genre ,Security information and event management ,Threat ,Information security audit ,Security service ,Information security management ,Analytics ,ComputingMilieux_COMPUTERSANDEDUCATION ,Electrical and Electronic Engineering ,business ,Law ,computer - Abstract
At the 2015 Workshop on Security and Privacy Analytics, there was a well-attended and vigorous debate on educating and training professionals and students in security analytics. This article extends this debate by laying out essential security analytics concepts for professionals and students, sharing educational experiences, and identifying gaps in the field.
- Published
- 2015
- Full Text
- View/download PDF
38. A privacy preserving protocol for tracking participants in phase I clinical trials
- Author
-
Elizabeth Jonker, Hanna Farah, Murat Kantarcioglu, Saeed Samet, Aleksander Essex, Khaled El Emam, and Craig C. Earle
- Subjects
Matching (statistics) ,Databases, Factual ,Computer science ,Statistics as Topic ,Health Informatics ,02 engineering and technology ,computer.software_genre ,Phase 1 volunteer ,03 medical and health sciences ,0302 clinical medicine ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,030212 general & internal medicine ,Protocol (science) ,Clinical Trials as Topic ,Probabilistic logic ,Construct (python library) ,Data Accuracy ,Computer Science Applications ,Clinical trial ,Data quality ,Secure multi-party computation ,Data mining ,Personally identifiable information ,computer ,Confidentiality - Abstract
Display Omitted We present a privacy-preserving protocol to detect concurrent trial participants.We present a name representation scheme resilient to frequency attacks.The accuracy of the protocol is similar to standard non-secure methods.For a database size of 20,000, the private query time is under 40s on 32 cores. ObjectiveSome phase 1 clinical trials offer strong financial incentives for healthy individuals to participate in their studies. There is evidence that some individuals enroll in multiple trials concurrently. This creates safety risks and introduces data quality problems into the trials. Our objective was to construct a privacy preserving protocol to track phase 1 participants to detect concurrent enrollment. DesignA protocol using secure probabilistic querying against a database of trial participants that allows for screening during telephone interviews and on-site enrollment was developed. The match variables consisted of demographic information. MeasurementThe accuracy (sensitivity, precision, and negative predictive value) of the matching and its computational performance in seconds were measured under simulated environments. Accuracy was also compared to non-secure matching methods. ResultsThe protocol performance scales linearly with the database size. At the largest database size of 20,000 participants, a query takes under 20s on a 64 cores machine. Sensitivity, precision, and negative predictive value of the queries were consistently at or above 0.9, and were very similar to non-secure versions of the protocol. ConclusionThe protocol provides a reasonable solution to the concurrent enrollment problems in phase 1 clinical trials, and is able to ensure that personal information about participants is kept secure.
- Published
- 2015
- Full Text
- View/download PDF
39. Building a Dossier on the Cheap
- Author
-
Harichandan Roy, Murat Kantarcioglu, Imrul Chowdhury Anindya, and Bradley A. Malin
- Subjects
Information privacy ,Database ,Computer science ,02 engineering and technology ,Linkage (mechanical) ,computer.software_genre ,Data science ,law.invention ,Voter registration ,Information sensitivity ,law ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,computer ,Record linkage ,Data integration - Abstract
A wide variety of personal data is routinely collected by numerous organizations that, in turn, share and sell their collections for analytic investigations (e.g., market research). To preserve privacy, certain identifiers are often redacted, perturbed or even removed. A substantial number of attacks have shown that, if care is not taken, such data can be linked to external resources to determine the explicit identifiers (e.g., personal names) or infer sensitive attributes (e.g., income) for the individuals from whom the data was collected. As such, organizations increasingly rely upon record linkage methods to assess the risk such attacks pose and adopt countermeasures accordingly. Traditional linkage methods assume only two datasets would be linked (e.g., linking de-identified hospital discharge to identified voter registration lists), but with the advent of a multi-billion dollar data broker industry, modern adversaries have access to a massive data stash of multiple datasets that can be leveraged. Still, realistic adversaries have budget constraints that prevent them from obtaining and integrating all relevant datasets. Thus, in this work, we investigate a novel privacy risk assessment framework, based on adversaries who plan an integration of datasets for the most accurate estimate of targeted sensitive attributes under a certain budget. To solve this problem, we introduce a graph-based formulation of the problem and predictive modeling methods to prioritize data resources for linkage. We perform an empirical analysis using real world voter registration data from two different U.S. states and show that the methods can be used efficiently to accurately estimate potentially sensitive information disclosure risks even under a non-trivial amount of noise.
- Published
- 2017
- Full Text
- View/download PDF
40. SGX-BigMatrix
- Author
-
Fahad Shaon, Zhiqiang Lin, Murat Kantarcioglu, and Latifur Khan
- Subjects
SQL ,Theoretical computer science ,business.industry ,Computer science ,Distributed computing ,Big data ,Cloud computing ,02 engineering and technology ,Trusted Computing ,Python (programming language) ,Encryption ,Information sensitivity ,High-level programming language ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,computer ,computer.programming_language - Abstract
Recently, using secure processors for trusted computing in cloud has attracted a lot of attention. Over the past few years, efficient and secure data analytic tools (e.g., map-reduce framework, machine learning models, and SQL querying) that can be executed over encrypted data using the trusted hardware have been developed. However, these prior efforts do not provide a simple, secure and high level language based framework that is suitable for enabling generic data analytics for non-security experts who do not have concepts such as "oblivious execution". In this paper, we thus provide such a framework that allows data scientists to perform the data analytic tasks with secure processors using a Python/Matlab-like high level language. Our framework automatically compiles programs written in our language to optimal execution code by managing issues such as optimal data block sizes for I/O, vectorized computations to simplify much of the data processing, and optimal ordering of operations for certain tasks. Furthermore, many language constructs such as if-statements are removed so that a non-expert user is less likely to create a piece of code that may reveal sensitive information while allowing oblivious data processing (i.e., hiding access patterns). Using these design choices, we provide guarantees for efficient and secure data analytics. We show that our framework can be used to run the existing big data benchmark queries over encrypted data using the Intel SGX efficiently. Our empirical results indicate that our proposed framework is orders of magnitude faster than the general oblivious execution alternatives.
- Published
- 2017
- Full Text
- View/download PDF
41. From Myths to Norms: Demystifying Data Mining Models with Instance-Based Transparency
- Author
-
Yan Zhou, Murat Kantarcioglu, Yasmeen Alufaisan, and Bhavani Thuraisingham
- Subjects
Computer science ,media_common.quotation_subject ,Law enforcement ,02 engineering and technology ,computer.software_genre ,Transparency (behavior) ,Popularity ,Consistency (database systems) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Quality (business) ,Simplicity ,Data mining ,Set (psychology) ,computer ,media_common ,Test data - Abstract
The desire of moving from data to intelligence has become a trend that pushes the world we live in today fast forward. Machine learning and data mining techniques are being used as important tools to unlock the wealth of voluminous amounts of data owned by organizations. Despite the existing effort of explaining their underlying machinery in layman's terms, data mining models and their output remain as esoteric, discipline-based black boxes-viable only to experts with years of training and development experiences. As data mining techniques gain growing popularity in the real world, the ability to understand their intelligent decision-making artifacts has become increasingly important and critical, especially in areas such as criminal justice and law enforcement where transparency of decision-making is vital for ensuring fairness, justice, and equality. In this paper, we present a transparency model to help unmask the incomprehensible reasoning many data mining techniques are deservedly taking the blame for. Our transparency model substitutes a comprehensible, rule-based counterpart for the complex, black-box output of any data mining technique using a novel rule selection technique. The rule-based substitute explains the decision made for each instance with a tiny set of rules, resulting in a significant reduction in model complexity. Besides model simplicity and comprehensibility, we also assess the quality of our rule set by measuring its similarity to the output of the original data mining algorithm. Furthermore, we compute its accuracy on unseen test data as a complementary assessment criteria. We empirically demonstrate the effectiveness of our transparency model by experimenting on eight real datasets that deal with predicting important personal attributes ranging from credit worthiness to criminal recidivism. Our transparency model demonstrates a high degree of consistency with the original data mining algorithms in nearly all cases. We also compare our results to one of the state-of-the-art transparency models-LIME, and show that our transparency model outperforms LIME 84% of the time.
- Published
- 2017
- Full Text
- View/download PDF
42. Security and privacy trade-offs in CPS by leveraging inherent differential privacy
- Author
-
Jairo Giraldo, Alvaro A. Cardenas, and Murat Kantarcioglu
- Subjects
Information privacy ,Engineering ,Leverage (finance) ,business.industry ,Privacy software ,Trade offs ,External noise ,Computer security ,computer.software_genre ,Bilevel optimization ,Control system ,Differential privacy ,business ,computer - Abstract
Cyber-physical systems are subject to natural uncertainties and sensor noise that can be amplified/attenuated due to feedback. In this work, we want to leverage these properties in order to define the inherent differential privacy of feedback-control systems without the addition of an external differential privacy noise. If larger levels of privacy are required, we introduce a methodology to add an external differential privacy mechanism that injects the minimum amount of noise that is needed. On the other hand, we show how the combination of inherent and external noise affects system security in terms of the impact that integrity attacks can impose over the system while remaining undetected. We formulate a bilevel optimization problem to redesign the control parameters in order to minimize the attack impact for a desired level of inherent privacy.
- Published
- 2017
- Full Text
- View/download PDF
43. A Cyber-Provenance Infrastructure for Sensor-Based Data-Intensive Applications
- Author
-
Elisa Bertino and Murat Kantarcioglu
- Subjects
021110 strategic, defence & security studies ,Provenance ,business.industry ,Computer science ,Interoperability ,0211 other engineering and technologies ,02 engineering and technology ,Encryption ,Computer security ,computer.software_genre ,Data modeling ,Public-key cryptography ,020204 information systems ,Management system ,0202 electrical engineering, electronic engineering, information engineering ,business ,Host (network) ,computer ,Wireless sensor network - Abstract
This paper focuses on cyber infrastructures composed on sensor networks and conventional host-based systems and applications. In such an infrastructure, assuring data trustworthiness is critical and requires the acquisition and recording of provenance information. This paper discusses research directions to be addressed towards the development of effective provenance management systems. Directions that are discussed include: services for provenance queries, provenance compression and encryption in sensor-based networks, and provenance interoperability and security in host-based systems.
- Published
- 2017
- Full Text
- View/download PDF
44. Hacking social network data mining
- Author
-
Bhavani Thuraisingham, Yasmeen Alufaisan, Yan Zhou, and Murat Kantarcioglu
- Subjects
021110 strategic, defence & security studies ,Measure (data warehouse) ,Social network ,Computer science ,business.industry ,Reliability (computer networking) ,Internet privacy ,0211 other engineering and technologies ,02 engineering and technology ,computer.software_genre ,Data modeling ,Harm ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Sexual orientation ,Spite ,Data mining ,business ,computer ,Hacker - Abstract
Over the years social network data has been mined to predict individuals' traits such as intelligence and sexual orientation. While mining social network data can provide many beneficial services to the user such as personalized experiences, it can also harm the user when used in making critical decisions such as employment. In this work, we investigate the reliability of applying data mining techniques on social network data to predict various individual traits. In spite of the preliminary success of such data mining applications, in this paper, we demonstrate the vulnerabilities of existing state of the art social network data mining techniques when they are facing malicious attacks. Our results indicate that making critical decisions, such as employment or credit approval, based solely on social network data mining results is still premature at this stage. Specifically, we explore Facebook likes data for predicting the traits of a Facebook user, including their political views and sexual orientation. We perform several types of malicious attacks on the predictive models to measure and understand their potential vulnerabilities. We find that existing predictive models built on social network data can be easily manipulated and suggest some countermeasures to prevent some of the proposed attacks.
- Published
- 2017
- Full Text
- View/download PDF
45. Controlling the signal: Practical privacy protection of genomic data sharing through Beacon services
- Author
-
Zhiyu Wan, Yevgeniy Vorobeychik, Bradley A. Malin, and Murat Kantarcioglu
- Subjects
0301 basic medicine ,lcsh:Internal medicine ,Service (systems architecture) ,lcsh:QH426-470 ,Computer science ,iDASH challenge ,Population ,030105 genetics & heredity ,computer.software_genre ,03 medical and health sciences ,Gene Frequency ,Chromosome (genetic algorithm) ,Genetics ,Statistical inference ,Humans ,Beacon service ,lcsh:RC31-1245 ,education ,Computer Security ,Genetics (clinical) ,Vulnerability (computing) ,education.field_of_study ,Data custodian ,Information Dissemination ,Research ,Genomics ,Perturbation ,Data sharing ,lcsh:Genetics ,030104 developmental biology ,Privacy ,Data mining ,Web service ,computer ,Genomic databases - Abstract
Background Genomic data is increasingly collected by a wide array of organizations. As such, there is a growing demand to make summary information about such collections available more widely. However, over the past decade, a series of investigations have shown that attacks, rooted in statistical inference methods, can be applied to discern the presence of a known individual’s DNA sequence in the pool of subjects. Recently, it was shown that the Beacon Project of the Global Alliance for Genomics and Health, a web service for querying about the presence (or absence) of a specific allele, was vulnerable. The Integrating Data for Analysis, Anonymization, and Sharing (iDASH) Center modeled a track in their third Privacy Protection Challenge on how to mitigate the Beacon vulnerability. We developed the winning solution for this track. Methods This paper describes our computational method to optimize the tradeoff between the utility and the privacy of the Beacon service. We generalize the genomic data sharing problem beyond that which was introduced in the iDASH Challenge to be more representative of real world scenarios to allow for a more comprehensive evaluation. We then conduct a sensitivity analysis of our method with respect to several state-of-the-art methods using a dataset of 400,000 positions in Chromosome 10 for 500 individuals from Phase 3 of the 1000 Genomes Project. All methods are evaluated for utility, privacy and efficiency. Results Our method achieves better performance than all state-of-the-art methods, irrespective of how key factors (e.g., the allele frequency in the population, the size of the pool and utility weights) change from the original parameters of the problem. We further illustrate that it is possible for our method to exhibit subpar performance under special cases of allele query sequences. However, we show our method can be extended to address this issue when the query sequence is fixed and known a priori to the data custodian, so that they may plan stage their responses accordingly. Conclusions This research shows that it is possible to thwart the attack on Beacon services, without substantially altering the utility of the system, using computational methods. The method we initially developed is limited by the design of the scenario and evaluation protocol for the iDASH Challenge; however, it can be improved by allowing the data custodian to act in a staged manner.
- Published
- 2017
- Full Text
- View/download PDF
46. Towards a Framework for Developing Cyber Privacy Metrics: A Vision Paper
- Author
-
Chris Clifton, Murat Kantarcioglu, Bhavani Thuraisingham, and Elisa Bertino
- Subjects
Information privacy ,Privacy by Design ,Privacy software ,business.industry ,Computer science ,Internet privacy ,020206 networking & telecommunications ,Cryptography ,02 engineering and technology ,Computer security ,computer.software_genre ,0202 electrical engineering, electronic engineering, information engineering ,Data Protection Act 1998 ,020201 artificial intelligence & image processing ,business ,computer - Abstract
systems for providing data privacy, there is no general methodology for determining the extent to which these techniques, tools and systems reduce practical privacy risks. We need a comprehensive framework where the privacy and utility of multiple privacy-preserving techniques could be measured. This vision paper provides directions for designing such a framework.
- Published
- 2017
- Full Text
- View/download PDF
47. Security vs. privacy: How integrity attacks can be masked by the noise of differential privacy
- Author
-
Jairo Giraldo, Alvaro A. Cardenas, and Murat Kantarcioglu
- Subjects
0209 industrial biotechnology ,Information privacy ,Computer science ,Privacy software ,business.industry ,Data_MISCELLANEOUS ,Internet privacy ,02 engineering and technology ,Computer security ,computer.software_genre ,Noise ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Differential privacy ,020201 artificial intelligence & image processing ,business ,computer - Abstract
Privacy concerns have increased in the last years due to the unprecedented scale of data collected regarding human activity. Differential privacy has emerged in the last decade as an important mechanism to ensure privacy by adding random noise with a specific distribution to the information being collected (e.g., adding noise to smart meters, and sensor readings). Differential privacy has been mainly used in private databases, but lately it has also been extended to consider applications like estimation, consensus algorithms, and control of dynamical systems.
- Published
- 2017
- Full Text
- View/download PDF
48. Securing Data Analytics on SGX with Randomization
- Author
-
Zhiqiang Lin, Bhavani Thuraisingham, Vishal Karande, Swarup Chandra, Murat Kantarcioglu, and Latifur Khan
- Subjects
Information privacy ,business.industry ,Computer science ,Data security ,020206 networking & telecommunications ,Cloud computing ,02 engineering and technology ,Computer security ,computer.software_genre ,Analytics ,0202 electrical engineering, electronic engineering, information engineering ,Data analysis ,Overhead (computing) ,Dummy data ,020201 artificial intelligence & image processing ,Data mining ,business ,Cluster analysis ,computer - Abstract
Protection of data privacy and prevention of unwarranted information disclosure is an enduring challenge in cloud computing when data analytics is performed on an untrusted third-party resource. Recent advances in trusted processor technology, such as Intel SGX, have rejuvenated the efforts of performing data analytics on a shared platform where data security and trustworthiness of computations are ensured by the hardware. However, a powerful adversary may still be able to infer private information in this setting from side channels such as cache access, CPU usage and other timing channels, thereby threatening data and user privacy. Though studies have proposed techniques to hide such information leaks through carefully designed data-independent access paths, such techniques can be prohibitively slow on models with large number of parameters, especially when employed in a real-time analytics application. In this paper, we introduce a defense strategy that can achieve higher computational efficiency with a small trade-off in privacy protection. In particular, we study a strategy that adds noise to traces of memory access observed by an adversary, with the use of dummy data instances. We quantitatively measure privacy guarantee, and empirically demonstrate the effectiveness and limitation of this randomization strategy, using classification and clustering algorithms. Our results show significant reduction in execution time overhead on real-world data sets, when compared to a defense strategy using only data-oblivious mechanisms.
- Published
- 2017
- Full Text
- View/download PDF
49. Privacy and Security Challenges in GIS
- Author
-
Murat Kantarcioglu, Ashraful Alam, Latifur Khan, Bhavani Thuraisingham, and Ganesh Subbiah
- Subjects
Security analysis ,Information privacy ,Cloud computing security ,Computer science ,business.industry ,Privacy software ,Internet privacy ,Computer security ,computer.software_genre ,business ,computer - Published
- 2017
- Full Text
- View/download PDF
50. Gaussian Mixture Models for Classification and Hypothesis Tests Under Differential Privacy
- Author
-
Bowei Xi, Ali Inan, Murat Kantarcioglu, and Xiaosu Tong
- Subjects
business.industry ,Computer science ,Statistical model ,Pattern recognition ,Mixture model ,computer.software_genre ,Statistical database ,Differential privacy ,Z-test ,Artificial intelligence ,Data mining ,business ,computer ,Computer Science::Databases ,Student's t-test ,Statistical hypothesis testing ,Type I and type II errors - Abstract
Many statistical models are constructed using very basic statistics: mean vectors, variances, and covariances. Gaussian mixture models are such models. When a data set contains sensitive information and cannot be directly released to users, such models can be easily constructed based on noise added query responses. The models nonetheless provide preliminary results to users. Although the queried basic statistics meet the differential privacy guarantee, the complex models constructed using these statistics may not meet the differential privacy guarantee. However it is up to the users to decide how to query a database and how to further utilize the queried results. In this article, our goal is to understand the impact of differential privacy mechanism on Gaussian mixture models. Our approach involves querying basic statistics from a database under differential privacy protection, and using the noise added responses to build classifier and perform hypothesis tests. We discover that adding Laplace noises may have a non-negligible effect on model outputs. For example variance-covariance matrix after noise addition is no longer positive definite. We propose a heuristic algorithm to repair the noise added variance-covariance matrix. We then examine the classification error using the noise added responses, through experiments with both simulated data and real life data, and demonstrate under which conditions the impact of the added noises can be reduced. We compute the exact type I and type II errors under differential privacy for one sample z test, one sample t test, and two sample t test with equal variances. We then show under which condition a hypothesis test returns reliable result given differentially private means, variances and covariances.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.