Author: "Farough Ashkouti" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Farough Ashkouti"' showing total 3 results

Start Over Author "Farough Ashkouti"

3 results on '"Farough Ashkouti"'

1. A distributed computing model for big data anonymization in the networks.

Author: Farough Ashkouti and Keyhan Khamforoosh
Subjects: Medicine, Science
Abstract: Recently big data and its applications had sharp growth in various fields such as IoT, bioinformatics, eCommerce, and social media. The huge volume of data incurred enormous challenges to the architecture, infrastructure, and computing capacity of IT systems. Therefore, the compelling need of the scientific and industrial community is large-scale and robust computing systems. Since one of the characteristics of big data is value, data should be published for analysts to extract useful patterns from them. However, data publishing may lead to the disclosure of individuals' private information. Among the modern parallel computing platforms, Apache Spark is a fast and in-memory computing framework for large-scale data processing that provides high scalability by introducing the resilient distributed dataset (RDDs). In terms of performance, Due to in-memory computations, it is 100 times faster than Hadoop. Therefore, Apache Spark is one of the essential frameworks to implement distributed methods for privacy-preserving in big data publishing (PPBDP). This paper uses the RDD programming of Apache Spark to propose an efficient parallel implementation of a new computing model for big data anonymization. This computing model has three-phase of in-memory computations to address the runtime, scalability, and performance of large-scale data anonymization. The model supports partition-based data clustering algorithms to preserve the λ-diversity privacy model by using transformation and actions on RDDs. Therefore, the authors have investigated Spark-based implementation for preserving the λ-diversity privacy model by two designed City block and Pearson distance functions. The results of the paper provide a comprehensive guideline allowing the researchers to apply Apache Spark in their own researches.
Published: 2023
Full Text: View/download PDF

2. DHkmeans-ℓdiversity: distributed hierarchical K-means for satisfaction of the ℓ-diversity privacy model using Apache Spark

Author: Amir Sheikhahmadi, Hana Khamfroush, Farough Ashkouti, and Keyhan Khamforoosh
Subjects: Data anonymization, Computer science, business.industry, Big data, k-means clustering, Data publishing, computer.software_genre, Theoretical Computer Science, Hardware and Architecture, Spark (mathematics), Scalability, Data mining, Cluster analysis, business, computer, Personally identifiable information, Software, Information Systems
Abstract: One of the main steps in the data lifecycle is to publish it for data analysts to discover hidden patterns. But, data publishing may lead to unwanted disclosure of personal information and cause privacy problems. Data anonymization techniques preserve privacy models to prevent the disclosure of individuals’ private information in published data. In this paper, a distributed in-memory method is proposed on the Apache Spark framework to preserve the l-diversity privacy model. This method anonymizes large-scale data in a three-phase process, which includes, seed selection, data clustering for $$\ell$$ -diversity, and finalizing phase. In this method, a hierarchical kmeans-based data clustering algorithm has been designed for data anonymization. One of the major challenges of anonymization methods is to establish a better trade-off between data utility and privacy. Therefore, for calculating the distance between records and forming more cohesive ldiverse-clusters, the authors have designed two Manhattan-based and Euclidean-based distance functions to satisfy the requirements of the l-diversity model. Given the 100-fold speed of the Spark compared to MapReduce, the proposed method is presented using in-memory RDD programming in Apache Spark, to address the runtime, scalability, and performance in large-scale data anonymization as it exists in the previous MapReduce-based algorithms. Our method provides general knowledge to use parallel in-memory computation of Spark in big data anonymization. In experiments, this method has obtained lower information loss and loses about 1% to 2% accuracy and FMeasure criteria; therefore, it establishes a better trade-off than the state-of-the-art MapReduce-based Mondrian methods
Published: 2021

3. DI-Mondrian: Distributed improved Mondrian for satisfaction of the L-diversity privacy model using Apache Spark

Author: Keyhan Khamforoosh, Farough Ashkouti, and Amir Sheikhahmadi
Subjects: Information privacy, Information Systems and Management, Data anonymization, Computer science, 05 social sciences, 050301 education, 02 engineering and technology, Mondrian, computer.software_genre, Computer Science Applications, Theoretical Computer Science, Privacy model, Artificial Intelligence, Control and Systems Engineering, Spark (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, 0503 education, Classifier (UML), computer, Software
Abstract: For the extraction of useful patterns, the collected data should be distributed to and shared with analyzers. This, however, creates problems and challenges for the individual with respect to their privacy and identity. In this paper, the Mondrian multidimensional anonymization method was developed and improved for satisfaction of the l-diversity privacy model, and it has been presented in a distributed fashion within the Apache Spark framework. Since one of the major challenges in data privacy is the tradeoff between privacy and data utility, the presented method focuses on information loss and classifier evaluation criteria. Therefore, the cut dimension was selected using the coefficient of variation and information gain criteria, and the cut points were chosen dynamically, which led to a decrease in the information loss parameter and an improvement in the classifier performance evaluation criteria such as accuracy and FMeasure compared to the previous algorithms in the literature. The processing speed is 100 times higher in Spark than in the Hadoop framework. Consequently, the proposed method was presented in a distributed fashion based on RDDs programming within Apache Spark framework. This will resolve the problem of speed in large-scale data anonymization as it exists in the previous Hadoop-based algorithms. The results of the experiments performed on the numerical datasets demonstrate the improvements made by the proposed method.
Published: 2021

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

3 results on '"Farough Ashkouti"'

1. A distributed computing model for big data anonymization in the networks.

2. DHkmeans-ℓdiversity: distributed hierarchical K-means for satisfaction of the ℓ-diversity privacy model using Apache Spark

3. DI-Mondrian: Distributed improved Mondrian for satisfaction of the L-diversity privacy model using Apache Spark

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

3 results on '"Farough Ashkouti"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources