26 results on '"Chen, Jinjun"'
Search Results
2. Integrating Collaborative Filtering and Association Rule Mining for Market Basket Recommendation
- Author
-
Wang, Feiran, Wen, Yiping, Chen, Jinjun, Cao, Buqing, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Hacid, Hakim, editor, Cellary, Wojciech, editor, Wang, Hua, editor, Paik, Hye-Young, editor, and Zhou, Rui, editor
- Published
- 2018
- Full Text
- View/download PDF
3. Personalized Recommendation System Based on Collaborative Filtering for IoT Scenarios.
- Author
-
Cui, Zhihua, Xu, Xianghua, Xue, Fei, Cai, Xingjuan, Cao, Yang, Zhang, Wensheng, and Chen, Jinjun
- Abstract
Recommendation technology is an important part of the Internet of Things (IoT) services, which can provide better service for users and help users get information anytime, anywhere. However, the traditional recommendation algorithms cannot meet user's fast and accurate recommended requirements in the IoT environment. In the face of a large-volume data, the method of finding neighborhood by comparing whole user information will result in a low recommendation efficiency. In addition, the traditional recommendation system ignores the inherent connection between user's preference and time. In reality, the interest of the user varies over time. Recommendation system should provide users accurate and fast with the change of time. To address this, we propose a novel recommendation model based on time correlation coefficient and an improved K-means with cuckoo search (CSK-means), called TCCF. The clustering method can cluster similar users together for further quick and accurate recommendation. Moreover, an effective and personalized recommendation model based on preference pattern (PTCCF) is designed to improve the quality of TCCF. It can provide a higher quality recommendation by analyzing the user's behaviors. The extensive experiments are conducted on two real datasets of MovieLens and Douban, and the precision of our model have improved about 5.2 percent compared with the MCoC model. Systematic experimental results have demonstrated our models TCCF and PTCCF are effective for IoT scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
4. A Scalable Multi-Data Sources Based Recursive Approximation Approach for Fast Error Recovery in Big Sensing Data on Cloud.
- Author
-
Yang, Chi, Xu, Xianghua, Ramamohanarao, Kotagiri, and Chen, Jinjun
- Subjects
BIG data ,EUCLIDEAN distance ,SAMPLING errors ,ALGORITHMS ,CLOUD storage ,TIME series analysis ,FALSIFICATION of data - Abstract
Big sensing data is commonly encountered from various surveillance or sensing systems. Sampling and transferring errors are commonly encountered during each stage of sensing data processing. How to recover from these errors with accuracy and efficiency is quite challenging because of high sensing data volume and unrepeatable wireless communication environment. While Cloud provides a promising platform for processing big sensing data, however scalable and accurate error recovery solutions are still need. In this paper, we propose a novel approach to achieve fast error recovery in a scalable manner on cloud. This approach is based on the prediction of a recovery replacement data by making multiple data sources based approximation. The approximation process will use coverage information carried by data units to limit the algorithm in a small cluster of sensing data instead of a whole data spectrum. Specifically, in each sensing data cluster, a Euclidean distance based approximation is proposed to calculate a time series prediction. With the calculated time series, a detected error can be recovered with a predicted data value. Through the experiment with real world meteorological data sets on cloud, we demonstrate that the proposed error recovery approach can achieve high accuracy in data approximation to replace the original data error. At the same time, with MapReduce based implementation for scalability, the experimental results also show significant efficiency on time saving. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
5. Multivariate Multi-Order Markov Multi-Modal Prediction With Its Applications in Network Traffic Management.
- Author
-
Liu, Huazhong, Yang, Laurence T., Chen, Jinjun, Ye, Minghao, Ding, Jihong, and Kuang, Liwei
- Abstract
Predicting the future network traffic through big data analysis technologies has been one of the important preoccupations of network design and management. Combining Markov chains with tensors to implement predictions has received considerable attention in the era of big data. However, when dealing with multi-order Markov models, the existing approaches including the combination of states and Z-eigen decomposition still face some shortcomings. Therefore, this paper focuses on proposing a novel multivariate multi-order Markov transition to realize multi-modal accurate predictions. First, we put forward two new tensor operations including tensor join and unified product (UP). Then a general multivariate multi-order (2M) Markov model with its UP-based state transition is proposed. Afterwards, we develop a multi-step transition tensor for 2M Markov models to implement the multi-step state transition. Furthermore, an UP-based power method is proposed to calculate the stationary joint probability distribution tensor (i.e., stationary joint eigentensor, SJE) and realize SJE based multi-modal accurate predictions. Finally, a series of experiments under various Markov models on real-world network traffic datasets are conducted. Experimental results demonstrate that the proposed SJE based approach can improve the prediction accuracy for network traffic by highest up to 38.47 percentage points compared with the Z-eigen based approach. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
6. Secure Tensor Decomposition for Big Data Using Transparent Computing Paradigm.
- Author
-
Kuang, Liwei, Yang, Laurence T., Zhu, Qing, and Chen, Jinjun
- Subjects
SINGULAR value decomposition ,BIG data ,KNOWLEDGE gap theory - Abstract
The exponential growth of big data places a great burden on current computing environment. However, there exists a vast gap in the approaches that can securely and efficiently process the large scale heterogeneous data. This paper, on the basis of transparent computing paradigm, presents a unified approach that coordinates the transparent servers and transparent clients to decompose tensor, a mathematical model widely used in data intensive applications, to a core tensor multiplied with a number of truncated orthogonal bases. The structured, semi-structured as well as structured data are transformed to low-order sub-tensors, which are then encrypted using the Paillier homomorphic encryption scheme on the transparent clients. The cipher sub-tensors are transported to the transparent servers for carrying out the integration and decomposition operations. Three secure decomposition algorithms, namely secure bidiagonalization algorithm, secure singular value decomposition algorithm, and secure mode product algorithm, are presented to generate the bidiagonal matrices, truncated orthogonal bases, and core tensor respectively. The homomorphic operations of the three algorithms are carried out on the transparent servers, while the non-homomorphic operations, namely division and square root, are performed on the transparent clients. Experimental results indicate that the proposed method is promising for secure tensor decomposition for big data. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
7. A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud.
- Author
-
Yang, Chi and Chen, Jinjun
- Subjects
- *
DATA analysis , *CLOUD computing , *DETECTORS , *INFORMATION retrieval , *DATA compression - Abstract
Big sensing data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity. Cloud computing provides a promising platform for big sensing data processing and storage as it provides a flexible stack of massive computing, storage, and software services in a scalable manner. Current big sensing data processing on Cloud have adopted some data compression techniques. However, due to the high volume and velocity of big sensing data, traditional data compression techniques lack sufficient efficiency and scalability for data processing. Based on specific on-Cloud data compression requirements, we propose a novel scalable data compression approach based on calculating similarity among the partitioned data chunks. Instead of compressing basic data units, the compression will be conducted over partitioned data chunks. To restore original data sets, some restoration functions and predictions will be designed. MapReduce is used for algorithm implementation to achieve extra scalability on Cloud. With real world meteorological big sensing data experiments on U-Cloud platform, we demonstrate that the proposed scalable compression approach based on data chunk similarity can significantly improve data compression efficiency with affordable data accuracy loss. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
8. HKE-BC: hierarchical key exchange for secure scheduling and auditing of big data in cloud computing.
- Author
-
Liu, Chang, Beaugeard, Nick, Yang, Chi, Zhang, Xuyun, and Chen, Jinjun
- Subjects
DATA security ,BIG data ,CLOUD computing ,COMPUTER architecture ,INFORMATION & communication technologies - Abstract
Big data is one of the most referred key words in recent information and communications technology industry. As the new-generation distributed computing platform, cloud environments offer high efficiency and low cost for data-intensive storage and computation for big data applications. Cloud resources and services are available in pay-as-you-go mode, which brings extraordinary flexibility and cost-effectiveness as well as minimal investments in their own computing infrastructure. However, these advantages come at a price-people no longer have direct control over their own data. Based on this view, data security becomes a major concern in the adoption of cloud computing. Authenticated key exchange is essential to a security system that is based on high-efficiency symmetric-key encryptions. With virtualisation technology being applied, existing key exchange schemes such as Internet key exchange become time consuming when directly deployed into cloud computing environment, especially for large-scale tasks that involve intensive user-cloud interactions, such as scheduling and data auditing. In this paper, we propose a novel hierarchical key exchange scheme, namely hierarchical key exchange for big data in cloud, which aims at providing efficient security-aware scheduling and auditing for cloud environments. In this novel key exchange scheme, we developed a two-phase layer-by-layer iterative key exchange strategy to achieve more efficient authenticated key exchange without sacrificing the level of data security. Both theoretical analysis and experimental results demonstrate that when deployed in cloud environments with diverse server layouts, efficiency of the proposed scheme is dramatically superior to its predecessors cloud computing background key exchange and Internet key exchange schemes. Copyright © 2014 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
9. MuR-DPA: Top-Down Levelled Multi-Replica Merkle Hash Tree Based Secure Public Auditing for Dynamic Big Data Storage on Cloud.
- Author
-
Liu, Chang, Ranjan, Rajiv, Yang, Chi, Zhang, Xuyun, Wang, Lizhe, and Chen, Jinjun
- Subjects
BACK up systems ,INFORMATION retrieval ,ROUTING (Computer network management) ,CLOUD computing ,HASHING - Abstract
Cloud computing that provides elastic computing and storage resource on demand has become increasingly important due to the emergence of “big data”. Cloud computing resources are a natural fit for processing big data streams as they allow big data application to run at a scale which is required for handling its complexities (data volume, variety and velocity). With the data no longer under users’ direct control, data security in cloud computing is becoming one of the most concerns in the adoption of cloud computing resources. In order to improve data reliability and availability, storing multiple replicas along with original datasets is a common strategy for cloud service providers. Public data auditing schemes allow users to verify their outsourced data storage without having to retrieve the whole dataset. However, existing data auditing techniques suffers from efficiency and security problems. First, for dynamic datasets with multiple replicas, the communication overhead for update verifications is very large, because each update requires updating of all replicas, where verification for each update requires O(log n ) communication complexity. Second, existing schemes cannot provide public auditing and authentication of block indices at the same time. Without authentication of block indices, the server can build a valid proof based on data blocks other than the blocks client requested to verify. In order to address these problems, in this paper, we present a novel public auditing scheme named MuR-DPA. The new scheme incorporated a novel authenticated data structure (ADS) based on the Merkle hash tree (MHT), which we call MR-MHT. To support full dynamic data updates and authentication of block indices, we included rank and level values in computation of MHT nodes. In contrast to existing schemes, level values of nodes in MR-MHT are assigned in a top-down order, and all replica blocks for each data block are organized into a same replica sub-tree. Such a configuration allows efficient verification of updates for multiple replicas. Compared to existing integrity verification and public auditing schemes, theoretical analysis and experimental results show that the proposed MuR-DPA scheme can not only incur much less communication overhead for both update verification and integrity verification of cloud datasets with multiple replicas, but also provide enhanced security against dishonest cloud service providers. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
10. Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud.
- Author
-
Zhang, Xuyun, Dou, Wanchun, Pei, Jian, Nepal, Surya, Yang, Chi, Liu, Chang, and Chen, Jinjun
- Subjects
BIG data ,SCALABILITY ,DATA security ,INFORMATION storage & retrieval systems ,CLOUD computing ,ELECTRONIC health records - Abstract
Cloud computing provides promising scalable IT infrastructure to support various processing of a variety of big data applications in sectors such as healthcare and business. Data sets like electronic health records in such applications often contain privacy-sensitive information, which brings about privacy concerns potentially if the information is released or shared to third-parties in cloud. A practical and widely-adopted technique for data privacy preservation is to anonymize data via generalization to satisfy a given privacy model. However, most existing privacy preserving approaches tailored to small-scale data sets often fall short when encountering big data, due to their insufficiency or poor scalability. In this paper, we investigate the local-recoding problem for big data anonymization against proximity privacy breaches and attempt to identify a scalable solution to this problem. Specifically, we present a proximity privacy model with allowing semantic proximity of sensitive values and multiple sensitive attributes, and model the problem of local recoding as a proximity-aware clustering problem. A scalable two-phase clustering approach consisting of a t-ancestors clustering (similar to k-means) algorithm and a proximity-aware agglomerative clustering algorithm is proposed to address the above problem. We design the algorithms with MapReduce to gain high scalability by performing data-parallel computation in cloud. Extensive experiments on real-life data sets demonstrate that our approach significantly improves the capability of defending the proximity privacy breaches, the scalability and the time-efficiency of local-recoding anonymization over existing approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
11. HireSome-II: Towards Privacy-Aware Cross-Cloud Service Composition for Big Data Applications.
- Author
-
Dou, Wanchun, Zhang, Xuyun, Liu, Jianxun, and Chen, Jinjun
- Subjects
CLOUD computing ,QUALITY of service ,BIG data ,QUALITY control of information storage & retrieval systems ,PRIVACY - Abstract
Cloud computing promises a scalable infrastructure for processing big data applications such as medical data analysis. Cross-cloud service composition provides a concrete approach capable for large-scale big data processing. However, the complexity of potential compositions of cloud services calls for new composition and aggregation methods, especially when some private clouds refuse to disclose all details of their service transaction records due to business privacy concerns in cross-cloud scenarios. Moreover, the credibility of cross-clouds and on-line service compositions will become suspicional, if a cloud fails to deliver its services according to its “promised” quality. In view of these challenges, we propose a privacy-aware cross-cloud service composition method, named HireSome-II (History record-based Service optimization method) based on its previous basic version HireSome-I. In our method, to enhance the credibility of a composition plan, the evaluation of a service is promoted by some of its QoS history records, rather than its advertised QoS values. Besides, the k-means algorithm is introduced into our method as a data filtering tool to select representative history records. As a result, HireSome-II can protect cloud privacy, as a cloud is not required to unveil all its transaction records. Furthermore, it significantly reduces the time complexity of developing a cross-cloud service composition plan as only representative ones are recruited, which is demanded for big data processing. Simulation and analytical results demonstrate the validity of our method compared to a benchmark. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
12. KASR: A Keyword-Aware Service Recommendation Method on MapReduce for Big Data Applications.
- Author
-
Meng, Shunmei, Dou, Wanchun, Zhang, Xuyun, and Chen, Jinjun
- Subjects
KEYWORD searching ,BIG data ,ONLINE information services ,INFORMATION storage & retrieval systems ,CLOUD computing - Abstract
Service recommender systems have been shown as valuable tools for providing appropriate recommendations to users. In the last decade, the amount of customers, services and online information has grown rapidly, yielding the big data analysis problem for service recommender systems. Consequently, traditional service recommender systems often suffer from scalability and inefficiency problems when processing or analysing such large-scale data. Moreover, most of existing service recommender systems present the same ratings and rankings of services to different users without considering diverse users’ preferences, and therefore fails to meet users’ personalized requirements. In this paper, we propose a Keyword-Aware Service Recommendation method, named KASR, to address the above challenges. It aims at presenting a personalized service recommendation list and recommending the most appropriate services to the users effectively. Specifically, keywords are used to indicate users’ preferences, and a user-based Collaborative Filtering algorithm is adopted to generate appropriate recommendations. To improve its scalability and efficiency in big data environment, KASR is implemented on Hadoop, a widely-adopted distributed computing platform using the MapReduce parallel processing paradigm. Finally, extensive experiments are conducted on real-world data sets, and results demonstrate that KASR significantly improves the accuracy and scalability of service recommender systems over existing approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
13. Privacy-Preserving Layer over MapReduce on Cloud.
- Author
-
Zhang, Xuyun, Liu, Chang, Nepal, Surya, Dou, Wanchun, and Chen, Jinjun
- Abstract
Cloud computing provides powerful and economical infrastructural resources for cloud users to handle everincreasing Big Data with data-processing frameworks such as MapReduce. Based on cloud computing, the MapReduce framework has been widely adopted to process huge-volume data sets by various companies and organizations due to its salient features. Nevertheless, privacy concerns in MapReduce are aggravated because the privacy-sensitive information scattered among various data sets can be recovered with more ease when data and computational power are considerably abundant. Existing approaches employ techniques like access control or encryption to protect privacy in data processed by MapReduce. However, such techniques fail to preserve data privacy cost-effectively in some common scenarios where data are processed for data analytics, mining and sharing on cloud. As such, we propose a flexible, scalable, dynamical and costeffective privacy-preserving layer over the MapReduce framework in this paper. The layer ensures data privacy preservation and data utility under the given privacy requirements before data are further processed by subsequent MapReduce tasks. A corresponding prototype system is developed for the privacy-preserving layer as well. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
14. Authorized Public Auditing of Dynamic Big Data Storage on Cloud with Efficient Verifiable Fine-Grained Updates.
- Author
-
Liu, Chang, Chen, Jinjun, Yang, Laurence T., Zhang, Xuyun, Yang, Chi, Ranjan, Rajiv, and Rao, Kotagiri
- Subjects
- *
CLOUD computing , *BACK up systems , *INFORMATION storage & retrieval systems , *CASCADING style sheets , *CAPITAL investments , *SOCIAL media - Abstract
Cloud computing opens a new era in IT as it can provide various elastic and scalable IT services in a pay-as-you-go fashion, where its users can reduce the huge capital investments in their own IT infrastructure. In this philosophy, users of cloud storage services no longer physically maintain direct control over their data, which makes data security one of the major concerns of using cloud. Existing research work already allows data integrity to be verified without possession of the actual data file. When the verification is done by a trusted third party, this verification process is also called data auditing, and this third party is called an auditor. However, such schemes in existence suffer from several common drawbacks. First, a necessary authorization/authentication process is missing between the auditor and cloud service provider, i.e., anyone can challenge the cloud service provider for a proof of integrity of certain file, which potentially puts the quality of the so-called ‘auditing-as-a-service’ at risk; Second, although some of the recent work based on BLS signature can already support fully dynamic data updates over fixed-size data blocks, they only support updates with fixed-sized blocks as basic unit, which we call coarse-grained updates. As a result, every small update will cause re-computation and updating of the authenticator for an entire file block, which in turn causes higher storage and communication overheads. In this paper, we provide a formal analysis for possible types of fine-grained data updates and propose a scheme that can fully support authorized auditing and fine-grained update requests. Based on our scheme, we also propose an enhancement that can dramatically reduce communication overheads for verifying small updates. Theoretical analysis and experimental results demonstrate that our scheme can offer not only enhanced security and flexibility, but also significantly lower overhead for big data applications with a large number of frequent small updates, such as applications in social media and business transactions. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
15. SaC-FRAPP: a scalable and cost-effective framework for privacy preservation over big data on cloud.
- Author
-
Zhang, Xuyun, Liu, Chang, Nepal, Surya, Yang, Chi, Dou, Wanchun, and Chen, Jinjun
- Subjects
COMPUTER networks ,SCALABILITY ,DATA privacy ,CLOUD computing ,INFORMATION technology industry ,DATA encryption ,INTERNET ,EMPIRICAL research - Abstract
SUMMARY Big data and cloud computing are two disruptive trends nowadays, provisioning numerous opportunities to the current information technology industry and research communities while posing significant challenges on them as well. Cloud computing provides powerful and economical infrastructural resources for cloud users to handle ever increasing data sets in big data applications. However, processing or sharing privacy-sensitive data sets on cloud probably engenders severe privacy concerns because of multi-tenancy. Data encryption and anonymization are two widely-adopted ways to combat privacy breach. However, encryption is not suitable for data that are processed and shared frequently, and anonymizing big data and manage numerous anonymized data sets are still challenges for traditional anonymization approaches. As such, we propose a scalable and cost-effective framework for privacy preservation over big data on cloud in this paper. The key idea of the framework is that it leverages cloud-based MapReduce to conduct data anonymization and manage anonymous data sets, before releasing data to others. The framework provides a holistic conceptual foundation for privacy preservation over big data. Further, a corresponding proof-of-concept prototype system is implemented. Empirical evaluations demonstrate that scalable and cost-effective framework for privacy preservation can anonymize large-scale data sets and mange anonymous data sets in a highly flexible, scalable, efficient, and cost-effective fashion. Copyright © 2013 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
16. An Efficient Framework for the Analysis of Big Brain Signals Data
- Author
-
Supriya, Siuly, Wang, Hua, Zhang, Yanchun, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Wang, Junhu, editor, Cong, Gao, editor, Chen, Jinjun, editor, and Qi, Jianzhong, editor
- Published
- 2018
- Full Text
- View/download PDF
17. Uncertainty Evaluation for Big Data of Mass Standards in a Key Comparison
- Author
-
Ren, Xiaoping, Nan, Fang, Wang, Jian, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Wang, Guojun, editor, Chen, Jinjun, editor, and Yang, Laurence T., editor
- Published
- 2018
- Full Text
- View/download PDF
18. BDCP: A Framework for Big Data Copyright Protection Based on Digital Watermarking
- Author
-
Yang, Jingyue, Wang, Haiquan, Wang, Zhaoyi, Long, Jieyi, Du, Bowen, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Wang, Guojun, editor, Chen, Jinjun, editor, and Yang, Laurence T., editor
- Published
- 2018
- Full Text
- View/download PDF
19. Risk Identification-Based Association Rule Mining for Supply Chain Big Data
- Author
-
Salamai, Abdullah, Saberi, Morteza, Hussain, Omar, Chang, Elizabeth, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Wang, Guojun, editor, Chen, Jinjun, editor, and Yang, Laurence T., editor
- Published
- 2018
- Full Text
- View/download PDF
20. Balanced Iterative Reducing and Clustering Using Hierarchies with Principal Component Analysis (PBirch) for Intrusion Detection over Big Data in Mobile Cloud Environment
- Author
-
Peng, Kai, Zheng, Lixin, Xu, Xiaolong, Lin, Tao, Leung, Victor C. M., Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Wang, Guojun, editor, Chen, Jinjun, editor, and Yang, Laurence T., editor
- Published
- 2018
- Full Text
- View/download PDF
21. Dual incremental fuzzy schemes for frequent itemsets discovery in streaming numeric data.
- Author
-
Zheng, Hui, Li, Peng, Liu, Qing, Chen, Jinjun, Huang, Guangli, Wu, Junfeng, Xue, Yun, and He, Jing
- Subjects
- *
INTEGER approximations , *DATA distribution , *DISCRETIZATION methods , *ASSOCIATION rule mining , *APPROXIMATE reasoning , *BIG data , *CACHE memory , *SEQUENTIAL pattern mining - Abstract
• There is no need to re-visit previous batches of numeric data. • The consumed time and the estimated error of the proposed two schemes, which is stable with the number of data increasing, is much less than traditional method. • Approximate support values of item-sets are proved in this paper, and also testified by synthetic and real datasets to converge when the number of streaming data increase. • Errors of approximate support values of item-sets are also testified to converge at zero when the number of streaming data increase, which means approximate support values converge to their corresponding real support value. Discovering frequent itemsets is essential for finding association rules, yet too computational expensive using existing algorithms. It is even more challenging to find frequent itemsets upon streaming numeric data. The streaming characteristic leads to a challenge that streaming numeric data cannot be scanned repetitively. The numeric characteristic requires that streaming numeric data should be pre-processed into itemsets, e.g., fuzzy-set methods can transform numeric data into itemsets with non-integer membership values. This leads to a challenge that the frequency of itemsets are usually not integer. To overcome such challenges, fast methods and stream processing methods have been applied. However, the existing algorithms usually either still need to re-visit some previous data multiple times, or cannot count non-integer frequencies. Those existing algorithms re-visiting some previous data have to sacrifice large memory spaces to cache those previous data to avoid repetitive scanning. When dealing with big streaming data nowadays, such large-memory requirement often goes beyond the capacity of many computers. Those existing algorithms unable to count non-integer frequencies would be very inaccurate in estimating the non-integer frequencies of frequent itemsets if used with integer approximation of frequency-counting. To solve the aforementioned issues, in this paper we propose two incremental schemes for frequent itemsets discovery that are capable to work efficiently with streaming numeric data. In particular, they are able to count non-integer frequency without re-visiting any previous data. The key of our schemes to the benefits in efficiency is to extract essential statistics that would occupy much less memory than the raw data do for the ongoing streaming data. This grants the advantages of our schemes 1) allowing non-integer counting and thus natural integration with a fuzzy-set discretization method to boost robustness and anti-noise capability for numeric data, 2) enabling the design of a decay ratio for different data distributions, which can be adapted for three general stream models: landmark, damped and sliding windows, and 3) achieving highly-accurate fuzzy-item-sets discovery with efficient stream-processing. Experimental studies demonstrate the efficiency and effectiveness of our dual schemes with both synthetic and real-world datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
22. Optimal LEACH protocol with modified bat algorithm for big data sensing systems in Internet of Things.
- Author
-
Cui, Zhihua, Cao, Yang, Cai, Xingjuan, Cai, Jianghui, and Chen, Jinjun
- Subjects
- *
INTERNET of things , *BIG data , *METAHEURISTIC algorithms , *INTERNET protocols , *ALGORITHMS , *BATS , *CENTROID - Abstract
Big data sensing system (BDSS) plays an important role in the Internet of Things, in which how to reduce power consumption is one crucial problem. Currently, low energy adaptive clustering hierarchy (LEACH) protocol is one well-known algorithm used in BDSS with low energy cost. In this paper, a new variant of bat algorithm combined with centroid strategy is introduced. Three different centroid strategies with six different designs are introduced. In addition, the velocity inertia-free update equation is also provided. The optimization performance is verified by CEC2013 benchmarks in those designs against standard BA. Simulation results prove that the bat algorithm with weighted harmonic centroid (WHCBA) strategy is superior to other algorithms. By integrating WHCBA into LEACH protocol, we develop a two-stage cluster-head node selection strategy and can save more energy compared to the standard LEACH protocol. • Modified bat algorithm with centroid strategy is designed. • Six different centroid strategies are designed and compared. • Velocity inertia-free update equation is designed. • Bat algorithm with weighted harmonic centroid strategy is used to optimize the performance of LEACH protocol, and compared with other four algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
23. A dynamic prime number based efficient security mechanism for big sensing data streams.
- Author
-
Puthal, Deepak, Nepal, Surya, Ranjan, Rajiv, and Chen, Jinjun
- Subjects
- *
DATA security , *BIG data , *STREAMING technology , *PRIME numbers , *DATA quality , *SENSOR networks - Abstract
Big data streaming has become an important paradigm for real-time processing of massive continuous data flows in large scale sensing networks. While dealing with big sensing data streams, a Data Stream Manager (DSM) must always verify the security (i.e. authenticity, integrity, and confidentiality) to ensure end-to-end security and maintain data quality. Existing technologies are not suitable, because real time introduces delay in data stream. In this paper, we propose a Dynamic Prime Number Based Security Verification (DPBSV) scheme for big data streams. Our scheme is based on a common shared key that updated dynamically by generating synchronized prime numbers. The common shared key updates at both ends, i.e., source sensing devices and DSM, without further communication after handshaking. Theoretical analyses and experimental results of our DPBSV scheme show that it can significantly improve the efficiency of verification process by reducing the time and utilizing a smaller buffer size in DSM. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
24. A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud.
- Author
-
Zhang, Xuyun, Liu, Chang, Nepal, Surya, Yang, Chi, Dou, Wanchun, and Chen, Jinjun
- Subjects
- *
DATA analysis , *CLOUD computing , *COMPUTER networks , *SCALABILITY , *HYBRID computers (Computer architecture) , *DISTRIBUTED computing , *COMPUTER systems - Abstract
Abstract: In big data applications, data privacy is one of the most concerned issues because processing large-scale privacy-sensitive data sets often requires computation resources provisioned by public cloud services. Sub-tree data anonymization is a widely adopted scheme to anonymize data sets for privacy preservation. Top–Down Specialization (TDS) and Bottom–Up Generalization (BUG) are two ways to fulfill sub-tree anonymization. However, existing approaches for sub-tree anonymization fall short of parallelization capability, thereby lacking scalability in handling big data in cloud. Still, either TDS or BUG individually suffers from poor performance for certain valuing of k-anonymity parameter. In this paper, we propose a hybrid approach that combines TDS and BUG together for efficient sub-tree anonymization over big data. Further, we design MapReduce algorithms for the two components (TDS and BUG) to gain high scalability. Experiment evaluation demonstrates that the hybrid approach significantly improves the scalability and efficiency of sub-tree anonymization scheme over existing approaches. [Copyright &y& Elsevier]
- Published
- 2014
- Full Text
- View/download PDF
25. Benchmarking and Performance Modelling of MapReduce Communication Pattern
- Author
-
Sheriffo Ceesay, Adam Barker, Yuhui Lin, Chen, Jinjun, Yang, Laurence T., EPSRC, and University of St Andrews. School of Computer Science
- Subjects
Big Data ,QA75 ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,0209 industrial biotechnology ,Computer science ,QA75 Electronic computers. Computer science ,Big data ,Cloud computing ,02 engineering and technology ,Modelling ,3rd-NDAS ,Machine Learning (cs.LG) ,Set (abstract data type) ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,MapReduce ,Computer Science - Performance ,Communication Pattern ,business.industry ,020208 electrical & electronic engineering ,Workload ,Benchmarking ,Performance (cs.PF) ,Range (mathematics) ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer engineering ,Distributed, Parallel, and Cluster Computing (cs.DC) ,business - Abstract
Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The complexity of the low-level internals of big data frameworks and the ubiquity of application and workload configuration parameters makes it challenging and expensive to come up with comprehensive performance modelling solutions. In this paper, instead of focusing on a wide range of configurable parameters, we studied the low-level internals of the MapReduce communication pattern and used a minimal set of performance drivers to develop a set of phase level parametric models for approximating the execution time of a given application on a given cluster. Models can be used to infer the performance of unseen applications and approximate their performance when an arbitrary dataset is used as input. Our approach is validated by running empirical experiments in two setups. On average the error rate in both setups is plus or minus 10% from the measured values., Comment: 8 pages, 10 figures
- Published
- 2020
- Full Text
- View/download PDF
26. A Datalog Engine for Iterative Graph Algorithms on Large Clusters
- Author
-
Marek Rogala, Michał Adamczyk, Jacek Sroka, Jan Hidders, Yang, Laurence T., Chen, Jinjun, Informatics and Applied Informatics, and Web and Information System Engineering
- Subjects
Big Data ,Theoretical computer science ,Shuffling ,spark ,business.industry ,Iterative method ,Computer science ,Computer Networks and Communications ,Big data ,datalog ,Skew ,distributed computation ,Query language ,graph processing ,Datalog ,Theoretical Computer Science ,Computational Theory and Mathematics ,Analytics ,business ,Cluster analysis ,computer ,computer.programming_language - Abstract
Distributed computations on graphs gained importance with the emergence of large graphs, e.g., in the web or social networks. Frameworks like Hadoop, Giraph and Spark are used for their processing. Yet, they require advanced programming techniques to minimize skew and data shuffling. Declarative, query-like, but at the same time efficient solutions like Pig for general purpose analytics are lacking. In this paper we promote the use of declarative datalog with aggregation for large graph processing. We presents an implementation which extends tApache Spark with the capability of executing datalog queries. This approach makes it possible to express graph algorithms in a well studied declarative query language and execute them on an existing and mature infrastructure for distributed computation. At the same time the data processed with datalog queries is fully integrated with the caching mechanism of Spark and can be part of a larger iterative algorithm.
- Published
- 2015
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.