Descriptor: "Data deduplication" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Data deduplication"' showing total 2,291 results

Start Over Descriptor "Data deduplication"

2,291 results on '"Data deduplication"'

1. On Customer Data Deduplication - Research vs. Industrial Perspective : Lessons Learned from a R&D Project in the Financial Sector

Author: Andrzejewski, Witold, Bębel, Bartosz, Boiński, Paweł, Wrembel, Robert, Ghosh, Ashish, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Tekli, Joe, editor, Gamper, Johann, editor, Chbeir, Richard, editor, Manolopoulos, Yannis, editor, Sassi, Salma, editor, Ivanovic, Mirjana, editor, Vargas-Solar, Genoveva, editor, and Zumpano, Ester, editor
Published: 2025
Full Text: View/download PDF

2. The Design of Fast Delta Encoding for Delta Compression Based Storage Systems.

Author: Tan, Haoliang, Xia, Wen, Zou, Xiangyu, Deng, Cai, Liao, Qing, and Gu, Zhaoquan
Subjects: DATA reduction, CONSTRUCTION cost estimates, ENCODING, SYNCHRONIZATION, SPEED
Abstract: Delta encoding is a data reduction technique capable of calculating the differences (i.e., delta) among very similar files and chunks. It is widely used for various applications, such as synchronization replication, backup/archival storage, cache compression, and so on. However, delta encoding is computationally costly due to its time-consuming word-matching operations for delta calculation. Existing delta encoding approaches either run at a slow encoding speed, such as Xdelta and Zdelta, or at a low compression ratio, such as Ddelta and Edelta. In this article, we propose Gdelta, a fast delta encoding approach with a high compression ratio. The key idea behind Gdelta is the combined use of five techniques: (1) employing an improved Gear-based rolling hash to replace Adler32 hash for fast scanning overlapping words of similar chunks, (2) adopting a quick array-based indexing for word-matching, (3) applying a sampling indexing scheme to reduce the cost of traditional building full indexes for base chunks' words, (4) skipping unmatched words to accelerate delta encoding through non-redundant areas, and (5) last but not least, after word-matching, further batch compressing the remainder to improve the compression ratio. Our evaluation results driven by seven real-world datasets suggest that Gdelta achieves encoding/decoding speedups of 3.5X∼25X over the classic Xdelta and Zdelta approaches while increasing the compression ratio by about 10%∼240%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Modelling a Request and Response-Based Cryptographic Model For Executing Data Deduplication in the Cloud.

Author: Kumar, Doddi Suresh and Srinivasu, Nulaka
Subjects: DATA privacy, ELLIPTIC curve cryptography, INTERSTELLAR communication, ACCESS control, DATA warehousing, CLOUD storage
Abstract: Cloud storage is one of the most crucial components of cloud computing because it makes it simpler for users to share and manage their data on the cloud with authorized users. Secure deduplication has attracted much attention in cloud storage because it may remove redundancy from encrypted data to save storage space and communication overhead. Many current safe deduplication systems usually focus on accomplishing the following characteristics regarding security and privacy: Access control, tag consistency, data privacy and defence against various attacks. But as far as we know, none can simultaneously fulfil all four conditions. In this research, we offer a safe deduplication method that is effective and provides user-defined access control to address this flaw. Because it only allows the cloud service provider to grant data access on behalf of data owners, our proposed solution (Request-response-based Elliptic Curve Cryptography) may effectively delete duplicates without compromising the security and privacy of cloud users. A thorough security investigation reveals that our approved safe deduplication solution successfully thwarts brute-force attacks while dependably maintaining tag consistency and data confidentiality. Comprehensive simulations show that our solution surpasses the evaluation in computing, communication, storage overheads, and deduplication efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Secure Deduplication Method with Blockchain-based Smart Contract for Heterogeneous Cloud Servers.

Author: JIANG Lin, LI Jiaxing, and WU Jigang
Abstract: In the era of big data in response to the conflict between the reliability enhancement of user data in cloud server storage and the strategy for removing duplicate data, a heterogeneous server data security deduplication method was proposed blockchain-based smart contracts. Leveraging the decentralized, tamper-proof, and publicly transparent characteristics of blockchain, as well as the automation capabilities of smart contracts, this method could achieve data storage security, reliability, and privacy protection. Specifically, the method combined secret sharing and blockchain smart contract technology to design a secure and efficient cloud storage data deduplication service. Moreover, by replacing the role of centralized third-party entities with blockchain and mitigating server heterogeneity through smart contract scripts, potential security risks were eliminated. Experimental results demonstrated that, under various scenarios with different file sizes and block quantities, the average computational overhead of this method was 65.42% to 115.77% lower than the comparative solutions, and the average storage overhead was 7.94% to 19.50% lower. Additionally, for varying numbers of heterogeneous storage servers, this method exhibited significantly lower average computational and storage overhead, with reductions of 67.27% to 177.89% and 34.01 % to 72.89%, respectively. Therefore, the proposed approach could outperform two existing blockchain-based deduplication method in terms of security, computational and storage efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Graph Deep Active Learning Framework for Data Deduplication

Author: Huan Cao, Shengdong Du, Jie Hu, Yan Yang, Shi-Jinn Horng, and Tianrui Li
Subjects: data deduplication, active learning, similarity calculation, Electronic computers. Computer science, QA75.5-76.95
Abstract: With the advent of the era of big data, an increasing amount of duplicate data are expressed in different forms. In order to reduce redundant data storage and improve data quality, data deduplication technology has never become more significant than nowadays. It is usually necessary to connect multiple data tables and identify different records pointing to the same entity, especially in the case of multi-source data deduplication. Active learning trains the model by selecting the data items with the maximum information divergence and reduces the data to be annotated, which has unique advantages in dealing with big data annotations. However, most of the current active learning methods only employ classical entity matching and are rarely applied to data deduplication tasks. To fill this research gap, we propose a novel graph deep active learning framework for data deduplication, which is based on similarity algorithms combined with the bidirectional encoder representations from transformers (BERT) model to extract the deep similarity features of multi-source data records, and first introduce the graph active learning strategy to build a clean graph to filter the data that needs to be labeled, which is used to delete the duplicate data that retain the most information. Experimental results on real-world datasets demonstrate that the proposed method outperforms state-of-the-art active learning models on data deduplication tasks.
Published: 2024
Full Text: View/download PDF

6. A novel three-factor authentication and optimal mapreduce frameworks for secure medical big data transmission over the cloud with shaxecc.

Author: Rajeshkumar, K., Dhanasekaran, S., and Vasudevan, V.
Subjects: BIG data, OPTIMIZATION algorithms, DATA transmission systems, CLOUD storage, DATA privacy, BIOMETRIC identification, DATA protection, CLOUD computing
Abstract: Big Data (BD) is a concept that deals with enormous amounts of data storage, processing, and analysis. With the exponential advancement in the evolution of cloud computing domains in healthcare (HC), the security and confidentiality of medical records have evolved into a primary consideration for HC services and applications. There needs to be more than the present-day cryptosystems to address these troubles. Therefore, this paper introduces a novel Three-Factor Authentication (3FA) and optimal Map-Reduce (MR) framework for secure BD transmission over the cloud with Secure Hashing Authentication XOR-ed Elliptical Curve Cryptography (SHAXECC). The authentication procedure is initially carried out with the SHA-512 algorithm, which protects the network from unauthorized access. Next, data deduplication is done using the SHA-512 algorithm to eliminate duplicate files. After that, an optimal MR design is introduced to handle a large amount of BD. In an optimal MR, the mapper uses the Modified Fuzzy C-means (MFCM) clustering approach to initially form the BD clusters. Then, the reducer uses the Levy Flight and Scoring Mutation-based Chimp Optimization Algorithm (LSCOA) to form final BD clusters. Finally, the SHAXECC is used to transmit the data securely. Experiments are performed to compare the superiority of the proposed technique with the existing techniques in terms of some performance measures. The proposed approach outperformed other existing models concerning clustering and security measures. So, the proposed model is the best for data protection and privacy in cloud-enabled HC data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. A smart hybrid content-defined chunking algorithm for data deduplication in cloud storage.

Author: Ellappan, Manogar and Murugappan, Abirami
Subjects: *CLOUD storage, *PRIME numbers, *ALGORITHMS
Abstract: The enormous growth of data requires a large amount of storage space in the cloud server, which occupies mostly by the redundant data. The deduplication technique avoids redundancy to utilize cloud storage effectively. Chunking is the process to break the data into chunks, and determine duplicates. Many algorithms exist, to handle the deduplication efficiency, reducing the chunk variance, and improving the computational overhead continue to be a challenging task. To solve the above challenge, we propose a smart chunker (SC) algorithm, which operates with hybrid chunking based on file size as file-level and content-defined chunking (CDC). File-level chunking is assigned only for less than 2 KB file size, and the exceeding file size falls with CDC. We aim optimus prime chunking (OPC) in CDC to break the chunks with prime numbers and involves a minimum number of comparisons with fewer conditions and low computation overhead. This work reduces the processing time without hash and window for computation. Thus, it provides a constant average chunk size to distribute the equal chunk variance in the cloud storage with a 17% reduction in chunk time. Our OPC result attains high throughput of more than 3.3 × , compared to other CDC algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Provisioning an efficient data deduplication model for cloud storage and integrity

Author: Kumar, Doddi Suresh and Srinivasu, Nulaka
Published: 2024
Full Text: View/download PDF

9. CADC-FPRLE: Content-aware deduplication clustering analysis using file partitioning and a running length encoder for cloud storage optimization.

Author: Shakkeera, L., Dhiyanesh, B., Asha, A., and Kiruthiga, G.
Subjects: *CLOUD storage, *CLUSTER analysis (Statistics), *DOCUMENT clustering, *VECTOR spaces
Abstract: To address this storage issue, we propose a Content-Aware Deduplication Clustering Analysis for Cloud Storage Optimization (CADC-FPRLE) based on a file partitioning running length encoder. At first, preprocessing was done by indexing, counting terms, cleansing, and tokenizing. Further multi-objective clustering points are analysed based on the bisecting divisible partition block, which divides a set of documents. The count terms are filtered from the divisible blocks and make up the count terms content block. Using Content-Aware Multi-Hash Ensemble Clustering (CAMH-EC) to group the similar blocks into clusters. This creates a high-dimensional Euclidean interval to create the number of clusters, and points are performed randomly to set the initial collection. Then, the Magnitude Vector Space Rate (MVSR) estimates the similarity distance between the groups to select the highest scatter value content for indexing. Finally, the Running Block Parity Encoder (RBPE) generates similarity parity in order to reduce the content to a redundant, singularized file in order to optimise storage. This implementation proves a higher level of storage optimization compared to the previous system than other methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Secure cloud storage auditing with deduplication and efficient data transfer.

Author: Yu, Jingze and Shen, Wenting
Subjects: *CLOUD storage, *OPTICAL disks, *AUDITING, *MERGERS & acquisitions
Abstract: To guarantee the integrity of cloud data, plenty of cloud storage auditing schemes are proposed. In cloud storage, when a company is purchased by another company, the corresponding data of the acquired company will be transferred to the acquiring company. In addition, there may be duplicate files between the acquiring company and the acquired company. To solve the above problems, we propose a secure cloud storage auditing scheme with deduplication and efficient data transfer. We design a novel signature transformation method, in which the signatures of the acquired company can be efficiently transformed into the signatures of the acquiring company with the assistance of the cloud. Using the above method, the acquiring company does not require to recompute the data signatures for the transferred data when cloud data is transferred. Furthermore, to improve the efficiency of cloud storage, we use data deduplication method, in which the cloud stores only a single copy of the duplicate files. The security proof and performance analysis show that the proposed scheme is secure and efficient. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. A novel model for enhancing cloud security and data deduplication using fuzzy and refraction learning based chimp optimization.

Author: Thottipalayam Andavan, Mohanaprakash, Parameswari, M., Subramanian, Nalini, and Vairaperumal, Nirmalrani
Abstract: Recently, the digitalization process generates an enormous amount of multimedia data that turn out to be further difficult to manage. The current developments in big data technology and the cloud computing (CC) field produce massive growth in cloud data. The accessible memory space was used by the enormous replication data and generates the highest computation cost which is the most important problem in the constrained cloud storage space. Therefore, in this study, we introduced a novel secure cloud data deduplication (SCDD) approach to improve data security and data storage of the cloud environment by generating optimal key and deduplicating files respectively. The proposed approach mainly focuses on reducing computational cost and memory utilization of the cloud application. Here, the utilized data files are encrypted using the proxy re-encryption approach, and the refraction learning-based chimp optimization (RL-CO) algorithm is utilized for optimal key generation process as a result it guarantees better cloud security to the end users to store data on the cloud. Subsequently, the optimal verified fuzzy keyword search (OVFKS) approach is proposed to eliminate duplicate files or copies of actual data thereby enhancing the cloud storage space considerably. The proposed secure cloud data deduplication-based optimal verified fuzzy keyword search (SCDD-OVFKS)approach utilizes three different data files namely android application data, audio files, and mixed application data files, audio files, and other relevant files as the input. Furthermore, the proposed approach's performance is validated using different performance measures namely computational time, computational cost, search time cost, memory utilization, and deduplication rate by examining other state-of-art approaches. As a result, the proposed SCDD-OVFKS approach achieves a maximum deduplication rate of about 28.6% for 8 MB along with minimum computational cost and reduced memory utilization than other state-of-art approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Efficient Data Aggregation and Duplicate Removal Using Grid-Based Hashing in Cloud-Assisted Industrial IoT

Author: Saleh M. Altowaijri
Subjects: Industrial Internet of Things (IIoT), extended Merkle grid, data deduplication, data aggregation, hash grid, cloud computing, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Industrial Internet of Things (IIoT) involves the incorporation of sensors, devices, and equipment with internet connectivity and data processing abilities. This connectivity allows sensors to collect and exchange data and communicate with each other. The proliferation of sensors and data-producing devices in the industrial sector has led to high volumes of data, causing data duplication and other issues in storage, processing, and resource consumption. To address data duplication becomes vital to guarantee effective operation and resource optimization. Existing data deduplication techniques in IIoT environments often struggle with efficiency and scalability. Current deduplication approaches may involve frequent and unnecessary hash generation for minor variations in sensor readings, resulting in excessive computational overhead, higher energy consumption, and reduced network lifetime. These limitations highlight the need to propose the Grid Hashing-based Efficient Data Aggregation (GH-EDA) scheme, a comprehensive solution that uses effective data aggregation, preprocessing, and region splitting, and employs an Extended Merkle Grid for efficient deduplication. The scheme begins with the aggregation of sensor data, followed by preprocessing steps to filter out irrelevant or noisy data. Subsequently, the data is partitioned into regions and refined to improve resource utilization, thereby enabling fast duplicate detection while minimizing the number of comparisons. Key features of the proposed scheme include a threshold-based approach to hash generation, guaranteeing that only substantial changes produce new hash values. Extensive simulations are conducted using Network Simulator-3. The performance of the proposed scheme is evaluated using metrics such as space reduction, search time, network lifetime, computation time, average latency, and energy utilization. Comparisons with existing techniques demonstrate the superior performance of the GH-EDA.
Published: 2024
Full Text: View/download PDF

13. Chunk2vec: A novel resemblance detection scheme based on Sentence‐BERT for post‐deduplication delta compression in network transmission

Author: Chunzhi Wang, Keguan Wang, Min Li, Feifei Wei, and Neal Xiong
Subjects: data deduplication, deep learning, delta compression, Natural Language Processing, network transmission, resemblance detection, Telecommunication, TK5101-6720
Abstract: Abstract Delta compression, as a complementary technique for data deduplication, has gained widespread attention in network storage systems. It can eliminate redundant data between non‐duplicate but similar chunks that cannot be identified by data deduplication. The network transmission overhead between servers and clients can be greatly reduced by using data deduplication and delta compression techniques. Resemblance detection is a technique that identifies similar chunks for post‐deduplication delta compression in network storage systems. The existing resemblance detection approaches fail to detect similar chunks with arbitrary similarity by setting a similarity threshold, which can be suboptimal. In this paper, the authors propose Chunk2vec, a resemblance detection scheme for delta compression that utilizes deep learning techniques and Approximate Nearest Neighbour Search technique to detect similar chunks with any given similarity range. Chunk2vec uses a deep neural network, Sentence‐BERT, to extract an approximate feature vector for each chunk while preserving its similarity with other chunks. The experimental results on five real‐world datasets indicate that Chunk2vec improves the accuracy of resemblance detection for delta compression and achieves higher compression ratio than the state‐of‐the‐art resemblance detection technique.
Published: 2024
Full Text: View/download PDF

14. Chunk2vec: A novel resemblance detection scheme based on Sentence‐BERT for post‐deduplication delta compression in network transmission.

Author: Wang, Chunzhi, Wang, Keguan, Li, Min, Wei, Feifei, and Xiong, Neal
Subjects: *IMAGE compression, *DEEP learning, *NATURAL language processing
Abstract: Delta compression, as a complementary technique for data deduplication, has gained widespread attention in network storage systems. It can eliminate redundant data between non‐duplicate but similar chunks that cannot be identified by data deduplication. The network transmission overhead between servers and clients can be greatly reduced by using data deduplication and delta compression techniques. Resemblance detection is a technique that identifies similar chunks for post‐deduplication delta compression in network storage systems. The existing resemblance detection approaches fail to detect similar chunks with arbitrary similarity by setting a similarity threshold, which can be suboptimal. In this paper, the authors propose Chunk2vec, a resemblance detection scheme for delta compression that utilizes deep learning techniques and Approximate Nearest Neighbour Search technique to detect similar chunks with any given similarity range. Chunk2vec uses a deep neural network, Sentence‐BERT, to extract an approximate feature vector for each chunk while preserving its similarity with other chunks. The experimental results on five real‐world datasets indicate that Chunk2vec improves the accuracy of resemblance detection for delta compression and achieves higher compression ratio than the state‐of‐the‐art resemblance detection technique. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Time-conserving deduplicated data retrieval framework for the cloud computing environment.

Author: Swathika, P. and Sekar, J. Raja
Subjects: INFORMATION retrieval, CLOUD storage, CLOUD computing, DATA warehousing, JOB performance, PERFORMANCE standards
Abstract: Cloud computing technology is quite inevitable in today's smart world. The excessive utilization of data mandates updated storage space, which is highly expensive and cloud storage is the best solution to it. As charges are levied for the utilized space, data redundancy must be avoided for the effective exploitation of cloud space. Data deduplication is a technique, which removes redundant data and conserves storage, bandwidth and charges. However, data retrieval upon deduplicated data is not well explored in the existing literature. This work attempts to present an effective retrieval framework for deduplicated data in a cloud environment by presenting two protocols namely Data Outsourcing Protocol (DOP) and Data Retrieval Protocol (DRP). The retrieval performance of the proposed approach is tested and compared with the existing approaches in terms of standard performance measures. The work performance of the proposed Deduplicated Data Retrieval (DDR) framework performs better in terms of retrieval precision, recall and time conservation rates. The average precision and recall rates attained by the proposed work are 97.9% and 95.75% respectively. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. Time-conserving deduplicated data retrieval framework for the cloud computing environment

Author: P. Swathika and J. Raja Sekar
Subjects: Cloud, data retrieval, data deduplication, data outsourcing, deduplicated data retrieval, Control engineering systems. Automatic machinery (General), TJ212-225, Automation, T59.5
Abstract: Cloud computing technology is quite inevitable in today’s smart world. The excessive utilization of data mandates updated storage space, which is highly expensive and cloud storage is the best solution to it. As charges are levied for the utilized space, data redundancy must be avoided for the effective exploitation of cloud space. Data deduplication is a technique, which removes redundant data and conserves storage, bandwidth and charges. However, data retrieval upon deduplicated data is not well explored in the existing literature. This work attempts to present an effective retrieval framework for deduplicated data in a cloud environment by presenting two protocols namely Data Outsourcing Protocol (DOP) and Data Retrieval Protocol (DRP). The retrieval performance of the proposed approach is tested and compared with the existing approaches in terms of standard performance measures. The work performance of the proposed Deduplicated Data Retrieval (DDR) framework performs better in terms of retrieval precision, recall and time conservation rates. The average precision and recall rates attained by the proposed work are 97.9% and 95.75% respectively.
Published: 2023
Full Text: View/download PDF

17. Fuzzy Data Deduplication at Edge Nodes in Connected Environments

Author: Yakhni, Sylvana, Tekli, Joe, Mansour, Elio, Chbeir, Richard, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Younas, Muhammad, editor, Awan, Irfan, editor, and Grønli, Tor-Morten, editor
Published: 2023
Full Text: View/download PDF

18. Elimination and Restoring Deduplicated Storage for Multilevel Integrated Approach with Cost Estimation

Author: Antony Xavier Bronson, Francis, Francis Jency, Xavier, Sai Shanmugaraja, Vairamani, Elumalai, Saravanan, Senthil Velan, Gowrishankar, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Marriwala, Nikhil, editor, Tripathi, C.C., editor, Jain, Shruti, editor, and Kumar, Dinesh, editor
Published: 2023
Full Text: View/download PDF

19. Blockchain‐based data deduplication using novel content‐defined chunking algorithm in cloud environment.

Author: Prakash J, Jabin, K, Ramesh, K, Saravanan, and Prabha G, Lakshmi
Subjects: DATA protection, CLOUD storage, ALGORITHMS, DATA warehousing, DATA integrity, BLOCKCHAINS, INFORMATION storage & retrieval systems
Abstract: The cloud environment is inherently dynamic as users are added immensely in a short duration. It is indeed difficult to manage such user profiles and associated data. Meanwhile, the cloud data expand at a twofold‐to‐threefold rate on average, making storage space management and data integrity maintenance a mandatory task but also risky. The main approaches for addressing these data maintenance challenges in a cloud context are deduplication and data protection. In order to manage storage space, finding and removing identical copies of the same data from the cloud are possible, resulting in a reduction in the amount of storage space needed. Furthermore, duplicate copies are considerably reduced in cloud storage owing to data deduplication. Here, a decentralized ledger public blockchain network is introduced to protect the Integrity of data stored in cloud storage. This research proposes data deduplication using speedy content‐defined Chunking (SpeedyCDC) algorithm in the public blockchain. Many people and businesses outsource sensitive data to remote cloud servers because it considerably eliminates the hassle of managing software and infrastructure. However, the ownership and control rights of users data are nonetheless divided because it is outsourced to cloud storage providers (CSPs) and kept on a distant cloud. As a result, users have a great deal of difficulty in verifying the Integrity of sensitive data. Analysis using datasets from Geospatial Information Systems (GIS) revealed that the throughput increased by 5%–6% over that of the fastCDC technique, which offered Integrity since a blockchain network secured it. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. LaDy: Enabling Locality-aware Deduplication Technology on Shingled Magnetic Recording Drives.

Author: JUNG-HSIU CHANG, TZU-YU CHANG, YI-CHAO SHIH, and TSENG-YI CHEN
Subjects: OPTICAL disks, TRAFFIC violations, REFUSE collection, DATA warehousing
Abstract: The continuous increase in data volume has led to the adoption of shingled-magnetic recording (SMR) as the primary technology for modern storage drives. This technology offers high storage density and low unit cost but introduces significant performance overheads due to the read-update-write operation and garbage collection (GC) process. To reduce these overheads, data deduplication has been identified as an effective solution as it reduces the amount of written data to an SMR-based storage device. However, deduplication can result in poor data locality, leading to decreased read performance. To tackle this problem, this study proposes a data locality-aware deduplication technology, LaDy, that considers both the overheads of writing duplicate data and the impact on data locality to determine whether the duplicate data should be written. LaDy integrates with DiskSim, an open-source project, and modifies it to simulate an SMR-based drive. The experimental results demonstrate that LaDy can significantly reduce the response time in the best-case scenario by 87.3% compared with CAFTL on the SMR drive. LaDy achieves this by selectively writing duplicate data, which preserves data locality, resulting in improved read performance. The proposed solution provides an effective and efficient method for mitigating the performance overheads associated with data deduplication in SMR-based storage devices. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

21. Using fuzzy reasoning to improve redundancy elimination for data deduplication in connected environments.

Author: Yakhni, Sylvana, Tekli, Joe, Mansour, Elio, and Chbeir, Richard
Subjects: *INTERNET of things, *CLOUD storage, *DATA warehousing, *ENERGY consumption, *WAREHOUSES, *CYBER physical systems, *WIRELESS sensor networks
Abstract: The Internet of Things is ushering in the era of connected environments where the number and diversity of data sources (devices and sensors) are inevitably increasing the size of the data that need to be stored locally (at the edge device level) and transmitted to base storages (at the sink level) of the network. This huge amount of data highlights several challenges including network bandwidth, consumption of network energy, cloud storage, and I/O throughput. These call for data pre-processing and filtering solutions to reduce the amount of data being handled and transmitted over the network. In this study, we investigate data deduplication as a prominent pre-processing method that can be used and adapted to address such challenges. Data deduplication techniques have been traditionally developed for data storage and data warehousing applications and aim at identifying and eliminating redundant data items. Few recent approaches have been designed for connected environments, yet they share various limitations, including: (i) detecting duplicates at one level only of the network (either edge or sink exclusively), (ii) overlooking the context and dynamicity of the network (disregarding device mobility and overlooking boundary separations and sensor coverage areas), (iii) relying on crisp thresholds and providing minimum-to-no expert control over the deduplication process (disregarding the domain expert's needs in defining redundancy). In this study, we propose FREDD, a new approach for Fuzzy Redundancy Elimination for Data Deduplication in a connected environment. FREDD uses simple natural language rules to represent domain knowledge and expert preferences regarding data duplication boundaries. It then applies pattern codes and fuzzy reasoning to detect duplicates at both the edge level and the sink level of the network. This reduces the time required to hard-code the deduplication process, while adapting to the domain expert's needs for different data sources and applications. Moreover, FREDD is adapted for multiple scenarios, considering both static and mobile devices, with different configurations of hard-separated and soft-separated zones, and different sensor coverage areas in the connected environment. Experiments on a real-world dataset highlight FREDD's potential and improvement compared with existing solutions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

22. SACRO : Solid state drive‐assisted chunk caching for restore optimization.

Author: Dagnaw, Girum, Zhou, Ke, and Wang, Hua
Subjects: SOLID state drives, CACHE memory
Abstract: Better duplicate elimination performance causes higher fragmentation which leads to degraded restore performance. As a result, restore performance needs to be optimized either through strengthening locality by selective rewriting and/or making the best use of the limited available memory through cache optimization. In this paper, we explore SACRO, SSD Assitsted Chunk Caching for Restore Optimization. It avoids the need to repetitively access chunk containers in disk by using SSD (Solid State Drives) as a secondary chunk cache. An FRT (Future Reference Table) is constructed from the recipe of a backup stream and a Future Reference Count entry in the FRT is utilized to assign priorities to chunks as they are accessed during restoration. These priority values coupled with access distances are used to decide to cache a chunk either in memory or the SSD. The restore performance of SACRO is shown to significantly outperform other restoring approaches which utilize chunk caching. For one dataset, at 4 MB cache size and 4 MB Look Ahead Window size its restore factor is 3.5 MB/container_access while ALACC and LRU record 3.4 MB/container_access and 0.9 MB/container_access respectively. Moreover, SACRO can be integrated to already existing deduplication systems with very limited amount of modification done on the deduplication system. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

23. EDocDeDup: Electronic Document Data Deduplication Towards Storage Optimization.

Author: Me Me Khaing and Jeyanthi, N.
Subjects: ELECTRONIC records, PDF (Computer file format), TEXT files, DATA warehousing, MATHEMATICAL optimization, STORAGE
Abstract: Understanding data deduplication in storage is essential for investigating the optimization of various data storage issues. For detecting and removing duplicate data, data deduplication has become an important and cost-effective optimization technique. Storage issues in the storage area for organizations exist, and if not conveyed to optimize there and then, a slower rate of storage capacity is expected. The proposed system (EdocDedup) addresses the aforementioned issue by applying data deduplication technique and implementing SHA-256 for hash value calculation and only keeping the unique hash values on an electronic document’s dataset containing word files, text files, html files, excel files, zip files, pdf files, and PowerPoint presentation files. By demanding the proposed technique there is a benefit in storage saved and a variety of duplicate files are explored efficiently. EdocDedup's performance is achieved through the use of useruploaded files. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. Double Sliding Window Chunking Algorithm for Data Deduplication in Ocean Observation

Author: Shuai Guo, Xiaodong Mao, Meng Sun, and Shuang Wang
Subjects: Double sliding window, ocean observation, data deduplication, content defined chunking, markov chain, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: As an essential means to eliminate redundant data, data deduplication technology significantly affects today’s era of massive data growth. In recent years, due to the rapid development of a series of related industries, such as marine monitoring, the marine monitoring data has exploded, leading to higher storage costs for marine observation stations. In the face of the surge in data size, we first think of using data deduplication technology to reduce the stored data to save storage costs. However, we have many choices for data deduplication technology. Because-block level data deduplication technology can better complete the task, and the core technology of block-level data deduplication technology is how to cut data blocks, this paper proposes a dual sliding window-based segmentation technology. The structure of double sliding windows makes the divided data block size more average to reduce the consumption of the fingerprint table in memory. At the same time, we add a prediction algorithm to the data deduplication system to predict the cutting point of the data block to improve the cutting efficiency. In addition, we propose a more accurate calculation method of the deduplication ratio, which can more accurately compare the algorithm’s performance and obtain the final experimental results of this paper by using this calculation method. Moreover, we propose a model based on Markov prediction to store massive ocean data, which can save more resources. At the end of the article, we compared the commonly used segmentation algorithms through careful experiments. Finally, we obtained and will use the public dataset experiment to compare the same checking rate at the end of this article.
Published: 2023
Full Text: View/download PDF

25. Blockchain-Based Integrity Auditing with Secure Deduplication in Cloud Storage

Author: Wang, Yuhua, Tang, Xin, Zhou, Yiteng, Chen, Xiguang, Zhu, Yudan, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Tan, Ying, editor, and Shi, Yuhui, editor
Published: 2022
Full Text: View/download PDF

26. Data Integration, Cleaning, and Deduplication: Research Versus Industrial Projects

Author: Wrembel, Robert, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Pardede, Eric, editor, Delir Haghighi, Pari, editor, Khalil, Ismail, editor, and Kotsis, Gabriele, editor
Published: 2022
Full Text: View/download PDF

27. Novel Modeling of Efficient Data Deduplication for Effective Redundancy Management in Cloud Environment

Author: Anil Kumar, G., Shantala, C. P., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Jacob, I. Jeena, editor, Kolandapalayam Shanmugam, Selvanayaki, editor, and Bestak, Robert, editor
Published: 2022
Full Text: View/download PDF

28. The Analysis and Implication of Data Deduplication in Digital Forensics

Author: Savić, Izabela, Lin, Xiaodong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Meng, Weizhi, editor, and Conti, Mauro, editor
Published: 2022
Full Text: View/download PDF

29. Image and Text Encrypted Data with Authorized Deduplication in Cloud.

Author: Borade, Shubham, Khan, Abdulrehman, Khan, Abdullah, Sayyed, Afridi, and Kedar, Ranjana M.
Subjects: CLOUD computing, DATA encryption, BANDWIDTHS, DATA transmission systems, CIPHERS, DATA integrity
Abstract: With the advent of cloud computing, secured data deduplication has gained a lot of popularity. Many techniques have been pro-posed in the literature of this ongoing research area. Among these techniques, the Message Locked Encryption (MLE) scheme is often mentioned. Researchers have introduced MLE based protocols which provide secured deduplication of data, where the data is generally in text form. As a result, multimedia data such as images and video, which are larger in size compared to text files, have not been given much attention. Applying secured data deduplication to such data files could significantly reduce the cost and space required for their storage. In this paper we present a secure deduplication scheme for near identical (NI) images using the Dual Integrity Convergent Encryption (DICE) protocol, which is a variant of the MLE based scheme. In the proposed scheme, an image is decomposed into blocks and the DICE protocol is applied on each block separately rather than on the entire image. As a result, the blocks that are common between two or more NI images are stored only once at the cloud. We provide detailed analyses on the theoretical, experimental and security aspects of the proposed scheme. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

30. InDe: An Inline Data Deduplication Approach via Adaptive Detection of Valid Container Utilization.

Author: LIFANG LIN, YUHUI DENG, YI ZHOU, and YIFENG ZHU
Subjects: GREEDY algorithms, CONTAINERS
Abstract: Inline deduplication removes redundant data in real-time as data is being sent to the storage system. However, it causes data fragmentation: logically consecutive chunks are physically scattered across various containers after data deduplication. Many rewrite algorithms aim to alleviate the performance degradation due to fragmentation by rewriting fragmented duplicate chunks as unique chunks into new containers. Unfortunately, these algorithms determine whether a chunk is fragmented based on a simple pre-set fixed value, ignoring the variance of data characteristics between data segments. Accordingly, when backups are restored, they often fail to select an appropriate set of old containers for rewrite, generating a substantial number of invalid chunks in retrieved containers. To address this issue, we propose an inline deduplication approach for storage systems, called InDe, which uses a greedy algorithm to detect valid container utilization and dynamically adjusts the number of old container references in each segment. InDe fully leverages the distribution of duplicated chunks to improve the restore performance while maintaining high backup performance. We define an effectiveness metric, valid container referenced counts (VCRC), to identify appropriate containers for the rewrite. We design a rewrite algorithm F-greedy that detects valid container utilization to rewrite low-VCRC containers. According to the VCRC distribution of containers, F-greedy dynamically adjusts the number of old container references to only share duplicate chunks with high-utilization containers for each segment, thereby improving the restore speed. To take full advantage of the above features, we further propose another rewrite algorithm called F-greedy+ based on adaptive interval detection of valid container utilization. F-greedy+ makes a more accurate estimation of the valid utilization of old containers by detecting trends of VCRC's change in two directions and selecting referenced containers in the global scope. We quantitatively evaluate InDe using three real-world backup workloads. The experimental results show that compared with two state-of-the-art algorithms (Capping and SMR), our scheme improves the restore speed by 1.3×-2.4× while achieving almost the same backup performance. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

31. A secure and efficient data deduplication framework for the internet of things via edge computing and blockchain.

Author: Wu, Zeng, Huang, Hui, Zhou, Yuping, and Wu, Chenhuang
Subjects: *EDGE computing, *INTERNET of things, *LABEL design, *BLOCKCHAINS, *TRUST, *PROBLEM solving
Abstract: Data deduplication can solve the problem of resource wastage caused by duplicated data. However, due to the limited resources of Internet of Things (IoT) devices, applying data deduplication to IoT scenarios is challenging. Existing data deduplication frameworks for the IoT are prone to inefficiency or trust crises due to the random allocation of edge computing nodes. Furthermore, side-channel attacks remain a risk. In addition, after IoT devices store data in the cloud through data deduplication, they cannot share their data efficiently. In this paper, we propose a secure and efficient data deduplication framework for the IoT based on edge computing and blockchain technologies. In this scheme, we propose a model based on parallel use of three-layer and two-layer architectures and introduce the RAndom REsponse (RARE) scheme to resist side-channel attacks. We also design a label tree to realise one-to-many data-sharing, which improves efficiency and meets the needs of the IoT. In addition, we use blockchain to resist collusion attacks. Experiments were conducted to demonstrate that our framework has advantages over similar schemes in terms of communication cost, security and efficiency. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

32. Blockchain-based cross-user data shared auditing.

Author: Li, Angtai, Tian, Guohua, Miao, Meixia, and Gong, Jianpeng
Subjects: *DATA integrity, *AUDITING procedures, *AUDITING, *AUDITING fees, *INFORMATION sharing, *USER charges, *CLOUD storage
Abstract: In cloud storage, public auditing is a more popular data integrity verification technique since it allows users to delegate auditing tasks to a fully trusted third-party auditor (TPA). However, it is difficult to find such a TPA in practical application. Besides, the centralised auditing model makes TPA have to bear burdensome work pressure, which limits the practicability of existing schemes. In this paper, we firstly proposed a blockchain-based generalised shared auditing mechanism BCSA in the cross-user scenario, which aims at achieving available public auditing with a non-fully trusted TPA, and reducing the user's auditing fees and TPA's work pressure by allowing data users to share their auditing procedure with others. Furthermore, we initialise a concrete construction BCSAD with Diffie–Hellman protocol for the cross-user auditing scenario with different data. Likewise, we also propose a novel construction BCSAI for the cross-user auditing scenario with identical data, which utilises a password-authenticated key exchange (PAKE) protocol to achieve shared auditing and ciphertext deduplication, reducing data storage and auditing fees for data users and alleviating service pressure on the cloud server and TPA. Security and performance analysis evaluate the practicability of the proposed scheme. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. Improving Storage Efficiency with Multi-cluster Deduplication and Achieving High-Data Availability in Cloud

Author: Bhavya, M., Prakash, M., Thriveni, J., Venugopal, K. R., Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Venugopal, K. R., editor, Shenoy, P. Deepa, editor, Buyya, Rajkumar, editor, Patnaik, L. M., editor, and Iyengar, Sitharama S., editor
Published: 2021
Full Text: View/download PDF

34. Graph-Based Data Deduplication in Mobile Edge Computing Environment

Author: Luo, Ruikun, Jin, Hai, He, Qiang, Wu, Song, Zeng, Zilai, Xia, Xiaoyu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hacid, Hakim, editor, Kao, Odej, editor, Mecella, Massimo, editor, Moha, Naouel, editor, and Paik, Hye-young, editor
Published: 2021
Full Text: View/download PDF

35. Research on the Method of Eliminating Duplicated Encrypted Data in Cloud Storage Based on Generated Countermeasure Network

Author: Tang, Lai-feng, Wang, Qiang, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin (Sherman), Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Fu, Weina, editor, Xu, Yuan, editor, Wang, Shui-Hua, editor, and Zhang, Yudong, editor
Published: 2021
Full Text: View/download PDF

36. An Analysis and Comparative Study of Data Deduplication Scheme in Cloud Storage

Author: Pronika, Tyagi, S. S., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Joshi, Amit, editor, Khosravy, Mahdi, editor, and Gupta, Neeraj, editor
Published: 2021
Full Text: View/download PDF

37. Security Enhancement and Deduplication Using Zeus Algorithm Cloud

Author: Kumar, Abhishek, Ravishankar, S., Viji, D., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Hemanth, D. Jude, editor, Vadivu, G., editor, Sangeetha, M., editor, and Balas, Valentina Emilia, editor
Published: 2021
Full Text: View/download PDF

38. Big Data Analytics: Tools, Challenges, and Scope in Data-Driven Computing

Author: Vijesh Joe, C., Raj, Jennifer S., Smys, S., Chlamtac, Imrich, Series Editor, and Raj, Jennifer S., editor
Published: 2021
Full Text: View/download PDF

39. Towards Optimizing Deduplication on Persistent Memory

Author: Li, Yichen, He, Kewen, Wang, Gang, Liu, Xiaoguang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, He, Xin, editor, Shao, En, editor, and Tan, Guangming, editor
Published: 2021
Full Text: View/download PDF

40. FASR: An Efficient Feature-Aware Deduplication Method in Distributed Storage Systems

Author: Wenbin Yao, Mengyao Hao, Yingying Hou, and Xiaoyong Li
Subjects: Distributed system, data deduplication, feature aware, routing strategy, system overhead, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Deduplication technology can obtain higher space utilization by keeping only one duplicate. But in a distributed storage system, the overall deduplication ratio will be limited due to redundancy elimination across nodes. The traditional deduplication methods usually utilize data similarity and data locality to improve the deduplication ratio. However, higher system overhead is caused by frequent similarity calculations. To deal with this problem, this paper proposes a new Feature-Aware Stateful Routing method (FASR), aiming to reduce the system overhead and keep a high deduplication ratio in the distributed environment. Firstly, we design a feature-aware nodes selection strategy to choose similar nodes by extracting data feature and data distribution characteristics. This strategy will save the similarity calculation with the nodes that are not similar to the data. Then, we present a stateful routing algorithm to determine the target node using super-chunk and handprint technology. Meanwhile, the algorithm maintain load balance of the entire distributed system. Finally, the data is deduplicated locally based on similarity index and fingerprint cache. Extensive experiments demonstrate that FASR can reduce system overhead by around 30% at most and also effectively obtain a higher deduplication ratio.
Published: 2022
Full Text: View/download PDF

41. Deep CNN based online image deduplication technique for cloud storage system.

Author: Kaur, Ravneet, Bhattacharya, Jhilik, and Chana, Inderveer
Subjects: CLOUD storage, COMPUTER vision, DEEP learning, IMAGE registration, EXTRACTION techniques, IMAGE compression
Abstract: Online image detection is one of the most critical components of an image deduplication technique for an efficient cloud storage system. Although extensive research has been conducted in this field, the problem still remains challenging. Deep learning techniques have achieved significant success in solving a variety of computer vision issues and have high potential in image deduplication techniques. Deduplication is an efficient method in a cloud storage system that minimizes redundant data at the file or sub-file level using cryptographic hash signatures. Although significant research on offline image deduplication techniques have been reported, yet limited research is available on online image deduplication techniques. Online image matching accuracy and performance has been a major challenge for online image deduplication techniques to detect exact or near-exact images using feature extraction techniques. These first use feature extraction techniques to extract image features and then match these image features to detect duplicate images. In this paper, we have proposed a Deep CNN based online image deduplication technique for a cloud storage system to detect exact and near-exact images using cross-domains, even in the presence of perturbations in the form of blur, noise, compression, lighting variations and many more. The experimental results show that our proposed deep CNN for online image deduplication technique outperforms in terms of image matching accuracy and performance. The paper also proposed a Hot Decomposition Vector (HDV) for image patch generation to store efficiently dissimilar parts of near-exact images. The experimental results demonstrate that HDV exhibits higher and stable image matching accuracy in all three types of image deformations with relatively small computation time. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

42. Data Deduplication With Random Substitutions.

Author: Lou, Hao and Farnoud, Farzad
Subjects: *ALGORITHMS, *LARGE scale systems, *IMAGE compression, *DATA compression
Abstract: Data deduplication saves storage space by identifying and removing repeats in the data stream. Compared with traditional compression methods, data deduplication schemes are more computationally efficient and are thus widely used in large scale storage systems. In this paper, we provide an information-theoretic analysis of the performance of deduplication algorithms on data streams in which repeats are not exact. We introduce a source model in which probabilistic substitutions are considered. More precisely, each symbol in a repeated string is substituted with a given edit probability. Deduplication algorithms in both the fixed-length scheme and the variable-length scheme are studied. The fixed-length deduplication algorithm is shown to be unsuitable for the proposed source model as it does not take into account the edit probability. Two modifications are proposed and shown to have performances within a constant factor of optimal for a specific class of source models with the knowledge of model parameters. We also study the conventional variable-length deduplication algorithm and show that as source entropy becomes smaller, the size of the compressed string vanishes relative to the length of the uncompressed string, leading to high compression ratios. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

43. Performance Enhancement of SMR-Based Deduplication Systems.

Author: Wu, Chun-Feng, Kuo, Martin, Yang, Ming-Chang, and Chang, Yuan-Hao
Subjects: *HARD disks, *DATA warehousing, *OPTICAL disk drives
Abstract: Due to the fast-growing amount of data and cost consideration, shingled-magnetic-recording (SMR) drives are developed to provide low-cost and high-capacity data storage by enhancing the areal-density of hard disk drives, and (data) deduplication techniques are getting popular in data-centric applications to reduce the amount of data that need to be stored in storage devices by eliminating the duplicate data chunks. However, directly applying deduplication techniques on SMR drives could significantly decrease the runtime performance of the deduplication system because of the time-consuming SMR space reclamation caused by the sequential write constraint of SMR drives. In this article, an SMR-aware deduplication scheme is proposed to improve the runtime performance of SMR-based deduplication systems with the consideration of the sequential write constraint of SMR drives. Moreover, to bridge the information gap between the deduplication system and the SMR drive, the lifetime information of data chunks is extracted to separate data chunks of different lifetimes in different places of SMR drives, so as to further reduce the SMR space reclamation overhead. A series of experiments was conducted with a set of realistic deduplication workloads. The results show that the proposed scheme can significantly improve the runtime performance of the SMR-based deduplication system with limited system overheads. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

44. The changing scope of data quality and fit for purpose: evolution and adaption of a CRIS solution.

Author: Gurney, Thomas
Subjects: DATA quality, INFORMATION resources management
Abstract: The effectiveness of a Current Research Information System (CRIS) is based on satisfying essential institutional needs, or purposes, regarding the capture, processing and reporting of research-related activities and outcomes. These needs, or purposes, exemplified in the Code of Good Practice and introduced in 1998, have remained relatively constant over time. However, the scope and nature of the underlying data supporting these needs have grown in complexity, thus necessitating a concurrent increase in sophistication for how data quality is addressed and improved. This publication aims to introduce and analyse the implications and solutions required to improve data quality within the scope of fit for purpose in a CRIS context. Drawing from, and building on, data and information quality foundations and descriptions, a data quality framework is introduced, and detailed product functionality is described. A discussion on the combination of framework and functionality highlights how data quality can be improved in CRIS products. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

45. The evolution of the Pure Community Module: How lessons learned from national, regional, and subject-focused use cases have been used to support inter-institutional collaboration.

Author: Toon, James
Subjects: WEB services, CONSUMERS
Abstract: This abstract provides an update on development and implementation progress of the Pure community module in the 5 years since the initial launch of the service. The paper revisits the conclusions of our 2018 presentation [1] , providing updates on the 'future developments' and encourages a discussion of the complexities in supporting multi-institutional collaborations. In addition, we will provide an overview of some key challenge areas we have been working on together with community customers and community owners (technical and operational). [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

46. Enhancing Restore Speed of In-line Deduplication Cloud-Based Backup Systems by Minimizing Fragmentation

Author: Gayathri Devi, K., Raksha, S., Sooda, Kavitha, Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Satapathy, Suresh Chandra, editor, Bhateja, Vikrant, editor, Mohanty, J. R., editor, and Udgata, Siba K., editor
Published: 2020
Full Text: View/download PDF

47. Blockchain-Based Secure Cloud Data Deduplication with Traceability

Author: Huang, Hui, Chen, Qunshan, Zhou, Yuping, Huang, Zhenjie, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Zheng, Zibin, editor, Dai, Hong-Ning, editor, Fu, Xiaodong, editor, and Chen, Benhui, editor
Published: 2020
Full Text: View/download PDF

48. Efficient Ciphertext Policy Attribute Based Encryption (ECP-ABE) for Data Deduplication in Cloud Storage

Author: Kumar, Abhishek, Kumar, P. Syam, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Thampi, Sabu M., editor, Martinez Perez, Gregorio, editor, Ko, Ryan, editor, and Rawat, Danda B., editor
Published: 2020
Full Text: View/download PDF

49. Securing the Data Deduplication to Improve the Performance of Systems in the Cloud Infrastructure

Author: N. Pachpor, Nishant, S. Prasad, Prakash, Verma, Ajit Kumar, Series Editor, Kapur, P. K., Series Editor, Kumar, Uday, Series Editor, Pant, Millie, editor, Sharma, Tarun K., editor, Basterrech, Sebastián, editor, and Banerjee, Chitresh, editor
Published: 2020
Full Text: View/download PDF

50. Efficient Deduplication on Cloud Environment Using Bloom Filter and IND-CCA2 Secured Cramer Shoup Cryptosystem

Author: Mohamed Sirajudeen, Y., Muralidharan, C., Anitha, R., Xhafa, Fatos, Series Editor, Hemanth, D. Jude, editor, Shakya, Subarna, editor, and Baig, Zubair, editor
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

2,291 results on '"Data deduplication"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources