Descriptor: "hbase" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"hbase"' showing total 423 results

Start Over Descriptor "hbase"

423 results on '"hbase"'

1. ALLI: A High-Performance Approach to Data Deduplication in Hadoop using Enhanced Hashing and Two-Level Indexing Techniques.

Author: Zakzouk, Ammar, Oumran, Bassim, and Hasan, Hasan
Abstract: There are many systems like Hadoop that have been developed to effectively handle big data. However, these systems face challenges related to duplicate files, which consume additional resources for both storage and processing. Several approaches have been developed to eliminate duplicate files using hash algorithms. However, these algorithms have struggled to achieve a balance between execution speed and collision probability. Furthermore, the methods employed for storing hash values lead to lengthy match times and an elevated risk of collisions. In this paper, we propose ALLI, an approach designed to accelerate execution time and reduce collision probability during both the hashing and matching stages. ALLI combines the Arithmetic Logic Hash Algorithm (ALHA) for generating 1024-bit hash values and Two-Level Indexing in HBase (TLI-HBase) for efficient storage of hash values. Experiments conducted on four different datasets demonstrate that ALLI outperforms existing file-level deduplication techniques, achieving execution times that are twice as fast as those of other approaches. Moreover, the results indicate that ALHA is 2 to 3 times faster than other hash algorithms while also reducing collision probability even further. Additionally, TLI-HBase improves performance during the matching stage by significantly reducing the number of hash value comparisons compared to other storage methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Distribution-Based Approach for Efficient Storage and Indexing of Massive Infrared Hyperspectral Sounding Data.

Author: Li, Han, Gu, Mingjian, Shi, Guang, Hu, Yong, and Xie, Mengzhen
Subjects: *INFRARED detectors, *ATMOSPHERIC acoustics, *DATA warehousing, *DISTRIBUTED computing, *CLOUD computing
Abstract: Hyperspectral infrared atmospheric sounding data, characterized by their high vertical resolution, play a crucial role in capturing three-dimensional atmospheric spatial information. The hyperspectral infrared atmospheric detectors HIRAS/HIRAS-II, mounted on the FY3D/EF satellite, have established an initial global coverage network for atmospheric sounding. The collaborative observation approach involving multiple satellites will improve both the coverage and responsiveness of data acquisition, thereby enhancing the overall quality and reliability of the data. In response to the increasing number of channels, the rapid growth of data volume, and the specific requirements of multi-satellite joint observation applications with infrared hyperspectral sounding data, this paper introduces an efficient storage and indexing method for infrared hyperspectral sounding data within a distributed architecture for the first time. The proposed approach, built on the Kubernetes cloud platform, utilizes the Google S2 discrete grid spatial indexing algorithm to establish a grid-based hierarchical model for unified metadata-embedded documents. Additionally, it optimizes the rowkey design using the BPDS model, thereby enabling the distributed storage of data in HBase. The experimental results demonstrate that the query efficiency of the Google S2 grid-based embedded document model is superior to that of the traditional flat model, achieving a query time that is only 35.6% of the latter for a dataset of 5 million records. Additionally, this method exhibits better data distribution characteristics within the global grid compared to the H3 algorithm. Leveraging the BPDS model, the HBase distributed storage system adeptly balances the node load and counteracts the detrimental effects caused by the accumulation of time-series remote sensing images. This architecture significantly enhances both storage and query efficiency, thus laying a robust foundation for forthcoming distributed computing. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Distribution-Based Approach for Efficient Storage and Indexing of Massive Infrared Hyperspectral Sounding Data

Author: Han Li, Mingjian Gu, Guang Shi, Yong Hu, and Mengzhen Xie
Subjects: infrared hyperspectral sounding data, HIRAS, distributed storage, kubernetes, HBase, Science
Abstract: Hyperspectral infrared atmospheric sounding data, characterized by their high vertical resolution, play a crucial role in capturing three-dimensional atmospheric spatial information. The hyperspectral infrared atmospheric detectors HIRAS/HIRAS-II, mounted on the FY3D/EF satellite, have established an initial global coverage network for atmospheric sounding. The collaborative observation approach involving multiple satellites will improve both the coverage and responsiveness of data acquisition, thereby enhancing the overall quality and reliability of the data. In response to the increasing number of channels, the rapid growth of data volume, and the specific requirements of multi-satellite joint observation applications with infrared hyperspectral sounding data, this paper introduces an efficient storage and indexing method for infrared hyperspectral sounding data within a distributed architecture for the first time. The proposed approach, built on the Kubernetes cloud platform, utilizes the Google S2 discrete grid spatial indexing algorithm to establish a grid-based hierarchical model for unified metadata-embedded documents. Additionally, it optimizes the rowkey design using the BPDS model, thereby enabling the distributed storage of data in HBase. The experimental results demonstrate that the query efficiency of the Google S2 grid-based embedded document model is superior to that of the traditional flat model, achieving a query time that is only 35.6% of the latter for a dataset of 5 million records. Additionally, this method exhibits better data distribution characteristics within the global grid compared to the H3 algorithm. Leveraging the BPDS model, the HBase distributed storage system adeptly balances the node load and counteracts the detrimental effects caused by the accumulation of time-series remote sensing images. This architecture significantly enhances both storage and query efficiency, thus laying a robust foundation for forthcoming distributed computing.
Published: 2024
Full Text: View/download PDF

4. Study on Spatio-Temporal Indexing Model of Geohazard Monitoring Data Based on Data Stream Clustering Algorithm.

Author: Li, Jiahao, Song, Weiwei, Chen, Jianglong, Wei, Qunlan, and Wang, Jinxia
Subjects: *ALGORITHMS, *MICROCLUSTERS, *COMMUNITY safety, *INDEXING
Abstract: Yunnan Province, residing in the eastern segment of the Qinghai–Tibet Plateau and the western part of the Yunnan–Guizhou Plateau, faces significant challenges due to its intricate geological structures and frequent geohazards. These pose monumental risks to community safety and infrastructure. Unfortunately, conventional spatial indexing methods struggle with the enormous influx of geohazard data, exhibiting inadequacies in efficient spatio-temporal querying and failing to meet the swift response imperatives for real-time geohazard monitoring and early warning mechanisms. In response to these challenges, this study proffers a cutting-edge spatio-temporal indexing model, the BCHR-index, undergirded by data stream clustering algorithms. The operational schema of the BCHR-index model is bifurcated into two stages: real-time and offline. The real-time phase proficiently uses micro-clusters shaped by the CluStream algorithm in unison with a B+ tree to construct indices in memory, thereby satisfying the exigent response necessities for geohazard data streams. Conversely, the offline stage employs the CluStream algorithm and the Hilbert curve to manage heterogeneously distributed spatial objects. Paired with a B+ tree, this framework promotes efficient spatio-temporal querying of geohazard data. The empirical results indicate that the indexing model implemented in this study affords millisecond-level responses when faced with query requests from real-time geohazard data streams. Moreover, in aspects of spatial query efficiency and data-insertion performance, it demonstrates superior results compared to the R-tree and Hilbert-R tree models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Optimizing segmented trajectory data storage with HBase for improved spatio-temporal query efficiency

Author: Yi Bao, Zhou Huang, Xuri Gong, Yuyang Zhang, Ganmin Yin, and Han Wang
Subjects: trajectory storage, hbase, trajectory segmentation, spatio-temporal query, Mathematical geography. Cartography, GA1-1776
Abstract: The surging accumulation of trajectory data has yielded invaluable insights into urban systems, but it has also presented challenges for data storage and management systems. In response, specialized storage systems based on non-relational databases have been developed to support large data quantities in distributed approaches. However, these systems often utilize storage by point or storage by trajectory methods, both of which have drawbacks. In this study, we evaluate the effectiveness of segmented trajectory data storage with HBase optimizations for spatio-temporal queries. We develop a prototype system that includes trajectory segmentation, serialization, and spatio-temporal indexing and apply it to taxi trajectory data in Beijing. Our findings indicate that the segmented system provides enhanced query speed and reduced memory usage compared to the Geomesa system.
Published: 2023
Full Text: View/download PDF

6. VA-HBase:一种面向矢量数据的自适应分布式管理方案.

Author: 谌诞楠, 关雪峰, 韩林栩, 向隆刚, and 吴华意
Subjects: *VECTOR data
Abstract: Objectives: With the rapid development of Earth observation networks, the size of the accumu‑ lated spatial data increases explosively. However, current distributed spatial data management systems fo‑ cus on discrete point sets (e.g. point of interest) or point sequences (e.g. vehicle trajectory), but they cannot provide sufficient support for complex polyline or polygon objects. To address this problem, we propose a vector-oriented adaptive management method based on HBase, named VA-HBase. Methods: In this method, a novel two-level spatial index is firstly designed for complex vector objects. The primary index adaptively finds an appropriate storage level for each vector object according to its spatial characteristics, and encodes this object independently with a customed Z-curve encoding schema. This encoding schema in‑ terleaves the spatial coordinates into a bit-sequence following the Z-curve, and encodes the derived se‑ quence into a byte code with a proposed simplest byte conversion schema. The secondary index adopts the idea of fixed-level grid partitioning and computes intermediate statistics on storage levels for later efficient spatial query. A middle level is defined for grid generation according the level distribution of stored objects, and the minimum storage level of objects within each grid cell will be recorded. Second, with this two-level spatial index, an HBase storage schema is proposed which includes four tables: One meta-data table, one primary index table, one secondary index table and one raw object table. Finally, we design an efficient range query algorithm based on this method. Integrated with the adaptive-level primary index and the fixed-level secondary index, efficient parallel queries are implemented through HBase's filter mechanism. Results: Experiments on three real datasets show that: (1) VA-HBase can achieve about 2 ‑ 10 times higher query efficiency compared with GeoMesa and other related methods. (2) For complex polyline or polygon objects, the adaptive indexing of VA-HBase can quickly filter out duplicated or not within the scope of the query rectangle, and the false positive proportion is much lower than other related methods. (3) With the in‑ crease of the input data size from 7 GB to 300 GB, the query time cost is kept in about 200 ms and VAH-Base shows very good scalability. (4) Facilitated by the simplest byte encoding schema, the indexing storage space of various vector objects is efficiently compressed. Conclusions: VA-HBase can well support the complex vector object management in the context of distributed environment, and can maintain efficient and stable query efficiency faced with large-volume datasets. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. Performance Analysis of Healthcare Information in Big Data NoSql Platform

Author: Mondal, Sukhendu S., Mondal, Somen, Adhikari, Sudip Kumar, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Bhattacharyya, Siddhartha, editor, Das, Gautam, editor, De, Sourav, editor, and Mrsic, Leo, editor
Published: 2023
Full Text: View/download PDF

8. Design and Research of Secondary Index of Traffic Big Data Based on Hash

Author: Wang, Miao, Sun, Quanling, Liang, Xiangying, Gao, Li, Xhafa, Fatos, Series Editor, Abawajy, Jemal H., editor, Xu, Zheng, editor, Atiquzzaman, Mohammed, editor, and Zhang, Xiaolu, editor
Published: 2023
Full Text: View/download PDF

9. A Synchronous Secondary Index Framework Based on Elasticsearch for HBase

Author: Lin, Xiaohui, Guo, Wenzhong, Guo, Kun, Xhafa, Fatos, Series Editor, Xiong, Ning, editor, Li, Maozhen, editor, Li, Kenli, editor, Xiao, Zheng, editor, Liao, Longlong, editor, and Wang, Lipo, editor
Published: 2023
Full Text: View/download PDF

10. Evolution of Hadoop and Big Data Trends in Smart World

Author: Awasthy, Neeta, Valivarthi, Nikhila, Awasthi, Shashank, editor, Sanyal, Goutam, editor, Travieso-Gonzalez, Carlos M., editor, Kumar Srivastava, Pramod, editor, Singh, Dinesh Kumar, editor, and Kant, Rama, editor
Published: 2023
Full Text: View/download PDF

11. Development path of college labor education based on big data platform in the context of health psychology

Author: Li Congcong and Wang Xuehui
Subjects: hadoop technology, hbase, reinforcement learning, α -scattered recommendation, labor education platform, 62n01, Mathematics, QA1-939
Abstract: Labor education is an important content of quality education, and the promotion of higher education cannot be separated from the implementation and practice of labor education. The labor education platform and labor education resource base in colleges and universities are established using the big data platform, which combines Hadoop technology and the HBase database. In order to help students obtain labor education teaching resources more efficiently on the labor education platform, reinforcement learning is combined with a deep neural network to optimize the scheduling of teaching resources, and α -the dispersion recommendation algorithm is used to realize personalized recommendations of teaching resources. With regard to the effectiveness of the labor education platform, load testing and application effects were carried out, and the influencing factors of labor behavior and craftsmanship cultivation of college labor education students were analyzed in depth. The labor education platform’s transaction response time increases by 22.5 seconds when the number of users reaches 5000, according to the results. The personal motivation for craftsmanship in the process of labor education will be affected by the campus culture, and its correlation coefficient is 0.565, and the student’s satisfaction with the platform of labor education reaches 0.825. The innovation and development of labor education in colleges and universities need to fully rely on the big data platform to promote the optimization and sharing of labor education resources and to promote the development of labor education resources. The cultivation of student labor creation and craftsmanship can be ensured through the optimization and sharing of educational resources.
Published: 2024
Full Text: View/download PDF

12. 大数据存储技术在张衡一号卫星数据服务中的应用.

Author: 杨旭明, 王志, 李忠, 黄建平, 杨百一, and 陈朝阳
Abstract: With the continuous increase of the monitoring data of Zhangheng-1 satellite, the current HDF5 file storage mode not only highlights its disadvantages in terms of performance scalability, read-write concurrency and other aspects, but also fails to achieve accurate spatial-temporal query of the required data, which brings great difficulties to users. In order to solve the problem of efficient storage and reading of massive satellite data, the advantages and disadvantages of HBase and ElasticSearch engine were analyzed, a satellite big data storage scheme combining the two was proposed, and a big data test platform based on Hadoop architecture was built, and the storage test and comparative test of the ultra-low frequency(ULF) frequency band data of the space electric field of Zhangheng-1 satellite were completed. The results show that the scheme greatly improves the read and write concurrent performance of massive satellite observation data, which is dozens of times higher than the current file storage method, and realizes accurate positioning and fast query of satellite data, which well meets user requirements. [ABSTRACT FROM AUTHOR]
Published: 2023

13. HCIndex: a Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems.

Author: Wang, Xinyang, Sun, Yu, Sun, Qiao, Lin, Weiwei, Wang, James Z., and Li, Wei
Subjects: *CLOUD storage, *PARALLEL algorithms, *ELECTRONIC data processing, *DATA warehousing, *HILBERT space, *CLOUD computing
Abstract: With the rapid development of the Internet of Things and cloud computing, HBase has become a good choice for massive data storage, and is efficient in reading and writing data. However, HBase is not supportive for multi-dimensional query of non-rowkey data, unconducive to data analysis and processing. To address this issue, we first analyze the constitution principle and deficiency of secondary index and clustering index, and select clustering index as the basis of optimization. Then, we choose the Hilbert curve in the space filling curve as the linearization technology, design the pre-partition algorithm and subspace partition algorithm, and realize the Hilbert-curve-based clustering index (HCIndex) which supports multi-dimensional point query and range query. Finally, the performance of HCIndex is verified by comparison experiments with HBase Scan, HiBase and CCIndex. The experimental results show that the query efficiency of HCIndex has been greatly improved at the expense of very limited storage space, which is necessary for storing index data and only 1.7 times the size of the original data table of HBase. Compared with HBase scan, the query efficiency of HCIndex's multi-dimensional point query and range query has been increased to more than 4 times and more than 2 times, respectively. Therefore, the proposed HCIndex is well suited for efficient multi-dimensional and complex queries of massive data in cloud storage systems. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

14. 基于大数据应用的地质灾害数据存储策略.

Author: 石晓拢, 赵统永, 王耀忠, and 彭君
Abstract: Copyright of Computer Measurement & Control is the property of Magazine Agency of Computer Measurement & Control and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2023
Full Text: View/download PDF

15. Study on Spatio-Temporal Indexing Model of Geohazard Monitoring Data Based on Data Stream Clustering Algorithm

Author: Jiahao Li, Weiwei Song, Jianglong Chen, Qunlan Wei, and Jinxia Wang
Subjects: BCHR tree, CluStream, B+ tree, Hilbert curve, Hilbert-R tree, HBase, Geography (General), G1-922
Abstract: Yunnan Province, residing in the eastern segment of the Qinghai–Tibet Plateau and the western part of the Yunnan–Guizhou Plateau, faces significant challenges due to its intricate geological structures and frequent geohazards. These pose monumental risks to community safety and infrastructure. Unfortunately, conventional spatial indexing methods struggle with the enormous influx of geohazard data, exhibiting inadequacies in efficient spatio-temporal querying and failing to meet the swift response imperatives for real-time geohazard monitoring and early warning mechanisms. In response to these challenges, this study proffers a cutting-edge spatio-temporal indexing model, the BCHR-index, undergirded by data stream clustering algorithms. The operational schema of the BCHR-index model is bifurcated into two stages: real-time and offline. The real-time phase proficiently uses micro-clusters shaped by the CluStream algorithm in unison with a B+ tree to construct indices in memory, thereby satisfying the exigent response necessities for geohazard data streams. Conversely, the offline stage employs the CluStream algorithm and the Hilbert curve to manage heterogeneously distributed spatial objects. Paired with a B+ tree, this framework promotes efficient spatio-temporal querying of geohazard data. The empirical results indicate that the indexing model implemented in this study affords millisecond-level responses when faced with query requests from real-time geohazard data streams. Moreover, in aspects of spatial query efficiency and data-insertion performance, it demonstrates superior results compared to the R-tree and Hilbert-R tree models.
Published: 2024
Full Text: View/download PDF

16. Global Thematic Land Use Cover Datasets Characterizing Artificial Covers

Author: García-Álvarez, David, Lara Hinojosa, Javier, Jurado Pérez, Francisco José, García-Álvarez, David, editor, Camacho Olmedo, María Teresa, editor, Paegelow, Martin, editor, and Mas, Jean François, editor
Published: 2022
Full Text: View/download PDF

17. An Efficient Storage Architecture Based on Blockchain and Distributed Database for Public Security Big Data

Author: Liao, Duoyue, Dong, Xinhua, Xu, Zhigang, Han, Hongmu, Yan, Zhongzhen, Sun, Qing, Li, Qi, Xhafa, Fatos, Series Editor, Hu, Zhengbing, editor, Dychka, Ivan, editor, Petoukhov, Sergey, editor, and He, Matthew, editor
Published: 2022
Full Text: View/download PDF

18. An Approach for Implementing Online Analytical Processing Systems under Column-Family Databases.

Author: Khalil, Abdelhak and Belaissaoui, Mustapha
Subjects: *OLAP technology, *DATABASES, *DATA warehousing, *MULTIDIMENSIONAL databases, *CONCEPTUAL models, *BIG data, *CUBES, *SCALABILITY
Abstract: The exponential growth of business data coming from heterogeneous sources imposes the use of new generations of database management systems and new data storage architectures. The major players in the big data market have turned to NoSQL (Not only SQL) technology, which provides a flexible data model and high scalability. In this paper, we investigate OLAP (Online Analytical Processing) implementation using columnar databases (a type of NoSQL system). We provide a set of formal transformation rules in order to map the multidimensional conceptual model to a target model that is suitable for the column-oriented model. Then, we propose two OLAP cube operators called MRC-Cube and SC-Cube, which allow to build the OLAP cube using the MapReduce paradigm and Spark respectively. We conduct an experimental comparison of their performance to analogous relational implementation using Oracle OLAP, we focus particularly on read latency metric under different experimental configurations. The obtained results show a clear difference when performing the OLAP cube building between the relational implementation and the columnar one. [ABSTRACT FROM AUTHOR]
Published: 2023

19. An HBase-Based Optimization Model for Distributed Medical Data Storage and Retrieval.

Author: Zhu, Chengzhang, Liu, Zixi, Zou, Beiji, Xiao, Yalong, Zeng, Meng, Wang, Han, and Fan, Ziang
Subjects: INFORMATION retrieval, OPTICAL disks, MEDICAL equipment, DATA warehousing
Abstract: In medical services, the amount of data generated by medical devices is increasing explosively, and access to medical data is also put forward with higher requirements. Although HBase-based medical data storage solutions exist, they cannot meet the needs of fast locating and diversified access to medical data. In order to improve the retrieval speed, the recognition model S-TCR and the dynamic management algorithm SL-TCR, based on the behavior characteristics of access, were proposed to identify the frequently accessed hot data and dynamically manage the data storage medium as to maximize the system access performance. In order to improve the search performance of keys, an optimized secondary index strategy was proposed to reduce I/O overhead and optimize the search performance of non-primary key indexes. Comparative experiments were conducted on real medical data sets. The experimental results show that the optimized retrieval model can meet the needs of hot data access and diversified medical data retrieval. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. Improving NoSQL Spatial-Query Processing with Server-Side In-Memory R*-Tree Indexes for Spatial Vector Data.

Author: Sun, Lele and Jin, Baoxuan
Abstract: Geospatial databases are basic tools to collect, index, and manage georeferenced data indicators in sustainability research for efficient, long-term analysis. NoSQL databases are increasingly applied to manage the ever-growing massive spatial vector data (SVD) with their changeable data schemas, agile scalability, and fast query response time. Spatial queries are basic operations in geospatial databases. According to Green information technology, an efficient spatial index can accelerate query processing and save power consumption for ubiquitous spatial applications. Current solutions tend to pursue it by indexing spatial objects with space-filling curves or geohash on NoSQL databases. As for the performance-wise R-tree family, they are mainly used in slow disk-based spatial access methods on NoSQL databases that incur high loading and searching costs. Therefore, performing spatial queries efficiently with the R-tree family on NoSQL databases remains a challenge. In this paper, an in-memory balanced and distributed R*-tree index named the BDRST index is proposed and implemented on HBase for efficient spatial-query processing of massive SVD. The BDRST index stores and distributes serialized R*-trees to HBase regions in association with SVD partitions in the same table. Moreover, an efficient optimized server-side parallel processing framework is presented for real-time R*-tree instantiation and query processing. Through extensive experiments on real-world land-use data sets, the performance of our method is tested, including index building, index quality, spatial queries, and applications. Our proposed method outperforms other state-of-the-art solutions, saving between 27.36% and 95.94% on average execution time for the above operations. Experimental results show the capability of the BDRST index to support spatial queries over large-scale SVD, and our method provides a solution for efficient sustainability research that involves massive georeferenced data. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

21. An analysis of e hadoop/mapreduce/h base framework and its current applications in bioinformatics

Author: Tripathi, Ramesh Chandra
Published: 2021
Full Text: View/download PDF

22. Optimizing segmented trajectory data storage with HBase for improved spatio-temporal query efficiency.

Author: Bao, Yi, Huang, Zhou, Gong, Xuri, Zhang, Yuyang, Yin, Ganmin, and Wang, Han
Subjects: *NONRELATIONAL databases, *TEMPORAL databases, *URBANIZATION, *RELATIONAL databases, *DATA warehousing
Abstract: The surging accumulation of trajectory data has yielded invaluable insights into urban systems, but it has also presented challenges for data storage and management systems. In response, specialized storage systems based on non-relational databases have been developed to support large data quantities in distributed approaches. However, these systems often utilize storage by point or storage by trajectory methods, both of which have drawbacks. In this study, we evaluate the effectiveness of segmented trajectory data storage with HBase optimizations for spatio-temporal queries. We develop a prototype system that includes trajectory segmentation, serialization, and spatio-temporal indexing and apply it to taxi trajectory data in Beijing. Our findings indicate that the segmented system provides enhanced query speed and reduced memory usage compared to the Geomesa system. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

23. A Distributed Storage Middleware Based on HBase and Redis

Author: Xu, Lingling, Chen, Yuzhong, Guo, Kun, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Sun, Yuqing, editor, Liu, Dongning, editor, Liao, Hao, editor, Fan, Hongfei, editor, and Gao, Liping, editor
Published: 2021
Full Text: View/download PDF

24. Big Data Analytics: A Review and Tools Comparison

Author: Dhivya, V., Bansal, Jagdish Chand, Series Editor, Deep, Kusum, Series Editor, Nagar, Atulya K., Series Editor, Goyal, Dinesh, editor, Bălaş, Valentina Emilia, editor, Mukherjee, Abhishek, editor, Hugo C. de Albuquerque, Victor, editor, and Gupta, Amit Kumar, editor
Published: 2021
Full Text: View/download PDF

25. Optimization of a Similarity Performance on Bounded Content of Motion Histogram by Using Distributed Model

Author: Saoudi, El Mehdi, Adoui El Ouadrhiri, Abderrahmane, Jai Andaloussi, Said, Ouchetto, Ouail, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Saeed, Faisal, editor, Al-Hadhrami, Tawfik, editor, Mohammed, Fathey, editor, and Mohammed, Errais, editor
Published: 2021
Full Text: View/download PDF

26. Analytical Design of the DIS Architecture: The Hybrid Model

Author: Prakash, B. R., Hanumanthappa, M., Dattasmita, H. V., Kavitha, Vasantha, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Rathore, Vijay Singh, editor, Dey, Nilanjan, editor, Piuri, Vincenzo, editor, Babo, Rosalina, editor, Polkowski, Zdzislaw, editor, and Tavares, João Manuel R. S., editor
Published: 2021
Full Text: View/download PDF

27. Introduction to Big Data Technology

Author: Abu-Salih, Bilal, Wongthongtham, Pornpit, Zhu, Dengya, Chan, Kit Yan, Rudra, Amit, Abu-Salih, Bilal, Wongthongtham, Pornpit, Zhu, Dengya, Chan, Kit Yan, and Rudra, Amit
Published: 2021
Full Text: View/download PDF

28. A Novel IoT-Based Approach Towards Diabetes Prediction Using Big Data

Author: Biswas, Riya, Pal, Souvik, Cuong, Nguyen Ha Huy, Chakrabarty, Arindam, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Solanki, Vijender Kumar, editor, Hoang, Manh Kha, editor, Lu, Zhonghyu (Joan), editor, and Pattnaik, Prasant Kumar, editor
Published: 2020
Full Text: View/download PDF

29. Big Data for Context-Aware Computing

Author: Addakiri, Khaoula, Khallouki, Hajar, Bahaj, Mohamed, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, and Ezziyyani, Mostafa, editor
Published: 2020
Full Text: View/download PDF

30. Healthcare Data Storage Based on HBase

Author: Addakiri, Khaoula, Khallouki, Hajar, Bahaj, Mohamed, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, and Ezziyyani, Mostafa, editor
Published: 2020
Full Text: View/download PDF

31. Benchmarking Redis and HBase NoSQL Databases using Yahoo Cloud Service Benchmarking tool.

Author: Alzaidi, Mustafa and Vagner, Aniko
Subjects: *NONRELATIONAL databases, *DATABASES, *CLOUD storage
Abstract: The Not Structured Query Language (NoSQL) databases have become more relevant to applications developers as the need for scalable and flexible data storage for online applications has increased. Each NoSQL database system provides features that fit particular types of applications. Thus, the developer must carefully select according to the application's needs. Redis is a key-value NoSQL database that provides fast data access. On the other hand, the Apache HBase database is a column-oriented database that offers scalability and fast data access, is a promising alternative to Redis in some types of applications. In this research paper, the goal is to use the Yahoo Cloud Serving Benchmark (YCSB) to compare the performance of two databases (Redis and HBase). The YCSB platform has been developed to determine the throughput of both databases against different workloads. This paper evaluates these NoSQL databases with six workloads and varying threads. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

32. 基于轨迹大数据时空分布的索引与查询方法.

Author: 李征宇 and 赵卓峰
Subjects: *DATA distribution, *EMPIRICAL research, *ALGORITHMS
Abstract: Due to the diversified behavioral characteristics and regular pattern of moving objects, the trajectory data generated by these objects shows obvious uneven distribution feature in time and space, which may lead to worse performance for trajectory data indexing and querying. However, the existing trajectory data indexing methods rarely consider this problem. In this paper, a temporal-spatial distribution based indexing and querying method is proposed. In the method, Geohash code is introduced and pre-partitioned spatially by utilizing the temporal-spatial similarity of the trajectory data distribution. Then, we use the pre-partitioned Geohash code, partition number and the trajectory data timestamp to compose the index structure. With this index structure, a storage model based on HBase and a query algorithm based on Geohash partition are designed respectively. The empirical study using real trajectory dataset shows that the method improves the spatiotemporal query performance of trajectory data by comparing with the Extend_HGrid and ST-hash methods, and effectively reduces the number of sub-queries during query. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. IS-HBase: An In-Storage Computing Optimized HBase with I/O Offloading and Self-Adaptive Caching in Compute-Storage Disaggregated Infrastructure.

Author: ZHICHAO CAO, HUIBING DONG, YIXUN WEI, SHIYONG LIU, and DU, DAVID H. C.
Subjects: SELF-adaptive software, COMPACTING, BANDWIDTHS, SCANNING systems, STORAGE
Abstract: Active storage devices and in-storage computing are proposed and developed in recent years to effectively reduce the amount of required data traffic and to improve the overall application performance. They are especially preferred in the compute-storage disaggregated infrastructure. In both techniques, a simple computing module is added to storage devices/servers such that some stored data can be processed in the storage devices/servers before being transmitted to application servers. This can reduce the required network bandwidth and offload certain computing requirements from application servers to storage devices/servers. However, several challenges exist when designing an in-storage computing-based architecture for applications. These include what computing functions need to be offloaded, how to design the protocol between in-storage modules and application servers, and how to deal with the caching issue in application servers. HBase is an important and widely used distributed Key-Value Store. It stores and indexes key-value pairs in large files in a storage system like HDFS. However, its performance especially read performance, is impacted by the heavy traffics between HBase RegionServers and storage servers in the compute-storage disaggregated infrastructure when the available network bandwidth is limited. We propose an In-Storage-based HBase architecture, called IS-HBase, to improve the overall performance and to address the aforementioned challenges. First, IS-HBase executes a data pre-processing module (In-Storage ScanNer, called ISSN) for some read queries and returns the requested key-value pairs to RegionServers instead of returning data blocks in HFile. IS-HBase carries out compactions in storage servers to reduce the large amount of data being transmitted through the network and thus the compaction execution time is effectively reduced. Second, a set of new protocols is proposed to address the communication and coordination between HBase RegionServers at computing nodes and ISSNs at storage nodes. Third, a new self-adaptive caching scheme is proposed to better serve the read queries with fewer I/O operations and less network traffic. According to our experiments, the IS-HBase can reduce up to 97% network traffic for read queries and the throughput (queries per second) is significantly less affected by the fluctuation of available network bandwidth. The execution time of compaction in IS-HBase is only about 6.31% - 41.84% of the execution time of legacy HBase. In general, IS-HBase demonstrates the potential of adopting in-storage computing for other data-intensive distributed applications to significantly improve performance in compute-storage disaggregated infrastructure. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

34. Fast Access and Retrieval of Big Data Based on Unique Identification.

Author: Wenshun Sheng, Aiping Xu, and Shengli Wu
Subjects: INFORMATION retrieval, DATA conversion, PROBLEM solving, DATA mapping, BIG data
Abstract: In big data applications, the data are usually stored in data files, whose data file structures, field structures, data types and lengths are not uniform. Therefore, if these data are stored in the traditional relational database, it is difficult to meet the requirements of fast storage and access. To solve this problem, we propose the mapping model between the source data file and the target HBase file. Our method solves the heterogeneity of the file object and the universality of the storage conversion. Firstly, based on the mapping model, we design “RowKey”, generation rules and algorithm. Then according to the mapping rules of data file fields with the HBase table column, the data in the data file are transformed into HBase. Finally, the retrieved keywords in “RowKey” are stored and used to achieve fast data retrieval by prefix matching or keyword matching method. Our method has been applied to different projects, which shows these results can be applied to the data conversion from regular row store data file to HBase distributed large data storage and has strong commonality. The method can be widely used in HBase big data storage applications. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

35. The ATLAS EventIndex: A BigData Catalogue for All ATLAS Experiment Events

Author: Barberis, Dario, Alexandrov, Igor, Alexandrov, Evgeny, Baranowski, Zbigniew, Canali, Luca, Cherepanova, Elizaveta, Dimitrov, Gancho, Favareto, Andrea, Fernández Casaní, Álvaro, Gallas, Elizabeth J., Montoro, Carlos García, González de la Hoz, Santiago, Hřivnáč, Julius, Iakovlev, Alexander, Kazymov, Andrei, Mineev, Mikhail, Prokoshin, Fedor, Rybkin, Grigori, Salt, José, Sánchez, Javier, Sorokoletov, Roman, Többicke, Rainer, Vasileva, Petya, Villaplana Perez, Miguel, and Yuan, Ruijun
Published: 2023
Full Text: View/download PDF

36. The Time Machine in Columnar NoSQL Databases: The Case of Apache HBase.

Author: Tsai, Chia-Ping, Chang, Che-Wei, Hsiao, Hung-Chang, and Shen, Haiying
Subjects: NONRELATIONAL databases, DATA recovery, DEBUGGING, RELATIONAL databases, HISTORICAL analysis, SQL
Abstract: Not Only SQL (NoSQL) is a critical technology that is scalable and provides flexible schemas, thereby complementing existing relational database technologies. Although NoSQL is flourishing, present solutions lack the features required by enterprises for critical missions. In this paper, we explore solutions to the data recovery issue in NoSQL. Data recovery for any database table entails restoring the table to a prior state or replaying (insert/update) operations over the table given a time period in the past. Recovery of NoSQL database tables enables applications such as failure recovery, analysis for historical data, debugging, and auditing. Particularly, our study focuses on columnar NoSQL databases. We propose and evaluate two solutions to address the data recovery problem in columnar NoSQL and implement our solutions based on Apache HBase, a popular NoSQL database in the Hadoop ecosystem widely adopted across industries. Our implementations are extensively benchmarked with an industrial NoSQL benchmark under real environments. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

37. Using Big Data-machine learning models for diabetes prediction and flight delays analytics

Author: Thérence Nibareke and Jalal Laassiri
Subjects: Big Data, Hadoop, Spark, HBase, Machine learning, Data analytics, Computer engineering. Computer hardware, TK7885-7895, Information technology, T58.5-58.64, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract Introduction Nowadays large data volumes are daily generated at a high rate. Data from health system, social network, financial, government, marketing, bank transactions as well as the censors and smart devices are increasing. The tools and models have to be optimized. In this paper we applied and compared Machine Learning algorithms (Linear Regression, Naïve bayes, Decision Tree) to predict diabetes. Further more, we performed analytics on flight delays. The main contribution of this paper is to give an overview of Big Data tools and machine learning models. We highlight some metrics that allow us to choose a more accurate model. We predict diabetes disease using three machine learning models and then compared their performance. Further more we analyzed flight delay and produced a dashboard which can help managers of flight companies to have a 360° view of their flights and take strategic decisions. Case description We applied three Machine Learning algorithms for predicting diabetes and we compared the performance to see what model give the best results. We performed analytics on flights datasets to help decision making and predict flight delays. Discussion and evaluation The experiment shows that the Linear Regression, Naive Bayesian and Decision Tree give the same accuracy (0.766) but Decision Tree outperforms the two other models with the greatest score (1) and the smallest error (0). For the flight delays analytics, the model could show for example the airport that recorded the most flight delays. Conclusions Several tools and machine learning models to deal with big data analytics have been discussed in this paper. We concluded that for the same datasets, we have to carefully choose the model to use in prediction. In our future works, we will test different models in other fields (climate, banking, insurance.).
Published: 2020
Full Text: View/download PDF

38. Evaluating Index Systems of High Energy Physics

Author: Dai, Shaopeng, Gao, Wanling, Xie, Biwei, Yu, Minghe, Chen, Jia’nan, Kong, Defei, Han, Rui, Li, Jinheng, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Ghosh, Ashish, Series Editor, Ren, Rui, editor, Zheng, Chen, editor, and Zhan, Jianfeng, editor
Published: 2019
Full Text: View/download PDF

39. The Performance Analysis of Distributed Storage Systems Used in Scalable Web Systems

Author: Oleś, Dominik, Nowak, Ziemowit, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Borzemski, Leszek, editor, Świątek, Jerzy, editor, and Wilimowska, Zofia, editor
Published: 2019
Full Text: View/download PDF

40. Naive Bayes and Decision Tree Classifier for Streaming Data Using HBase

Author: Mukherjee, Aradhita, Mondal, Sudip, Chaki, Nabendu, Khatua, Sunirmal, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Chaki, Rituparna, editor, Cortesi, Agostino, editor, Saeed, Khalid, editor, and Chaki, Nabendu, editor
Published: 2019
Full Text: View/download PDF

41. Storage and Analysis of Synchrophasor Data for Event Detection in Indian Power System Using Hadoop Ecosystem

Author: Singh, Akhilendra Pratap, Hemant Kumar, G., Paik, Subhendu Sekhar, Sinha Roy, Diptendu, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Jain, Lakhmi C., editor, E. Balas, Valentina, editor, and Johri, Prashant, editor
Published: 2019
Full Text: View/download PDF

42. NoSQL Overview and Performance Testing of HBase Over Multiple Nodes with MySQL

Author: Das, Nabanita, Paul, Swagata, Sarkar, Bidyut Biman, Chakrabarti, Satyajit, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Abraham, Ajith, editor, Dutta, Paramartha, editor, Mandal, Jyotsna Kumar, editor, Bhattacharya, Abhishek, editor, and Dutta, Soumi, editor
Published: 2019
Full Text: View/download PDF

43. EventDB: A Large-Scale Semi-structured Scientific Data Management System

Author: Zhao, Wenjia, Qi, Yong, Hou, Di, Wang, Peijian, Gao, Xin, Du, Zirong, Zhang, Yudong, Zong, Yongfang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Li, Jianhui, editor, Meng, Xiaofeng, editor, Zhang, Ying, editor, Cui, Wenjuan, editor, and Du, Zhihui, editor
Published: 2019
Full Text: View/download PDF

44. Hadoop: A Standard Framework for Computer Cluster

Author: Akhgarnush, Eljar, Broeckers, Lars, Jakoby, Thorsten, Liermann, Volker, editor, and Stegmann, Claus, editor
Published: 2019
Full Text: View/download PDF

45. HBASE Performance Analysis in Big Datasets Processing.

Author: Mladenova, Tsvetelina, Kalmukov, Yordan, Marinov, Milko, and Valova, Irena
Subjects: *DATA structures, *WORK structure, *BIG data
Abstract: The term Big Data has gained popularity in recent years due to technological developments and the accumulation of data from various sources, mobile devices and sensors. Hbase is a distributed open source environment that uses available disk space optimally and efficiently based on data. It organizes data in a very different way from standard relational databases and works with both structured and unstructured data. This article describes our experience and research on how the execution time for inserting datasets and selecting data depends on the size of the data volumes, the locations (nodes of the same or different networks) from which they send or retrieve and what is the effect of the selected data organization (especially RowKey design) on the execution time. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

46. An improved tile-based scalable distributed management model of massive high-resolution satellite images.

Author: Hajjaji, Yosra, Boulila, Wadii, and Farah, Imed Riadh
Subjects: REMOTE-sensing images, NONRELATIONAL databases, REMOTE sensing, DATA management, RELATIONAL databases, METADATA, DATA warehousing, PYRAMIDS
Abstract: The amount of remote sensing (RS) data has increased at an unexpected scale, due to the rapid progress of earth-observation and the growth of satellite RS and sensor technologies. Traditional relational databases attend their limit to meet the needs of high-resolution and large-scale RS Big Data management. As a result, massive RS data management is currently one of the most imperative topics. To address this problem, this paper describes a distributed architecture for big RS data storage based on a unified metadata file, pyramid model, and Hilbert curve for data composition and indexing using NoSQL databases (i.e, Apache Hbase). In this paper, a Hadoop-based framework in AzureInsight cloud platform is designed to manage massive RS data in a parallel and distributed way. Experimental results prove that our method has the potential to overcome the weakness of traditional methods. The proposed model is suitable for massive high-resolution image data management. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

47. A Blocky and Layered Management Schema for Remote Sensing Data

Author: Beibei Yang, Rui Wang, Wen Zhang, Chenhan Wu, Xujin Wang, and Lingkui Meng
Subjects: Data management, Google S2, HBase, remote sensing data, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: In the era of rapid data expansion and computer technology development, discrete storage, multiband push and fuzzy query remote sensing data management methods are no longer suitable for the data analysis needs of users, including the needs for long time series, global regions and multidata fusion. After analyzing traditional data management techniques, this paper discusses the existing achievements and development trends of current technologies. This paper aims to solve the problem of data sharing difficulties and organizational inconsistency caused by the use of different formats for the same spatial object. Based on a discrete global grid, this paper studies the blocky division method and coding specification of Google S2 and then accomplishes the layered storage of remote sensing data in HBase. Finally, Kylin is used to build a cube model to discuss the information mining analysis changes in the new data management model. Experiments show that the blocky and layered management schema (BLMS) can realize the unified management of global remote sensing data with multisource, heterogeneous, multiscale, and long-term characteristics and provide accurate data services on demand.
Published: 2020
Full Text: View/download PDF

48. Research on a Distributed Processing Model Based on Kafka for Large-Scale Seismic Waveform Data

Author: Xu-Chao Chai, Qing-Liang Wang, Wen-Sheng Chen, Wen-Qing Wang, Dan-Ning Wang, and Yue Li
Subjects: HBase, Kafka, key-value, spark streaming, seismic waveform data, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: For storage and recovery requirements on large-scale seismic waveform data of the National Earthquake Data Backup Center (NEDBC), a distributed cluster processing model based on Kafka message queues is designed to optimize the inbound efficiency of seismic waveform data stored in HBase at NEDBC. Firstly, compare the characteristics of big data storage architectures with that of traditional disk array storage architectures. Secondly, realize seismic waveform data analysis and periodic truncation, and write HBase in NoSQL record form through Spark Streaming cluster. Finally, compare and test the read/write performance of the data processing process of the proposed big data platform with that of traditional storage architectures. Results show that the seismic waveform data processing architecture based on Kafka designed and implemented in this paper has a higher read/write speed than the traditional architecture on the basis of the redundancy capability of NEDBC data backup, which verifies the validity and practicability of the proposed approach.
Published: 2020
Full Text: View/download PDF

49. A Storage Method for Remote Sensing Images Based on Google S2

Author: Xujin Wang, Rui Wang, Wen Zhan, Beibei Yang, Linyi Li, Fei Chen, and Lingkui Meng
Subjects: HBase, remote sensing images, Google S2, load balancing, tile storage mode, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: When using HBase to store tiles of remote sensing images, the spatial position of a tile is often used as the first part of the tile's rowkey so that tiles with high spatial correlations are stored close together to improve query efficiency. We refer to this storage method as the Geo-First model. However, Geo-First models have two problems: the load between nodes is unbalanced, and the accumulation of time-series remote sensing images has a negative impact on storage and query efficiency. Considering these two problems, we proposed a method for storing remote sensing images based on Google S2 and HBase. In our method, two strategies are adopted to eliminate these problems: the balanced placement strategy (BPS) and the periodic storage strategy (PSS). We evaluated our method by focusing on the effectiveness of BPS and PSS. The results show that our method achieves higher tile storage and query efficiency than three Geo-First models based on latitude and longitude, Geohash code, and Google S2 code. BPS effectively balances the load between nodes, while PSS alleviates the negative impact of the accumulation of time-series remote sensing images. Both BPS and PSS greatly improve tile storage and query efficiency.
Published: 2020
Full Text: View/download PDF

50. Research on Insurance Data Analysis Platform Based on the Hadoop Framework

Author: Xia, Mingze, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin (Sherman), Editorial Board Member, Stan, Mircea, Editorial Board Member, Xiaohua, Jia, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, Gu, Xuemai, editor, Liu, Gongliang, editor, and Li, Bo, editor
Published: 2018
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

423 results on '"hbase"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources