Descriptor: "data partition" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"data partition"' showing total 170 results

Start Over Descriptor "data partition"

170 results on '"data partition"'

1. Data set terminology of deep learning in medicine: a historical review and recommendation.

Author: Walston, Shannon L., Seki, Hiroshi, Takita, Hirotaka, Mitsuyama, Yasuhito, Sato, Shingo, Hagiwara, Akifumi, Ito, Rintaro, Hanaoka, Shouhei, Miki, Yukio, and Ueda, Daiju
Abstract: Medicine and deep learning-based artificial intelligence (AI) engineering represent two distinct fields each with decades of published history. The current rapid convergence of deep learning and medicine has led to significant advancements, yet it has also introduced ambiguity regarding data set terms common to both fields, potentially leading to miscommunication and methodological discrepancies. This narrative review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical deep learning contexts, and offer solutions to mitigate misunderstandings by readers from either field. Through an examination of historical documents, including articles, writing guidelines, and textbooks, this review traces the divergent evolution of terms for data sets and their impact. Initially, the discordant interpretations of the word 'validation' in medical and AI contexts are explored. We then show that in the medical field as well, terms traditionally used in the deep learning domain are becoming more common, with the data for creating models referred to as the 'training set', the data for tuning of parameters referred to as the 'validation (or tuning) set', and the data for the evaluation of models as the 'test set'. Additionally, the test sets used for model evaluation are classified into internal (random splitting, cross-validation, and leave-one-out) sets and external (temporal and geographic) sets. This review then identifies often misunderstood terms and proposes pragmatic solutions to mitigate terminological confusion in the field of deep learning in medicine. We support the accurate and standardized description of these data sets and the explicit definition of data set splitting terminologies in each publication. These are crucial methods for demonstrating the robustness and generalizability of deep learning applications in medicine. This review aspires to enhance the precision of communication, thereby fostering more effective and transparent research methodologies in this interdisciplinary field. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Simulating federated learning for steatosis detection using ultrasound images

Author: Yue Qi, Pedro Vianna, Alexandre Cadrin-Chênevert, Katleen Blanchet, Emmanuel Montagnon, Eugene Belilovsky, Guy Wolf, Louis-Antoine Mullie, Guy Cloutier, Michaël Chassé, and An Tang
Subjects: Steatosis, B-mode ultrasound image, Federated learning, Data partition, Class imbalance, Medicine, Science
Abstract: Abstract We aimed to implement four data partitioning strategies evaluated with four federated learning (FL) algorithms and investigate the impact of data distribution on FL model performance in detecting steatosis using B-mode US images. A private dataset (153 patients; 1530 images) and a public dataset (55 patient; 550 images) were included in this retrospective study. The datasets contained patients with metabolic dysfunction-associated fatty liver disease (MAFLD) with biopsy-proven steatosis grades and control individuals without steatosis. We employed four data partitioning strategies to simulate FL scenarios and we assessed four FL algorithms. We investigated the impact of class imbalance and the mismatch between the global and local data distributions on the learning outcome. Classification performance was assessed with area under the receiver operating characteristic curve (AUC) on a separate test set. AUCs were 0.93 (95% CI 0.92, 0.94) for source-based partitioning scenario with FedAvg, 0.90 (95% CI 0.89, 0.91) for a centralized model, and 0.83 (95% CI 0.81, 0.85) for a model trained in a single-center scenario. When data was perfectly balanced on the global level and each site had an identical data distribution, the model yielded an AUC of 0.90 (95% CI 0.88, 0.92). When each site contained data exclusively from one single class, irrespective of the global data distribution, the AUC fell in the range of 0.34–0.70. FL applied to B-mode US images provide performance comparable to a centralized model and higher than single-center scenario. Global data imbalance and local data heterogeneity influenced the learning outcome.
Published: 2024
Full Text: View/download PDF

3. Simulating federated learning for steatosis detection using ultrasound images

Author: Qi, Yue, Vianna, Pedro, Cadrin-Chênevert, Alexandre, Blanchet, Katleen, Montagnon, Emmanuel, Belilovsky, Eugene, Wolf, Guy, Mullie, Louis-Antoine, Cloutier, Guy, Chassé, Michaël, and Tang, An
Published: 2024
Full Text: View/download PDF

4. BIG DATA CLUSTER TENDENCY TECHNIQUES WITH SPECTRAL FEATURES FOR EFFICIENT DATA PARTITIONS ASSESSMENT.

Author: PINISETTY, RAJASEKHAR and V., RAVINDRANATH
Subjects: EVIDENCE gaps, EUCLIDEAN distance, BIG data, JOB performance, DATA quality
Abstract: Cluster tendency assessment in big data poses a challenge, particularly for non-compact separated (non-CS) datasets with irregular boundaries. This paper introduces a novel Spectral-Based Visual Technique (SVT) to address this limitation. Determining the similarity features for the data objects is a crucial computation in data clustering. Distance measures such as Euclidean and cosine are widely employed in clustering applications. By pre-determining cluster tendency, the quality of clusters is obtained using the algorithms of Visual Assessment of Cluster Tendency (VAT) and cosine-based VAT (cVAT). Both VAT and cVAT utilize Euclidean and cosine distance measures to identify the similarity features of objects. For extensive data cluster tendency assessment, an extended concept of VAT, Clustering using Improved Visual Assessment of Tendency (ClusiVAT), is employed to derive clusters with scalable amounts of time and memory loads. However, it operates efficiently for Compactly Separated (CS) datasets. The research gap lies in the need to deliver the quality of big data partitions (or clusters) for non-compact separated (non-CS) datasets. Thus, this paper proposes a spectral-based visual cluster tendency technique to address the challenge of significant data clustering for non-CS datasets. Experimental analysis employs benchmarked datasets to illustrate the performance of the proposed work compared to other techniques. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Federated Data Integration for Heterogeneous Partitions Based on Differential Privacy

Author: Huang, Jinghao, Sang, Yingpeng, Cai, Chaoxin, Li, Weizheng, Zhang, Maliang, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Shen, Hong, editor, Sang, Yingpeng, editor, Zhang, Yong, editor, Xiao, Nong, editor, Arabnia, Hamid R., editor, Fox, Geoffrey, editor, Gupta, Ajay, editor, and Malek, Manu, editor
Published: 2022
Full Text: View/download PDF

6. Multi-objective Fuzzy-Swarm Optimizer for Data Partitioning

Author: Goyal, S. B., Bedi, Pradeep, Rajawat, Anand Singh, Shaw, Rabindra Nath, Ghosh, Ankush, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Bianchini, Monica, editor, Piuri, Vincenzo, editor, Das, Sanjoy, editor, and Shaw, Rabindra Nath, editor
Published: 2022
Full Text: View/download PDF

7. Parallel Implementation of Statistical DBSCAN Algorithm for Spark-based Clustering on Google Cloud Platform.

Author: Awaad, Ahmad M. and Hefny, Hesham
Subjects: CLOUD computing, ALGORITHMS, STATISTICS, PARALLEL algorithms
Abstract: We present a new parallel density-based spatial clustering of applications with noise (DBSCAN) algorithm for spark on the google cloud platform (GCP). Statistical analysis is applied to determine DBSCAN's optimal parameters to enhance clustering performance. for scalability cost-based, R-tree partitioning is selected based on the distribution of the dataset into balanced workloads. Parallel DBSCAN consists of three parts: local DBSCAN, partitioning, and merging. Optimizing the partitioning of parallel DBSCAN is important to save time and space compared to serial DBSCAN. This approach can improve the performance and time cost of large datasets. the modified statistical cost-based (SCbs-DBSCAN) is applied to the UCI (university of california irvine) standard datasets, basic benchmark clustering and large different scales of data. For clustering performance and time cost, the experimental results show that the proposed algorithm achieve 10~15% more efficiently, and can run about 1.5x~3x faster than alternative Parallel DBSCAN method on Spark without sacrificing clustering quality. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

8. A hybrid imbalanced classification model based on data density.

Author: Shi, Shengnan, Li, Jie, Zhu, Dan, Yang, Fang, and Xu, Yong
Subjects: *PARALLEL algorithms, *DATA modeling, *DENSITY, *RECEIVER operating characteristic curves, *CLASSIFICATION algorithms, *CLASSIFICATION
Abstract: Imbalanced data are widely available in the real world, and it is difficult to effectively identify the minority class instances in imbalanced data. Various imbalanced classification models have been proposed. However, these models neglect the data density and the location of instances which can be important factors affecting classification performance. To tackle this issue, this paper proposes a hybrid imbalanced classification model based on data density (HICD). In data-level, the density-based resampling method is presented. The data partition Algorithm is given, which divides the data space into five regions based on the data density. The corresponding subsets are generated by sampling from the divided regions to improve the recognition of different classes of instances. In algorithm-level, we construct the corresponding ensemble models for different classes of instances. In addition, the model selection Algorithm is presented. On this basis, an appropriate model is selected for each instance based on its distribution. The performance of the proposed HICD was evaluated on 18 imbalanced datasets in the real world in terms of recall, the area under the roc curve (AUC), and G-mean. The experimental results validate that our method has better performance than other competitive algorithms in imbalanced classification. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. Data Partition and Rate Control for Learning and Energy Efficient Edge Intelligence.

Author: Li, Xiaoyang, Wang, Shuai, Zhu, Guangxu, Zhou, Ziqin, Huang, Kaibin, and Gong, Yi
Abstract: The rapid development of artificial intelligence together with the powerful computation capabilities of the advanced edge servers make it possible to deploy learning tasks at the wireless network edge, which is dubbed as edge intelligence (EI). The communication bottleneck between the data resource and the server results in deteriorated learning performance as well as tremendous energy consumption. To tackle this challenge, we explore a new paradigm called learning-and-energy-efficient (LEE) EI, which simultaneously maximizes the learning accuracies and energy efficiencies of multiple tasks via data partition and rate control. Mathematically, this results in a multi-objective optimization problem. Moreover, the continuously varying communication rates introduce infinite variables, which further complicates the problem. To solve this complex problem, we consider the case with infinite server buffer capacity and one-shot data arrival at sensor. First, the number of variables is reduced to a finite level by exploiting the optimality of constant-rate transmission in each epoch. Second, the optimal solution of the multi-objective problem is found by applying the stratified sequencing or merging of objectives. By assuming higher priority of learning efficiency in stratified sequencing, the optimal data partition is derived in closed form by the Lagrange method, while the optimal rate control is proved to have the structure of directional water filling (DWF), based on which a string-pulling (SP) algorithm is proposed to obtain the numerical values. The DWF structure of rate control is also proved to be optimal in merging of objectives, which combines different objectives in a weighted manner. By exploiting the optimal rate changing properties, the SP algorithm is further extended to tackle the more challenging cases with limited server buffer capacity or bursty data arrival at sensor. The performance of the proposed joint data partition and rate control design is examined by extensive experiments based on public datasets. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

10. A self-adaptive density-based clustering algorithm for varying densities datasets with strong disturbance factor.

Author: Cai, Zihao, Gu, Zhaodong, and He, Kejing
Subjects: *TEXT mining, *ENTROPY (Information theory), *DATA mining, *DATA distribution, *IMAGE analysis
Abstract: Clustering is a fundamental task in data mining, aiming to group similar objects together based on their features or attributes. With the rapid increase in data analysis volume and the growing complexity of high-dimensional data distribution, clustering has become increasingly important in numerous applications, including image analysis, text mining, and anomaly detection. DBSCAN is a powerful tool for clustering analysis and is widely used in density-based clustering algorithms. However, DBSCAN and its variants encounter challenges when confronted with datasets exhibiting clusters of varying densities in intricate high-dimensional spaces affected by significant disturbance factors. A typical example is multi-density clustering connected by a few data points with strong internal correlations, a scenario commonly encountered in the analysis of crowd mobility. To address these challenges, we propose a Self-adaptive Density-Based Clustering Algorithm for Varying Densities Datasets with Strong Disturbance Factor (SADBSCAN). This algorithm comprises a data block splitter, a local clustering module, a global clustering module, and a data block merger to obtain adaptive clustering results. We conduct extensive experiments on both artificial and real-world datasets to evaluate the effectiveness of SADBSCAN. The experimental results indicate that SADBSCAN significantly outperforms several strong baselines across different metrics, demonstrating the high adaptability and scalability of our algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Cost-Based Lightweight Storage Automatic Decision for In-Database Machine Learning

Author: Cui, Shuangshuang, Wang, Hongzhi, Gu, Haiyao, Xie, Yuntian, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Zhang, Wenjie, editor, Zou, Lei, editor, Maamar, Zakaria, editor, and Chen, Lu, editor
Published: 2021
Full Text: View/download PDF

12. Research on Optimization of Data Balancing Partition Algorithm Based on Spark Platform

Author: Wang, Suzhen, Jia, Zhiting, Wang, Wenli, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Sun, Xingming, editor, Zhang, Xiaorui, editor, and Xia, Zhihua, editor
Published: 2021
Full Text: View/download PDF

13. Feasibility Study on the Influence of Data Partition Strategies on Ensemble Deep Learning: The Case of Forecasting Power Generation in South Korea.

Author: Chuluunsaikhan, Tserenpurev, Kim, Jeong-Hun, Shin, Yoonsung, Choi, Sanghyun, and Nasridinov, Aziz
Subjects: *DEEP learning, *SOLAR panels, *SOLAR energy, *FEASIBILITY studies, *FORECASTING
Abstract: Ensemble deep learning methods have demonstrated significant improvements in forecasting the solar panel power generation using historical time-series data. Although many studies have used ensemble deep learning methods with various data partitioning strategies, most have only focused on improving the predictive methods by associating several different models or combining hyperparameters and interactions. In this study, we contend that we can enhance the precision of power generation forecasting by identifying a suitable data partition strategy and establishing the ideal number of partitions and subset sizes. Thus, we propose a feasibility study of the influence of data partition strategies on ensemble deep learning. We selected five time-series data partitioning strategies—window, shuffle, pyramid, vertical, and seasonal—that allow us to identify different characteristics and features in the time-series data. We conducted various experiments on two sources of solar panel datasets collected in Seoul and Gyeongju, South Korea. Additionally, LSTM-based bagging ensemble models were applied to combine the advantages of several single LSTM models. The experimental results reveal that the data partition strategies positively influence the forecasting of power generation. Specifically, the results demonstrate that ensemble models with data partition strategies outperform single LSTM models by approximately 4–11% in terms of the coefficient of determination (R2) score. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

14. A Distributed Data Storage Strategy Based on LOPs.

Author: Wang, Qianqiu, Ye, Xiaoping, Luo, Xianlu, Li, Lunjie, and Chen, Hainan
Subjects: *LINEAR orderings, *TIMESTAMPS, *DATA management, *DISTRIBUTED computing, *DATA warehousing, *SCALABILITY
Abstract: Distributed data management requires data partitioning and deployment at the data storage level, and data querying requires the configuration and integration of query subresults at each site. The data partitioning strategy is closely related to the overhead of the distributed system. It is necessary to determine the appropriate data partitioning strategy and update strategy according to the application. This paper proposes a widely distributed storage and processing scheme for a distributed linear order partition (DLOP) based on time stamps. This scheme proposes two kinds of partition strategy based on the characteristics of an "equivalent division" of a linear order partition (LOP), namely, partitioning based on time interval equilibrium and partitioning based on query expectation. Each site in the distributed system is uniformly configured with an index-based data query mechanism to complete the distributed management of data. The corresponding experiments verify the practicability and efficiency of the proposed storage strategy and show that the proposed method is effective for the self-scalability of the data scale and reduces the cluster hardware configuration requirements. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

15. Predictive modeling of Enterococcus sp. removal with limited data from different advanced oxidation processes: a machine learning approach

Author: Universitat Politècnica de Catalunya. Departament d'Enginyeria Civil i Ambiental, Pascacio de los Santos, Pavel, Vicente González, David Jesús, Salazar González, Fernando, Guerra Rodríguez, Sonia, Rodríguez Chueca, Jorge, Universitat Politècnica de Catalunya. Departament d'Enginyeria Civil i Ambiental, Pascacio de los Santos, Pavel, Vicente González, David Jesús, Salazar González, Fernando, Guerra Rodríguez, Sonia, and Rodríguez Chueca, Jorge
Abstract: The removal of contaminants through Advanced Oxidation Processes (AOPs) is a complex task that demands the simultaneous consideration of multiple operating parameters, such as type and concentration of oxidant and catalyst, type and intensity of radiation, composition of aqueous matrix, etc. Designing efficient AOPs often requires expensive and time-consuming laboratory experiments. To improve this process, this study proposes a Machine Learning approach based on a Random Forest (RF) model, to predict Enterococcus sp. concentration in wastewater treated with various AOPs, even when dealing with limited data. To assess our approach under diverse conditions, a data partitioning methodology is used to categorize the different AOPs into three distinct study cases of increasing complexity, from Case I to Case III. The evaluation of the RF model’s performance, combined with the data partitioning methodology, demonstrated its usefulness in predicting missing or additional disinfection values at any instant during the AOPs. Specifically, in Case I, the model excels at generalizing predictions across various AOP treatments, followed by Case II and III, which achieve Root Mean Squared Error (RMSE) values below or comparable to the average RMSE of Case I (0.72) in 8 out of 15 and 2 out of 4 treatments, respectively. Moreover, the effects of imbalanced data on model performance are discussed. This highlights the potential of our approach to assess AOPs performance and facilitate the design of new experiments of the same treatment type without the need for additional laboratory trials, even in challenging conditions., The publication is part of Projects TED2021-129969A-C32 and TED2021-129969B-C33 funded by MCIN/AEI/ 10.13039/501100011033 and by “European Union NextGenerationEU/PRTR”. Sonia Guerra-Rodríguez acknowledges the Universidad Politécnica de Madrid (UPM) for the financial support provided through the predoctoral contract granted within the “Programa Propio”. Jorge Rodríguez-Chueca acknowledges Comunidad de Madrid by the pluriannual agreement with the Polytechnic University of Madrid in the line of action Programme of Excellence for University Teaching Staff (M190020074BJJRC). This work was also funded by the Spanish Ministry of Economy and Competitiveness through the “Severo Ochoa Programme for Centres of Excellence in R&D” (CEX2018-000797-S) and the Generalitat de Catalonia through the CERCA Program., Peer Reviewed, Postprint (published version)
Published: 2024

16. A Novel Method of Data Partitioning Using Genetic Algorithm Work Load Driven Approach Utilizing Machine Learning

Author: Kaur, Kiranjit, Laxmi, Vijay, Tsihrintzis, George A., Series Editor, Virvou, Maria, Series Editor, Jain, Lakhmi C., Series Editor, Mallick, Pradeep Kumar, editor, Pattnaik, Prasant Kumar, editor, Panda, Amiya Ranjan, editor, and Balas, Valentina Emilia, editor
Published: 2020
Full Text: View/download PDF

17. Comparative Review of the Intrusion Detection Systems Based on Federated Learning: Advantages and Open Challenges.

Author: Fedorchenko, Elena, Novikova, Evgenia, and Shulepov, Anton
Subjects: *INTRUSION detection systems (Computer security), *ANOMALY detection (Computer security), *ARTIFICIAL intelligence
Abstract: In order to provide an accurate and timely response to different types of the attacks, intrusion and anomaly detection systems collect and analyze a lot of data that may include personal and other sensitive data. These systems could be considered a source of privacy-aware risks. Application of the federated learning paradigm for training attack and anomaly detection models may significantly decrease such risks as the data generated locally are not transferred to any party, and training is performed mainly locally on data sources. Another benefit of the usage of federated learning for intrusion detection is its ability to support collaboration between entities that could not share their dataset for confidential or other reasons. While this approach is able to overcome the aforementioned challenges it is rather new and not well-researched. The challenges and research questions appear while using it to implement analytical systems. In this paper, the authors review existing solutions for intrusion and anomaly detection based on the federated learning, and study their advantages as well as open challenges still facing them. The paper analyzes the architecture of the proposed intrusion detection systems and the approaches used to model data partition across the clients. The paper ends with discussion and formulation of the open challenges. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

18. Predictive modeling of Enterococcus sp. removal with limited data from different advanced oxidation processes: A machine learning approach.

Author: Pascacio, Pavel, Vicente, David J., Salazar, Fernando, Guerra-Rodríguez, Sonia, and Rodríguez-Chueca, Jorge
Subjects: MACHINE learning, PREDICTION models, STANDARD deviations, ENTEROCOCCUS, RANDOM forest algorithms, WATER disinfection
Abstract: The removal of contaminants through Advanced Oxidation Processes (AOPs) is a complex task that demands the simultaneous consideration of multiple operating parameters, such as type and concentration of oxidant and catalyst, type and intensity of radiation, composition of aqueous matrix, etc. Designing efficient AOPs often requires expensive and time-consuming laboratory experiments. To improve this process, this study proposes a Machine Learning approach based on a Random Forest (RF) model, to predict Enterococcus sp. concentration in wastewater treated with various AOPs, even when dealing with limited data. To assess our approach under diverse conditions, a data partitioning methodology is used to categorize the different AOPs into three distinct study cases of increasing complexity, from Case I to Case III. The evaluation of the RF model's performance, combined with the data partitioning methodology, demonstrated its usefulness in predicting missing or additional disinfection values at any instant during the AOPs. Specifically, in Case I, the model excels at generalizing predictions across various AOP treatments , followed by Case II and III, which achieve Root Mean Squared Error (RMSE) values below or comparable to the average RMSE of Case I (0.72) in 8 out of 15 and 2 out of 4 treatments , respectively. Moreover, the effects of imbalanced data on model performance are discussed. This highlights the potential of our approach to assess AOPs performance and facilitate the design of new experiments of the same treatment type without the need for additional laboratory trials, even in challenging conditions. • Random Forest predicts Enterococcus sp. disinfection in Advanced Oxidation Processes. • Improve of Advanced Oxidation Processes design using Machine Learning model. • Effect of data sample size and variability of parameters in Random Forest performance. • Challenges in the design of Advanced Oxidation Processes using Random Forest models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Spark 下基于 PCA 和分层选择的随机森林算法.

Author: 雷晨 and 毛伊敏
Subjects: MATRIX decomposition, COVARIANCE matrices, PROBLEM solving, DECISION trees, BIG data, PARALLEL processing, RANDOM forest algorithms, ANT algorithms
Abstract: Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2022
Full Text: View/download PDF

20. A Parallel Cellular Automaton Model For Adenocarcinomas in Situ with Java: Study of One Case

Author: Tomeu-Hardasmal, Antonio J., Salguero-Hidalgo, Alberto G., Capel, Manuel I., Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Mencagli, Gabriele, editor, B. Heras, Dora, editor, Cardellini, Valeria, editor, Casalicchio, Emiliano, editor, Jeannot, Emmanuel, editor, Wolf, Felix, editor, Salis, Antonio, editor, Schifanella, Claudio, editor, Manumachu, Ravi Reddy, editor, Ricci, Laura, editor, Beccuti, Marco, editor, Antonelli, Laura, editor, Garcia Sanchez, José Daniel, editor, and Scott, Stephen L., editor
Published: 2019
Full Text: View/download PDF

21. Low-cost data partitioning and encrypted backup scheme for defending against co-resident attacks

Author: Junfeng Tian, Zilong Wang, and Zhen Li
Subjects: Cloud computing, Co-resident attack, Data partition, Encrypted backup, Data theft, Data corruption, Computer engineering. Computer hardware, TK7885-7895, Electronic computers. Computer science, QA75.5-76.95
Abstract: Abstract Aiming at preventing user data leakage and the damage that is caused by co-resident attacks in the cloud environment, a data partitioning and encryption backup (P&XE) scheme is proposed. After the data have been divided into blocks, the data are backed up using the XOR operation between the data. Then, the backup data are encrypted using a random string. Compared with the existing scheme, the proposed scheme resolves the conflict between data security and survivability via encrypted backup. At the same time, because the XOR-encrypted backup causes multiple data blocks to share the same backup data, the storage overhead of the user is reduced. In this paper, existing probabilistic models are used to compare the performances of an existing scheme and the P&XE scheme in terms of data security, data survivability and user storage overhead, and the overall performances of the two schemes in terms of these three aspects that are compared using control variables. Finally, the experimental results demonstrate the effectiveness of the P&XE scheme at improving user data security and survivability and reducing user storage overhead.
Published: 2020
Full Text: View/download PDF

22. Boosting Big Data Streaming Applications in Clouds With BurstFlow

Author: Paulo Ricardo Rodrigues De Souza, Kassiano J. Matteussi, Alexandre Da Silva Veith, Breno F. Zanchetta, Valderi R. Q. Leithardt, Alvaro L. Murciego, Edison Pignaton De Freitas, Julio C. S. Dos Anjos, and Claudio F. R. Geyer
Subjects: Big data, stream processing applications, multi cloud, micro-batches, data partition, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The rapid growth of stream applications in financial markets, health care, education, social media, and sensor networks represents a remarkable milestone for data processing and analytic in recent years, leading to new challenges to handle Big Data in real-time. Traditionally, a single cloud infrastructure often holds the deployment of Stream Processing applications because it has extensive and adaptative virtual computing resources. Hence, data sources send data from distant and different locations of the cloud infrastructure, increasing the application latency. The cloud infrastructure may be geographically distributed and it requires to run a set of frameworks to handle communication. These frameworks often comprise a Message Queue System and a Stream Processing Framework. The frameworks explore Multi-Cloud deploying each service in a different cloud and communication via high latency network links. This creates challenges to meet real-time application requirements because the data streams have different and unpredictable latencies forcing cloud providers' communication systems to adjust to the environment changes continually. Previous works explore static micro-batch demonstrating its potential to overcome communication issues. This paper introduces BurstFlow, a tool for enhancing communication across data sources located at the edges of the Internet and Big Data Stream Processing applications located in cloud infrastructures. BurstFlow introduces a strategy for adjusting the micro-batch sizes dynamically according to the time required for communication and computation. BurstFlow also presents an adaptive data partition policy for distributing incoming streams across available machines by considering memory and CPU capacities. The experiments use a real-world multi-cloud deployment showing that BurstFlow can reduce the execution time up to 77% when compared to the state-of-the-art solutions, improving CPU efficiency by up to 49%.
Published: 2020
Full Text: View/download PDF

23. Understanding Data Partition for Applications on CPU-GPU Integrated Processors

Author: Fang, Juan, Chen, Huanhuan, Mao, Junjie, Barbosa, Simone Diniz Junqueira, Series Editor, Chen, Phoebe, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Zhu, Liehuang, editor, and Zhong, Sheng, editor
Published: 2018
Full Text: View/download PDF

24. Feasibility Study on the Influence of Data Partition Strategies on Ensemble Deep Learning: The Case of Forecasting Power Generation in South Korea

Author: Tserenpurev Chuluunsaikhan, Jeong-Hun Kim, Yoonsung Shin, Sanghyun Choi, and Aziz Nasridinov
Subjects: solar panels, power generation, solar panels with weather, long short-term memory, data partition, Technology
Abstract: Ensemble deep learning methods have demonstrated significant improvements in forecasting the solar panel power generation using historical time-series data. Although many studies have used ensemble deep learning methods with various data partitioning strategies, most have only focused on improving the predictive methods by associating several different models or combining hyperparameters and interactions. In this study, we contend that we can enhance the precision of power generation forecasting by identifying a suitable data partition strategy and establishing the ideal number of partitions and subset sizes. Thus, we propose a feasibility study of the influence of data partition strategies on ensemble deep learning. We selected five time-series data partitioning strategies—window, shuffle, pyramid, vertical, and seasonal—that allow us to identify different characteristics and features in the time-series data. We conducted various experiments on two sources of solar panel datasets collected in Seoul and Gyeongju, South Korea. Additionally, LSTM-based bagging ensemble models were applied to combine the advantages of several single LSTM models. The experimental results reveal that the data partition strategies positively influence the forecasting of power generation. Specifically, the results demonstrate that ensemble models with data partition strategies outperform single LSTM models by approximately 4–11% in terms of the coefficient of determination (R2) score.
Published: 2022
Full Text: View/download PDF

25. Comparative Review of the Intrusion Detection Systems Based on Federated Learning: Advantages and Open Challenges

Author: Elena Fedorchenko, Evgenia Novikova, and Anton Shulepov
Subjects: artificial intelligence, data partition, federated learning, Internet of Things, intrusion detection, machine learning, Industrial engineering. Management engineering, T55.4-60.8, Electronic computers. Computer science, QA75.5-76.95
Abstract: In order to provide an accurate and timely response to different types of the attacks, intrusion and anomaly detection systems collect and analyze a lot of data that may include personal and other sensitive data. These systems could be considered a source of privacy-aware risks. Application of the federated learning paradigm for training attack and anomaly detection models may significantly decrease such risks as the data generated locally are not transferred to any party, and training is performed mainly locally on data sources. Another benefit of the usage of federated learning for intrusion detection is its ability to support collaboration between entities that could not share their dataset for confidential or other reasons. While this approach is able to overcome the aforementioned challenges it is rather new and not well-researched. The challenges and research questions appear while using it to implement analytical systems. In this paper, the authors review existing solutions for intrusion and anomaly detection based on the federated learning, and study their advantages as well as open challenges still facing them. The paper analyzes the architecture of the proposed intrusion detection systems and the approaches used to model data partition across the clients. The paper ends with discussion and formulation of the open challenges.
Published: 2022
Full Text: View/download PDF

26. Cloud Data Security Based on Data Partitions and Multiple Encryptions

Author: Muthulakshmi, B., Venkatesulu, M., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Arumugam, S., editor, Bagga, Jay, editor, Beineke, Lowell W., editor, and Panda, B.S., editor
Published: 2017
Full Text: View/download PDF

27. AQUA+: Query Optimization for Hybrid Database-MapReduce System.

Author: Pang, Zhifei, Wu, Sai, Huang, Haichao, Hong, Zhouzhenyan, and Xie, Yuqing
Subjects: HYBRID systems, DISTRIBUTED databases, DATA warehousing, ELECTRONIC data processing, DATA analysis
Abstract: MapReduce has been widely recognized as an efficient tool for large-scale data analysis. It achieves high performance by exploiting parallelism among processing nodes while providing a simple interface for upper-layer applications. However, there are many existing applications maintaining their data in a distributed database. It is costly to export those data into the storage system of MapReduce (normally a distributed file system). Moreover, compared to MapReduce, database is equipped with many state-of-the-art techniques, such as index and optimizer. Therefore, a hybrid Database-MapReduce system inheriting the advantages of both systems is preferred. In this paper, we propose AQUA+, a query optimizer tailored for the hybrid system. AQUA+ is an extension work of our previous system AQUA. It generates a plan that adaptively assigns the operators to the database engine and MapReduce engine to optimize the performance. The intuition is to exploit the index, co-partition and other features provided by the database as much as possible and reduce the data volume processed by the MapReduce. Due to the complexity of query optimization, in AQUA+, we introduce a novel tuning technique, learning to optimize. In particular, two neural networks are trained to predict cost and refine query plan, respectively. We train them based on our log of real query processing. Experiments carried out on our in-house cluster confirm the effectiveness of our query optimizer. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

28. Multi-Layered Multimodal Biometric Authentication for Smartphone Devices.

Author: Memon, Qurban A.
Subjects: BIOMETRIC identification, MULTIMODAL user interfaces, DATA security
Abstract: As technological advances in smartphone domain increase, so are the issues that pertain to security and privacy. In current literature, multimodal biometric approach is addressed at length (for banks as an example) to improve secured access into personal devices. However, personal devices currently do not support enforcing multilayered access to its different domains/regions of data. In this paper, a multilayered multimodal biometric approach using three biometric methods (such as fingerprint, face and voice) is proposed for smartphones. It is shown that fusion of biometric methods can be layered to enforce private data security on smartphone. The experimental results are presented. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

29. Single-Objective/Multiobjective Cat Swarm Optimization Clustering Analysis for Data Partition.

Author: Yan, Dapeng, Cao, Hui, Yu, Yajie, Wang, Yanxia, and Yu, Xiang
Abstract: This article proposes single-objective/multiobjective cat swarm optimization clustering algorithms for data partition. The proposed methods use the cat swarm to search the optimal. The position of the cat tightly associates with the clustering centers and is updated by two submodes: the seeking mode and the tracing mode. The seeking mode uses the simulated annealing strategy to update the cat position at a probability. Inspired by the quantum theories, the tracing mode adopts the quantum model to update the cat position in the whole solution space. First, the single-objective method is proposed and adopts the cohesion of clustering as the objective function, in which the kernel method is applied. For considering more objective functions to reveal diverse aspects of data, the multiobjective method is proposed and adopts both the cohesion and the connectivity as the objective functions. The Pareto optimization method is applied to balance the objectives. In the experiments, three kinds of data sets are used to examine the effectiveness of the proposed methods, which are three synthetic data sets, four data sets from the UCI Machine Learning Repository, and a field data set. Experimental results verified that the proposed methods perform better than the traditional clustering algorithms, and the proposed multiobjective method has the highest accuracy. Note to Practitioners—This article presents single-objective/multiobjective cat swarm optimization clustering analysis methods for data partition. Through automatically extracting meaningful or useful classes, clustering analysis could help the practitioners or the intelligent devices find the specific meanings of data, natural data structure, the data relationships, or other characteristics. The proposed methods use the cat swarm to search the optimal clustering result. One or more criterion functions could be selected as the optimization objectives. The time complexity of the multiobjective type is higher than that of the single-objective type. Therefore, in the industrial field, engineers should choose the number of the optimization objectives based on the actual requirements. The proposed methods could be widely used into industrial applications to deal with complex data sets. Future research could consider some more progressive optimization schemes to improve the effectiveness. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

30. Low-cost data partitioning and encrypted backup scheme for defending against co-resident attacks.

Author: Tian, Junfeng, Wang, Zilong, and Li, Zhen
Subjects: DATA security, DATA encryption, DATA corruption, CLOUD storage, PROBABILISTIC databases, IMAGE encryption, DATA
Abstract: Aiming at preventing user data leakage and the damage that is caused by co-resident attacks in the cloud environment, a data partitioning and encryption backup (P&XE) scheme is proposed. After the data have been divided into blocks, the data are backed up using the XOR operation between the data. Then, the backup data are encrypted using a random string. Compared with the existing scheme, the proposed scheme resolves the conflict between data security and survivability via encrypted backup. At the same time, because the XOR-encrypted backup causes multiple data blocks to share the same backup data, the storage overhead of the user is reduced. In this paper, existing probabilistic models are used to compare the performances of an existing scheme and the P&XE scheme in terms of data security, data survivability and user storage overhead, and the overall performances of the two schemes in terms of these three aspects that are compared using control variables. Finally, the experimental results demonstrate the effectiveness of the P&XE scheme at improving user data security and survivability and reducing user storage overhead. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

31. Exploration advantages of data combination and partition: First chemometric analysis of liquid chromatography–mass spectrometry data in full scan mode with quadruple fragmentor voltages.

Author: Sun, Xiao-Dong, Wu, Hai-Long, Chen, Jun-Chen, Chen, An-Qi, Chen, Yue, Ouyang, Yang-Zi, Ding, Yu-Jie, and Yu, Ru-Qin
Subjects: *CHEMOMETRICS, *LIQUID analysis, *LIQUID chromatography-mass spectrometry, *MASS spectrometry, *MATRIX effect, *MASS spectrometers, *ELECTRIC potential
Abstract: A novel soft strategy for combination and partition of mass spectra data recorded at different fragmentor voltages in full scan mode of a mass spectrometer was developed to generate abundant multi-way data. It is the first time that non-linear four-way and combined three-way LC-MS data have been obtained simultaneously in a single chromatographic run. This strategy ensures that each analyte can be ionized and detected at the most appropriate MS conditions (ionization modes, fragmentor voltages) and avoids a hard chromatographic segmentation in subsequent chemometric analysis. Two different experimental datasets were analyzed to validate the feasibility and applicability of this strategy. Some simple pretreatments were carried out before LC-MS analysis to prevent potential matrix effects. Proper chemometric tools were used to resolve three-way (partitioned data) and enhanced three-way LC-MS (combined data) data, respectively. The method was assessed by comparing the analytical results obtained from the same chemometric algorithm with both combined and partitioned datasets: (1) the combined data provided the best global overall resolution, higher sensitivity and more reliable results, (2) the partitioned data provided higher selectivity for some specific analytes. The results showed that the developed method could be a soft and ingenious tool to handle the unordered but information-rich raw LC-MS data. Moreover, the proposed strategy could take extra analytical advantages in terms of higher sensitivity and more reliable quantitative results when compared with LC-MS (with single fragmentor voltage) strategy and showed nearly the same capability in analytical quality as classic LC-MS/MS method. Image 1 • A novel data combination and partition strategy for LC-MS in full scan mode was presented. • The strategy efficiently avoids a hard chromatographic segmentation in traditional chemometric LC-MS analysis. • The use of data combination allowed increasing analytical sensitivity and improving significantly the prediction performances. • The quantitative results from the proposed strategy were compared with the LC-MS/MS method. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

32. GLDA: Parallel Gibbs Sampling for Latent Dirichlet Allocation on GPU

Author: Xue, Pei, Li, Tao, Zhao, Kezhao, Dong, Qiankun, Ma, Wenjing, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Wu, Junjie, editor, and Li, Lian, editor
Published: 2016
Full Text: View/download PDF

33. pFUTURES: A Parallel Framework for Cellular Automaton Based Urban Growth Models

Author: Shashidharan, Ashwin, van Berkel, Derek B., Vatsavai, Ranga Raju, Meentemeyer, Ross K., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Miller, Jennifer A., editor, O'Sullivan, David, editor, and Wiegand, Nancy, editor
Published: 2016
Full Text: View/download PDF

34. Applying Kolmogorov Complexity for High Load Balancing Between Distributed Computing System Nodes.

Author: STEPANOVA, Maria
Subjects: KOLMOGOROV complexity, DISTRIBUTED computing, HETEROGENEOUS distributed computing, COMPUTER systems, EDUCATIONAL innovations, HIGH performance computing
Abstract: Nowadays there is a huge growth of massive amount of data generated by different sources at real time or soft real time. Generally data is heterogeneous by its content and exists at every human sphere such as education, government, finance, medicine and so on. This paper is about the possibility to use the Kolmogorov complexity as a technological innovation in such area as education. The article describes fundamental issues such as data storage, data security and high speed access to data as one of the main parameters for successful using of eLearning platforms for education. It is considered comparisons of different eLearning platforms and its functional principles. Most eLearning platforms based on supercomputer and distributed computing systems, however, the growing volumes of data, especially in educational sphere, required other approaches of data storing and access. Nevertheless, educational platforms should be scalable and cross accessible. To increase data processing and storing efficiency it is considered data and operation partition idea among heterogeneous computational nodes which is the main infrastructure for eLearning platforms and methods. Kolmogorov complexity approach is investigated for such purposes. Furthermore, Kolmogorov complexity approach is examined for reduction a number of transmitting data between nodes. The possibility of using Kolmogorov complexity for optimal way of data dividing and processing in distributed and heterogeneous computing system and nodes without quantity nodes limitations is considered. The advantages and disadvantages of Kolmogorov complexity were considered applying to eLearning study aims. Experiments and results are given. The further plan investigation is described. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

35. Query Execution Optimization Based on Incremental Update in Database Distributed Middleware

Author: Ye, Wei, Wang, Mei, Le, Jiajin, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Wang, Guojun, editor, Zomaya, Albert, editor, Martinez, Gregorio, editor, and Li, Kenli, editor
Published: 2015
Full Text: View/download PDF

36. DistDL: A Distributed Deep Learning Service Schema with GPU Accelerating

Author: Wang, Jianzong, Cheng, Lianglun, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Cheng, Reynold, editor, Cui, Bin, editor, Zhang, Zhenjie, editor, Cai, Ruichu, editor, and Xu, Jia, editor
Published: 2015
Full Text: View/download PDF

37. Distributed Cache Strategies for Machine Learning Classification Tasks over Cluster Computing Resources

Author: Ovalle, John Edilson Arévalo, Ramos-Pollan, Raúl, González, Fabio A., Junqueira Barbosa, Simone Diniz, Series editor, Chen, Phoebe, Series editor, Cuzzocrea, Alfredo, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Sivalingam, Krishna M., Series editor, Ślęzak, Dominik, Series editor, Washio, Takashi, Series editor, Yang, Xiaokang, Series editor, Hernández, Gonzalo, editor, Barrios Hernández, Carlos Jaime, editor, Díaz, Gilberto, editor, García Garino, Carlos, editor, Nesmachnow, Sergio, editor, Pérez-Acle, Tomás, editor, Storti, Mario, editor, and Vázquez, Mariano, editor
Published: 2014
Full Text: View/download PDF

38. Cost-Based Optimization of Logical Partitions for a Query Workload in a Hadoop Data Warehouse

Author: Peng, Shu, Gu, Jun, Wang, X. Sean, Rao, Weixiong, Yang, Min, Cao, Yu, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Chen, Lei, editor, Jia, Yan, editor, Sellis, Timos, editor, and Liu, Guanfeng, editor
Published: 2014
Full Text: View/download PDF

39. XML Multi-core Query Optimization Based on Task Preemption and Data Partition

Author: Tian, Pingfang, Luo, Dan, Li, Yaoyao, Gu, Jinguang, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Kim, Wooju, editor, Ding, Ying, editor, and Kim, Hong-Gee, editor
Published: 2014
Full Text: View/download PDF

40. An accelerator for support vector machines based on the local geometrical information and data partition.

Author: Song, Yunsheng, Liang, Jiye, and Wang, Feng
Abstract: The support vector machines (SVM) is difficult to deal with large datasets for its low training efficiency. One of the important solutions has been developed by dividing a whole dataset into smaller subsets with data partition and combining the results of the classifiers over the divided subsets. However, traditional data partition approaches are difficult to preserve the class boundary of the dataset or control the size of divided subsets, so that their performance will be greatly influenced. To overcome this difficulty, we propose an accelerator for SVM algorithm based on the local geometrical information. In this algorithm, the feature space is divided into several regions with the approximately equal number of training instances by linear projection, and then each SVM classifier trained over the extended region only predicts the unlabeled instances within that original region. The proposed algorithm can not only hold the decision boundary of the raw data, but also saves a lot of execution time for implementing it in a parallel environment. Furthermore, the number of instances within each divided regions can be effectively controlled; it is conducive to choose the complexity of the execution in each of the processors. Experiments show that the classification performance of the proposed algorithm compares favorably with four state-of-the-art algorithms with the least training time. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

41. 非结构化云数椐管理系统不稳定数据分区识别算法.

Author: 郑美光, 杨姣, 常成龙, and 胡志刚
Abstract: Copyright of Journal of South China University of Technology (Natural Science Edition) is the property of South China University of Technology and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2019
Full Text: View/download PDF

42. Security of Separated Data in Cloud Systems with Competing Attack Detection and Data Theft Processes.

Author: Huang, Hong‐Zhong, Levitin, Gregory, and Xing, Liudong
Subjects: DATA, VIRTUAL machine systems, CLOUD computing, DATA security, CYBERTERRORISM
Abstract: Empowered by virtualization technology, service requests from cloud users can be honored through creating and running virtual machines. Virtual machines established for different users may be allocated to the same physical server, making the cloud vulnerable to co‐residence attacks where a malicious attacker can steal a user's data through co‐residing their virtual machines on the same server. For protecting data against the theft, the data partition technique is applied to divide the user's data into multiple blocks with each being handled by a separate virtual machine. Moreover, early warning agents (EWAs) are deployed to possibly detect and prevent co‐residence attacks at a nascent stage. This article models and analyzes the attack success probability (complement of data security) in cloud systems subject to competing attack detection process (by EWAs) and data theft process (by co‐residence attackers). Based on the suggested probabilistic model, the optimal data partition and protection policy is determined with the objective of minimizing the user's cost subject to providing a desired level of data security. Examples are presented to illustrate effects of different model parameters (attack rate, number of cloud servers, number of data blocks, attack detection time, and data theft time distribution parameters) on the attack success probability and optimization solutions. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

43. Dual buffer rotation four-stage pipeline for CPU-GPU cooperative computing.

Author: Li, Tao, Dong, Qiankun, Wang, Yifeng, Gong, Xiaoli, and Yang, Yulu
Subjects: *GRAPHICS processing units, *BIG data, *DATA transmission systems, *CLOUD computing, *MOTHERBOARDS
Abstract: Accelerators such as GPUs have become popular general-purpose computing device in the field of high-performance computing. With the boosting ability of storage and computation, it is very important to solve the complex scientific and engineering problems on CPU-GPU heterogeneous system in the big data era. Now the compute-intensive problems have been successfully solved using CPU-GPU cooperative computing. However, it is difficult to handle large-scale data-intensive problems, especially for those limited by GPU device memory. In this paper, the dual buffer rotation four-stage pipeline (DBFP) mechanism is proposed for CPU-GPU cooperative computation to efficiently handle data-intensive problems, which need larger memory than that of a single GPU. The data block partition-based pipeline computing strategy is designed on top of the DBFP mechanism. On the one hand, it breaks out the bottleneck of limited GPU device memory. On the other hand, it explores high-performance computing of CPU and GPU with data transfer and computation overlap. Furthermore, it is easy to extend the DBFP mechanism on the heterogeneous system equipped with multiple GPUs and achieve high resource utilization. The results show that it can achieve 99 and 90% of theoretical performance for dense general matrix multiplication on one GPU and two GPUs respectively with Nvidia GTX480 or K40 GPUs. It also enables K-means and T-nearest-neighbor algorithms to process larger datasets, which used to be limited by the GPU device memory. We achieve nearly 1.9-fold performance gains by dynamic task scheduling on two GPUs when the performance bottleneck is GPU computing or data transmission. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

44. Performance Analysis of Software Aging Prediction.

Author: Yongquan Yan
Subjects: COMPUTER software, COMPUTER storage devices, ROUNDING errors, CLASSIFICATION algorithms, DATA mining
Abstract: Software aging is a problem that was discovered two decades ago. Since then, many research studies have investigated how to manage aging problems caused by memory leakage and accumulated round-off error through resource consumption prediction or state forecasting. When applying state prediction, the performances of various aging classification algorithms are compared by the prediction error. Since forecasting error is not a precise measure and must be estimated, the forecast error variance needs to be analyzed. In this work, we carefully analyze the forecast error variance by three steps. In the first step, we propose a method to decompose the variance by considering the influence of the data sampling process and data partition procedure. In the second step, we use an enhanced Friedman test and the Nemenyi post hoc test to analyze the influence of the data sampling process on the data partitioning procedure. In the last step, a corrected t-test is proposed to compare the performance of two off-the-shelf classification algorithms. The software comparison experiment is based on a real-time web environment. We end this work by proposing a set of feasible suggestions. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

45. Adaptive Evidence Accumulation Clustering Using the Confidence of the Objects’ Assignments

Author: Duarte, João M. M., Fred, Ana L. N., Duarte, F. Jorge F., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Washio, Takashi, editor, and Luo, Jun, editor
Published: 2013
Full Text: View/download PDF

46. Analytic Functions

Author: Morton, Karen, Osborne, Kerry, Sands, Robyn, Shamsudeen, Riyaj, Still, Jared, Morton, Karen, Osborne, Kerry, Sands, Robyn, Shamsudeen, Riyaj, and Still, Jared
Published: 2013
Full Text: View/download PDF

47. PTL: Partitioned Logging for Database Storage on Flash Solid State Drives

Author: Yang, Robin Jun, Luo, Qiong, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Bao, Zhifeng, editor, Gao, Yunjun, editor, Gu, Yu, editor, Guo, Longjiang, editor, Li, Yingshu, editor, Lu, Jiaheng, editor, Ren, Zujie, editor, Wang, Chaokun, editor, and Zhang, Xiao, editor
Published: 2012
Full Text: View/download PDF

48. Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing

Author: Nie, Zhi, Du, Fang, Chen, Yueguo, Du, Xiaoyong, Xu, Linhao, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Sheng, Quan Z., editor, Wang, Guoren, editor, Jensen, Christian S., editor, and Xu, Guandong, editor
Published: 2012
Full Text: View/download PDF

49. Average Cluster Consistency for Cluster Ensemble Selection

Author: Duarte, F. Jorge F., Duarte, João M. M., Fred, Ana L. N., Rodrigues, M. Fátima C., Fred, Ana, editor, Dietz, Jan L. G., editor, Liu, Kecheng, editor, and Filipe, Joaquim, editor
Published: 2011
Full Text: View/download PDF

50. On Consensus Clustering Validation

Author: Duarte, João M. M., Fred, Ana L. N., Lourenço, André, Duarte, F. Jorge F., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Hancock, Edwin R., editor, Wilson, Richard C., editor, Windeatt, Terry, editor, Ulusoy, Ilkay, editor, and Escolano, Francisco, editor
Published: 2010
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

170 results on '"data partition"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources