166 results
Search Results
2. Special issue on deep learning and big data analytics for medical e-diagnosis/AI-based e-diagnosis.
- Author
-
Fong, Simon, Fortino, Giancarlo, Ghista, Dhanjoo, and Piccialli, Francesco
- Subjects
DEEP learning ,ARTIFICIAL neural networks ,MACHINE learning ,ARTIFICIAL intelligence ,BIG data ,CONVOLUTIONAL neural networks - Abstract
The model integrates artificial intelligence (AI) and big data analytics, utilizing IoMT devices for data acquisition and Hadoop ecosystem for managing big data. The field of medical diagnosis is currently undergoing a remarkable transformation with the emergence of artificial intelligence (AI) techniques, particularly deep learning and big data analytics. By harnessing the power of deep learning and big data analytics, AI-based e-diagnosis has the potential to revolutionize healthcare delivery. [Extracted from the article]
- Published
- 2023
- Full Text
- View/download PDF
3. Ontology construction and mapping of multi-source heterogeneous data based on hybrid neural network and autoencoder.
- Author
-
Zhao, Wenbin, Fu, Zijian, Fan, Tongrang, and Wang, Jiaqi
- Subjects
CONVOLUTIONAL neural networks ,BIG data ,ONTOLOGY ,DEEP learning ,ONTOLOGIES (Information retrieval) - Abstract
In big data era, multi-source heterogeneous data become the biggest obstacle to data sharing due to its high dimension and inconsistent structure. Using text classification to solve the ontology construction and mapping problem of multi-source heterogeneous data can not only reduce manual operation, but also improve the accuracy and efficiency. This paper proposes an ontology construction and mapping scheme based on hybrid neural network and autoencoder. Firstly, the proposed text classification method uses the multi-core convolutional neural network to capture local features and uses the improved Bidirectional Long Short-Term Memory network to compensate for the shortcomings of the convolutional neural network that cannot obtain context-related information. Secondly, a similarity matching method is used for ontology mapping, which integrate autoencoder to improve anti-interference ability. We have carried out several sets of experiments to test the validity of the proposed ontology construction and mapping scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. Towards design and implementation of Industry 4.0 for food manufacturing.
- Author
-
Konur, Savas, Lan, Yang, Thakker, Dhavalkumar, Morkyani, Geev, Polovina, Nereida, and Sharp, James
- Subjects
FOOD industry ,INDUSTRY 4.0 ,DATA mining ,MANUFACTURING processes ,PRODUCTION control ,CYBER physical systems ,TEXTILE machinery - Abstract
Today's factories are considered as smart ecosystems with humans, machines and devices interacting with each other for efficient manufacturing of products. Industry 4.0 is a suite of enabler technologies for such smart ecosystems that allow transformation of industrial processes. When implemented, Industry 4.0 technologies have a huge impact on efficiency, productivity and profitability of businesses. The adoption and implementation of Industry 4.0, however, require to overcome a number of practical challenges, in most cases, due to the lack of modernisation and automation in place with traditional manufacturers. This paper presents a first of its kind case study for moving a traditional food manufacturer, still using the machinery more than one hundred years old, a common occurrence for small- and medium-sized businesses, to adopt the Industry 4.0 technologies. The paper reports the challenges we have encountered during the transformation process and in the development stage. The paper also presents a smart production control system that we have developed by utilising AI, machine learning, Internet of things, big data analytics, cyber-physical systems and cloud computing technologies. The system provides novel data collection, information extraction and intelligent monitoring services, enabling improved efficiency and consistency as well as reduced operational cost. The platform has been developed in real-world settings offered by an Innovate UK-funded project and has been integrated into the company's existing production facilities. In this way, the company has not been required to replace old machinery outright, but rather adapted the existing machinery to an entirely new way of operating. The proposed approach and the lessons outlined can benefit similar food manufacturing industries and other SME industries. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Three stage fusion for effective time series forecasting using Bi-LSTM-ARIMA and improved DE-ABC algorithm.
- Author
-
Kumar, Raghavendra, Kumar, Pardeep, and Kumar, Yugal
- Subjects
TIME series analysis ,DOW Jones industrial average ,STOCK exchanges ,MARKET sentiment ,MOVING average process ,BIG data ,FORECASTING - Abstract
Fusion is a state-of-the-art technique to observe the behavioral pattern from time series data. Fusion models efficiently and effectively interpret both linear and nonlinear patterns that are the constraints of an individual model due to feature limitations. In this paper, a three-stage fusion model is proposed to handle time series data and improve stock market forecasting accuracy. In the first phase of fusion, stock market inputs that are constituted with historical data and market sentiments of the targeted stock are pooled along with established technical indicators of the stock market. Market sentiments are examined through sentiment polarity index using big data platform Hadoop. In the second phase, Auto Regressive Integrated Moving Average (ARIMA) and Long Short-Term Memory (LSTM) are combined to observe linear and nonlinear features of the final stock dataset. In the third phase, an improved Artificial Bee Colony (ABC) algorithm using differential evolution (DE) is examined for the hyperparameter selection of the proposed DE-ABC-Bi-LSTM-ARIMA model for the stock market prediction. In this paper, experiments are performed on established and diversified reported historical datasets Dow Jones Industrial Average index, Nikkei 225 (N225) index, S&P 500 index and NASDAQ GS index. The proposed fusion model DE-ABC-Bi-LSTM-ARIMA outperformed the benchmark models used in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. The correlation between green finance and carbon emissions based on improved neural network.
- Author
-
Sun, Chenghao
- Subjects
CARBON emissions ,CARBON offsetting ,SUSTAINABLE development ,MACHINE learning ,BIG data - Abstract
The development of green finance and the quantitative evaluation of its impact on the ecological environment provide empirical evidence for the construction of the carbon trading accounting system. Among them, carbon trading is an important part of green finance, and the accounting of businesses related to carbon emission rights has promoted the development of regional green finance. In order to explore the relationship between green finance and carbon emissions, this paper builds an analysis model of the relationship between green finance and carbon emissions based on big data and machine learning based on big data technology and machine learning technology. Moreover, this paper conducts simulation tests through the system and compares the output results with the actual situation after system simulation to verify the effectiveness of the model in this paper. From the experimental research results, it can be seen that the correlation analysis model of green finance and carbon emissions based on big data and machine learning constructed in this paper has a good performance in the correlation analysis of green finance and carbon emissions. Moreover, it is not difficult to see through the model of this paper that there is a clear correlation between green finance and carbon emissions. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Analysis of children's sub-health treatment effect based on multi-scale feature fusion network from the perspective of medical informatization.
- Author
-
Ma, Lingli, Hou, Jianghong, and Gui, Lingqin
- Subjects
- *
TREATMENT effectiveness , *EVALUATION methodology , *MEDICAL innovations , *BIG data , *MEDICAL care - Abstract
Sub-health state is a state of health and low quality between disease and health. The theoretical basis of children's sub-health is to start from the whole. The common clinical sub-health conditions cannot be explained by modern detection methods, and it can be screened and analyzed with the help of big data in medical informatization. The combination of "Internet + " and the health care model is an innovation in the construction of medical informatization. It can provide many considerate services to the masses in time and alleviate the anxiety of illness. Therefore, it is very necessary to carry out the efficacy evaluation of children's sub-health from the perspective of medical information. Therefore, this paper completes the following work with the help of AI neural network: (1) This paper proposes an improved AlexNet network evaluation method based on attention mechanism. In this study, attention mechanism is added to the original AlexNet model to weight each channel of the feature layer. At the same time, we improve the large convolution kernel of the previous layers of the original AlexNet network and use batch normalization instead of the local response normalization (LRN) layer in the original model. (2) This paper proposes an evaluation method based on improved residual network, which improves the original residual block of the residual model and widens the residual block. The residual block can effectively reduce the amount of network parameters and improve the efficiency of network training. (3) This paper proposes an evaluation method of multi-scale feature fusion (MSFF). The features extracted from the improved AlexNet and residual network are fused and then evaluated. At this time, the training time is greatly shortened, and the accuracy is higher than that of single model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Big data medical behavior analysis based on machine learning and wireless sensors.
- Author
-
Cui, Moyang
- Subjects
BEHAVIORAL assessment ,BIG data ,ASSOCIATION rule mining ,STRUCTURAL health monitoring ,DATA mining ,MACHINE learning - Abstract
To improve the scientificity and reliability of medical behavior analysis, this paper combines machine learning and wireless sensor technology to construct an intelligent data mining system that can be used for medical behavior analysis and uses association rules to analyze and mine the implicit relationships between structural monitoring parameters. Moreover, this paper establishes strong association rules between different monitoring variables based on historical monitoring data under normal structural conditions to predict whether the structural conditions are normal. In addition, this paper constructs a system function module according to actual needs, obtains the overall system architecture, and implements the system function module in combination with algorithms. Finally, this paper designs experiments to verify the performance of the system constructed in this paper and discusses the experimental results through mathematical graph analysis methods. From the research point of view, it can be observed that the system constructed in this paper has a specific effect. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Research on diversity and accuracy of the recommendation system based on multi-objective optimization.
- Author
-
Ma, Tie-min, Wang, Xue, Zhou, Fu-cai, and Wang, Shuang
- Subjects
PROBABILITY density function ,RECOMMENDER systems ,BIG data ,MATHEMATICAL optimization - Abstract
As the information industry and the Internet develop rapidly, the use of big data enters people's vision and attracts attention. It makes the recommendation system come into being how to quickly extract the desired information from the excessive information. In the recommendation system, user-based collaborative filtering algorithm has become a research hotspot. Existing researches focus on improving collaborative filtering recommendation algorithm by using the kernel method, but still face the cold start problem, the diversity problem, the data sparsity problem, the concept drift problem and more others. To solve these problems, this paper proposes the user-based collaborative filtering based on kernel method and multi-objective optimization (MO-KUCF) which introduces kernel density estimation and multi-objective optimization. It can be increasing diversity of the recommendation systems, improving concept drift in dynamic data and the accuracy and diversity of the recommendation system. The dataset used in this article is the Netflix dataset. It analyzes the MO-KUCF algorithm with the user-based collaborative filtering (UCF) and user-based collaborative filtering based on kernel method (KUCF) by the mean absolute error (MAE). The MAE is compared with the internal user diversity I u index, and the pre-processed data set is divided into the training set and the test set, which are provided to the recommendation system for recommendation and evaluation. The results show that the accuracy of MO-KUCF improves by 5.6%, and the diversity also increases with decreasing values. Combining multi-objective optimization techniques with kernel density estimation methods can improve the diversity of recommendation systems effectively and solve the concept drift problem to achieve the purpose of improving system accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. Utilization of big data classification models in digitally enhanced optical coherence tomography for medical diagnostics.
- Author
-
Bansal, Priti, Harjai, Nipun, Saif, Mohammad, Mugloo, Saahil Hussain, and Kaur, Preeti
- Subjects
CONVOLUTIONAL neural networks ,ARTIFICIAL intelligence ,RETINAL imaging ,OPTICAL coherence tomography ,BIG data ,MEDICAL sciences ,RETINAL diseases - Abstract
With the advancement in modern imaging techniques like CT scan, MRI, PET scan etc., a vast amount of data is generated every day in the field of healthcare. Big data contains hidden information, which necessitates the development of intelligent systems to analyze it and extract relevant information, allowing for accurate and cost-effective decisions in the medical field. By utilizing the untapped potential of the big data available in the medical field, very precise models can be developed for the medical diagnosis of retinal diseases. Optical coherence tomography (OCT) is a non-invasive imaging test that captures different, distinctive layers of the retina and optic nerve in a living eye to map and measure their thickness, that helps diagnose various retinal disorders. With the advancement of the application of deep learning-based techniques in the field of medical sciences, the use of convolutional neural network (CNN) based approaches for disease detection is gaining popularity. While the manual examination of 3D OCT images for the diagnosis of retinal disorders requires extensive time and expert intervention, the use of CNNs provides an effective automated option that provides results with higher accuracy while also reducing the time involved in the overall process. In this paper, we have implemented the aforementioned idea by proposing OCT-CNN, a CNN architecture, that automatically classifies retinal OCT images and identifies potential disorders in a living eye. Several techniques have been employed to enhance the performance of the proposed approach, including digital enhancement of the images, dropout regularization, adaptive learning rates, and early stopping of training to attain optimal performance. The performance of the proposed OCT-CNN is evaluated on the UCSD dataset against several popular deep CNN architectures and existing state-of-the-art approaches to automatic retinal OCT classification. The proposed OCT-CNN attains the best performance on all evaluated metrics, pushing the classification accuracies to 99.28% on CNV, 99.9% on DME, 99.38% on DRUSEN, and 100% on NORMAL images, indicating its superiority over existing state-of-the-art techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. HTwitt: a hadoop-based platform for analysis and visualization of streaming Twitter data.
- Author
-
Demirbaga, Umit
- Subjects
DATA visualization ,ELECTRONIC data processing ,BIG data ,MACHINE learning ,ECOSYSTEMS - Abstract
Twitter produces a massive amount of data due to its popularity that is one of the reasons underlying big data problems. One of those problems is the classification of tweets due to use of sophisticated and complex language, which makes the current tools insufficient. We present our framework HTwitt, built on top of the Hadoop ecosystem, which consists of a MapReduce algorithm and a set of machine learning techniques embedded within a big data analytics platform to efficiently address the following problems: (1) traditional data processing techniques are inadequate to handle big data; (2) data preprocessing needs substantial manual effort; (3) domain knowledge is required before the classification; (4) semantic explanation is ignored. In this work, these challenges are overcome by using different algorithms combined with a Naïve Bayes classifier to ensure reliability and highly precise recommendations in virtualization and cloud environments. These features make HTwitt different from others in terms of having an effective and practical design for text classification in big data analytics. The main contribution of the paper is to propose a framework for building landslide early warning systems by pinpointing useful tweets and visualizing them along with the processed information. We demonstrate the results of the experiments which quantify the levels of overfitting in the training stage of the model using different sizes of real-world datasets in machine learning phases. Our results demonstrate that the proposed system provides high-quality results with a score of nearly 95% and meets the requirement of a Hadoop-based classification system. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Information flow-based second-order cone programming model for big data using rough concept lattice.
- Author
-
Wang, Pin, Wu, Wei, Zeng, Lingyu, and Zhong, Hongmei
- Subjects
BIG data ,CONES ,ROUGH sets ,DATA modeling ,DATA mining ,FUZZY sets - Abstract
The purpose of this paper is to study the rough concept lattice and use the information flow to construct a second-order cone programming model for big data. Through the construction of the model, attribute reduction is performed on the original data of the noise in the formal background. Then, construct the concept lattice according to the reduced formal background, and then analyze the big data in the form of information flow. Then, based on the advantages of the β-upper and lower distribution reduction algorithms of the variable-precision rough set, combine the rough concept. The characteristics of the background of the lattice form, the second-order cone thought method theory is applied, and then a second-order cone calculation model is constructed. The rough concept lattice is applied to the processing of big data, and then it is analyzed and researched through concrete examples. The time required in traditional mode is between 118.3 min and 123.6 min, while the time required for second-order cone and concept lattice fitting is 92.4 min and 98.5 min. Experimental data show that the rough concept lattice uses information flow to construct a second-order cone programming model for big data, which results in a greatly reduced number of nodes in the rough concept lattice and an enhanced anti-noise capability of the system, which saves data statistics and calculation time. The traditional concept lattice algorithm can be traced back to the purification of the formal background, and the purification of the formal background can simplify the concept connotation and study attribute reduction from the perspective of lattice isomorphism. Experimental data show that the rough concept lattice uses information flow to construct a second-order cone programming model for big data, which greatly guarantees the integrity and security of the data by about 15%, and saves 20% of the data processing time compared with traditional and algorithms. It has guiding significance for the efficient and secure development of big data in the future. In this paper, data feature mining and information flow model construction are carried out, the power spectral density feature extraction of big data is carried out from a large number of noisy and fuzzy data, and the second-order cone programming model of big data information flow is carried out by rough concept lattice method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Measuring the efficiency of banks using high-performance ensemble technique.
- Author
-
Thabet, Huda H., Darwish, Saad M., and Ali, Gihan M.
- Subjects
- *
MACHINE learning , *DATA envelopment analysis , *SUPPORT vector machines , *BANKING industry , *K-nearest neighbor classification , *BIG data - Abstract
The importance of technology and managerial risk management in banks has increased due to the financial crisis. Banks are the most affected since there are so many of them with poor financial standing. Due to this problem, an unstable and inefficient financial system causes economic stagnation in both the banking sector and overall economy. Data envelopment analysis (DEA) has been used to examine decision-making units (DMUs) performance to enhance efficiency. Currently, with the rapid growth of big data, adding more DMUs will likely require a large amount of memory and CPU time on the computer system, which will be the biggest challenge. As a result, machine learning (ML) approaches have been used to analyze financial institution performance, but many of them have variances in predictions or model stability, making measuring bank efficiency extremely difficult. For this, ensemble learning is commonly used to evaluate the performance of financial institutions in this context. This paper presents a robust super learner ensemble technique for assessing bank efficiency, with four machine learning models serving as base learners. These models are the support vector machine (SVM), K-nearest neighbors (KNN), random forest (RF), and AdaBoost classifier (ADA) which represent the base learners and their results utilized to train the meta-learner. The super learner (SL) approach is an extension of the stacking technique, which generates an ensemble based on cross-validation. One important benefit of this cross-validation theory-based technique is that it can overcome the overfitting issue that plagues most other ensemble approaches. When SL and base learners were compared for their forecasting abilities using different statistical standards, the results showed that the SL is superior to the base learners, where different variable combinations were used. The SL had accuracy (ACC) of 0.8636–0.9545 and F1-score (F1) of 0.9143–0.9714, while the basic learners had ACC of 0.5909–0.8182 and F1 of 0.6897–0.9143. So, SL is highly recommended for improving the accuracy of financial data forecasts, even with limited financial data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. An efficient big data classification using elastic collision seeker optimization based faster R-CNN.
- Author
-
Chidambaram, S., Cyril, C. Pretty Diana, and Ganesh, S. Sankar
- Subjects
ELASTIC scattering ,BIG data ,MACHINE learning ,CLASSIFICATION - Abstract
Big data is a large set of data that is analyzed with the calculation to manifest myriad sources. Big data is capable of handling various challenges to processing huge amounts of data. To handle issues based on large-scale databases, a MapReduce framework is employed which provides robust and simple infrastructure for huge datasets. This paper proposes a novel Elastic collision seeker optimization based Faster R-CNN (ECSO-FRCNN) classifier for efficient big data classification. The proposed ECSO-FRCNN classifier is capable of handling missing attributes, and incremental learning and improves training performance effectively. As the proposed technique deals with large data samples, it necessitates the inclusion of the MapReduce framework. The adaption of MapReduce design in big data classification prevents the classification results from uncertainties such as data redundancy, misclassification, and storage issues. The proposed method is examined with three standard datasets, namely the skin segmentation dataset, mushroom dataset, and localization dataset, collected from the University of California, UCI machine learning repository. Finally, extensive experimental analysis is carried out for various parameters to depict the efficiency of the system. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Artificial intelligence with big data analytics-based brain intracranial hemorrhage e-diagnosis using CT images.
- Author
-
Mansour, Romany F., Escorcia-Gutierrez, José, Gamarra, Margarita, Díaz, Vicente García, Gupta, Deepak, and Kumar, Sachin
- Subjects
ARTIFICIAL intelligence ,INTRACRANIAL hemorrhage ,COMPUTED tomography ,BIG data ,FUZZY neural networks ,IMAGE segmentation - Abstract
Due to the fast development of medical imaging technologies, medical image analysis has entered the period of big data for proper disease diagnosis. At the same time, intracerebral hemorrhage (ICH) becomes a serious disease which affects the injury of blood vessels in the brain regions. This paper presents an artificial intelligence and big data analytics-based ICH e-diagnosis (AIBDA-ICH) model using CT images. The presented model utilizes IoMT devices for data acquisition process. The presented AIBDA-ICH model involves graph cut-based segmentation model for identifying the affected regions in the CT images. To manage big data, Hadoop Ecosystem and its elements are mainly used. In addition, capsule network (CapsNet) model is applied as a feature extractor to derive a useful set of feature vectors. Finally, the presented AIBDA-ICH model makes use of the fuzzy deep neural network (FDNN) model to carry out classification process. For validating the superior performance of the AIBDA-ICH method, an extensive set of simulations were performed and the outcomes are examined under diverse aspects. The experimental values pointed out the improved e-diagnostic performance of the AIBDA-ICH model over the other compared methods with the precision and accuracy of 94.96% and 98.59%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Research on large data set clustering method based on MapReduce.
- Author
-
Wei, Pengcheng, He, Fangcheng, Li, Li, Shang, Chuanfu, and Li, Jing
- Subjects
BIG data ,K-means clustering ,DATA analysis - Abstract
The similarities and differences between the K-means algorithm and the Canopy algorithm's MapReduce implementation are described in detail, and the possibility of combining the two to design a better algorithm suitable for clustering analysis of large data sets is analyzed in this paper. Different from the previous literature's improvement ideas for K-means algorithm, it proposes new ideas for sampling and analyzes the selection of relevant thresholds in this paper. Finally, it introduces the MapReduce implementation framework based on Canopy partitioning and filtering K-means algorithm and analyzes some pseudocode in this chapter. Finally, it briefly analyzes the time complexity of the algorithm in this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
17. Large-scale cellular traffic prediction based on graph convolutional networks with transfer learning.
- Author
-
Zhou, Xu, Zhang, Yong, Li, Zhao, Wang, Xing, Zhao, Juan, and Zhang, Zhao
- Subjects
CONVOLUTIONAL neural networks ,KNOWLEDGE transfer ,LEARNING strategies - Abstract
Intelligent cellular traffic prediction is very important for mobile operators to achieve resource scheduling and allocation. In reality, people often need to predict very large scale of cellular traffic involving thousands of cells. This paper proposes a transfer learning strategy based on graph convolution neural network to achieve the task of large-scale traffic prediction. In this paper, we design a novel spatial-temporal graph convolutional network based on attention mechanism (STA-GCN). In order to achieve large-scale traffic prediction, this paper proposes a regional transfer learning strategy based on STA-GCN to improve knowledge reuse. The effectiveness of STA-GCN is validated through two real-world traffic datasets. The results show that STA-GCN outperforms the state-of-art baselines, and the transfer learning strategy can effectively reduce the number of epochs while training. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
18. Edge artificial intelligence for big data: a systematic review.
- Author
-
Hemmati, Atefeh, Raoufi, Parisa, and Rahmani, Amir Masoud
- Subjects
- *
ARTIFICIAL intelligence , *REAL-time computing , *BIG data , *ELECTRONIC data processing , *EDGE computing , *MACHINE learning - Abstract
Edge computing, artificial intelligence (AI), and machine learning (ML) concepts have become increasingly prevalent in Internet of Things (IoT) applications. As the number of IoT devices continues to grow, relying solely on cloud computing for real-time data processing and analysis is proving to be more challenging. The synergy between edge computing and AI is particularly intriguing due to AI's reliance on rapid data processing, a capability facilitated by edge computing. Edge AI represents a significant paradigm shift, leveraging AI within edge computing frameworks to reduce reliance on internet connections and mitigate data latency issues. This approach accelerates data processing, supporting use cases that demand real-time inference. Additionally, as cloud storage costs continue to rise, the feasibility of streaming and storing large volumes of data comes into question. Edge AI offers a compelling solution by performing big data analytics closer to the end device where edge computing is deployed. This paper presents a systematic literature review (SLR) of 85 articles published between 2018 and 2023 within Edge AI. The study provides a comprehensive examination of the analysis of measurement environments and assesses factors applied to Edge AI for big data. It offers taxonomies specific to Edge AI within the big data domain, presents case studies, and outlines the challenges and open issues inherent in Edge AI for big data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Deep learning-enabled block scrambling algorithm for securing telemedicine data of table tennis players.
- Author
-
Yang, Bo, Cheng, Bojin, Liu, Yixuan, and Wang, Lijun
- Subjects
DEEP learning ,TABLE tennis players ,MACHINE learning ,ALGORITHMS ,DIAGNOSIS ,NOSOLOGY ,TELEMEDICINE - Abstract
In sports, advance sensing technologies generate massive amount of unstructured telemedicine data that need to be refined for accurate diagnosis of underlying diseases. For accurate prediction of diseases and classification of athletes' data, deep learning algorithms are frequently used at the cloud. However, the transmission of raw data of athletes to the cloud faces numerous challenges. Among them, security and privacy are a major challenge in view of the sensitive and personal information present within the unstructured data. In this paper, first we present a data block scrambling algorithm (without key management) for secured transmission and storage of ECG (electrocardiogram) data of table tennis players at the cloud. A small piece of original data stored at the cloud is used for scrambling the massive amount of remaining ECG data. The secured telemedicine data is then imported into Hadoop Distributed File System for data management, which is read by Spark framework to form Resilient Distributed Datasets. Finally, a deep learning approach is used that extracts useful features, learns the related information, and weights and sums the feature vectors at different layers for classification. Theoretical analysis proves that our proposed approach is highly robust and resilient to brute force attacks and at the same time has a much better accuracy, sensitivity, and specificity as compared to the existing approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. Data science strategies leading to the development of data scientists' skills in organizations.
- Author
-
Sousa, Maria José, Melé, Pere Mercadé, Pesqueira, António Miguel, Rocha, Álvaro, Sousa, Miguel, and Noor, Salma
- Subjects
DATA science ,DATA quality ,SCIENTIFIC method ,LITERATURE reviews ,BIG data - Abstract
The purpose of this paper is to compare the strategies of companies with data science practices and methodologies and the data specificities/variables that can influence the definition of a data science strategy in pharma companies. The current paper is an empirical study, and the research approach consists of verifying against a set of statistical tests the differences between companies with a data science strategy and companies without a data science strategy. We have designed a specific questionnaire and applied it to a sample of 280 pharma companies. The main findings are based on the analysis of these variables: overwhelming volume, managing unstructured data, data quality, availability of data, access rights to data, data ownership issues, cost of data, lack of pre-processing facilities, lack of technology, shortage of talent/skills, privacy concerns and regulatory risks, security, and difficulties of data portability regarding companies with a data science strategy and companies without a data science strategy. The paper offers an in-depth comparative analysis between companies with or without a data science strategy, and the key limitation is regarding the literature review as a consequence of the novelty of the theme; there is a lack of scientific studies regarding this specific aspect of data science. In terms of the practical business implications, an organization with a data science strategy will have better direction and management practices as the decision-making process is based on accurate and valuable data, but it needs data scientists skills to fulfil those goals. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
21. DENA: display name embedding method for Chinese social network alignment.
- Author
-
Li, Yao and Liu, Huilin
- Subjects
SOCIAL networks ,CHINESE characters ,BIG data - Abstract
Social network alignment, which aims at finding node correspondences between social networks, is the cornerstone of fusing big data from different social networks. Most of social network alignment solutions are based on English environment. Hence, the existing attribute-based solutions, which contain the unique features in English, are not suitable for Chinese social networks. Although structure-based methods are general, they suffer from the sparsity problem. To solve the Chinese social network alignment problem, in this paper, a novel display name embedding method is proposed, called DENA. It utilizes the morphological and phonetic information of Chinese characters to enhance the alignment accuracy. Specifically, in DENA, a hierarchical n-gram process framework is introduced to generate features from display names and their related morphological information (i.e., strokes) and phonetic information (i.e., pinyin). Then, an innovative graph called display name graph is proposed to transform them into an undirected and unweighted graph. By learning this graph, all features are embedded in to low-dimensional vectors. Therefore, the closeness between embedding vectors of display names represents the probability of the alignment between them. Experiments based on real-world datasets show that DENA outperforms traditional classification-based methods and the state-of-the-art word embedding methods in social network alignment. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Toward a prediction approach based on deep learning in Big Data analytics.
- Author
-
Haddad, Omar, Fkih, Fethi, and Omri, Mohamed Nazih
- Subjects
DEEP learning ,BIG data ,INTERNET servers ,ELECTRONIC data processing ,SOCIAL networks ,DATA reduction - Abstract
Nowadays, cloud computing plays an important role in the process of storing both structured and unstructured data. This contributed to a very large data growth on web servers, which has come to be called Big Data. Cloud computing technology is adopted in many applications, perhaps the most important of which are social networking applications, e-mail messages, and others, which represent an important source of data through the process of communication between web users. Thus, these data represent views and opinions on various topics, which can help businesses and other decision makers in making decisions based on future predictions. To achieve this goal, several methods have been proposed. Recently, it relies on the use of deep learning as a tool for processing large volumes of data due to its high performance in extracting predictions from the opinions of web users. This paper presents a new prediction approach based on Big Data analysis and deep learning for large-scale data, called PABIDDL. The infrastructure of the proposed approach is focused on three important stages, starting with the reduction of Big Data based on MapReduce using the Hadoop framework. In the second stage, we performed the initialization of these data using the GloVe technique. Finally, the text data were classified into advantages and disadvantages poles depending on CNN deep learning approach. Also, we conducted an empirical study of our proposed approach PABIDDL and related works models on two standard datasets IMDB and MR datasets. The results obtained showed that the best performance is given by our approach. We recorded 0.93%, 0.90%, and 0.92% as an accuracy, a recall, and an F1-score, respectively. Furthermore, our approach reached the fastest response time. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. Multiclass sentiment analysis on COVID-19-related tweets using deep learning models.
- Author
-
Vernikou, Sotiria, Lyras, Athanasios, and Kanavos, Andreas
- Subjects
DEEP learning ,SENTIMENT analysis ,NATURAL language processing ,COVID-19 ,ELECTRONIC data processing ,SOCIAL media ,MACHINE learning - Abstract
COVID-19 is an infectious disease with its first recorded cases identified in late 2019, while in March of 2020 it was declared as a pandemic. The outbreak of the disease has led to a sharp increase in posts and comments from social media users, with a plethora of sentiments being found therein. This paper addresses the subject of sentiment analysis, focusing on the classification of users' sentiment from posts related to COVID-19 that originate from Twitter. The period examined is from March until mid-April of 2020, when the pandemic had thus far affected the whole world. The data is processed and linguistically analyzed with the use of several natural language processing techniques. Sentiment analysis is implemented by utilizing seven different deep learning models based on LSTM neural networks, and a comparison with traditional machine learning classifiers is made. The models are trained in order to distinguish the tweets between three classes, namely negative, neutral and positive. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Construction of the knowledge service model of a port supply chain enterprise in a big data environment.
- Author
-
Bo, Yang and Meifang, Yang
- Subjects
SUPPLY chains ,BIG data ,KNOWLEDGE representation (Information theory) ,NATURAL language processing ,MODEL-based reasoning ,HARBORS ,SUPPLY & demand ,ARTIFICIAL intelligence - Abstract
In the context of the rapid development of big data and artificial intelligence, knowledge service theory and big data technology are applied to build a smart port supply chain knowledge service model. This model provides a personalized, intelligent, and diversified knowledge-based service system platform solution to port supply chain enterprises, helping to realize port supply chain transformation and upgrading and intelligent integrated operations. This paper analyses and summarizes the research status on knowledge service demand and port supply chain knowledge service during the development and operation of the port supply chain and applies big data and artificial intelligence technologies such as knowledge matching, knowledge fusion, and natural language processing. A port supply chain knowledge service model including knowledge acquisition, knowledge organization and knowledge service modules is constructed. The ontology method is used to construct the ontology knowledge base of the port supply chain, and based on this, computational reasoning experiments are performed. The experiments show that ontology technology demonstrates effectiveness and superiority in constructing a knowledge service system model for the port supply chain in terms of knowledge representation and knowledge reasoning. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
25. Prediction of corn price fluctuation based on multiple linear regression analysis model under big data.
- Author
-
Ge, Yan and Wu, Haixia
- Subjects
MULTIPLE regression analysis ,CORN prices ,PRICE fluctuations ,AGRICULTURAL forecasts ,REGRESSION analysis ,FORECASTING ,BIG data - Abstract
This paper mainly analyzes the changing trend of corn price and the factors that affect the price of corn. Using the data and regression analysis, the univariate nonlinear and multivariate linear regression models are established to predict the corn price, respectively. First, this paper establishes a univariate nonlinear regression model with time as the independent variable, and corn price is used as the dependent variable through the analysis of the trend of big data related to Chinese corn price from 2005 to 2016 by MATLAB, which is the computer-based analysis and processing method. The variation of the maize price with time was fitted. To a certain extent, the price trend of corn is predicted. However, the estimated price of corn in 2017 with this model will deviate from the actual value. According to the changes of related policies in our country, we analyzed the deviation of the original model, and the relationship between supply and demand is the main underlying factor that affects the price of corn. This paper selects maize-related big data from 2005 to 2016, we set its production consumption, import and export volume as independent variables, and we still use maize price as the dependent variable to establish a multiple linear regression model. At this stage, the time series analysis of the independent variable has obtained the forecast value of each independent variable in 2017, and then the model is used to predict the corn in 2017 more accurately. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
26. Cubic-RBF-ARX modeling and model-based optimal setting control in head and tail stages of cut tobacco drying process.
- Author
-
Zhou, Feng, Peng, Hui, Ruan, Wenjie, Wang, Dan, Liu, Mingyue, Gu, Yunfeng, and Li, Li
- Subjects
TOBACCO analysis ,MATHEMATICAL optimization ,OPTIMAL control theory ,VECTOR error-correction models ,BIG data - Abstract
This paper presents a data-driven modeling technique used to build a multi-sampling-rate RBF-ARX (MSR-RBF-ARX) model for capturing and quantifying global nonlinear characteristics of the head and tail stage drying process of a cylinder-type cut tobacco drier. In order to take full account of influence of the input variables to outlet cut tobacco moisture content in whole drying process, and meanwhile, to avoid orders of the model too large, this paper designs a special MSR-RBF-ARX model structure that incorporates the advantages of parametric model and nonparametric model in nonlinear dynamics description for the process. Considering this industrial process identification problem, a hybrid optimization algorithm is proposed to identify the MSR-RBF-ARX model using the multi-segment historical data set in different seasons and working conditions. To obtain better long-term forecasting performance of the model, a long-term forecasting performance index is introduced in the algorithm. To accelerate the computational convergence, in the hybrid algorithm, one-step predictive errors of the model are minimized first to get a set of the model parameters that are just used as the model initial parameters, and then, the model parameters are further optimized by minimizing long-term forecasting errors of the model. Based on the estimated model, a set of optimal setting curves of the input variables are obtained by optimizing parameters of the designed input variable models. The effectiveness of the proposed modeling and setting control strategy for the process are demonstrated by simulation studies. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
27. Time series forecasting by recurrent product unit neural networks.
- Author
-
Fernández-Navarro, F., de la Cruz, Maria Angeles, Gutiérrez, P. A., Castaño, A., and Hervás-Martínez, C.
- Subjects
TIME series analysis ,ARTIFICIAL neural networks ,AUTOREGRESSION (Statistics) ,BIG data ,COMPUTATIONAL complexity - Abstract
Time series forecasting (TSF) consists on estimating models to predict future values based on previously observed values of time series, and it can be applied to solve many real-world problems. TSF has been traditionally tackled by considering autoregressive neural networks (ARNNs) or recurrent neural networks (RNNs), where hidden nodes are usually configured using additive activation functions, such as sigmoidal functions. ARNNs are based on a short-term memory of the time series in the form of lagged time series values used as inputs, while RNNs include a long-term memory structure. The objective of this paper is twofold. First, it explores the potential of multiplicative nodes for ARNNs, by considering product unit (PU) activation functions, motivated by the fact that PUs are specially useful for modelling highly correlated features, such as the lagged time series values used as inputs for ARNNs. Second, it proposes a new hybrid RNN model based on PUs, by estimating the PU outputs from the combination of a long-term reservoir and the short-term lagged time series values. A complete set of experiments with 29 data sets shows competitive performance for both model proposals, and a set of statistical tests confirms that they achieve the state of the art in TSF, with specially promising results for the proposed hybrid RNN. The experiments in this paper show that the recurrent model is very competitive for relatively large time series, where longer forecast horizons are required, while the autoregressive model is a good selection if the data set is small or if a low computational cost is needed. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
28. Pipeline risk big data intelligent decision-making system based on machine learning and situation awareness.
- Author
-
Zhong, Xiong, Zhang, Xinsheng, and Zhang, Ping
- Subjects
SITUATIONAL awareness ,UNDERGROUND pipelines ,DECISION making ,BIG data ,GAS leakage ,FAILURE analysis ,MACHINE learning ,PIPELINE failures - Abstract
Underground pipelines are an indispensable part of urban public facilities. However, the frequent occurrence of pipeline accidents in recent years has not only brought great inconvenience to people's lives, but also affected people's lives and property safety to a certain extent. Therefore, timely treatment and treatment are very important. Preventing sudden underground pipeline accidents plays an important role in improving urban livability. This article studies pipeline risk big data intelligent decision-making systems based on machine learning and situational awareness. In this paper, by analyzing the application scope of gas leakage and diffusion models under different modes, leakage, diffusion, fire and explosion models are determined, and a combined model framework of leakage accident consequence system analysis is formed. The system uses the pipeline failure probability model and the pipeline failure consequence analysis model to determine the pipeline failure probability, the probability and the consequences of each accident; it uses the spatial analysis ability of GIS technology to determine the accident impact area and displays the impact area in graphics form. Through the effect verification of the test set, the prediction result of the SVR model based on the grid search parameter, the relative percentage error of the predicted value of each sample and the true value fluctuate is in the range of 4%-36%, and the amplitude is not very large. Most of the error values are approximately 13.56% of the MAPE value. The results show that the optimization method using grid search parameters can have better prediction performances. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. Big data analytics for MOOC video watching behavior based on Spark.
- Author
-
Hu, Hui, Zhang, Guofeng, Gao, Wanlin, and Wang, Minjuan
- Subjects
BIG data ,MASSIVE open online courses - Abstract
The purpose of this study is to measure the effectiveness of courses delivered using MOOCs in China Agricultural University. Video watching is considered to be the most important way to disseminate knowledge in Massive Open Online Course (MOOC). Its mission is to understand the degree of students' learning engagement and to provide suggestions for teachers to construct courses. This paper proposes the analysis methods of students' video watching behavior in MOOCs platform and verifies it with the data of the cauX platform. Initially, a detailed statistical analysis of video watching data and behavior was performed. Later, data preprocessing algorithms based on Spark platform were developed and used to calculate the number of video watching behaviors in every hour and every minute. Then, the entropy weight method was used to calculate the weight of pause video, seek video and speed change video. Finally, we analyze and discuss the results of experiment. The results show that the proposed method based on Spark platform can quickly and accurately analyze the characteristics of video watching behavior. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
30. Research on algorithm for solving maximum independent set of semi-external data of large graph data.
- Author
-
Wei, Pengcheng, He, Fangcheng, Shang, Chuanfu, and Li, Jing
- Subjects
BIG data ,INDEPENDENT sets ,REAL numbers ,GREEDY algorithms ,ALGORITHMS ,LIBRARY technical services - Abstract
The maximum independent set algorithm for large-scale data semi-existing data is studied and the solving method of the largest independent set problem in large-map data is mainly analyzed in this paper. The specific research contents are mainly divided into semi-external map algorithm based on Greedy heuristic strategy, semi-external map algorithm based on swap and design, and implementation of semi-external graph algorithm processing function library. Experiments on a large number of real and artificially generated data sets show that the algorithm in this paper is very efficient both in time and in space. The largest independent set obtained by the algorithm can reach more than 96% of its theoretical upper bound for most of the data. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
31. Special Issue on Machine Learning and Big Data Analytics for IoT Security and Privacy (SPIoT2022).
- Author
-
Zhao, Jinghua
- Subjects
- *
MACHINE learning , *BIG data , *INTERNET of things , *OPTIMIZATION algorithms , *FEDERATED learning , *DEEP learning - Abstract
This document is a special issue of the journal "Neural Computing & Applications" focused on machine learning and big data analytics for IoT security and privacy. The Internet of Things (IoT) is a network of connected devices that presents challenges for security and analysis due to the large amounts of data involved. The issue includes selected papers from the 3rd International Conference on Machine Learning and Big Data Analytics for IoT Security and Privacy (SPIoT2022), as well as additional submissions. The papers cover a range of topics including prototype selection, scheduling optimization, deep learning, text reading, task-technology matching, financial incentives, state estimation for robots, image recognition, speech recognition, and term extraction algorithms. The authors express gratitude to the editor, staff, authors, and reviewers involved in the creation of this issue. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
32. NSOFS: a non-dominated sorting-based online feature selection algorithm.
- Author
-
Hashemi, Amin, Pajoohan, Mohammad-Reza, and Dowlatshahi, Mohammad Bagher
- Subjects
- *
FEATURE selection , *ALGORITHMS , *COMPUTATIONAL complexity , *BIG data - Abstract
Online streaming feature selection (OSFS) methods are used to dynamically update the feature space as well as remove irrelevant and redundant features from the data. Since most Big Data in real-world applications are generated in the form of data streams, effective methods should be developed in this area. Further, methods with low computational complexity are required to make online decisions. In this paper, the OSFS process is modeled as a multi-objective optimization problem. To the best of our knowledge, this is the first time that the concept of Pareto dominance has been applied to find the optimal subset of features in OSFS. When a new feature arrives, it is evaluated in the multi-objective space. The non-dominated features are the optimal subset for each timestamp. We proposed an efficient and effective method which enhances the classification accuracy in OSFS by minimizing the number of features within a short time. In addition, the proposed method is insensitive to the feature streams. Experiments are conducted using two classifiers and seven OSFS methods, including OSFSMI, K-OFSD, OFS-A3M, OFS-Density, Alpha-Investing, SAOLA, and OFSS-FI. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. FCSF-TABS: two-stage abstractive summarization with fact-aware reinforced content selection and fusion.
- Author
-
Zhang, Mengli, Zhou, Gang, Yu, Wanting, Liu, Wenfen, Huang, Ningbo, and Yu, Ze
- Subjects
NATURAL language processing ,REINFORCEMENT learning ,BIG data - Abstract
In the era of big data, machine summarization models provide a new and efficient way for the rapid processing of massive text data. Generally, whether the fact descriptions in generated summaries are consistent with input text that is a critical metric in real-world tasks. However, most existing approaches based on standard likelihood training ignore this problem and only focus on improving the ROUGE scores. In this paper, we propose a two-stage Transformer-based abstractive summarization model to improve the factual correctness, denoted as FCSF-TABS. In the first stage, we use fine-tuned BERT classifier to perform content selection to select summary-worthy single sentences or adjacent sentence pairs in the input document. In the second stage, we feed the selected sentences into the Transformer-based summarization model to generate summary sentences. Furthermore, during the training, we also introduce the idea of reinforcement learning to jointly optimize a mixed-objective loss function. Specially, to train our model, we elaborately constructed two training sets by comprehensively considering informativeness and factual consistency. We conduct a lot of experiments on the CNN/DailyMail and XSum datasets. Experimental results show that our FCSF-TABS model not only improves the ROUGE scores, but also contains fewer factual errors in the generated summaries compared to some popular summarization models. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. Short-term data-based spatial parallel autoreservoir computing on spatiotemporally chaotic system prediction.
- Author
-
Wang, Yin and Liu, Shutang
- Subjects
PARALLEL programming ,WEIGHT training ,BIG data ,PRIOR learning ,FORECASTING ,MACHINE learning - Abstract
This paper presents a novel machine learning method for predicting chaotic spatiotemporal systems by proposing a parallel reservoir-like neural network. In contrast to previous machine learning methods, which require big data and suffer from high computing costs, the main advantage of the method is that it is based on short-term data and only needs to train the weight of the output layer. Theoretically, the ratio of training data length and the prediction data length can reach 2:1. First, the method for transforming an infinite-dimensional system into a finite-dimensional system is introduced. Then, based on this concept and the spatiotemporal information transformation, a network training algorithm of spatial parallel autoreservoir computing structure is proposed. Numerical experiments verified that with short-term data, the proposed method performs better than some widely used data-driven methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. Review on R&D task integrated management of intelligent manufacturing equipment.
- Author
-
Ren, Teng, Luo, Tianyu, Li, Shuxuan, Xing, Lining, and Xiang, Shang
- Subjects
PRODUCTION management (Manufacturing) ,TASK analysis ,SYSTEM integration ,BIG data ,DATA science ,SYSTEMS theory - Abstract
With the rapid development of various types of industrial big data technologies, in the context of industrial big data and systems science, intelligent optimization algorithms and other technologies have been widely used in the field of intelligent manufacturing. In recent years, it has not only become an important engine for the transformation and upgrading of smart manufacturing industry, but also brought new opportunities and challenges to the development task integrated management of intelligent manufacturing equipment. This paper reviews the research on task integrated management of intelligent manufacturing equipment development from the following four aspects: task analysis and management of intelligent manufacturing equipment in big data environment, task decomposition and resource allocation, task network analysis and evaluation, and task integration analysis and verification evaluation progress. Prospects for further research are pointed out, including the customized research into high-end equipment developed for the individual needs of users, data-driven optimal allocation of resources research, multi-layer interaction of complex network modeling, intelligent systems integration, and verification evaluation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. A parallel NAW-DBLSTM algorithm on Spark for traffic flow forecasting.
- Author
-
Xia, Dawen, Yang, Nan, Jiang, Shunying, Hu, Yang, Li, Yantao, Li, Huaqing, and Wang, Lin
- Subjects
INTELLIGENT transportation systems ,TRAFFIC flow ,TRAFFIC estimation ,PARALLEL algorithms ,TRAFFIC engineering ,GAUSSIAN distribution ,BIG data - Abstract
Traffic flow forecasting (TFF) is critical for constructing intelligent transportation systems and offering real-time traffic applications, and especially accurate flow forecasting based on traffic big data can drive reliable strategies for traffic management and control. To consider the weight of the influence of the spatial correlation among the road segments and capture the nonlinear characteristics of traffic flow, this paper presents a parallel Normal Distribution and Attention Mechanism Weighted Deep Bidirectional Long Short-Term Memory (NAW-DBLSTM) algorithm on Spark. Specifically, we employ the resilient distributed data set (RDD) to preprocess large-scale mobile trajectory data (e.g., taxi GPS trajectory data), and then Kalman Filter (KF) is utilized to smooth the taxi trajectory big data. Next, the parallel NAW-DBLSTM algorithm is put forward on a Spark distributed computing platform to enhance the accuracy and scalability of TFF, combined with the attention mechanism and the normal distribution, and then the time window is used for TFF. Finally, the traffic flow is forecasted successfully on Spark by our NAW-DBLSTM algorithm with the real-world GPS trajectories of taxicabs. The experimental results demonstrate that, compared with LSTM, BiLSTM, DBLSTM, DNN, SVR, KNN, SAEs, BP, CNN, GRU, and ANNs, NAW-DBLSTM can produce better performance with the MAPE value that is 85.1%, 80.1%, 85.8%, 73.1%, 78.2%, 77.9%, 78.8%, 84.6%, 96.4%, 86.2%, and 73.2% lower than that of comparable algorithms, respectively. Particularly, the MAPE value of NAW-DBLSTM is 28.3%, 20.1%, 71.1%, and 79.1% lower than that of LSTM weighted with the normal distribution and the attention mechanism (NAW-LSTM), BiLSTM weighted with the normal distribution and the attention mechanism (NAW-BiLSTM), DBLSTM weighted with the normal distribution (NW-DBLSTM), and NAW-DBLSTM weighted without the time window (NT-NAW-DBLSTM), respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. Big data classification using deep learning and apache spark architecture.
- Author
-
Brahmane, Anilkumar V. and Krishna, B. Chaitanya
- Subjects
BIG data ,DEEP learning ,MACHINE learning ,CLASSIFICATION ,MATHEMATICAL optimization ,BIOGRAPHY (Literary form) - Abstract
The oddity in large information is rising step by step so that the current programming instruments faces trouble in supervision of huge information. Moreover, the pace of the irregularity information in the immense datasets is a key imperative to the exploration business. Along these lines, this paper proposes a novel method for taking care of the large information utilizing Spark structure. The proposed method experiences two stages for arranging the enormous information, which includes highlight choice and arrangement, which is acted in the underlying hubs of Spark engineering. The proposed improvement calculation is named Rider Chaotic Biography streamlining (RCBO) calculation, which is the incorporation of the Rider Optimization Algorithm (ROA) and the standard confused biogeography-based-advancement (CBBO). The proposed RCBO-profound stacked auto-encoder utilizing Spark structure successfully handles the large information for achieving powerful huge information arrangement. Here, the proposed RCBO is utilized for choosing reasonable highlights from the monstrous dataset. Besides, the profound stacked auto-encoder utilizes RCBO for preparing so as to characterize colossal datasets. In this research we focused on problem of supervision related to big information of The Cover type Data in UCI machine learning repository. The dataset describes the forest cover set data to predict the forest cover type from cartographic variables. The dataset is multivariate in nature with number of web hits 263,361. The number of instances is 581012 with 54 numbers of attributes and the task associated for the dataset is classification. The examination of the proposed RCBO-profound stacked auto-encoder-based Spark structure utilizing the UCI AI datasets uncovered that the proposed technique beat different strategies, by procuring maximal exactness of 86.71%, dice coefficient of 92.7%, affectability of 75.2% and explicitness of 95.4% separately. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
38. Deep Q-network-based multi-criteria decision-making framework for virtual simulation environment.
- Author
-
Jang, Hyeonjun, Hao, Shujia, Chu, Phuong Minh, Sharma, Pradip Kumar, Sung, Yunsick, and Cho, Kyungeun
- Subjects
ARTIFICIAL intelligence ,VIRTUAL reality ,GOAL (Psychology) ,DEEP learning ,DECISION making ,BIG data - Abstract
Deep learning improves the realistic expression of virtual simulations specifically to solve multi-criteria decision-making problems, which are generally rely on high-performance artificial intelligence. This study was inspired by the motivation theory and natural life observations. Recently, motivation-based control has been actively studied for realistic expression, but it presents various problems. For instance, it is hard to define the relation among multiple motivations and to select goals based on multiple motivations. Behaviors should generally be practiced to take into account motivations and goals. This paper proposes a deep Q-network (DQN)-based multi-criteria decision-making framework for virtual agents in real time to automatically select goals based on motivations in virtual simulation environments and to plan relevant behaviors to achieve those goals. All motivations are classified according to the five-level Maslow's hierarchy of needs, and the virtual agents train a double DQN by big social data, select optimal goals depending on motivations, and perform behaviors relying on a predefined hierarchical task networks (HTNs). Compared to the state-of-the-art method, the proposed framework is efficient and reduced the average loss from 0.1239 to 0.0491 and increased accuracy from 63.24 to 80.15%. For behavioral performance using predefined HTNs, the number of methods has increased from 35 in the Q network to 1511 in the proposed framework, and the computation time of 10,000 behavior plans reduced from 0.118 to 0.1079 s. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. Multi-disease big data analysis using beetle swarm optimization and an adaptive neuro-fuzzy inference system.
- Author
-
Singh, Parminder, Kaur, Avinash, Batth, Ranbir Singh, Kaur, Sukhpreet, and Gianini, Gabriele
- Subjects
PARTICLE swarm optimization ,BIG data ,DATA analysis ,BEETLES ,MEDICAL decision making - Abstract
Healthcare organizations and Health Monitoring Systems generate large volumes of complex data, which offer the opportunity for innovative investigations in medical decision making. In this paper, we propose a beetle swarm optimization and adaptive neuro-fuzzy inference system (BSO-ANFIS) model for heart disease and multi-disease diagnosis. The main components of our analytics pipeline are the modified crow search algorithm, used for feature extraction, and an ANFIS classification model whose parameters are optimized by means of a BSO algorithm. The accuracy achieved in heart disease detection is 99.1 % with 99.37 % precision. In multi-disease classification, the accuracy achieved is 96.08 % with 98.63 % precision. The results from both tasks prove the comparative advantage of the proposed BSO-ANFIS algorithm over the competitor models. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
40. Constructing accuracy and diversity ensemble using Pareto-based multi-objective learning for evolving data streams.
- Author
-
Sun, Yange and Dai, Honghua
- Subjects
MATHEMATICAL optimization ,BIG data - Abstract
Ensemble learning is one of the most frequently used techniques for handling concept drift, which is the greatest challenge for learning high-performance models from big evolving data streams. In this paper, a Pareto-based multi-objective optimization technique is introduced to learn high-performance base classifiers. Based on this technique, a multi-objective evolutionary ensemble learning scheme, named Pareto-optimal ensemble for a better accuracy and diversity (PAD), is proposed. The approach aims to enhance the generalization ability of ensemble in evolving data stream environment by balancing the accuracy and diversity of ensemble members. In addition, an adaptive window change detection mechanism is designed for tracking different kinds of drifts constantly. Extensive experiments show that PAD is capable of adapting to dynamic change environments effectively and efficiently in achieving better performance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
41. Research on trend analysis method of multi-series economic data based on correlation enhancement of deep learning.
- Author
-
Wang, Weihan and Li, Weiping
- Subjects
TREND analysis ,ARTIFICIAL neural networks ,DEEP learning ,TIME series analysis ,COMPUTER software reusability ,TASK analysis ,ARTIFICIAL intelligence ,BIG data - Abstract
The analysis on economic data based on time series takes an important position in the field of analysis on time-series data and is also an important task of the field of big data and artificial intelligence. Traditional time-series analysis method is of relatively weak competence in dealing with multi-series analysis. In this research, based on the problem associated with the analysis on time-series economic data, efficient handling method and model are put forward in the face of multi-series analysis task. Also, combined with the association rules, trend correlation and self-trend correlation among multiple series, a trend and correlation deep neural network model (TC-DNM) is established and then tested and verified by using three kinds of economic datasets with representativeness based on the trend analysis task handed by multi-series analysis. The results show that the model proposed in this research is effective than a number of baseline models, can be employed to achieve precision–recall balance and also possesses strong reusability. The two correlation models and joint models in this paper are of peculiarity and innovativeness. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
42. SVM hyperparameters tuning for recursive multi-step-ahead prediction.
- Author
-
Liu, Jie and Zio, Enrico
- Subjects
MACHINE learning ,CLASSIFICATION algorithms ,ARTIFICIAL intelligence ,SUPPORT vector machines ,MATHEMATICAL optimization ,TIME series analysis ,REGRESSION analysis ,BIG data - Abstract
Prediction of time series data is of relevance for many industrial applications. The prediction can be made in one-step and multi-step ahead. For predictive maintenance, multi-step-ahead prediction is of interest for projecting the evolution of the future conditions of the equipment of interest, computing the remaining useful life and taking corresponding maintenance decisions. Recursive prediction is one of the popular strategies for multi-step-ahead prediction. SVM is a popular data-driven approach that has been used for recursive multi-step-ahead prediction. Tuning the hyperparameters in SVM during the training process is challenging, and normally the hyperparameters are tuned by solving an optimization problem. This paper analyses the possible objectives of the optimization for tuning hyperparameters. Through experiments on one synthetic dataset and two real time series data, related to the prediction of wind speed in a region and leakage from the reactor coolant pump in a nuclear power plant, a bi-objective optimization combining mean absolute derivatives and accuracy on all prediction steps is shown to be the best choice for tuning SVM hyperparameters for recursive multi-step-ahead prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
43. Querying out-of-vocabulary words in lexicon-based keyword spotting.
- Author
-
Puigcerver, Joan, Toselli, Alejandro, and Vidal, Enrique
- Subjects
LEXICON ,LINGUISTICS ,VOCABULARY ,BIG data ,TERMS & phrases - Abstract
Lexicon-based handwritten text keyword spotting (KWS) has proven to be a faster and more accurate alternative to lexicon-free methods. Nevertheless, since lexicon-based KWS relies on a predefined vocabulary, fixed in the training phase, it does not support queries involving out-of-vocabulary (OOV) keywords. In this paper, we outline previous work aimed at solving this problem and present a new approach based on smoothing the (null) scores of OOV keywords by means of the information provided by 'similar' in-vocabulary words. Good results achieved using this approach are compared with previously published alternatives on different data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
44. Breadth search strategies for finding minimal reducts: towards hardware implementation.
- Author
-
Choromański, Mateusz, Grześ, Tomasz, and Hońko, Piotr
- Subjects
BIG data ,FIELD programmable gate arrays ,DATA mining - Abstract
Attribute reduction, being a complex problem in data mining, has attracted many researchers. The importance of this issue rises due to ever-growing data to be mined. Together with data growth, a need for speeding up computations increases. The contribution of this paper is twofold: (1) investigation of breadth search strategies for finding minimal reducts in order to emerge the most promising method for processing large data sets; (2) development and implementation of the first hardware approach to finding minimal reducts in order to speed up time-consuming computations. Experimental research showed that for software implementation blind breadth search strategy is in general faster than frequency-based breadth search strategy not only in finding all minimal reducts but also in finding one of them. An inverse situation was observed for hardware implementation. In the future work, the implemented tool is to be used as a fundamental module in a system to be built for processing large data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
45. High-performance IoT streaming data prediction system using Spark: a case study of air pollution.
- Author
-
Jin, Ho-Yong, Jung, Eun-Sung, and Lee, Duckki
- Subjects
AIR pollution ,FORECASTING ,DEEP learning ,CELL size ,CASE studies ,BIG data - Abstract
Internet-of-Things (IoT) devices are becoming prevalent, and some of them, such as sensors, generate continuous time-series data, i.e., streaming data. These IoT streaming data are one of Big Data sources, and they require careful consideration for efficient data processing and analysis. Deep learning is emerging as a solution to IoT streaming data analytics. However, there is a persistent problem in deep learning that it takes a long time to learn neural networks. In this paper, we propose a high-performance IoT streaming data prediction system to improve the learning speed and to predict in real time. We showed the efficacy of the system through a case study of air pollution. The experimental results show that the modified LSTM autoencoder model shows the best performance compared to a generic LSTM model. We noticed that achieving the best performance requires optimizing many parameters, including learning rate, epoch, memory cell size, input timestep size, and the number of features/predictors. In that regard, we show that the high-performance data learning/prediction frameworks (e.g., Spark, Dist-Keras, and Hadoop) are essential to rapidly fine-tune a model for training and testing before real deployment of the model as data accumulate. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
46. Unstructured big data analysis algorithm and simulation of Internet of Things based on machine learning.
- Author
-
Hou, Rui, Kong, YanQiang, Cai, Bing, and Liu, Huan
- Subjects
BIG data ,MACHINE learning ,INTERNET of things ,DATA analysis ,DATA mining - Abstract
Big data values data processing to ensure effective value-added data. With the rapid development of the cloud era, the coverage of big data has gradually expanded, and it has received wide attention from all walks of life. In the process of modern social development, big data analysis is gradually applied to the future development planning, risk evaluation and integration of market development status. With the rapid development of many fields of society, the flow of information has gradually expanded, and the Internet has developed more rapidly, prompting the application of big data in various fields. Machine learning is a multidisciplinary study of how computers use data or past experience. With the ability to independently improve specific algorithms, the computer acquires knowledge through learning and achieves the goal of artificial intelligence. Big data and machine learning are the major technological changes in the modern computer world, and these technologies have had a huge impact on all walks of life. At present, with the rapid development of the Internet, mobile communications, social networks and the Internet of Things, these networks generate large amounts of data every day, and data become the most important information resource of today. Some studies have shown that in many cases, the larger the amount of data, the better the data will be for machine learning. On this basis, this paper proposes an online client algorithm based on machine learning algorithm for IoT unstructured big data analysis and uses it in other big data analysis scenarios. Use the online data entered by the customer to implement background data mining, the parallel way to verify its efficiency through machine learning algorithms such as K-nearest neighbor algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
47. Toward cognitive support for automated defect detection.
- Author
-
Essa, Ehab, Hossain, M. Shamim, Tolba, A. S., Raafat, Hazem M., Elmogy, Samir, and Muahmmad, Ghulam
- Subjects
MACHINE learning ,INDUSTRIAL goods ,BIG data ,SPECTRUM analysis ,SURFACE defects ,STATISTICS ,INSPECTION & review ,COGNITIVE computing - Abstract
With the development of cognitive computing, machine learning techniques, and big data analytics, cognitive support is crucial for automated industrial production. The real-time automated visual inspection in industrial production is a challenging task. Speed and accuracy are crucial factors for the process of automating the defect detection. Many statistical and spectrum analysis approaches have been introduced; however, they suffer from high computational cost with average performance. This paper proposes a neighborhood-maintaining approach, which is based on the minimum ratio for fast and reliable inspection of industrial products. The minimum ratio between local neighborhood sliding windows is used as a similarity measure for localizing defection. Extreme learning machine is then adapted to classify surfaces to defect or normal. A defect detection accuracy on textile fabrics has achieved 98.07% with 91.29% sensitivity and 99.67% specificity. The minimum ratio shows highly discriminant power to distinguish between normal and abnormal surfaces. A defective region produces a smaller value of minimum ratio than that of a defect-free region. Experimental results show superior speed and accuracy performance over many existing defect detection methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
48. Intelligent equipment design assisted by Cognitive Internet of Things and industrial big data.
- Author
-
Wan, Jiafu, Li, Jiapeng, Hua, Qingsong, Celesti, Antonio, and Wang, Zhongren
- Subjects
INTERNET of things ,BIG data ,BUSINESS process outsourcing ,INTELLIGENT transportation systems ,ELECTRONIC data processing ,INFORMATION technology - Abstract
In recent years, the development of emerging technologies has brought about a new era of industrial reform. The current industrial revolution will deeply integrate the new generation of information technology with modern manufacturing industry and production servicing businesses to promote transformation and upgrading. As it is the foundation of the manufacturing industry, intelligent equipment plays an important role in the reform. In this paper, we propose an innovative design method to help design intelligent equipment. Firstly, referring to the architecture of the Cognitive Internet of Things (CIoT) and industrial big data, we proposed the architecture of the method and defined the different layers to process the data. Then, for the acquired external data, we put forward an algorithm which was combined with the technology of CIoT and industrial big data, to help designers analyze and make decisions. Finally, we verified the validity and feasibility of this method through a case study. The results showed that this method could effectively mine the deep information of intelligent equipment and provide more valuable information about design-assisting designers in designing better intelligent equipment. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
49. Applications of deep convolutional neural networks in prospecting prediction based on two-dimensional geological big data.
- Author
-
Li, Shi, Chen, Jianping, and Xiang, Jie
- Subjects
FORECASTING ,BIG data ,GEOLOGICAL modeling ,MINES & mineral resources ,ORE deposits ,MANGANESE ores ,FOOD recall ,ARTIFICIAL neural networks - Abstract
There are many challenges in the task of predicting ore deposits from big data repositories. The data are inherently complex and of great significance to the intervenient spatial relevance of deposits. The characteristics of the data make it difficult to use machine learning algorithms for the quantitative prediction of mineral resources. There are considerable interest and value in extracting spatial distribution characteristics from two-dimensional (2-d) ore-controlling factor layers under different metallogenic conditions. In this paper we undertake such analysis using a deep convolutional neural network algorithm named AlexNet. Training on the 2-d mineral prediction and classification model is performed using data from the Songtao–Huayuan sedimentary manganese deposit. It mines the coupling correlation between the spatial distribution of chemical elements, sedimentary facies, the outcrop of Datangpo Formation, faults, water system, and the areas where manganese ore bodies are present, as well as the correlation among different ore-controlling factors by employing the AlexNet networks. By comparing the training loss, training accuracy, verification accuracy, and recall of models trained by different scales of grids and different combinations of ore-controlling factor layers, we further discuss the most appropriate scale division and the optimal combination of ore-controlling factors to make the model achieve its strongest robustness. It is found that the prediction performance of AlexNet networks reaches its peak when selecting a grid division of 200 pixels × 200 pixels (the actual distance is 10 km × 10 km) and inputting the distribution layers of 21 chemical elements maps, lithofacies–paleogeographic map, formation and tectonic map, outcrop map of Datangpo Formation, and water system map. The training loss, training accuracy, verification accuracy, and recall of the optimal model are 0.0000001, 100.00%, 86.21%, and 91.67%, respectively. The proposed method is successfully applied to the 2-d metallogenic prediction in Songtao–Huayuan study area. And five metallogenic prospective areas from A to E are delineated with large probability for potential ore bodies. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
50. Bioacoustic detection with wavelet-conditioned convolutional neural networks.
- Author
-
Kiskin, Ivan, Zilli, Davide, Li, Yunpeng, Sinka, Marianne, Willis, Kathy, and Roberts, Stephen
- Subjects
ARTIFICIAL neural networks ,BIG data ,BIRD classification ,DEEP learning ,ACOUSTIC signal processing ,AVIAN anatomy - Abstract
Many real-world time series analysis problems are characterized by low signal-to-noise ratios and compounded by scarce data. Solutions to these types of problems often rely on handcrafted features extracted in the time or frequency domain. Recent high-profile advances in deep learning have improved performance across many application domains; however, they typically rely on large data sets that may not always be available. This paper presents an application of deep learning for acoustic event detection in a challenging, data-scarce, real-world problem. We show that convolutional neural networks (CNNs), operating on wavelet transformations of audio recordings, demonstrate superior performance over conventional classifiers that utilize handcrafted features. Our key result is that wavelet transformations offer a clear benefit over the more commonly used short-time Fourier transform. Furthermore, we show that features, handcrafted for a particular dataset, do not generalize well to other datasets. Conversely, CNNs trained on generic features are able to achieve comparable results across multiple datasets, along with outperforming human labellers. We present our results on the application of both detecting the presence of mosquitoes and the classification of bird species. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.