28 results
Search Results
2. Machine learning and big data analytics in bipolar disorder : A position paper from the International Society for Bipolar Disorders Big Data Task Force
- Author
-
Passos, Ives C., Ballester, Pedro L., Barros, Rodrigo C., Librenza-Garcia, Diego, Mwangi, Benson, Birmaher, Boris, Brietzke, Elisa, Hajek, Tomas, Lopez Jaramillo, Carlos, Mansur, Rodrigo B., Alda, Martin, Haarman, Bartholomeus C. M., Isometsa, Erkki, Lam, Raymond W., McIntyre, Roger S., Minuzzi, Luciano, Kessing, Lars V., Yatham, Lakshmi N., Duffy, Anne, Kapczinski, Flavio, Department of Psychiatry, HUS Psychiatry, and University of Helsinki
- Subjects
bipolar disorder ,PREDICTING SUICIDALITY ,RISK ,MOOD DISORDERS ,SYMPTOMS ,predictive psychiatry ,education ,3112 Neurosciences ,deep learning ,data mining ,ASSOCIATION ,personalized psychiatry ,DEPRESSION ,CLASSIFICATION ,3124 Neurology and psychiatry ,risk prediction ,machine learning ,big data ,LITHIUM RESPONSE ,SCHIZOPHRENIA ,NEUROPROGRESSION - Abstract
Objectives The International Society for Bipolar Disorders Big Data Task Force assembled leading researchers in the field of bipolar disorder (BD), machine learning, and big data with extensive experience to evaluate the rationale of machine learning and big data analytics strategies for BD. Method A task force was convened to examine and integrate findings from the scientific literature related to machine learning and big data based studies to clarify terminology and to describe challenges and potential applications in the field of BD. We also systematically searched PubMed, Embase, and Web of Science for articles published up to January 2019 that used machine learning in BD. Results The results suggested that big data analytics has the potential to provide risk calculators to aid in treatment decisions and predict clinical prognosis, including suicidality, for individual patients. This approach can advance diagnosis by enabling discovery of more relevant data-driven phenotypes, as well as by predicting transition to the disorder in high-risk unaffected subjects. We also discuss the most frequent challenges that big data analytics applications can face, such as heterogeneity, lack of external validation and replication of some studies, cost and non-stationary distribution of the data, and lack of appropriate funding. Conclusion Machine learning-based studies, including atheoretical data-driven big data approaches, provide an opportunity to more accurately detect those who are at risk, parse-relevant phenotypes as well as inform treatment selection and prognosis. However, several methodological challenges need to be addressed in order to translate research findings to clinical settings.
- Published
- 2019
3. Machine learning and big data analytics in bipolar disorder
- Author
-
Luciano Minuzzi, Erkki Isometsä, Elisa Brietzke, Diego Librenza-Garcia, Anne Duffy, Martin Alda, Benson Mwangi, Flávio Kapczinski, Rodrigo B. Mansur, Boris Birmaher, Bartholomeus C M Haarman, Roger S. McIntyre, Lars Vedel Kessing, Raymond W. Lam, Lakshmi N. Yatham, Pedro Ballester, Tomas Hajek, Ives Cavalcante Passos, Carlos López Jaramillo, and Rodrigo C. Barros
- Subjects
SYMPTOMS ,Computer science ,Big data ,Scientific literature ,computer.software_genre ,Field (computer science) ,Terminology ,risk prediction ,0302 clinical medicine ,big data ,SCHIZOPHRENIA ,NEUROPROGRESSION ,bipolar disorder ,RISK ,ASSOCIATION ,Prognosis ,DEPRESSION ,3. Good health ,Psychiatry and Mental health ,Phenotype ,machine learning ,LITHIUM RESPONSE ,MOOD DISORDERS ,Schizophrenia (object-oriented programming) ,Advisory Committees ,Clinical Decision-Making ,education ,Machine learning ,Risk Assessment ,CLASSIFICATION ,Suicidal Ideation ,03 medical and health sciences ,medicine ,Humans ,Bipolar disorder ,Biological Psychiatry ,PREDICTING SUICIDALITY ,business.industry ,Deep learning ,predictive psychiatry ,Data Science ,deep learning ,data mining ,medicine.disease ,personalized psychiatry ,030227 psychiatry ,Position paper ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery - Abstract
OBJECTIVES: The International Society for Bipolar Disorders Big Data Task Force assembled leading researchers in the field of bipolar disorder (BD), machine learning, and big data with extensive experience to evaluate the rationale of machine learning and big data analytics strategies for BD.METHOD: A task force was convened to examine and integrate findings from the scientific literature related to machine learning and big data based studies to clarify terminology and to describe challenges and potential applications in the field of BD. We also systematically searched PubMed, Embase, and Web of Science for articles published up to January 2019 that used machine learning in BD.RESULTS: The results suggested that big data analytics has the potential to provide risk calculators to aid in treatment decisions and predict clinical prognosis, including suicidality, for individual patients. This approach can advance diagnosis by enabling discovery of more relevant data-driven phenotypes, as well as by predicting transition to the disorder in high-risk unaffected subjects. We also discuss the most frequent challenges that big data analytics applications can face, such as heterogeneity, lack of external validation and replication of some studies, cost and non-stationary distribution of the data, and lack of appropriate funding.CONCLUSION: Machine learning-based studies, including atheoretical data-driven big data approaches, provide an opportunity to more accurately detect those who are at risk, parse-relevant phenotypes as well as inform treatment selection and prognosis. However, several methodological challenges need to be addressed in order to translate research findings to clinical settings.
- Published
- 2019
4. Review of Text Classification Methods on Deep Learning.
- Author
-
Hongping Wu, Yuling Liu, and Jingwen Wang
- Subjects
DEEP learning ,NATURAL language processing ,BIG data ,CLASSIFICATION ,FEATURE extraction ,MACHINE learning - Abstract
Text classification has always been an increasingly crucial topic in natural language processing. Traditional text classification methods based on machine learning have many disadvantages such as dimension explosion, data sparsity, limited generalization ability and so on. Based on deep learning text classification, this paper presents an extensive study on the text classification models including Convolutional Neural Network-Based (CNN-Based), Recurrent Neural Network-Based (RNN-based), Attention Mechanisms-Based and so on. Many studies have proved that text classification methods based on deep learning outperform the traditional methods when processing large-scale and complex datasets. The main reasons are text classification methods based on deep learning can avoid cumbersome feature extraction process and have higher prediction accuracy for a large set of unstructured data. In this paper, we also summarize the shortcomings of traditional text classification methods and introduce the text classification process based on deep learning including text preprocessing, distributed representation of text, text classification model construction based on deep learning and performance evaluation. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
5. A machine learning-based approach for classifying tourists and locals using geotagged photos: the case of Tokyo.
- Author
-
Derdouri, Ahmed and Osaragi, Toshihiro
- Subjects
DEEP learning ,URBAN planning ,TOURISTS ,RANDOM forest algorithms ,MACHINE learning - Abstract
In tourism-dependent cities, investigating the spatiotemporal distribution and dynamics of tourist flows is crucial for better urban planning in both steady and perturbed states. In recent years, researchers have started relying more on photo-based, geotagged social data, which offer insights about tourists, popular hotspots, and mobility patterns. However, distinguishing between tourists and locals from this data is problematic since residence information is often not provided. While previous studies rely on heuristic (e.g., period of stay) and probabilistic (Shannon entropy) approaches, this paper proposes a method for classifying tourists and residents based on machine learning (ML) algorithms and considering parameters that could explain the variability between the two (e.g., weather, mobility, and photo content). This approach was applied to Flickr users' geotagged photos taken in Tokyo's 23 special wards from July 2008 to December 2019. The results show that stacked ensemble (SE) models are superior to models based on five supervised-learning algorithms, including gradient boosting machine (GBM), generalized linear model (GLM), distributed random forest (DRF), deep learning (DL), and extremely randomized trees (XRT). Temporal entropy (TEN), mobility on workdays, and frequent visits to amusement venues and crowded places influenced how users were classified. While temporal distribution showed similar monthly/hourly patterns, spatial distribution varied. The proposed approach might pave the way for scholars to carry out future tourism research on different topics and subsequently support policymakers in the decision-making process, specifically in urban settings. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Big data classification using deep learning and apache spark architecture.
- Author
-
Brahmane, Anilkumar V. and Krishna, B. Chaitanya
- Subjects
BIG data ,DEEP learning ,MACHINE learning ,CLASSIFICATION ,MATHEMATICAL optimization ,BIOGRAPHY (Literary form) - Abstract
The oddity in large information is rising step by step so that the current programming instruments faces trouble in supervision of huge information. Moreover, the pace of the irregularity information in the immense datasets is a key imperative to the exploration business. Along these lines, this paper proposes a novel method for taking care of the large information utilizing Spark structure. The proposed method experiences two stages for arranging the enormous information, which includes highlight choice and arrangement, which is acted in the underlying hubs of Spark engineering. The proposed improvement calculation is named Rider Chaotic Biography streamlining (RCBO) calculation, which is the incorporation of the Rider Optimization Algorithm (ROA) and the standard confused biogeography-based-advancement (CBBO). The proposed RCBO-profound stacked auto-encoder utilizing Spark structure successfully handles the large information for achieving powerful huge information arrangement. Here, the proposed RCBO is utilized for choosing reasonable highlights from the monstrous dataset. Besides, the profound stacked auto-encoder utilizes RCBO for preparing so as to characterize colossal datasets. In this research we focused on problem of supervision related to big information of The Cover type Data in UCI machine learning repository. The dataset describes the forest cover set data to predict the forest cover type from cartographic variables. The dataset is multivariate in nature with number of web hits 263,361. The number of instances is 581012 with 54 numbers of attributes and the task associated for the dataset is classification. The examination of the proposed RCBO-profound stacked auto-encoder-based Spark structure utilizing the UCI AI datasets uncovered that the proposed technique beat different strategies, by procuring maximal exactness of 86.71%, dice coefficient of 92.7%, affectability of 75.2% and explicitness of 95.4% separately. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
7. A survey of big data classification strategies.
- Author
-
Banchhor, Chitrakant and Srinivasu, N.
- Subjects
BIG data ,MACHINE learning ,ARTIFICIAL intelligence ,DEEP learning ,DATA mining - Abstract
Big data plays nowadays a major role in finance, industry, medicine, and various other fields. In this survey, 50 research papers are reviewed regarding different big data classification techniques presented and/or used in the respective studies. The classification techniques are categorized into machine learning, evolutionary intelligence, fuzzy-based approaches, deep learning and so on. The research gaps and the challenges of the big data classification, faced by the existing techniques are also listed and described, which should help the researchers in enhancing the effectiveness of their future works. The research papers are analyzed for different techniques with respect to software tools, datasets used, publication year, classification techniques, and the performance metrics. It can be concluded from the here presented survey that the most frequently used big data classification methods are based on the machine learning techniques and the apparently most commonly used dataset for big data classification is the UCI repository dataset. The most frequently used performance metrics are accuracy and execution time. [ABSTRACT FROM AUTHOR]
- Published
- 2020
8. WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network.
- Author
-
Hassib, Eslam. M., El-Desouky, Ali. I., Labib, Labib. M., and El-kenawy, El-Sayed M.
- Subjects
MULTILAYER perceptrons ,PARTICLE swarm optimization ,RECURRENT neural networks ,MACHINE learning ,FEATURE selection ,WHALES ,BIG data ,REGRESSION trees - Abstract
Nowadays, big data plays a substantial part in information knowledge analysis, manipulation, and forecasting. Analyzing and extracting knowledge from such big datasets are a very challenging task due to the imbalance of data distribution, which could lead to a biased classification results and wrong decisions. The standard classifiers are not capable of handling such datasets. Hence, a new technique for dealing with such datasets is required. This paper proposes a novel classification framework for big data that consists of three developed phases. The first phase is the feature selection phase, which uses the Whale optimization algorithm (WOA) for finding the best set of features. The second phase is the preprocessing phase, which uses the SMOTE algorithm and the LSH-SMOTE algorithm for solving the class imbalance problem. Lastly, the third phase is WOA + BRNN algorithm, which is using the Whale optimization algorithm for training a deep learning approach called bidirectional recurrent neural network for the first time. Our proposed algorithm WOA-BRNN has been tested against nine highly imbalanced datasets one of them is big dataset in terms of area under curve (AUC) against four of the most common use machine learning algorithms (Naïve Bayes, AdaBoostM1, decision table, random tree), in addition to GWO-MLP (training multilayer perceptron using Gray Wolf Optimizer), then we test our algorithm over four well-known datasets against GWO-MLP and particle swarm optimization (PSO-MLP), genetic algorithm (GA-MLP), ant colony optimization (ACO-MLP), evolution strategy (ES-MLP), and population-based incremental learning (PBIL-MLP) in terms of classification accuracy. Experimental results proved that our proposed algorithm WOA + BRNN has achieved promising accuracy and high local optima avoidance, and outperformed four of the most common use machine learning algorithms, and GWO-MLP in terms of AUC. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
9. An Efficient Intrusion Detection Method Based on LightGBM and Autoencoder.
- Author
-
Tang, Chaofei, Luktarhan, Nurbol, and Zhao, Yuxin
- Subjects
MACHINE learning ,ALGORITHMS ,INFORMATION commons ,BIG data ,DECISION trees ,DEEP learning - Abstract
Due to the insidious characteristics of network intrusion behaviors, developing an efficient intrusion detection system is still a big challenge, especially in the era of big data where the number of traffic and the dimension of each traffic feature are high. Because of the shortcomings of traditional common machine learning algorithms in network intrusion detection, such as insufficient accuracy, a network intrusion detection system based on LightGBM and autoencoder (AE) is proposed. The LightGBM-AE model proposed in this paper includes three steps: data preprocessing, feature selection, and classification. The LightGBM-AE model adopts the LightGBM algorithm for feature selection, and then uses an autoencoder for training and detection. When a set of data containing network intrusion behaviors are inputted into an autoencoder, there is a large reconstruction error between the original input data and the reconstructed data obtained by the autoencoder, which provides a basis for intrusion detection. According to the reconstruction error, an appropriate threshold is set to distinguish symmetrically between normal behavior and attack behavior. The experiment is carried out on the NSL-KDD dataset and implemented using Pytorch. In addition to autoencoder, variational autoencoder (VAE) and denoising autoencoder (DAE) are also used for intrusion detection and are compared with existing machine learning algorithms such as Decision Tree, Random Forest, KNN, GBDT, and XGBoost. The evaluation is carried out through classification evaluation indexes such as accuracy, precision, recall, F1-score. The experimental results show that the method can efficiently separate the attack behavior from normal behavior according to the reconstruction error. Compared with other methods, the effectiveness and superiority of this method are verified. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Classification of Urban Surface Elements by Combining Multisource Data and Ontology.
- Author
-
Zhu, Ling, Lu, Yuzhen, and Fan, Yewen
- Subjects
DEEP learning ,ONTOLOGY ,MACHINE learning ,REMOTE sensing ,BIG data ,CLASSIFICATION - Abstract
The rapid pace of urbanization and increasing demands for urban functionalities have led to diversification and complexity in the types of urban surface elements. The conventional approach of relying solely on remote sensing imagery for urban surface element extraction faces emerging challenges. Data-driven techniques, including deep learning and machine learning, necessitate a substantial number of annotated samples as prerequisites. In response, our study proposes a knowledge-driven approach that integrates multisource data with ontology to achieve precise urban surface element extraction. Within this framework, components from the EIONET Action Group on Land Monitoring in Europe matrix serve as ontology primitives, forming a shared vocabulary. The semantics of surface elements are deconstructed using these primitives, enabling the creation of specific descriptions for various types of urban surface elements by combining these primitives. Our approach integrates multitemporal high-resolution remote sensing data, network big data, and other heterogeneous data sources. It segments high-resolution images into individual patches, and for each unit, urban surface element classification is accomplished through semantic rule-based inference. We conducted experiments in two regions with varying levels of urban scene complexity, achieving overall accuracies of 93.03% and 97.35%, respectively. Through this knowledge-driven approach, our proposed method significantly enhances the classification performance of urban surface elements in complex scenes, even in the absence of sample data, thereby presenting a novel approach to urban surface element extraction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. An Improved Big Data Analytics Architecture for Intruder Classification Using Machine Learning.
- Author
-
Babar, Muhammad, Kaleem, Sarah, Sohail, Adnan, Asim, Muhammad, and Tariq, Muhammad Usman
- Subjects
BIG data ,MACHINE learning ,DEEP learning ,COMPUTER network security ,REPUTATION ,CLASSIFICATION - Abstract
The approval of retrieving information on the Internet originates several network securities matters. Intrusion recognition is a critical study in network security to spot unauthorized admission or occurrences on protected networks. Intrusion detection has a fully-fledged reputation in the current era. Research emphasizes several datasets to upsurge system precision and lessen the false-positive proportion. This article proposes a new intrusion detection system using big data analytics and deep learning to address some of the misuse and irregularity detection limitations. The proposed method could identify any odd activities in a network to recognize malicious or unauthorized action and permit a response during a confidentiality break. The proposed system utilizes the big data analytics platform based on parallel and distributed mechanisms. The parallel and distributed platforms improve the training time along with the accuracy. The experimentation appropriately classifies the information as either normal or abnormal. The proposed system has a recognition proportion of 96.11% that pointedly expands overall recognition accuracy related to existing strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Deep Learning Model for Big Data Classification in Apache Spark Environment.
- Author
-
Nithya, T. M., Umanesan, R., Kalavathidevi, T., Selvarathi, C., and Kavitha, A.
- Subjects
DEEP learning ,BIG data ,METAHEURISTIC algorithms ,FEATURE selection ,MACHINE learning ,DATA modeling - Abstract
Big data analytics is a popular research topic due to its applicability in various real time applications. The recent advent of machine learning and deep learning models can be applied to analyze big data with better performance. Since big data involves numerous features and necessitates high computational time, feature selection methodologies using metaheuristic optimization algorithms can be adopted to choose optimum set of features and thereby improves the overall classification performance. This study proposes a new sigmoid butterfly optimization method with an optimum gated recurrent unit (SBOA-OGRU) model for big data classification in Apache Spark. The SBOA-OGRU technique involves the design of SBOA based feature selection technique to choose an optimum subset of features. In addition, OGRU based classification model is employed to classify the big data into appropriate classes. Besides, the hyperparameter tuning of the GRU model takes place using Adam optimizer. Furthermore, the Apache Spark platform is applied for processing big data in an effective way. In order to ensure the betterment of the SBOA-OGRU technique, a wide range of experiments were performed and the experimental results highlighted the supremacy of the SBOAOGRU technique. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Needs of Scientometry and Possibilities of Modern Machine Learning as a Field of Artificial Intelligence.
- Author
-
Melnikova, E. V.
- Abstract
A general description of modern scientometry, its main tasks, and its research methods is presented. The issues of the application of conventional machine learning and deep learning algorithms as tools of artificial intelligence in the thematic classification of scientific literature are considered. The problems and limitations of the classification of literature by sections of science in the systems of indexing and citing of scientific information are outlined. The author presents a specific example of a deep learning application for by-article thematic classification based on convolutional neural networks that was designed by scientists from the United Arab Emirates and Jordan. The article emphasizes the importance of the use of deep learning applications and models for creating correct classifications of the scientific literature that correspond to the realities of the development of science and that are capable of increasing the accuracy of calculating scientometric indicators. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. A Deep Learning-Based Decision Support System for Mobile Performance Marketing.
- Author
-
Matos, Luís Miguel, Cortez, Paulo, Mendes, Rui, and Moreau, Antoine
- Subjects
MOBILE learning ,DECISION support systems ,ARTIFICIAL neural networks ,MARKETING ,BIG data ,DEEP learning ,RANDOM forest algorithms ,MACHINE learning - Abstract
In Mobile Performance Marketing (MPM), monetary compensation only occurs when an advertisement results in a conversion (e.g., sale of a product or service). In this work, we propose an intelligent decision support system (IDSS) to automatically select mobile marketing campaigns for users. The IDSS is based on a computationally efficient mobile user conversion prediction model that assumes a novel Percentage Categorical Pruning (PCP) categorical preprocessing and an online deep multilayer perceptron (MLP) reuse model (MLPr). Using private (nonpublicly available) business MPM data provided by a marketing company, the MLPr model outperformed an offline multilayer perceptron and a logistic regression, obtaining a high quality class discrimination when applied to sampled (85% to 92%) and complete (90% to 94%) data. In addition, the MLPr compared favorably with other machine learning (ML) models (e.g., Random Forest, XGBoost), as well as with other deep neural networks (e.g., diamond shaped). Moreover, we designed two strategies (A — best campaign selection; and B — random selection among the top candidate campaigns) to build the IDSS, in which the predictive deep learning model is used to perform a real-time selection of advertisement campaigns for mobile users. Using recently collected big data (with millions of redirect events) from a worldwide MPM company, we performed a realistic IDSS evaluation that considered three criteria: response time, potential profit and advertiser diversity. Overall, competitive results were achieved by the IDSS B strategy when compared with the current marketing company ad assignment method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Diabetes Monitoring System in Smart Health Cities Based on Big Data Intelligence.
- Author
-
AlZu'bi, Shadi, Elbes, Mohammad, Mughaid, Ala, Bdair, Noor, Abualigah, Laith, Forestiero, Agostino, and Zitar, Raed Abu
- Subjects
SMART cities ,BIG data ,URBAN health ,PEOPLE with diabetes ,DIABETES complications ,STRUCTURAL health monitoring ,DEEP learning ,DIABETIC neuropathies - Abstract
Diabetes is a metabolic disorder in which the body is unable to properly regulate blood sugar levels. It can occur when the body does not produce enough insulin or when cells become resistant to insulin's effects. There are two main types of diabetes, Type 1 and Type 2, which have different causes and risk factors. Early detection of diabetes allows for early intervention and management of the condition. This can help prevent or delay the development of serious complications associated with diabetes. Early diagnosis also allows for individuals to make lifestyle changes to prevent the progression of the disease. Healthcare systems play a vital role in the management and treatment of diabetes. They provide access to diabetes education, regular check-ups, and necessary medications for individuals with diabetes. They also provide monitoring and management of diabetes-related complications, such as heart disease, kidney failure, and neuropathy. Through early detection, prevention and management programs, healthcare systems can help improve the quality of life and outcomes for people with diabetes. Current initiatives in healthcare systems for diabetes may fail due to lack of access to education and resources for individuals with diabetes. There may also be inadequate follow-up and monitoring for those who have been diagnosed, leading to poor management of the disease and lack of prevention of complications. Additionally, current initiatives may not be tailored to specific cultural or demographic groups, resulting in a lack of effectiveness for certain populations. In this study, we developed a diabetes prediction system using a healthcare framework. The system employs various machine learning methods, such as K-nearest neighbors, decision tree, deep learning, SVM, random forest, AdaBoost and logistic regression. The performance of the system was evaluated using the PIMA Indians Diabetes dataset and achieved a training accuracy of 82% and validation accuracy of 80%. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Geometrically interpretable Variance Hyper Rectangle learning for pattern classification.
- Author
-
Sun, Jie, Gu, Huamao, Peng, Haoyu, Fang, Yili, and Wang, Xun
- Subjects
- *
DEEP learning , *MACHINE learning , *TRUST , *BIG data , *CLASSIFICATION - Abstract
Many current intrinsically interpretable machine learning models can only handle the data that are linear, low-dimensional, and relatively independent attributes and often with discrete attribute values, while the models that are capable of handling high-dimensional nonlinear data, like deep learning, have very poor interpretability. Based on the geometric characteristics, a new idea of accurately wrapping the data region with minimum-volume geometry is proposed for pattern classification. The Variance Hyper Rectangle (VHR) model presented in this paper is a realization of the idea. The VHR model uses the minimum-volume hyper rectangles, obtained through projection variance calculation, to wrap the regions occupied by a category of data, hence it has strong and clear geometric interpretability. In addition, the VHR model is well suited for large data volume, as it approaches the linear complexity in both time and space. Extensive qualitative and quantitative experiments are performed on seven real-world data sets, demonstrating that VHR outperforms the state-of-the-art interpretable methods while running quickly. [Display omitted] • VHR has strong geometric interpretability, and is much reliable and trustworthy. • VHR can provide a clear range of values in each direction for a category of data. • VHR naturally supports incremental learning without any extra processing. • VHR has great performance and stability, and is able to handle big data. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. MRI Brain Images Compression and Classification Using Different Classes of Neural Networks
- Author
-
El Boustani, Abdelhakim, El Bachari, Essaid, Barbosa, Simone Diniz Junqueira, Editorial Board Member, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Kotenko, Igor, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Attiogbé, Christian, editor, Ferrarotti, Flavio, editor, and Maabout, Sofian, editor
- Published
- 2019
- Full Text
- View/download PDF
18. A Novel Deep Learning Approach of Convolutional Neural Network and Random Forest Classifier for Fine-grained Sentiment Classification.
- Author
-
C. G., Siji George and Sumathi, B.
- Subjects
CONVOLUTIONAL neural networks ,DEEP learning ,RANDOM forest algorithms ,MACHINE learning ,SENTIMENT analysis ,CLASSIFICATION - Abstract
Deep learning became more popular in recent years. It is widely used for different machine learning tasks. One such task is sentiment prediction on a text document. Fine-grained sentiment analysis is highly recommended since most of the researchers are focusing on binary sentiment classification. In this work, a new model which combines the benefits of both Convolutional Neural Network (CNN) and Random Forest (RF) Classifier is proposed for finegrained sentiment classification. The main idea of the proposed model is to achieve maximum accuracy for sentiment classification on large volume of data. The CBOW (Continuous Bag-of-Words) model is used for converting the text input into vector form. Convolutional Neural Network (CNN) is used to extract the features from the input vector. The fully connected layer in the Convolutional Neural Network is replaced by the Random Forest classifier. Then the extracted features are used for the classification process by Random Forest Classifier. A dropout strategy is applied to regularize the CNNRF model to avoid overfitting. Sentiment analysis is performed on product review data by using CNN and RF model separately. The result of the CNN model and RF model is compared with the result of the proposed CNNRF model. The experiment result shows that the combined CNNRF model gave high performance than independent CNN and RF models. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
19. Machine learning classification of entrepreneurs in British historical census data
- Author
-
Piero Montebruno, Robert J. Bennett, Carry Van Lieshout, Harry Smith, Bennett, Robert [0000-0003-3940-1760], and Apollo - University of Cambridge Repository
- Subjects
Census ,Boosting (machine learning) ,Computer science ,QA75 Electronic computers. Computer science ,Logistic regression ,02 engineering and technology ,Library and Information Sciences ,Management Science and Operations Research ,Machine learning ,computer.software_genre ,Big data ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,AdaBoost ,Artificial neural network ,business.industry ,Deep learning ,Classification ,Ensemble learning ,Computer Science Applications ,Support vector machine ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,Binary classification ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Information Systems - Abstract
This paper presents a binary classification of entrepreneurs in British historical data based on the recent availability of big data from the I-CeM dataset. The main task of the paper is to attribute an employment status to individuals that did not fully report entrepreneur status in earlier censuses (1851-1881). The paper assesses the accuracy of different classifiers and machine learning algorithms, including Deep Learning, for this classification problem. We first adopt a ground-truth dataset from the later censuses to train the computer with a Logistic Regression (which is standard in the literature for this kind of binary classification) to recognize entrepreneurs distinct from non-entrepreneurs (i.e. workers). Our initial accuracy for this base-line method is 0.74. We compare the Logistic Regression with ten optimized machine learning algorithms: Nearest Neighbors, Linear and Radial Support Vector Machine, Gaussian Process, Decision Tree, Random Forest, Neural Network, AdaBoost, Naive Bayes, and Quadratic Discriminant Analysis. The best results are boosting and ensemble methods. AdaBoost achieves an accuracy of 0.95. Deep-Learning, as a standalone category of algorithms, further improves accuracy to 0.96 without using the rich text-data that characterizes the OccString feature, a string of up to 500 characters with the full occupational statement of each individual collected in the earlier censuses. Finally, and now using this OccString feature, we implement both shallow (bag-of-words algorithm) learning and Deep Learning (Recurrent Neural Network with a Long Short-Term Memory layer) algorithms. These methods all achieve accuracies above 0.99 with Deep Learning Recurrent Neural Network as the best model with an accuracy of 0.9978. The results show that standard algorithms for classification can be outperformed by machine learning algorithms. This confirms the value of extending the techniques traditionally used in the literature for this type of classification problem.
- Published
- 2020
20. Deep Bayesian network architecture for Big Data mining.
- Author
-
Njah, Hasna, Jamoussi, Salma, and Mahdi, Walid
- Subjects
BIG data ,MACHINE learning ,ARTIFICIAL intelligence ,BAYESIAN analysis ,DEEP learning - Abstract
Summary Classical Datamining methods are facing various challenges in the era of Big Data. Between the need of fast knowledge extraction and the high flows of data acquired in small slots of time, these methods became shifted. The variability and the veracity of the Big Data perplex the Machine Learning process. The high volume of Big Data yields to a congested learning because the classic methods are designed for small sets of features. Deep Learning has recently emerged in the aim of handling voluminous data. The concept of the Deep induces the conversion of the features into a new abstracted representation in order to optimize an objective. Although the Deep Learning methods are experimentally promising, their parameterization is exhaustive and empirical. To tackle these problems, we utilize the causality and the uncertainty of the Bayesian Network in order to propose a new Deep Bayesian Network architecture. We provide a new learning algorithm for this multi‐layered Bayesian Network with latent variables. We evaluate the proposed architecture and learning algorithms over benchmark datasets. We used high‐dimensional data in order to simulate the Big Data challenges, which are imposed by the volume and veracity aspects. We demonstrate the effectiveness of our contribution under these constraints. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
21. RSSI-Based for Device-Free Localization Using Deep Learning Technique
- Author
-
Norasmadi Abdul Rahim, Abdul Syafiq Abdull Sukor, Latifah Munirah Kamarudin, Sukhairi Sudin, Ammar Zakaria, and Hiromitsu Nishizaki
- Subjects
wireless networks ,Computer science ,Big data ,02 engineering and technology ,Machine learning ,computer.software_genre ,Discriminative model ,big data ,0202 electrical engineering, electronic engineering, information engineering ,Wireless ,Wearable technology ,Data processing ,Learning classifier system ,machine learning classifier ,Wireless network ,business.industry ,Deep learning ,deep learning ,020206 networking & telecommunications ,device-free localization ,received signal strength ,classification ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
Device-free localization (DFL) has become a hot topic in the paradigm of the Internet of Things. Traditional localization methods are focused on locating users with attached wearable devices. This involves privacy concerns and physical discomfort especially to users that need to wear and activate those devices daily. DFL makes use of the received signal strength indicator (RSSI) to characterize the user&rsquo, s location based on their influence on wireless signals. Existing work utilizes statistical features extracted from wireless signals. However, some features may not perform well in different environments. They need to be manually designed for a specific application. Thus, data processing is an important step towards producing robust input data for the classification process. This paper presents experimental procedures using the deep learning approach to automatically learn discriminative features and classify the user&rsquo, s location. Extensive experiments performed in an indoor laboratory environment demonstrate that the approach can achieve 84.2% accuracy compared to the other basic machine learning algorithms.
- Published
- 2020
22. Distributed Framework for Automating Opinion Discretization From Text Corpora on Facebook
- Author
-
Vu Tuan Nguyen, Van Huy Pham, Hiep Xuan Huynh, Cang Thuong Phan, and Nghia Duong-Trung
- Subjects
Text corpus ,General Computer Science ,Discretization ,TensorFlow ,Computer science ,Big data ,Machine learning ,computer.software_genre ,Convolutional neural network ,Apache spark ,03 medical and health sciences ,0302 clinical medicine ,Text mining ,Robustness (computer science) ,convolutional neural networks ,General Materials Science ,030212 general & internal medicine ,business.industry ,030503 health policy & services ,General Engineering ,deep learning ,classification ,opinion mining ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,0305 other medical science ,business ,lcsh:TK1-9971 ,computer - Abstract
Nowadays, the consecutive increase of the volume of text corpora datasets and the countless research directions in general classification have created a great opportunity and an unprecedented demand for a comprehensive evaluation of the current achievement in the research of natural language processing. There are unfortunately few studies that have applied the combination of convolutional neural networks (CNN) and Apache Spark to the task of automating opinion discretization. In this paper, the authors propose a new distributed structure for solving an opinion classification problem in text mining by utilizing CNN models and big data technologies on Vietnamese text sources. The proposed framework consists of implementation concepts that are needed by a researcher to perform experiments on text discretization problems. It covers all the steps and components that are usually part of a completely practical text mining pipeline: acquiring input data, processing, tokenizing it into a vectorial representation, applying machine learning algorithms, performing the trained models to unseen data, and evaluating their accuracy. The development of the framework started with a specific focus on binary text discretization, but soon expanded toward many other text-categorization-based problems, distributed language modeling and quantification. Several intensive assessments have been investigated to prove the robustness and efficiency of the proposed framework. Resulting in high accuracy (72.99% ± 3.64) from the experiments, one can conclude that it is feasible to perform our proposed distributed framework to the task of opinion discretization on Facebook.
- Published
- 2019
23. Handling of uncertainty in medical data using machine learning and probability theory techniques: a review of 30 years (1991-2020)
- Author
-
Afsaneh Koohestani, U. Rajendra Acharya, Maryam Panahiazar, Dipti Srinivasan, Mohammad Hossein Zangooei, Adham Beykikhoshk, Moloud Abdar, Saeid Nahavandi, Sadiq Hussain, Mohamad Roshanzamir, Afshin Shoeibi, Roohallah Alizadehsani, Amir F. Atiya, Abbas Khosravi, and Assef Zare
- Subjects
Computer science ,Big data ,Bayesian inference ,0211 other engineering and technologies ,General Decision Sciences ,02 engineering and technology ,Management Science and Operations Research ,Machine learning ,computer.software_genre ,Probability theory ,Health care ,Monte Carlo simulation ,Original Research ,021103 operations research ,business.industry ,Deep learning ,Uncertainty ,Fuzzy systems ,Classification ,Noise ,Theory of computation ,Artificial intelligence ,business ,Raw data ,computer - Abstract
Understanding the data and reaching accurate conclusions are of paramount importance in the present era of big data. Machine learning and probability theory methods have been widely used for this purpose in various fields. One critically important yet less explored aspect is capturing and analyzing uncertainties in the data and model. Proper quantification of uncertainty helps to provide valuable information to obtain accurate diagnosis. This paper reviewed related studies conducted in the last 30 years (from 1991 to 2020) in handling uncertainties in medical data using probability theory and machine learning techniques. Medical data is more prone to uncertainty due to the presence of noise in the data. So, it is very important to have clean medical data without any noise to get accurate diagnosis. The sources of noise in the medical data need to be known to address this issue. Based on the medical data obtained by the physician, diagnosis of disease, and treatment plan are prescribed. Hence, the uncertainty is growing in healthcare and there is limited knowledge to address these problems. Our findings indicate that there are few challenges to be addressed in handling the uncertainty in medical raw data and new models. In this work, we have summarized various methods employed to overcome this problem. Nowadays, various novel deep learning techniques have been proposed to deal with such uncertainties and improve the performance in decision making.
- Published
- 2021
24. An accurate and dynamic predictive model for a smart M-Health system using machine learning
- Author
-
Kashif Naseer Qureshi, Francesco Piccialli, Sadia Din, Gwanggil Jeon, Naseer Qureshi, K., Din, S., Jeon, G., and Piccialli, F.
- Subjects
Information Systems and Management ,Computer science ,SVM ,Big data ,Decision tree ,Cloud computing ,02 engineering and technology ,Machine learning ,computer.software_genre ,Predictive ,Theoretical Computer Science ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Android (operating system) ,Accuracy ,business.industry ,Deep learning ,05 social sciences ,050301 education ,Classification ,Computer Science Applications ,Support vector machine ,Control and Systems Engineering ,M-Health ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0503 education ,Cloud storage ,computer ,Software ,Model - Abstract
Nowadays, new highly-developed technologies are changing traditional processes related to medical and healthcare systems. Emerging Mobile Health (M-Health) systems are examples of novel technologies based on advanced data communication, deep learning, artificial intelligence, cloud computing, big data, and other machine learning methods. Data are collected from sensor nodes and forwarded to local databases through new technologies that enable cellular networks and then store the information in cloud storage systems. From cloud computing services or medical centres, the data are collected for further analysis. Furthermore, machine learning techniques are being used for accurate prediction of disease analysis and for purposes of classification. This paper presents a detailed overview of M-Health systems, their model and architecture, technologies and applications and also discusses statistical and machine learning approaches. We also propose a secure Android-based architecture to collect patient data, a reliable cloud-based model for data storage. Finally, a predictive model able to classify cardiovascular diseases according to their seriousness will be discussed. Moreover, the proposed prediction model has been compared with existing models in terms of accuracy, sensitivity, and specificity. The experimental results show encouraging results in terms of the proposed predictive model for an M-Health system. Keywords: Machine Learning, Predictive, Models, M-Health, Classification, SVM, Decision Tree, Accuracy
- Published
- 2020
25. Using deep learning for mobile marketing user conversion prediction
- Author
-
Paulo Cortez, Antoine Moreau, Rui Mendes, Luís Miguel Matos, and Universidade do Minho
- Subjects
Big Data ,Computer science ,Big data ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Conversion Rate (CVR) ,Pruning (decision trees) ,Categorical variable ,Science & Technology ,business.industry ,Multilayer Perceptron ,Deep learning ,Categorical Transformation ,Ciências Naturais::Ciências da Computação e da Informação ,Perceptron ,Classification ,Mobile Performance Marketing ,Binary classification ,Multilayer perceptron ,020201 artificial intelligence & image processing ,Data pre-processing ,Artificial intelligence ,business ,computer - Abstract
Mobile performance marketing is a growing industry due to the massive adoption of smartphones and tablets. In this paper, we explore Deep Multilayer Perceptrons (MLP) to predict the Conversion Rate (CVR) of mobile users that are redirected to ad campaigns (i.e., if there will be a sale). We analyze recent real-world big data provided by a global mobile marketing company. Using a realistic rolling window validation, we conducted several experiments with different datasets (two sampling and two data traffic modes), in which we measure both the predictive binary classification performance and the computational effort. The modeling experiments include: two data preprocessing methods, the popular one-hot encoding and a proposed Percentage Categorical Pruning (PCP); and two MLP learning modes, offline (reset) and online (reuse). Overall, competitive classification results were achieved by the PCP transform and the two MLP learning modes, producing real-time predictions and comparing favorably against a Convolutional Neural Network and a Logistic Regression., This article is a result of the project NORTE-01-0247-FEDER-017497, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). This work was also supported by FCT Fundac¸ao para a Ciência e Tecnologia within the Project ˆScope: UID/CEC/00319/2019.
- Published
- 2019
26. A Deep Learning Streaming Methodology for Trajectory Classification.
- Author
-
Kontopoulos, Ioannis, Makris, Antonios, and Tserpes, Konstantinos
- Subjects
MACHINE learning ,AUTOMATIC identification ,CLASSIFICATION ,COMPUTER vision ,STREAMING technology ,BIG data ,DEEP learning - Abstract
Due to the vast amount of available tracking sensors in recent years, high-frequency and high-volume streams of data are generated every day. The maritime domain is no different as all larger vessels are obliged to be equipped with a vessel tracking system that transmits their location periodically. Consequently, automated methodologies able to extract meaningful information from high-frequency, large volumes of vessel tracking data need to be developed. The automatic identification of vessel mobility patterns from such data in real time is of utmost importance since it can reveal abnormal or illegal vessel activities in due time. Therefore, in this work, we present a novel approach that transforms streaming vessel trajectory patterns into images and employs deep learning algorithms to accurately classify vessel activities in near real time tackling the Big Data challenges of volume and velocity. Two real-world data sets collected from terrestrial, vessel-tracking receivers were used to evaluate the proposed methodology in terms of both classification and streaming execution performance. Experimental results demonstrated that the vessel activity classification performance can reach an accuracy of over 96 % while achieving sub-second latencies in streaming execution performance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
27. Deep learning for health informatics
- Author
-
Javier Andreu-Perez, Guang-Zhong Yang, Fani Deligianni, Melissa Berthelot, Daniele Ravi, Charence Wong, Benny Lo, and Engineering & Physical Science Research Council (EPSRC)
- Subjects
Technology ,Computer science ,Big data ,SEGMENTATION ,02 engineering and technology ,Health informatics ,030218 nuclear medicine & medical imaging ,0302 clinical medicine ,Health Administration Informatics ,wearable devices ,Health Information Management ,0202 electrical engineering, electronic engineering, information engineering ,Information system ,health informatics ,ARCHITECTURE ,Translational bioinformatics ,Computer Science, Information Systems ,Artificial neural network ,public health ,CONVOLUTIONAL NEURAL-NETWORKS ,Computer Science Applications ,machine learning ,020201 artificial intelligence & image processing ,Computer Science, Interdisciplinary Applications ,Life Sciences & Biomedicine ,Biotechnology ,MRI ,BIG DATA ,Bioinformatics ,medical imaging ,Monitoring, Ambulatory ,SEQUENCE ,CLASSIFICATION ,Multimodality ,03 medical and health sciences ,Humans ,Electrical and Electronic Engineering ,Science & Technology ,business.industry ,MEDICINE ,Deep learning ,RECOGNITION ,Computational Biology ,deep learning ,Data science ,MODEL ,Computer Science ,Artificial intelligence ,Mathematical & Computational Biology ,business ,Medical Informatics - Abstract
With a massive influx of multimodality data, the role of data analytics in health informatics has grown rapidly in the last decade. This has also prompted increasing interests in the generation of analytical, data driven models based on machine learning in health informatics. Deep learning, a technique with its foundation in artificial neural networks, is emerging in recent years as a powerful tool for machine learning, promising to reshape the future of artificial intelligence. Rapid improvements in computational power, fast data storage, and parallelization have also contributed to the rapid uptake of the technology in addition to its predictive power and ability to generate automatically optimized high-level features and semantic interpretation from the input data. This article presents a comprehensive up-to-date review of research employing deep learning in health informatics, providing a critical analysis of the relative merit, and potential pitfalls of the technique as well as its future outlook. The paper mainly focuses on key applications of deep learning in the fields of translational bioinformatics, medical imaging, pervasive sensing, medical informatics, and public health.
- Published
- 2017
28. Popularity Prediction of Instagram Posts.
- Author
-
Carta, Salvatore, Podda, Alessandro Sebastian, Recupero, Diego Reforgiato, Saia, Roberto, and Usai, Giovanni
- Subjects
FORECASTING ,NATURAL language processing ,POPULARITY ,BIG data ,DEEP learning ,SOCIAL networks - Abstract
Predicting the popularity of posts on social networks has taken on significant importance in recent years, and several social media management tools now offer solutions to improve and optimize the quality of published content and to enhance the attractiveness of companies and organizations. Scientific research has recently moved in this direction, with the aim of exploiting advanced techniques such as machine learning, deep learning, natural language processing, etc., to support such tools. In light of the above, in this work we aim to address the challenge of predicting the popularity of a future post on Instagram, by defining the problem as a classification task and by proposing an original approach based on Gradient Boosting and feature engineering, which led us to promising experimental results. The proposed approach exploits big data technologies for scalability and efficiency, and it is general enough to be applied to other social media as well. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.