Database: Academic Search Index / Journal: knowledge-based systems / Topic: deep learning and machine learning - Searchworks@Jio Institute Digital Library Search Results

Showing total 53 results

Start Over Topic deep learning Topic machine learning Journal knowledge-based systems Database Academic Search Index

53 results

1. Understanding cheese ripeness: An artificial intelligence-based approach for hierarchical classification.

Author: Zedda, Luca, Perniciano, Alessandra, Loddo, Andrea, and Di Ruberto, Cecilia
Abstract: Within the contemporary dairy industry, the effective monitoring of cheese ripeness constitutes a critical yet challenging task. This paper proposes the first public dataset encompassing images of cheese wheels that depict various products at distinct stages of ripening and introduces an innovative hybrid approach, integrating machine learning and computer vision techniques to automate the detection of cheese ripeness. By leveraging deep learning and shallow learning techniques, the proposed method endeavors to overcome the limitations associated with conventional assessment methodologies. It aims to provide automation, precision, and consistency in the evaluation of cheese ripeness, delving into a hierarchical classification for the simultaneous classification of distinct cheese types and ripeness levels and presenting a comprehensive solution to enhance the efficiency of the cheese production process. By employing a lightweight hierarchical feature aggregation methodology, this investigation navigates the intricate landscape of preprocessing steps, feature selection, and diverse classifiers. We report a noteworthy achievement, attaining a best F-measure score of 0.991 through the merging of features extracted from EfficientNet and DarkNet-53, opening the field to concretely address the complexity inherent in cheese quality assessment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Deep transfer learning for automatic speech recognition: Towards better generalization.

Author: Kheddar, Hamza, Himeur, Yassine, Al-Maadeed, Somaya, Amira, Abbes, and Bensaali, Faycal
Subjects: *DEEP learning, *AUTOMATIC speech recognition, *LANGUAGE models, *ARTIFICIAL intelligence, *MACHINE learning, *ACOUSTIC models
Abstract: Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which cannot meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research. • ASR can be improved through language model, acoustic model, and multi-task learning. • Fine-tuning and domain adaptation are effective transfer learning methods in ASR. • DTL can enhance ASR using other domains, such as Cross-language and cross-corpus tasks. • DTL shows promise in improving ASR accuracy nor medical diagnosis. • DTL plays a role in enhancing ASR security against both white and black box attacks. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. Multi-label learning with kernel extreme learning machine autoencoder.

Author: Cheng, Yusheng, Zhao, Dawei, Wang, Yibin, and Pei, Gensheng
Subjects: *MACHINE learning, *DEEP learning, *STATISTICAL hypothesis testing, *LABELS, *ITERATIVE learning control, *STATISTICAL learning
Abstract: In multi-label learning, in order to improve the accuracy of classification, many scholars have considered the relationship between features and features, features and labels or labels and labels, but how to combine the correlation among them is rarely studied. Based on this, this paper proposes a multi-label learning algorithm with kernel extreme learning machine autoencoder. Firstly, the label space is reconstructed by using the non-equilibrium labels completion method in the label space. Then, the non-equilibrium labels space information is added to the input node of the kernel extreme learning machine autoencoder network, and the input features are output as the target. Finally, the kernel extreme learning machine is used for classification. Our method implements the information fusion between features and features, between labels and features, and between labels and labels. Compared with the traditional autoencoder network, the extreme learning machine autoencoder has no iterative process, which reduces the network training time and improves the classification accuracy. The experimental results of the proposed algorithm in the opening benchmark multi-label data sets show that the KELM-AE algorithm has some advantages over other comparative multi-label learning algorithms and the statistical hypothesis testing and stability analysis further illustrate the effectiveness of the proposed algorithm. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

4. FPCANet: Fisher discrimination for Principal Component Analysis Network.

Author: Sun, Kai, Zhang, Jiangshe, Yong, Hongwei, and Liu, Junmin
Subjects: *FISHER discriminant analysis, *DEEP learning, *MACHINE learning, *PROBLEM solving, *KERNEL functions
Abstract: Abstract With the development of Deep Learning (DL) in recent years, integrating traditional machine learning methods with DL has received a lot of attention. One of such representative work is the Principal Component Analysis Network (PCANet), which adopts Principal Component Analysis (PCA) to learn convolutional kernels (or filters) for image classification. Nevertheless, PCANet does not use the discriminative information during learning filters. In this paper, based on PCA in the PCANet, we propose a new model called Fisher PCA (FPCA) which combines Fisher Linear Discriminant Analysis (LDA) with PCA. To facilitate the practical calculation, a approximate model of FPCA is given by introducing a intermediate variable. Theoretically, we analyze the relationship between the original FPCA model and its approximate model, and give a convergence analysis of the approximate model. Additionally, stacking the approximate model of FPCA, we also construct a deep network named FPCA Network (FPCANet). Extensive experiments are conducted to compare FPCANet with other state-of-the-art models for classification problems. The results show that the proposed FPCANet can learn features with more discriminative information, and thus demonstrating its competitive performances on classification tasks. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

5. A novel adversarial learning framework in deep convolutional neural network for intelligent diagnosis of mechanical faults.

Author: Han, Te, Liu, Chao, Yang, Wenguang, and Jiang, Dongxiang
Subjects: *ARTIFICIAL neural networks, *MACHINE learning, *DEEP learning, *GENERALIZATION, *REGULARIZATION parameter, *MATHEMATICAL regularization
Abstract: Abstract In recent years, deep learning has become an emerging research orientation in the field of intelligent monitoring and fault diagnosis for industry equipment. Generally, the success of supervised deep models is largely attributed to a mass of typically labeled data, while it is often limited in real diagnosis tasks. In addition, the diagnostic model trained with data from limited conditions may generalize poorly for conditions not observed during training. To tackle these challenges, adversarial learning is introduced as a regularization into the convolutional neural network (CNN), and a novel deep adversarial convolutional neural network (DACNN) is accordingly proposed in this paper. By adding an additional discriminative classifier, an adversarial learning framework can be developed to train the convolutional blocks with the split data subsets, leading to a minimax two-player game. This process contributes to making the feature representation robust, boosting the generalization ability of the trained model as well as avoiding overfitting with a small size of labeled samples. The comparison studies with respect to conventional deep models on two fault datasets demonstrate the applicability and superiority of proposed method. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

6. SympGAN: A systematic knowledge integration system for symptom–gene associations network.

Author: Lu, Kezhi, Yang, Kuo, Sun, Hailong, Zhang, Qian, Zheng, Qiguang, Xu, Kuan, Chen, Jianxin, and Zhou, Xuezhong
Subjects: *DEEP learning, *SYSTEM integration, *MACHINE learning, *MEDICAL care, *KNOWLEDGE graphs, *DATABASES
Abstract: Phenotypes (i.e., symptoms and clinical signs) are essential for clinical diagnosis and research related to symptom science and precision health. As clinical observational manifestations of a disease, symptoms are clinically significant because they act as direct causes for patients to seek medical care and the primary indicators for clinicians to provide diagnosis/treatments. However, a comprehensive phenotypic knowledge base and high-quality symptom–gene associations are lacking. Therefore, a thorough understanding of the relationships between symptoms and other entities is urgently needed to support scientific research and clinical health care. In this paper, we constructed a systematic, large-scale, and high-quality symp tom- g ene a ssociations n etwork system named SympGAN (accessible at http://www.sympgan.org/). We provide access to the database with millions of associations between symptoms, genes, diseases, and drugs, as well as the system for users to search, analyze, knowledge inference, and present data visualization. We utilize state-of-the-art machine learning and deep learning algorithms as the backbone to form the final dataset. In addition, we utilize the RoBERTa-PubMed neural network for name entity recognition to assist in data screening. The knowledge graph is adopted to organize the relationships between different entities. We adopt ConvE, TuckER, and HypER methods for knowledge completion experiments to validate the quality of final knowledge graph triples. Based on the results, we provide online automatic knowledge inference interfaces. The system, SympGAN, has promising value for disease diagnosis, decision support in health care, precision health, and scientific research, as researchers and practitioners can easily access information about symptoms, diseases, targets, gene ontology, and drugs. [Display omitted] SympGAN is a comprehensive framework designed for the integration of symptom phenotypes, utilizing neural network embeddings and deep information extraction models. We have developed an integrative framework that establishes connections between symptoms and genes. This framework encompasses relationship inference through deep network embedding, literature mining using named entity recognition methods, and manual curation. Consequently, we have created a robust database and knowledge graph containing millions of associations between symptoms, genes, diseases, and drugs. SympGAN is readily accessible at http://www.sympgan.org/ , providing users with the ability to search, analyze, perform knowledge inference, and visualize information pertaining to these terminologies. The construction of SympGAN has successfully filled knowledge gaps and established millions of high-quality associations between symptoms, genotypes, diseases, and drugs. It holds tremendous potential for advancing precision health and symptom science. • We have developed SympGAN, a comprehensive, high-quality, and extensive knowledge graph-based system that encompasses the most comprehensive terminology set of 12,560 symptom phenotypes and their associations with genes, diseases, and drugs. • SympGAN has made a significant breakthrough by acquiring a comprehensive dataset for the knowledge graph, comprising 401,126 symptom–gene triples. These triples, along with their accompanying data, have undergone meticulous collection procedures to ensure exceptional quality. Our methodology involves employing the RoBERTa-PubMed model for named entity recognition (NER) and conducting literature mining from biomedical studies to gather pertinent information. Furthermore, we utilize sophisticated, high-precision algorithms to infer phenotypic associations with genes. • The website http://www.sympgan.org/ offers a comprehensive platform that enables users to conduct integrative searches and perform online knowledge inference and analysis. It serves as a centralized hub for exploring clinical knowledge associated with symptoms, as well as related diseases, genes, drugs, and molecular networks. This robust resource facilitates the interpretation and exploration of symptom phenotypes, particularly in understanding their genetic origins. By promoting precision health research and advancing the field of symptom science, http://www.sympgan.org/ significantly contributes to enhancing our understanding of symptoms and the underlying genetic factors involved. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. Multi-Head multimodal deep interest recommendation network.

Author: Yang, Mingbao, Zhou, Peng, Li, Shaobo, Zhang, Yuanmeng, Hu, Jianjun, and Zhang, Ansi
Subjects: *DEEP learning, *DISTRIBUTED computing, *REINFORCEMENT learning, *REAL-time computing, *MACHINE learning, *FEATURE extraction
Abstract: From machine learning recommendation to deep learning recommendation, reinforcement learning recommendation, and recommendation model compression, the network structure of the recommendation algorithm becomes more complex, more expressive, and more lightweight with higher real-time performance. However, most of these models are dedicated to recommendations by mining ratings and text information, and researchers have optimized the network structures of many recommendation models. However, mining image information and enriching recommendation model features are less studied, and there is still some room for optimization of deep recommendation model structures and real-time performance. To this end, this paper proposes a model called MMDIN by adding multi-head and multi-modal modules to the DIN model. The multi-modal module extracts image features, which enriches the feature set that the model can use and strengthens the cross-combination and fitting expression capabilities of the model. Meanwhile, the multi-head mechanism extracts features from different dimensions, which further enhances the model prediction performance. Experimental results show that the proposed MMDIN model improves the recommendation prediction performance, and it outperforms state-of-the-art methods in comprehensive indicators. • Adding the Multimodal module allows the model to use richer features. • Add Multi-Head ResNet module to DIN to improve feature extraction capability. • Use spark for distributed in-memory computing to improve real-time performance. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

8. Synthesizing credit data using autoencoders and generative adversarial networks.

Author: Oreski, Goran
Subjects: *GENERATIVE adversarial networks, *PROBABILISTIC generative models, *DEEP learning, *RECEIVER operating characteristic curves, *MACHINE learning, *CREDIT risk
Abstract: Data quality is an essential element necessary for the development of a successful machine-learning project. One of the biggest challenges in various real-world application domains is class imbalance. This paper proposes a new framework for oversampling credit data by combining two deep learning techniques: autoencoders and generative adversarial networks. A trivial autoencoder (TAE) is used to change data representation, and modified generative adversarial networks (GAN) are used to create new instances from random noise. The experiment on three different datasets demonstrates that the same classifier achieves a better area under the receiver operating characteristic curve (AUC) on datasets augmented by the proposed framework compared to datasets oversampled by other techniques. Additionally, the results show that datasets balanced by the new framework influence the classifier to change the prediction error types, significantly reducing false negatives; more expensive misclassification case in the imbalance learning. The improvements are significant, and considering the change in error distribution, the proposed technique is an excellent complement to existing oversampling techniques. • Credit risk datasets are imbalanced and contain discrete and continuous variables. • A new framework for oversampling tabular data is proposed. • Deep learning techniques for generating new data are applied to balance credit datasets. • The classifier performs better on AUC and recall after adding new instances to the dataset. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. An athlete–referee dual learning system for real-time optimization with large-scale complex constraints.

Author: Zhang, Yuchen, Liu, Jizhe, Xu, Yan, and Dong, Zhao Yang
Subjects: *ELECTRIC power system control, *CONSTRAINT satisfaction, *MATHEMATICAL optimization, *INSTRUCTIONAL systems, *MACHINE learning, *CONSTRAINED optimization, *ATHLETE training
Abstract: Constrained optimization (CO) has made a profound impact in solving many real-world problems. Due to the high computation burden in exact solvers, data-driven CO based on machine learning techniques is recently receiving extensive research interests for its capability to solve CO problems in real time. The existing data-driven CO approaches only serve for optimization problems with rather simple constraints that can be directly incorporated into model training. However, constraints that are computationally infeasible or burdensome to evaluate are commonly experienced in realistic optimization applications, especially in the engineering sector. This paper proposes an athlete–referee dual learning system (ARDLS) for end-to-end CO with large-scale complex constraints, where an athlete model is trained as the main optimizer while a referee model is trained as a probabilistic constraint classifier to guide the athlete training. A risk-based constrained loss function is designed to fine-tune the athlete model for constraint satisfaction. A case study on electric power system emergency control application is conducted to validate the proposed ARDLS, where the testing results demonstrate the excellent capability of ARDLS to improve the likelihood of satisfying large-scale complex constraints in CO. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

10. Missing value estimation using clustering and deep learning within multiple imputation framework.

Author: Samad, Manar D., Abrar, Sakib, and Diawara, Norou
Subjects: *MISSING data (Statistics), *DEEP learning, *MACHINE performance, *MACHINE learning, *CLUSTER sampling
Abstract: Missing values in tabular data restrict the use and performance of machine learning, requiring the imputation of missing values. Arguably the most popular imputation algorithm is multiple imputation by chained equations (MICE), which estimates missing values from linear conditioning on observed values. This paper proposes methods to improve both the imputation accuracy of MICE and the classification accuracy of imputed data by replacing MICE's linear regressors with ensemble learning and deep neural networks (DNN). The imputation accuracy is further improved by characterizing individual samples with cluster labels (CISCL) obtained from the training data. Our extensive analyses of six tabular data sets with up to 80% missing values and three missing types (missing completely at random, missing at random, missing not at random) reveal that ensemble or deep learning within MICE is superior to the baseline MICE (b-MICE), both of which are consistently outperformed by CISCL. Results show that CISCL + b-MICE outperforms b-MICE for all percentages and types of missing values. In most experimental cases, our proposed DNN-based MICE and gradient boosting MICE plus CISCL (GB-MICE-CISCL) outperform seven state-of-the-art imputation algorithms. The classification accuracy of GB-MICE-imputed data is further improved by our proposed GB-MICE-CISCL imputation method across all percentages of missing values. Results also reveal a shortcoming of the MICE framework at high percentages of missing values (> 50%) and when the missing type is not random. This paper provides a generalized approach to identifying the best imputation model for a tabular data set based on the percentage and type of missing values. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

11. A Siamese Deep Forest.

Author: Utkin, Lev V. and Ryabinin, Mikhail A.
Subjects: *QUADRATIC programming, *MATHEMATICAL optimization, *MACHINE learning, *DECISION trees, *PROBLEM solving
Abstract: A Siamese Deep Forest (SDF) is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. It can be also regarded as an alternative to the well-known Siamese neural networks. The SDF uses a modified training set consisting of concatenated pairs of vectors. Moreover, it defines the class distributions in the deep forest as the weighted sum of the tree class probabilities such that the weights are determined in order to reduce distances between similar pairs and to increase them between dissimilar points. We show that the weights can be obtained by solving a quadratic optimization problem. The SDF aims to prevent overfitting which takes place in neural networks when only limited training data are available. The numerical experiments illustrate the proposed distance metric method. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

12. Deep learning based face beauty prediction via dynamic robust losses and ensemble regression.

Author: Bougourzi, F., Dornaika, F., and Taleb-Ahmed, A.
Subjects: *DEEP learning, *CONVOLUTIONAL neural networks, *MACHINE learning, *PERSONAL beauty
Abstract: In the last decade, several studies have shown that facial attractiveness can be learned by machines. In this paper, we address Facial Beauty Prediction from static images. The paper contains three main contributions. First, we propose a two-branch architecture (REX-INCEP) based on merging the architecture of two already trained networks to deal with the complicated high-level features associated with the FBP problem. Second, we introduce the use of a dynamic law to control the behaviour of the following robust loss functions during training: ParamSmoothL1, Huber and Tukey. Third, we propose an ensemble regression based on Convolutional Neural Networks (CNNs). In this ensemble, we use both the basic networks and our proposed network (REX-INCEP). The proposed individual CNN regressors are trained with different loss functions, namely MSE, dynamic ParamSmoothL1, dynamic Huber and dynamic Tukey. Our approach is evaluated on the SCUT-FBP5500 database using the two evaluation scenarios provided by the database creators: 60%–40% split and five-fold cross-validation. In both evaluation scenarios, our approach outperforms the state of the art on several metrics. These comparisons highlight the effectiveness of the proposed solutions for FBP. They also show that the proposed dynamic robust losses lead to more flexible and accurate estimators. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

13. Dynamic Successor Features for transfer learning and guided exploration.

Author: Tasfi, Norman, Santana, Eder, Liboni, Luisa, and Capretz, Miriam
Subjects: *MACHINE learning, *REINFORCEMENT learning, *DEEP learning, *ALGORITHMS
Abstract: The Successor Feature framework for Reinforcement Learning algorithms improves task transfer by decomposing the learned state–action value function. The decomposition involves two components, one that captures future-expected state features and the other that models the task-related reward structure. However, successful transfer between tasks depends heavily on how the reward function changes, possibly leading to failure of the original Successor Feature formulation. This paper proposes the Dynamic Successor Feature framework, DynSF, by extending the mathematical formulation of the original Successor Feature framework to center around a learned state-transition model. Under this formulation, the state-transition model dynamically induces the acting policy. The flexibility of DynSF also extends to the architecture, requiring only a state-transition model and a small vector of parameters. This architecture provides immense flexibility in the choice of the model used to learn the state-transition model. The DynSF framework is evaluated and compared to other baseline algorithms through several experiments in a continuous grid world environment, a robotic Reacher, and pixels in the Doom environment. • DynSF enables a state-transition model to be used in Reinforcement Learning. • Policy and discount factors can be parameterized dynamically. • Different supervised ML algorithms can be used to learn the state transition model. • DynSF can be used for guided exploration and transfer learning. • DynSF performs better during task transfer than the Successor Feature framework. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

14. Soft reordering one-dimensional convolutional neural network for credit scoring.

Author: Qian, Hongyi, Ma, Ping, Gao, Songfeng, and Song, You
Subjects: *CONVOLUTIONAL neural networks, *DEEP learning, *CREDIT ratings, *CREDIT scoring systems, *MACHINE learning, *CLASSIFICATION algorithms
Abstract: Credit scoring systems have seen revolutionary development in recent decades, with many classification algorithms being proposed. However, with the increase in the data volume, the performance of traditional algorithms tends to encounter bottlenecks. Although deep learning methods have advantages in handling big data, they are not commonly applied in credit scoring. As one of the most frequently used methods in deep learning, convolutional neural network (CNN) use convolutional kernels as feature extraction tools and has been very successful in tasks related to images or text. This is because image and text data naturally have a structural characteristic called spatial local correlation, which means that the pixels or tokens covered by the same convolutional kernel are highly correlated, and they can be jointly processed to extract meaningful feature representations. However, the tabular data used for credit scoring do not naturally have such a characteristic. The main contribution of this paper is to propose a novel end-to-end soft reordering one-dimensional CNN (SR-1D-CNN), which can adaptively reorganize the original tabular data and make them more conducive to CNN learning. Several real-world credit scoring datasets of different sizes are used for a comprehensive comparison with traditional machine learning classifiers and other deep learning methods. The experimental results demonstrate that the soft reordering mechanism can effectively improve the classification effect of the CNN for tabular data. With the increase in the data scale, the proposed approach obtains superior results to those of other benchmark models. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

15. SurvSHAP(t): Time-dependent explanations of machine learning survival models.

Author: Krzyziński, Mateusz, Spytek, Mikołaj, Baniecki, Hubert, and Biecek, Przemysław
Subjects: *DEEP learning, *SURVIVAL analysis (Biometry), *MACHINE learning, *STATISTICAL learning, *PROPORTIONAL hazards models
Abstract: Machine and deep learning survival models demonstrate similar or even improved time-to-event prediction capabilities compared to classical statistical learning methods yet are too complex to be interpreted by humans. Several model-agnostic explanations are available to overcome this issue; however, none directly explain the survival function prediction. In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. It is based on SHapley Additive exPlanations with solid theoretical foundations and a broad adoption among machine learning practitioners. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. Experiments on synthetic and medical data confirm that SurvSHAP(t) can detect variables with a time-dependent effect, and its aggregation is a better determinant of the importance of variables for a prediction than SurvLIME. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at https://github.com/MI2DataLab/survshap. • We introduce the first time-dependent explanation for interpreting machine learning survival models. • The proposed SurvSHAP(t) method accurately explains the predicted survival function. • SurvSHAP(t) is able to detect time-dependent variable effects and its aggregation determines the local variable importance. • An open-source implementation of SurvSHAP(t) is available on GitHub. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. FL-Defender: Combating targeted attacks in federated learning.

Author: Jebreel, Najeeb Moharram and Domingo-Ferrer, Josep
Subjects: *DEEP learning, *GLOBAL method of teaching, *DATA distribution, *MACHINE learning, *CENTROID
Abstract: Federated learning (FL) enables learning a global machine learning model from data distributed among a set of participating workers. This makes it possible (i) to train more accurate models due to learning from rich, joint training data and (ii) to improve privacy by not sharing the workers' local private data with others. However, the distributed nature of FL makes it vulnerable to targeted poisoning attacks that negatively impact on the integrity of the learned model while, unfortunately, being difficult to detect. Existing defenses against those attacks are limited by assumptions on the workers' data distribution and/or are ill-suited to high-dimensional models. In this paper, we analyze targeted attacks against FL, specifically label-flipping and backdoor attacks, and find that the neurons in the last layer of a deep learning (DL) model that are related to these attacks exhibit a different behavior from the unrelated neurons. This makes the last-layer gradients valuable features for attack detection. Accordingly, we propose FL-Defender to combat FL targeted attacks. It consists of (i) engineering robust discriminative features by calculating the worker-wise angle similarity for the workers' last-layer gradients, (ii) compressing the resulting similarity vectors using PCA to reduce redundant information, and (iii) re-weighting the workers' updates based on their deviation from the centroid of the compressed similarity vectors. Experiments on three data sets show the effectiveness of our method in defending against label-flipping and backdoor attacks. Compared to several state-of-the-art defenses, FL-Defender achieves the lowest attack success rates while maintaining the main task accuracy. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

17. Hyperparameter optimization through context-based meta-reinforcement learning with task-aware representation.

Author: Wu, Jia, Liu, Xiyuan, and Chen, Senpeng
Subjects: *METAHEURISTIC algorithms, *REINFORCEMENT learning, *MACHINE learning, *DEEP learning, *DECISION making
Abstract: In this paper, we combine context-based Meta-Reinforcement Learning with task-aware representation to efficiently overcome data-inefficiency and limited generalization in the hyperparameter optimization problem. First, we propose a new context-based meta-RL model that disentangles task inference and control, which improves the meta-training efficiency and accelerates the learning process for unseen tasks. Second, the task properties are inferred on-line, which includes not only the dataset representation but also the task-solving experience, thus encouraging the agent to explore in a much smarter fashion. Third, we employ amortized meta-learning to meta-train the agent, which is simple and runs faster than the gradient-based meta-training method. Experimental results suggest that our method can search for the optimal hyperparameter configuration with limited computational cost in a reasonable time. [Display omitted] • A task encoder is proposed to achieve the task inference. • The rich task-specific information allows the agent to make better decision. • An amortized meta-learning approach is proposed to accelerate the meta-training. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

18. Variational autoencoder densified graph attention for fusing synonymous entities: Model and protocol.

Author: Li, Qian, Wang, Daling, Feng, Shi, Song, Kaisong, Zhang, Yifei, and Yu, Ge
Subjects: *KNOWLEDGE graphs, *NOUN phrases (Grammar), *KNOWLEDGE representation (Information theory), *REPRESENTATIONS of graphs, *MACHINE learning, *GRAPH algorithms, *DEEP learning
Abstract: The prediction of missing links of open knowledge graphs (OpenKGs) poses unique challenges compared with well-studied curated knowledge graphs (CuratedKGs). Unlike CuratedKGs whose entities are fully disambiguated against a fixed vocabulary, OpenKGs consist of entities represented by non-canonicalized free-form noun phrases and do not require an ontology specification, which drives the synonymity (multiple entities with different surface forms have the same meaning) and sparsity (a large portion of entities with few links). How to capture synonymous features in such sparse situations and how to evaluate the multiple answers pose challenges to existing models and evaluation protocols. In this paper, we propose VGAT, a variational autoencoder densified graph attention model to automatically mine synonymity features, and propose CR, a cluster ranking protocol to evaluate multiple answers in OpenKGs. For the model, VGAT investigates the following key ideas: (1) phrasal synonymity encoder attempts to capture phrasal features, which can automatically make entities with synonymous texts have closer representations; (2) neighbor synonymity encoder mines structural features with a graph attention network, which can recursively make entities with synonymous neighbors closer in representations. (3) densification attempts to densify the OpenKGs by generating similar embeddings and negative samples. For the protocol, CR is designed from the significance and compactness perspectives to comprehensively evaluate multiple answers. Extensive experiments and analysis show the effectiveness of the VGAT model and rationality of the CR protocol. • Propose a new model to automatically mine synonymous features. • Design a novel evaluation protocol to evaluate multiple answers. • Densify OpenKGs by variational autoencoder and negative samples. • Improve the performance of link prediction. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. Referent graph embedding model for name entity recognition of Chinese car reviews.

Author: Fang, Zhao, Zhang, Qiang, Kok, Stanley, Li, Ling, Wang, Anning, and Yang, Shanlin
Subjects: *DEEP learning, *AUTOMOBILES, *PROBLEM solving, *MACHINE learning, *NATURAL language processing
Abstract: Name entity recognition (NER) is one of the most basic tasks for extracting information from Internet text. Chinese NER remains a major challenge due to the language complexity. Although researchers have recently used domain knowledge to embed word-level information into the deep learning models to deal with the Chinese NER, they have not considered the global interdependence between word-level information, i.e., the entities in the same document should be semantically related to each other. In addition, domain knowledge often cannot be used efficiently due to the presence of irregular expressions in the Internet text, such as abbreviations and aliases. In this paper, we propose a referent graph embedding model for the NER, specifically concentrating on the Chinese car review. First, domain knowledge is used to generate character-level candidate entities and model the global interdependence between these entities based on the referent graph model. Second, the latest BERT-based character vectors and the character-level candidate entities are jointly embedded into the deep learning model to perform the NER. Last, Chinese car reviews are collected and labeled for use as the experimental dataset. The experimental results demonstrate the efficiency and effectiveness of the proposed model for the Chinese car NER task compared with the other start-of-the-art models. • This paper proposes an RGE-NER model to solve the NER problem for Chinese car reviews. The model innovatively combines referent graph and deep learning models. • Word-based and pronunciation-based methods are designed to expand the index range of entities in lexicon, which reduces the impact of irregular text expressions. • The latest BERT-based character vectors and the character-level candidate entities are jointly embedded into the deep learning model to perform the NER. • The embedding weights of candidate entities are measured by exploiting the global interdependence between candidate entities instead of algorithm learning, which significantly reduces the time complexity while improving the model performance. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

20. A comprehensive survey on sentiment analysis: Approaches, challenges and trends.

Author: Birjali, Marouane, Kasri, Mohammed, and Beni-Hssane, Abderrahim
Subjects: *SENTIMENT analysis, *BLOGS, *SOCIAL networks, *PRODUCT reviews
Abstract: Sentiment analysis (SA), also called Opinion Mining (OM) is the task of extracting and analyzing people's opinions, sentiments, attitudes, perceptions, etc., toward different entities such as topics, products, and services. The fast evolution of Internet-based applications like websites, social networks, and blogs, leads people to generate enormous heaps of opinions and reviews about products, services, and day-to-day activities. Sentiment analysis poses as a powerful tool for businesses, governments, and researchers to extract and analyze public mood and views, gain business insight, and make better decisions. This paper presents a complete study of sentiment analysis approaches, challenges, and trends, to give researchers a global survey on sentiment analysis and its related fields. The paper presents the applications of sentiment analysis and describes the generic process of this task. Then, it reviews, compares, and investigates the used approaches to have an exhaustive view of their advantages and drawbacks. The challenges of sentiment analysis are discussed next to clarify future directions. • Sentiment analysis is constantly evolving through approaches, data and models. • The paper provides an unprecedented and comprehensive survey on sentiment analysis. • Traditional and recent models are discussed, compared and classified. • Pointing out the reasons to select the proper model for sentiment analysis. • The paper summarizes the sentiment analysis models to monitor future trends. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

21. HADA: An automated tool for hardware dimensioning of AI applications.

Author: De Filippo, Allegra, Borghesi, Andrea, Boscarino, Andrea, and Milano, Michela
Subjects: *ARTIFICIAL intelligence, *ONLINE algorithms, *DEEP learning, *MACHINE learning, *HARDWARE
Abstract: In recent years, the uptake of Artificial Intelligence (AI) in industry is increasing. For many AI techniques, like Deep Learning, optimization, planning, etc., computational and storage requirements are significant. The problem of determining what is the right hardware (HW on premise or on the cloud) architecture and its dimensioning for AI algorithms is still crucial. Searching for the optimal solution is often challenging, as it is not trivial to anticipate the behavior of an algorithm on diverse architectures. This is especially true if the AI application must respect quality-of-service constraints or budgets. In this scenario, having an automated decision support tool to match algorithms, user constraints and HW resources would be a great advantage for companies and practitioners working with AI applications. In this paper, we tackle this challenge with an approach that relies on the Empirical Model Learning paradigm, based on the integration of Machine Learning (ML) models into an optimization problem. The key idea is to integrate domain knowledge held by experts with data-driven models that learn the relationships between HW requirements and AI algorithm performances. In particular, the approach starts with benchmarking multiple AI algorithms on different HW resources, generating data used to train ML models; then, optimization is used to find the best HW configuration that respects user-defined constraints (e.g., budget, time, solution quality). In the experimental evaluation we validate our approach on a complex problem, namely online algorithms for energy systems, an area characterized by uncertainty and tight HW and real-time constraints. Results show the effectiveness of our approach and its flexibility: We can train the ML models only once and reuse them in the optimization model to tackle a variety of problems, determined by different data instances and user-defined constraints. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

22. JointMatcher: Numerically-aware entity matching using pre-trained language models with attention concentration.

Author: Ye, Chen, Jiang, Shihao, Zhang, Hua, Wu, Yifan, Shi, Jiankai, Wang, Hongzhi, and Dai, Guojun
Subjects: *DEEP learning, *MACHINE learning
Abstract: Entity matching (EM) aims to identify whether two records refer to the same underlying real-world entity. Traditional entity matching methods mainly focus on structured data, where the attribute values are short and atomic. Recently, there has been an increasing demand for matching textual records, such as matching descriptions of products that correspond to long spans of text, which challenges the applications of these methods. Although a few deep learning (DL) solutions have been proposed, these solutions tend to "directly" use the DL techniques and treat the EM as NLP tasks without determining the unique demand for the EM task. Thus, the performance of these DL-based solutions is still far from satisfactory. In this paper, we present JointMatcher , a novel EM method based on the pre-trained Transformer-based language models so that the generated features of the textual records contain the context information. We realize that more attention paid to the similar segments and number-contained segments of the record pair is crucial for accurate matching. To integrate the high-contextualized features with the consideration of paying more attention to the similar segments and the number-contained segments, JointMatcher is equipped with the relevance-aware encoder and the numerically-aware encoder. Extensive experiments using structured and real-world textual datasets demonstrated that JointMatcher outperforms the previous state-of-the-art (SOTA) results without injecting any domain knowledge when small or medium size training sets are used. • The pitfalls overlooked by existing pre-trained LM-based EM methods are identified. • A novel pre-trained LM-based EM model JointMatcher is developed. • Two encoders are proposed to pay more attention to the important segments of the input record pair. • Experimental results show JointMatcher achieves good performance under limited training data. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

23. Multimodality in meta-learning: A comprehensive survey.

Author: Ma, Yao, Zhao, Shilin, Wang, Weixiao, Li, Yaoman, and King, Irwin
Subjects: *MULTIMODAL user interfaces, *MACHINE learning, *DEEP learning
Abstract: Meta-learning has gained wide popularity as a training framework that is more data-efficient than traditional machine learning methods. However, its generalization ability in complex task distributions, such as multimodal tasks, has not been thoroughly studied. Recently, some studies on multimodality-based meta-learning have emerged. This survey provides a comprehensive overview of the multimodality-based meta-learning landscape in terms of the methodologies and applications. We first formalize the definition of meta-learning in multimodality, along with the research challenges in this growing field, such as how to enrich the input in few-shot learning (FSL) or zero-shot learning (ZSL) in multimodal scenarios and how to generalize the models to new tasks. We then propose a new taxonomy to discuss typical meta-learning algorithms in multimodal tasks systematically. We investigate the contributions of related papers and summarize them by our taxonomy. Finally, we propose potential research directions for this promising field. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

24. Target-level sentiment analysis for news articles.

Author: Žitnik, Slavko, Blagus, Neli, and Bajec, Marko
Subjects: *SENTIMENT analysis, *USER-generated content, *MACHINE learning, *BLOGS, *TEXT mining, *DEEP learning, *SOCIAL media
Abstract: The rapid growth of social media, news sites, and blogs increases the opportunity to express and share an opinion on the Internet. Researchers from different fields take advantage of nearly limitless data. Thus, in the past decade, opinion mining or sentiment analysis has become an important research discipline. In this paper, we focus on the target-level sentiment analysis, wherein the task is to predict the sentiment concerning specific (multiple) entities that appear as coreference mentions throughout the document. We created a new annotated dataset of Slovene news articles, additionally annotated with named entities and coreferences that are the basis for the proposed task. Using entity-document representation, we compared the task with the traditional sentiment analysis, evaluating traditional machine learning and deep neural network approaches. According to existing approaches, the proposed task represents a challenging problem. The results show that we can achieve the best results using a customised BERT adapter (a minor improvement over a standard text-classification adapter). We outperformed existing aspect-based state-of-the-art approaches by 13%, reaching up to 77% accuracy and a 73% F 1 score. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

25. Building interpretable models for business process prediction using shared and specialised attention mechanisms.

Author: Wickramanayake, Bemali, He, Zhipeng, Ouyang, Chun, Moreira, Catarina, Xu, Yue, and Sindhgatta, Renuka
Subjects: *DEEP learning, *PREDICTION models, *MACHINE learning, *FORECASTING, *BUSINESS intelligence
Abstract: Predictive process analytics, often underpinned by deep learning techniques, is a newly emerged discipline dedicated for providing business process intelligence in modern organisations. Whilst accuracy has been a dominant criterion in building predictive capabilities, the use of deep learning techniques comes at the cost of the resulting models being used as 'black boxes', i.e., they are unable to provide insights into why a certain business process prediction was made. So far, little attention has been paid to interpretability in the design of deep learning-based process predictive models. In this paper, we address the 'black-box' problem in the context of predictive process analytics by developing attention-based models that are capable to inform both what and why is a process prediction. We propose i) two types of attentions— event attention to capture the impact of specific events on a prediction, and attribute attention to reveal which attribute(s) of an event influenced the prediction; and ii) two attention mechanisms— shared attention mechanism and specialised attention mechanism to reflect different design decisions between whether to construct attribute attention on individual input features (specialised) or using the concatenated feature tensor of all input feature vectors (shared). These lead to two distinct attention-based models, and both are interpretable models that incorporate interpretability directly into the structure of a process predictive model. We conduct experimental evaluation of the proposed models using real-life dataset and comparative analysis between the models for accuracy and interpretability, and draw insights from the evaluation and analysis results. The results demonstrate that i) the proposed attention-based models can achieve reasonably high accuracy; ii) both are capable of providing relevant interpretations (when validated against domain knowledge); and iii) whilst the two models perform equally in terms of prediction accuracy, the specialised attention-based model tends to provide more relevant interpretations than the shared attention-based model, reflecting the fact that the specialised attention-based model is designed to facilitate better interpretability. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

26. CPDGA: Change point driven growing auto-encoder for lifelong anomaly detection.

Author: Corizzo, Roberto, Baron, Michael, and Japkowicz, Nathalie
Subjects: *OBJECT recognition (Computer vision), *DEEP learning, *FOREST productivity, *STATISTICAL learning, *MACHINE learning
Abstract: Lifelong learning addresses the challenge of acquiring new knowledge and tackling new tasks in a continually evolving environment. Although this thread of research has recently received increased interest, most lifelong machine learning approaches proposed thus far focus on object recognition or classification tasks. In contrast, lifelong approaches for anomaly detection are still unexplored. This paper presents a method for lifelong anomaly detection loosely based on biological principles, which can adapt to the environment and efficiently recall old information from its memory bank. Inspired by the interaction between the cortex and the hippocampus in biology, we combine deep learning with statistical change point detection. Our method induces concepts from its environment and organizes them in a semantically coherent forest structure in an unsupervised manner. At runtime, we analyze new objects, one by one, with respect to the current forest of concepts. If a new object fits an existing concept, it is added to the pool of objects representing that concept. Otherwise, it is further analyzed to determine whether it represents a new concept, a new sub-concept, or it is an anomaly. Experiments conducted over different applied settings show that the synergic interaction of change point detection with an evolving forest of concepts yields a higher anomaly detection performance than state-of-the-art methods. • A Change Point Driven Growing Autoencoder (CPDGA) for lifelong anomaly detection. • Unsupervised concept formation and memory organization in a forest structure. • Hierarchical knowledge is continually updated and exploited for anomaly detection. • Competitive high anomaly detection performance in complex real-world domains. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

27. ConfusionVis: Comparative evaluation and selection of multi-class classifiers based on confusion matrices.

Author: Theissler, Andreas, Thomas, Mark, Burch, Michael, and Gerschner, Felix
Subjects: *BALEEN whales, *DEEP learning, *MACHINE learning, *MARINE biologists, *MATRICES (Mathematics), *FEATURE selection, *WILDLIFE conservation
Abstract: In machine learning, the presumably best model is selected from a variety of model candidates generated by testing different model types, hyperparameters, or feature subsets. The advent of deep learning has made model selection even more challenging due to the huge parameter search space. Relying on a single metric to select the best model does not consider class imbalances or the different costs of misclassifications. We argue that incorporating human knowledge to interactively analyse the per-class errors and class confusions over all model candidates enables a more efficient training process and yields better models for given applications. This paper proposes the model-agnostic approach ConfusionVis which allows to comparatively evaluate and select multi-class classifiers based on their confusion matrices. This contributes to making the models' results understandable, while treating the models as black boxes. Therefore, we propose a novel method to measure and visualise distances between confusion matrices and an interactive query interface to incorporate all composition levels of class errors. The approach is evaluated in a user study and the applicability is shown by a case study where marine biologists investigate the conservation efforts of baleen whales by classifying whale species in acoustic recordings. ConfusionVis is available online: https://www.ml-and-vis.org/confusionvis. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

28. Deep semi-supervised learning with contrastive learning and partial label propagation for image data.

Author: Gan, Yanglan, Zhu, Huichun, Guo, Wenjing, Xu, Guangwei, and Zou, Guobing
Subjects: *SUPERVISED learning, *DEEP learning, *DATA augmentation, *MACHINE learning, *LEARNING modules, *ARTIFICIAL neural networks
Abstract: Deep semi-supervised learning is becoming an active research topic because it jointly utilizes labeled and unlabeled samples in training deep neural networks. Recent advances are mainly focused on inductive semi-supervised learning which generally extends supervised algorithms to include unlabeled data. In this paper, we propose CL_PLP, a new transductive deep semi-supervised learning algorithm based on contrastive self-supervised learning and partial label propagation. The proposed method consists of two modules, contrastive self-supervised learning module extracting features from labeled and unlabeled data and partial label propagation module generating confident pseudo-labels through label propagation. For contrastive learning, we propose an improved twins network model by adding multiple projector layers and the contrastive loss term. Meanwhile, we adopt strong and weak data augmentation to increase the diversity of the dataset and the robustness of the model. For the partial label propagation module, we interrupt the label propagation process according to the quality of pseudo-labels and improve the impact of high-quality pseudo-labels. The performance of our algorithm on three standard baseline datasets CIFAR-10, CIFAR-100 and miniImageNet is better than previous state-of-the-art transductive deep semi-supervised learning methods. By transferring our model to the medical COVID19-Xray dataset, it also achieves good performance. Finally, we propose a strategy to integrate our partial label propagation module with inductive semi-supervised learning method, and the results prove that it can further improve their performance and obtain additional high-quality pseudo-labels for the unlabeled data. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

29. Explainable machine learning in image classification models: An uncertainty quantification perspective.

Author: Zhang, Xiaoge, Chan, Felix T.S., and Mahadevan, Sankaran
Subjects: *DEEP learning, *MACHINE learning, *CUMULATIVE distribution function, *BAYESIAN analysis, *DIFFERENTIAL evolution, *CLASSIFICATION
Abstract: The poor explainability of deep learning models has hindered their adoption in safety and quality-critical applications. This paper focuses on image classification models and aims to enhance the explainability of deep learning models through the development of an uncertainty quantification-based framework. The proposed methodology consists of three major steps. In the first step, we adopt dropout-based Bayesian neural network to characterize the structure and parameter uncertainty inherent in deep learning models, propagate and represent such uncertainties to the model prediction as a distribution. Next, we employ entropy as a quantitative indicator to measure the uncertainty in model prediction, and develop an Empirical Cumulative Distribution Function (ECDF)-based approach to determine an appropriate threshold value for the purpose of deciding when to accept or reject the model prediction. Secondly, in the cases with high model prediction uncertainty, we combine the prediction difference analysis (PDA) approach with dropout-based Bayesian neural network to quantify the uncertainty in pixel-wise feature importance, and identify the locations in the input image that highly correlate with the model prediction uncertainty. In the third step, we develop a robustness-based design optimization formulation to enhance the relevance between input features and model prediction, and leverage a differential evolution approach to optimize the pixels in the input image with high uncertainty in feature importance. Experimental studies in MNIST and CIFAR-10 image classifications are included to demonstrate the effectiveness of the proposed approach in increasing the explainability of deep learning models. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

30. Deep learning for missing value imputation of continuous data and the effect of data discretization.

Author: Lin, Wei-Chao, Tsai, Chih-Fong, and Zhong, Jia Rong
Subjects: *MISSING data (Statistics), *DEEP learning, *MACHINE learning, *DATA mining
Abstract: Often real-world datasets are incomplete and contain some missing attribute values. Furthermore, many data mining and machine learning techniques cannot directly handle incomplete datasets. Missing value imputation is the major solution for constructing a learning model to estimate specific values to replace the missing ones. Deep learning techniques have been employed for missing value imputation and demonstrated their superiority over many other well-known imputation methods. However, very few studies have attempted to assess the imputation performance of deep learning techniques for tabular or structured data with continuous values. Moreover, the effect on the imputation results when the continuous data need to be discretized has never been examined. In this paper, two supervised deep neural networks, i.e., multilayer perceptron (MLP) and deep belief networks (DBN), are compared for missing value imputation. Moreover, two differently ordered combinations of data discretization and imputation steps are examined. The results show that MLP and DBN significantly outperform the baseline imputation methods based on the mean, KNN, CART, and SVM, with DBN performing the best. On the other hand, when considering the discretization of continuous data, the order in which the two steps are combined is not the most important, but rather, the chosen imputation algorithm. That is, the final performance is much better when using DBN for imputation, regardless of whether discretization is performed in the first or second step, than the other imputation methods. • Deep learning for imputing missing continuous values of tabular or structured data is studied. • In particular, multilayer perceptron (MLP) and deep belief networks (DBN) are employed. • Two different ordered combinations of data discretization and imputation steps are examined. • MLP and DBN significantly outperform the baseline imputation methods. • DBN is the better choice for imputation when the discretization of continuous data is required. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. A latent batch-constrained deep reinforcement learning approach for precision dosing clinical decision support.

Author: Qiu, Xihe, Tan, Xiaoyu, Li, Qiong, Chen, Shaotao, Ru, Yajun, and Jin, Yaochu
Subjects: *REINFORCEMENT learning, *DEEP learning, *INTENSIVE care units, *CLINICAL decision support systems, *MEDICAL prescriptions, *RECOMMENDER systems
Abstract: Precise prescription of medication dosing is crucial to patients, especially among Intensive Care Unit (ICU) patients. However, improper administration of some sensitive therapeutic medications (e.g., heparin) might place patients at unneeded risk, even life-threatening. Numerous factors such as a patient's clinical phenotype, genotype, and environmental factors will affect the heparin dose response. As a result, it is challenging to prescribe the optimal initial dose of heparin. In this paper, an individualized dosing policy is proposed to determine the optimal initial dose and minimize the risk of mis-dosing, as well as preventing the patients from late complications associated with medications dosing. A latent batch-constrained deep reinforcement learning (RL) algorithm is proposed to guarantee the safety of the medication recommendation system. The agent can observe a latent representation of patents and generate medication dosing solutions in successive and limited action spaces. The individualized dosing policy aims to reduce the extrapolation errors in the off-policy algorithms, by evaluating over-dosing and under-dosing of heparin in patients. Our results evaluated on Medical Information Mart for Intensive Care III (MIMIC-III) database demonstrate that the latent batch-constrained RL algorithm can work effectively from the retrospective data, showing promise to be used in future medication dosing policies. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

32. Intelligent knowledge consolidation: From data to wisdom.

Author: Hussain, Musarrat, Satti, Fahad Ahmed, Ali, Syed Imran, Hussain, Jamil, Ali, Taqdir, Kim, Hun-Sung, Yoon, Kun-Ho, Chung, TaeChoong, and Lee, Sungyoung
Subjects: *ARTIFICIAL intelligence, *MACHINE learning, *DECISION making, *LOGISTIC regression analysis, *WISDOM, *DEEP learning, *MODAL logic
Abstract: Knowledge based systems have accomplished remarkable achievements in assisting evidence based decision making for complex problems. However, machine learning-driven, intelligent systems of today are dependent on the underlying knowledge model, which is acquired from domain experts, or the available datasets in a structured or unstructured format. Most of the existing literature utilized a single modal, while very few have combined multi-modalities (mainly two) for knowledge acquisition. In order to achieve a strong Artificial Intelligence, multi-domain and multi-modal knowledge acquisition, and consolidation is required. This paper presents the research work, driving the realization of such a comprehensive framework, in the field of healthcare. Using area specific, state-of-the-art machine learning techniques, we first extract knowledge from structured and unstructured data, which is consolidated with expert knowledge and managed through ripple down rules. Our presented technique shows an accuracy of 92.05%, which is much higher than single modal deep learning at 78.20%, naive bayes at 69.70%, logistic regression at 61.20%, expert driven knowledge at 86.02%, and naive knowledge combination at 70.86%. Thus, through the application of our proposed technique, we provide the foundations for an accurate and evolvable knowledge-base, that can greatly enhance decision making in the healthcare domain. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

33. Deep reinforcement learning for transportation network combinatorial optimization: A survey.

Author: Wang, Qi and Tang, Chunlei
Subjects: *DEEP learning, *REINFORCEMENT learning, *COMBINATORIAL optimization, *VEHICLE routing problem, *MACHINE learning, *HEURISTIC algorithms
Abstract: Traveling salesman and vehicle routing problems with their variants, as classic combinatorial optimization problems, have attracted considerable attention for decades of their theoretical and practical value. Many classic algorithms have been proposed, for example, exact algorithms, heuristic algorithms, solution solvers, etc. Still, due to their complexity, even the most advanced traditional methods require too much computational time or are not well-defined mathematically; algorithm-based decision-making is no exception. Also, these methods cannot be generalized to a larger scale or other similar problems. With the latest developments in machine and deep learning, people believe it is feasible to apply reinforcement learning and other technologies in the decision-making or heuristic for learning combinatorial optimization. In this paper, we first gave an overview on how combinate deep reinforcement learning for the NP-hard combinatorial optimization, emphasizing general optimization problems as data points and exploring the relevant distribution of data used for learning in a given task. We next reviewed state-of-the-art learning techniques related to combinational optimization problems on graphs. Then, we summarized the experimental methods of using reinforcement learning to solve combinatorial optimization problems and analyzed the performance comparison of different algorithms. Lastly, we sorted out the challenges encountered by deep reinforcement learning in solving combinatorial optimization problems and future research directions. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

34. AMFF: A new attention-based multi-feature fusion method for intention recognition.

Author: Liu, Cong and Xu, Xiaolong
Subjects: *DEEP learning, *QUESTION answering systems, *CONVOLUTIONAL neural networks, *INTENTION, *MACHINE learning
Abstract: Intention recognition is based on a dialog between users to identify their real intentions, which plays a key role in the question answering system. However, the content of a dialog is usually in the form of short text. Due to data sparsity, many current classification models show poor performance on short text. To address this issue, we propose AMFF, an attention-based multi-feature fusion method for intention recognition. In this paper, we enrich short text features by fusing features extracted from frequency-inverse document frequency (TF-IDF), convolutional neural networks (CNNs) and long short-term memory (LSTM). For the purpose of measuring the important features, we utilize the attention mechanisms to assign weights for the fusion features. Experimental results on the TREC, SST1 and SST2 datasets demonstrate that the proposed AMFF model outperforms traditional machine learning models and typical deep learning models on short text classification. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

35. Knowledge distillation via instance-level sequence learning.

Author: Zhao, Haoran, Sun, Xin, Dong, Junyu, Dong, Zihe, and Li, Qiong
Subjects: *MACHINE learning, *KNOWLEDGE transfer, *COMPUTER vision, *STATISTICAL sampling, *DEEP learning
Abstract: Recently, distillation approaches for extracting general knowledge from a teacher network to guide a student network have been suggested. Most existing methods transfer knowledge from the teacher to the student network by feeding a sequence of random minibatches sampled uniformly from the data. We argue that, instead, a compact student network should be guided gradually using samples ordered in a meaningful sequence. Thus, the gap in feature representation between the teacher and student network can be bridged step by step. In this paper, we provide a curriculum learning knowledge distillation framework via instance-level sequence learning. It employs the student network of the early epoch as a snapshot to create a curriculum for the student network's next training phase. We performed extensive experiments using the CIFAR-10, CIFAR-100, SVHN, and CINIC-10 datasets. When compared with several state-of-the-art methods, our framework achieved the best performance with fewer iterations. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

36. Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients.

Author: Abuqaddom, Inas, Mahafzah, Basel A., and Faris, Hossam
Subjects: *ALGORITHMS, *RANDOM numbers, *MACHINE learning, *MATHEMATICAL optimization, *DIRECTIONAL derivatives
Abstract: Deep multi-layer neural networks represent hypotheses of very high degree polynomials to solve very complex problems. Gradient descent optimization algorithms are utilized to train such deep networks through backpropagation, which suffers from permanent problems such as the vanishing gradient problem. To overcome the vanishing problem, we introduce a new anti-vanishing back-propagated learning algorithm called oriented stochastic loss descent (OSLD). OSLD updates a random-initialized parameter iteratively in the opposite direction of its partial derivative sign by a small positive random number, which is scaled by a tuned ratio of the model loss. This paper compares OSLD to stochastic gradient descent algorithm as the basic backpropagation algorithm and Adam as one of the best backpropagation algorithms in five benchmark models. Experimental results show that OSLD is very competitive to Adam in small and moderate depth models, and OSLD outperforms Adam in very long models. Moreover, OSLD is compatible with current backpropagation architectures except for learning rates. Finally, OSLD is stable and opens more choices in front of the very deep multi-layer neural networks. • A new back-propagated gradient descent optimization algorithm without a vanishing problem is proposed. • The proposed algorithm is called oriented stochastic loss descent (OSLD). • OSLD updates w i in the opposite side of its gradient sign by a tuned ratio of random loss. • OSLD is competitive to Adam and is compatible with most backpropagation architectures. • OSLD is stable and opens more choices in front of very deep multi-layer neural networks. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

37. How does that name sound? Name representation learning using accent-specific speech generation.

Author: Elyashar, Aviad, Puzis, Rami, and Fire, Michael
Subjects: *WEB search engines, *DEEP learning, *MACHINE learning, *ALGORITHMS, *PERSONAL names
Abstract: Searching for information about a specific person is a frequent online activity. In most cases, users are aided in the search process by queries containing a name in Web search engines. Typically, Web search engines provide just a few accurate results associated with a name-containing query. Most existing solutions for suggesting synonyms in online search are based on pattern matching and phonetic encoding, however very often, the performance of such solutions is less than optimal. In this paper, we propose SpokenName2Vec , a novel and generic algorithm which addresses the synonym suggestion problem by utilizing automated speech generation, and deep learning to produce novel spoken name embeddings. These embeddings capture the way people pronounce names in a particular language and accent. Utilizing a name's pronunciation can help detect names that sound alike, but are written differently. We demonstrated the proposed approach on a large-scale dataset with more than 250,000 forenames and surnames and evaluated it on two ground truth datasets containing 7400 forenames and 25,000 surnames (including their verified synonyms). The performance of SpokenName2Vec was found superior to the 10 other algorithms evaluated, including phonetic encoding, string similarity, and machine learning algorithms. The results obtained emphasize the potential of spoken name embeddings for improved synonym suggestion. • Proposing SpokenName2Vec, a novel and generic algorithm which addresses the synonym suggestion problem by utilizing automated speech generation and deep learning to produce spoken name embeddings. • Behind the Name dataset: In total, 37,916 synonyms were retrieved for the 7,399 distinct names. • Spoken Name dataset: 250K WAV files associated with the names in the dataset for 11 languages. • A demonstration of the quality of SpokenName2Vec on forenames and surnames, including a comparison to other 10 algorithms. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

38. Unified Deep Learning approach for Efficient Intrusion Detection System using Integrated Spatial–Temporal Features.

Author: Rajesh Kanna, P and Santhi, P
Subjects: *DEEP learning, *CONVOLUTIONAL neural networks, *MACHINE learning, *FEATURE extraction, *COMPUTER network security, *LEARNING ability
Abstract: Intrusion detection systems (IDS) differentiate the malicious entries from the legitimate entries in network traffic data and helps in securing the networks. Deep learning algorithms have been greatly employed in the network security field for large scale data in modern cyberspace networks because of their ability to learn the deeply integrated features. However, learning both space and time aspects of system information are very challenging for any individual deep knowledge model. While Convolutional Neural Networks (CNN) effectively acquires the spatial aspects, the Long Short-Term Memory (LSTM) neural networks perform better for temporal features. Integrating the benefits of these models has the potential for improving the large scale IDS. In this paper, a high accurate IDS model is proposed by using a unified model of Optimized CNN (OCNN) and Hierarchical Multi-scale LSTM (HMLSTM) for effective extraction and learning of spatial–temporal features. The proposed IDS model performs the pre-processing, feature extraction through network training and network testing and final classification. In the OCNN–HMLSTM model, the Lion Swarm Optimization (LSO) is used to tune the hyper-parameters of CNN for the optimal configuration of learning spatial features. The HMLSTM learns the hierarchical relationships between the different features and extracts the time features. Lastly, the unified IDS approach utilizes the extracted spatial–temporal features for categorizing the network data. Tests are performed over public IDS datasets namely NSL-KDD, ISCX-IDS and UNSWNB15. Assessing the performance of OCNN–HMLSTM against the contemporary IDS methods, the proposed model performs better intrusion detection with high accuracy of above 90% with less false values and better classification coefficients. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

39. DeepCorn: A semi-supervised deep learning method for high-throughput image-based corn kernel counting and yield estimation.

Author: Khaki, Saeed, Pham, Hieu, Han, Ye, Kuhl, Andy, Kent, Wade, and Wang, Lizhi
Subjects: *DEEP learning, *PLANT breeding, *FEATURE extraction, *MACHINE learning, *SWEET corn, *COMMERCIAL associations, *CORN yields
Abstract: The success of modern farming and plant breeding relies on accurate and efficient collection of data. For a commercial organization that manages large amounts of crops, collecting accurate and consistent data is a bottleneck. Due to limited time and labor, accurately phenotyping crops to record color, head count, height, weight, etc. is severely limited. However, this information, combined with other genetic and environmental factors, is vital for developing new superior crop species that help feed the world's growing population. Recent advances in machine learning, in particular deep learning, have shown promise in mitigating this bottleneck. In this paper, we propose a novel deep learning method for counting on-ear corn kernels in-field to aid in the gathering of real-time data and, ultimately, to improve decision making to maximize yield. We name this approach DeepCorn, and show that this framework is robust under various conditions. DeepCorn estimates the density of corn kernels in an image of corn ears and predicts the number of kernels based on the estimated density map. DeepCorn uses a truncated VGG-16 as a backbone for feature extraction and merges feature maps from multiple scales of the network to make it robust against image scale variations. We also adopt a semi-supervised learning approach to further improve the performance of our proposed method. Our proposed method achieves the MAE and RMSE of 41.36 and 60.27 in the corn kernel counting task, respectively. Our experimental results demonstrate the superiority and effectiveness of our proposed method compared to other state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

40. Image classification with deep learning in the presence of noisy labels: A survey.

Author: Algan, Görkem and Ulusoy, Ilkay
Subjects: *DEEP learning, *LABELS, *DRUG labeling, *MACHINE learning, *IMAGING systems, *PERFORMANCE art
Abstract: Image classification systems recently made a giant leap with the advancement of deep neural networks. However, these systems require an excessive amount of labeled data to be adequately trained. Gathering a correctly annotated dataset is not always feasible due to several factors, such as the expensiveness of the labeling process or difficulty of correctly classifying data, even for the experts. Because of these practical challenges, label noise is a common problem in real-world datasets, and numerous methods to train deep neural networks with label noise are proposed in the literature. Although deep neural networks are known to be relatively robust to label noise, their tendency to overfit data makes them vulnerable to memorizing even random noise. Therefore, it is crucial to consider the existence of label noise and develop counter algorithms to fade away its adverse effects to train deep neural networks efficiently. Even though an extensive survey of machine learning techniques under label noise exists, the literature lacks a comprehensive survey of methodologies centered explicitly around deep learning in the presence of noisy labels. This paper aims to present these algorithms while categorizing them into one of the two subgroups: noise model based and noise model free methods. Algorithms in the first group aim to estimate the noise structure and use this information to avoid the adverse effects of noisy labels. Differently, methods in the second group try to come up with inherently noise robust algorithms by using approaches like robust losses, regularizers or other learning paradigms. • Label noise is a common problem in real-world datasets. • Noise robust learning techniques are important to achieve state of the art performance. • Many works are proposed in the literature to tackle noisy labels. • Some works aim to estimate underlying noise structure. • Other works try to achieve robustness without explicitly modeling the noise structure. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

41. Unsupervised feature selection via transformed auto-encoder.

Author: Zhang, Yunhe, Lu, Zhoumin, and Wang, Shiping
Subjects: *MACHINE learning, *COST functions, *DEEP learning, *COMPUTATIONAL complexity, *FEATURE selection
Abstract: As one of the fundamental research issues, feature selection plays a critical role in machine learning. By the removal of irrelevant features, it attempts to reduce computational complexities of upstream tasks, usually with computation accelerations and performance improvements. This paper proposes an auto-encoder based scheme for unsupervised feature selection. Due to the inherent consistency, this framework can solve traditional constrained feature selection problems approximately. Specifically, the proposed model takes non-negativity, orthogonality, and sparsity into account, whose internal characteristics are exploited sufficiently. It can also employ other loss functions and flexible activation functions. The former can fit a wide range of learning tasks, and the latter has the ability to play the role of regularization terms to impose regularization constraints on the model. Thereinafter, the proposed model is validated on multiple benchmark datasets, where various activation and loss functions are analyzed for finding better feature selectors. Finally, extensive experiments demonstrate the superiority of the proposed method against other compared state-of-the-arts. • Propose to select features by auto-encoder with non-negativity and orthogonality. • Construct a lifted transformed net that can rank original features. • Provide a new perspective for feature selection with efficient embedding property. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

42. Robust hybrid deep learning models for Alzheimer's progression detection.

Author: Abuhmed, Tamer, El-Sappagh, Shaker, and Alonso, Jose M.
Subjects: *BLENDED learning, *DECISION support systems, *DEEP learning, *ALZHEIMER'S disease, *MACHINE learning, *TIME series analysis
Abstract: The prevalence of Alzheimer's disease (AD) in the growing elderly population makes accurately predicting AD progression crucial. Due to AD's complex etiology and pathogenesis, an effective and medically practical solution is a challenging task. In this paper, we developed and evaluated two novel hybrid deep learning architectures for AD progression detection. These models are based on the fusion of multiple deep bidirectional long short-term memory (BiLSTM) models. The first architecture is an interpretable multitask regression model that predicts seven crucial cognitive scores for the patient 2.5 years after their last observations. The predicted scores are used to build an interpretable clinical decision support system based on a glass-box model. This architecture aims to explore the role of multitasking models in producing more stable, robust, and accurate results. The second architecture is a hybrid model where the deep features extracted from the BiLSTM model are used to train multiple machine learning classifiers. The two architectures were comprehensively evaluated using different time series modalities of 1371 subjects participated in the study of the Alzheimer's disease neuroimaging initiative (ADNI). The extensive, real-world experimental results over ADNI data help establish the effectiveness and practicality of the proposed deep learning models. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

43. Federated learning for machinery fault diagnosis with dynamic validation and self-supervision.

Author: Zhang, Wei, Li, Xiang, Ma, Hui, Luo, Zhong, and Li, Xu
Subjects: *SUPERVISED learning, *ROTATING machinery, *FAULT diagnosis, *MACHINE learning, *MACHINERY, *CONFLICT of interests
Abstract: Intelligent data-driven machinery fault diagnosis methods have been successfully and popularly developed in the past years. While promising diagnostic performance has been achieved, the existing methods generally require large amounts of high-quality supervised data for training, which are mostly difficult and expensive to collect in real industries. Therefore, it is motivated that the distributed data of multiple clients can be integrated and exploited to build a powerful data-driven model. However, that basically requires data sharing among different users, and is not preferred in most industrial cases due to potential conflict of interests. In order to address the data island problem, a federated learning method for machinery fault diagnosis is proposed in this paper. Model training is locally implemented within each participated client, and a self-supervised learning scheme is proposed to enhance the learning performance. The server aggregates the locally updated models in each training round under the dynamic validation scheme, and a global fault diagnosis model can be established. Only the models are mutually communicated rather than the data, which ensures data privacy among different clients. The experiments on two datasets suggest the proposed method offers a promising approach on confidential decentralized learning. • A specially designed federated learning method is proposed for machinery fault diagnosis problems. • A self-supervised learning algorithm is proposed for better explorations of time-series machinery data. • A dynamic validation scheme is proposed to adaptively implement model averaging operation. • The challenging scenarios with non-independent and identically distributed user data are addressed. • The proposed data privacy-preserving learning scheme is validated through experiments on two rotating machinery datasets. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

44. Channel and spatial attention based deep object co-segmentation.

Author: Chen, Jia, Chen, Yasong, Li, Weihao, Ning, Guoqin, Tong, Mingwen, and Hilton, Adrian
Subjects: *INFORMATION commons, *MACHINE learning
Abstract: Object co-segmentation is a challenging task, which aims to segment common objects in multiple images at the same time. Generally, common information of the same object needs to be found to solve this problem. For various scenarios, common objects in different images only have the same semantic information. In this paper, we propose a deep object co-segmentation method based on channel and spatial attention, which combines the attention mechanism with a deep neural network to enhance the common semantic information. Siamese encoder and decoder structure are used for this task. Firstly, the encoder network is employed to extract low-level and high-level features of image pairs. Secondly, we introduce an improved attention mechanism in the channel and spatial domain to enhance the multi-level semantic features of common objects. Then, the decoder module accepts the enhanced feature maps and generates the masks of both images. Finally, we evaluate our approach on the commonly used datasets for the co-segmentation task. And the experimental results show that our approach achieves competitive performance. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

45. A deep learning based algorithm for multi-criteria recommender systems.

Author: Shambour, Qusai
Subjects: *RECOMMENDER systems, *MACHINE learning, *NATURAL language processing, *COMPUTER vision, *DEEP learning, *INFORMATION overload
Abstract: Recommender systems have become exceptionally widespread in recent years to deal with the information overload problem by providing personalized recommendations. Multi-criteria recommender systems proved to have more accurate recommendations compared to single-criterion recommender systems as multi-criteria rating reflects the user appreciation of an item in terms of many aspects. On the another hand, deep learning techniques achieve promising performance in many research areas such as image processing, computer vision, pattern recognition and natural language processing. Recently, the application of deep learning in recommender systems have been frequently explored with encouraging results. Accordingly, this paper proposes a deep learning based algorithm for multi-criteria recommender systems in which deep autoencoders are employed to exploit the non-trivial, nonlinear and hidden relations between users with regard to multi-criteria preferences, and generate more accurate recommendations. Experiments on the Yahoo! Movies and TripAdvisor multi-criteria datasets show that the proposed algorithm prove to be very effective in terms of producing more accurate predictions compared with the state-of-the-art recommendation algorithms [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

46. AutoML: A survey of the state-of-the-art.

Author: He, Xin, Zhao, Kaiyong, and Chu, Xiaowen
Subjects: *DEEP learning, *MACHINE learning, *IMAGE recognition (Computer vision)
Abstract: Deep learning (DL) techniques have obtained remarkable achievements on various tasks, such as image recognition, object detection, and language modeling. However, building a high-quality DL system for a specific task highly relies on human expertise, hindering its wide application. Meanwhile, automated machine learning (AutoML) is a promising solution for building a DL system without human assistance and is being extensively studied. This paper presents a comprehensive and up-to-date review of the state-of-the-art (SOTA) in AutoML. According to the DL pipeline, we introduce AutoML methods – covering data preparation, feature engineering, hyperparameter optimization, and neural architecture search (NAS) – with a particular focus on NAS, as it is currently a hot sub-topic of AutoML. We summarize the representative NAS algorithms' performance on the CIFAR-10 and ImageNet datasets and further discuss the following subjects of NAS methods: one/two-stage NAS, one-shot NAS, joint hyperparameter and architecture optimization, and resource-aware NAS. Finally, we discuss some open problems related to the existing AutoML methods for future research. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

47. DeLTa: Deep local pattern representation for time-series clustering and classification using visual perception.

Author: Anand, Gaurangi and Nayak, Richi
Subjects: *CONVOLUTIONAL neural networks, *SIGNAL convolution, *VISUAL perception, *SUPERVISED learning, *CLASSIFICATION, *MACHINE learning
Abstract: Time-series analysis is of enormous significance to a multitude of domains such as Internet-of-Things (IoT), prognostics, health, and robotics. Machine learning tasks require time-series data in the form of features for the application of (un)supervised algorithms. The existing feature representation methods lack generality and are domain-specific, especially those based on supervised learning. In this paper, we propose a novel time-series feature representation method based on feature transformation and feature learning. The feature transformation process is inspired by the human cognitive thinking used in visual recognition, where the 1-D time-series data is transformed into a 2-D image dataset. A feature set is learned by imposing a pre-trained convolutional neural network on the transformed search space. This generates two complementary high-dimensional feature sets: (1) one with the matching of the overall 2-D layout of the time-series; and (2) another with matching based on the activation of the local 2-D patterns irrespective of the overall layout. Empirical analysis on a large number of benchmark datasets shows the advantage of the domain-agnostic nature of DeLTa in achieving higher accuracy in comparison to relevant benchmarking methods. Source code is publicly available at https://github.com/technophyte/DeLTa. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

48. A weak supervision machine vision detection method based on artificial defect simulation.

Author: Li, Changsheng, Huang, Yanjiang, Li, Hai, and Zhang, Xianmin
Subjects: *COMPUTER vision, *CELL phones, *SUPERVISION, *MACHINE learning, *LABOR time, *DEEP learning, *AUTOMATIC speech recognition
Abstract: During a practical detection process, insufficient defect data, unbalanced defect types and the high cost of defect labeling can present problems. Therefore, it often takes considerable time and labor to collect actual samples to improve the accuracy of defect classification and recognition. In this paper, we propose a weak supervision machine vision detection method based on artificial defect simulation. First, four typical mobile phone screen defects – scratches, floaters, light stains and severe stains – are simulated by the proposed synthesis algorithms, and an artificial defect database is created. Next, the artificial dataset is applied to a deep learning recognition algorithm, and an initial model is trained. Then, the collected actual defects are augmented due to the insufficient training quantity. The augmented actual defects are then applied as the training data, and the initial model is retrained by fine tuning. Finally, the well-retrained model is used for defect recognition. The experimental results demonstrate that satisfactory performance is achieved with the proposed detection method. • A weak supervision machine vision detection method is proposed. • Synthesis algorithms of four different defects of mobile phone screens are designed. • The method can solve the problem of insufficient samples and unbalanced defect types. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

49. Incremental learning model inspired in Rehearsal for deep convolutional networks.

Author: Muñoz, David, Narváez, Camilo, Cobos, Carlos, Mendoza, Martha, and Herrera, Francisco
Subjects: *MACHINE learning, *DEEP learning, *REHEARSALS, *SIGNAL convolution, *ARTIFICIAL neural networks
Abstract: In Deep Learning, training a model properly with a high quantity and quality of data is crucial in order to achieve a good performance. In some tasks, however, the necessary data is not available at a particular moment and only becomes available over time. In which case, incremental learning is used to train the model correctly. An open problem remains, however, in the form of the stability–plasticity dilemma: how to incrementally train a model that is able to respond well to new data (plasticity) while also retaining previous knowledge (stability). In this paper, an incremental learning model inspired in Rehearsal (recall of past memories based on a subset of data) named CRIF is proposed, and two instances for the framework are employed — one using a random-based selection of representative samples (Naive Incremental Learning, NIL), the other using Crowding Distance and Best vs. Second Best metrics in conjunction for this task (RILBC). The experiments were performed on five datasets — MNIST, Fashion-MNIST, CIFAR-10, Caltech 101, and Tiny ImageNet, in two different incremental scenarios: a strictly class-incremental scenario, and a pseudo class-incremental scenario with unbalanced data. In Caltech 101, Transfer Learning was used, and in this scenario as well as in the other three datasets, the proposed method, NIL, achieved better results in most of the quality metrics than comparison algorithms such as RMSProp Inc (base line) and iCaRL (state-of-the-art proposal) and outperformed the other proposed method, RILBC. NIL also requires less time to achieve these results. • An incremental learning model inspired in Rehearsal (recall of past memories based on a subset of data) is proposed. • Experiments were performed over MNIST, Fashion-MNIST, CIFAR-10 and Caltech 101 in two different scenarios. • Several metrics were used to compare learning quality results when each new megabatch of data is used. • Friedman's non-parametric statistical test and Holm post-hoc test were used for supporting the analysis of the results. • Random-based selection of representative samples obtains the best results. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

50. Deep clustering by maximizing mutual information in variational auto-encoder.

Author: Xu, Chaoyang, Dai, Yuanfei, Lin, Renjie, and Wang, Shiping
Subjects: *COMPUTER vision, *KEY performance indicators (Management), *MACHINE learning, *DEEP learning, *LEARNING problems
Abstract: Unsupervised clustering, which is extensively employed in deep learning and computer vision as a fundamental technique, has attracted much attention in recent years. Deep embedding clustering often uses auto-encoders to learn representations for clustering. However, auto-encoders tend to corrupt the learning representations when simultaneously learning embedded representations and performing clustering. In this paper, we propose a Deep Clustering via Variational Auto-Encoder (DC-VAE) of mutual information maximization. First, we formulate the deep clustering problem as learning soft cluster assignments within the framework of variational auto-encoder. Second, we impose mutual information maximization on the observed data and the representations to prevent soft cluster assignments from distorting learning representations. Third, we derive a new generalization evidence lower bound objects related to several previous models and introduce parameters to balance learning informative representations and clustering. It is shown that the proposed model can significantly boost the performance of clustering by learning effective and reliable representations for downstream machine learning tasks. Through experimental results on several datasets, we demonstrate that the proposed model is competitive with existing state-of-the-arts on multiple performance metrics. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

53 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources