121 results on '"Hierarchical multi-label classification"'
Search Results
2. Hierarchical multi-label classification model for science and technology news based on heterogeneous graph semantic enhancement.
- Author
-
Cheng, Quan, Cheng, Jingyi, Chen, Jian, and Liu, Shaojun
- Subjects
CONVOLUTIONAL neural networks ,KNOWLEDGE graphs ,TECHNOLOGICAL innovations ,CLASSIFICATION ,TECHNOLOGICAL progress ,ECONOMIC development - Abstract
In the context of high-quality economic development, technological innovation has emerged as a fundamental driver of socio-economic progress. The consequent proliferation of science and technology news, which acts as a vital medium for disseminating technological advancements and policy changes, has attracted considerable attention from technology management agencies and innovation organizations. Nevertheless, online science and technology news has historically exhibited characteristics such as limited scale, disorderliness, and multi-dimensionality, which is extremely inconvenient for users of deep application. While single-label classification techniques can effectively categorize textual information, they face challenges in leading science and technology news classification due to a lack of a hierarchical knowledge framework and insufficient capacity to reveal knowledge integration features. This study proposes a hierarchical multi-label classification model for science and technology news, enhanced by heterogeneous graph semantics. The model captures multi-dimensional themes and hierarchical structural features within science and technology news through a hierarchical transmission module. It integrates graph convolutional networks to extract node information and hierarchical relationships from heterogeneous graphs, while also incorporating prior knowledge from domain knowledge graphs to address data scarcity. This approach enhances the understanding and classification capabilities of the semantics of science and technology news. Experimental results demonstrate that the model achieves precision, recall, and F1 scores of 84.21%, 88.89%, and 86.49%, respectively, significantly surpassing baseline models. This research presents an innovative solution for hierarchical multi-label classification tasks, demonstrating significant application potential in addressing data scarcity and complex thematic classification challenges. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Hierarchical multi-label classification model for science and technology news based on heterogeneous graph semantic enhancement
- Author
-
Quan Cheng, Jingyi Cheng, Jian Chen, and Shaojun Liu
- Subjects
Hierarchical multi-label classification ,Graph convolutional neural network ,Knowledge graph ,Science and technology news ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
In the context of high-quality economic development, technological innovation has emerged as a fundamental driver of socio-economic progress. The consequent proliferation of science and technology news, which acts as a vital medium for disseminating technological advancements and policy changes, has attracted considerable attention from technology management agencies and innovation organizations. Nevertheless, online science and technology news has historically exhibited characteristics such as limited scale, disorderliness, and multi-dimensionality, which is extremely inconvenient for users of deep application. While single-label classification techniques can effectively categorize textual information, they face challenges in leading science and technology news classification due to a lack of a hierarchical knowledge framework and insufficient capacity to reveal knowledge integration features. This study proposes a hierarchical multi-label classification model for science and technology news, enhanced by heterogeneous graph semantics. The model captures multi-dimensional themes and hierarchical structural features within science and technology news through a hierarchical transmission module. It integrates graph convolutional networks to extract node information and hierarchical relationships from heterogeneous graphs, while also incorporating prior knowledge from domain knowledge graphs to address data scarcity. This approach enhances the understanding and classification capabilities of the semantics of science and technology news. Experimental results demonstrate that the model achieves precision, recall, and F1 scores of 84.21%, 88.89%, and 86.49%, respectively, significantly surpassing baseline models. This research presents an innovative solution for hierarchical multi-label classification tasks, demonstrating significant application potential in addressing data scarcity and complex thematic classification challenges.
- Published
- 2024
- Full Text
- View/download PDF
4. Dimensionality Reduction for Hierarchical Multi-Label Classification: A Systematic Mapping Study
- Author
-
Raimundo Osvaldo Vieira and Helyane Bronoski Borges
- Subjects
Hierarchical Multi-label Classification ,Dimension ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Hierarchical multi-label classification problems typically deal with datasets with many attributes and labels, which can negatively impact the classifier performance. The application of dimensionality reduction methods can significantly improve the performance of classifiers. Dimensionality reduction can be performed by feature extraction or feature selection, according to the problem domain and datasets characteristics. This work carried out a systematic literature mapping to identify the approaches and techniques of dimensionality reduction that have been used in hierarchical multi-label classification tasks. Searches were performed on 7 important databases for the Computer Science field. From a list of 184 retrieved papers, 12 were selected for analysis, from which it was possible to determine a general overview of studies conducted from 2010 to 2022. It was identified that feature selection was the most frequent reduction method, with filter approach standing out. In addition, it was detected that most of the works used tree hierarchical structure. As its main outcome, this paper presents the state of the art of dimensionality reduction problem for hierarchical multi-label classification, indicating trends and research issues in the field.
- Published
- 2024
- Full Text
- View/download PDF
5. Dimensionality Reduction for Hierarchical Multi-Label Classification: A Systematic Mapping Study.
- Author
-
Osvaldo Vieira, Raimundo and Bronoski Borges, Helyane
- Abstract
Hierarchical multi-label classification problems typically deal with datasets with many attributes and labels, which can negatively impact the classifier performance. The application of dimensionality reduction methods can significantly improve the performance of classifiers. Dimensionality reduction can be performed by feature extraction or feature selection, according to the problem domain and datasets characteristics. This work carried out a systematic literature mapping to identify the approaches and techniques of dimensionality reduction that have been used in hierarchical multi-label classification tasks. Searches were performed on 7 important databases for the Computer Science field. From a list of 184 retrieved papers, 12 were selected for analysis, from which it was possible to determine a general overview of studies conducted from 2010 to 2022. It was identified that feature selection was the most frequent reduction method, with filter approach standing out. In addition, it was detected that most of the works used tree hierarchical structure. As its main outcome, this paper presents the state of the art of dimensionality reduction problem for hierarchical multi-label classification, indicating trends and research issues in the field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Constructive Machine Learning and Hierarchical Multi-label Classification for Molecules Design
- Author
-
de Souza Silva, Rodney Renato, Cerri, Ricardo, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Naldi, Murilo C., editor, and Bianchi, Reinaldo A. C., editor
- Published
- 2023
- Full Text
- View/download PDF
7. A Network-based Approach for Inferring Thresholds in Co-expression Networks
- Author
-
López-Rozo, Nicolás, Romero, Miguel, Finke, Jorge, Rocha, Camilo, Kacprzyk, Janusz, Series Editor, Cherifi, Hocine, editor, Mantegna, Rosario Nunzio, editor, Rocha, Luis M., editor, Cherifi, Chantal, editor, and Miccichè, Salvatore, editor
- Published
- 2023
- Full Text
- View/download PDF
8. HMNet: a hierarchical multi-modal network for educational video concept prediction.
- Author
-
Huang, Wei, Xiao, Tong, Liu, Qi, Huang, Zhenya, Ma, Jianhui, and Chen, Enhong
- Abstract
Educational video concept prediction is a challenging task in the online education system that aims to assign appropriate hierarchical concepts to the video. The key to this problem is to model and fuse the multimodal information of the video. However, most prior studies tend to ignore the incremental characteristics of the educational video, and most of the video segmentation strategies do not apply well to the educational video. Moreover, most existing methods overlook the class hierarchy and do not consider the class dependencies when predicting the hierarchical concepts of a video. To that end, in this paper, we propose a Hierarchical Multi-modal Network (HMNet) framework for predicting the hierarchical concepts of educational videos via fusing the multimodal information and modeling the class dependencies. Specifically, we first apply a video divider for extracting keyframes from the video, which considers the incremental characteristics of the educational video. The video is divided into a series of video sections with subtitles. Then, we utilize a multi-modal encoder to obtain the unified representation for multi-modality. Finally, we design a hierarchical predictor capable of fusing the multi-modality representation, modeling the class dependencies and predicting the hierarchical concepts of video in a top-down manner. Extensive experimental results on two real-world datasets demonstrate the effectiveness and explanatory power of HMNet. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Multi-label classification via closed frequent labelsets and label taxonomies.
- Author
-
Ferrandin, Mauri and Cerri, Ricardo
- Subjects
- *
DIRECTED acyclic graphs , *CLASSIFICATION algorithms , *TAXONOMY , *CLASSIFICATION , *HIERARCHICAL Bayes model - Abstract
Multi-label classification (MLC) is a very explored field in recent years. The most common approaches that deal with MLC problems are classified into two groups: (i) problem transformation which aims to adapt the multi-label data, making the use of traditional binary or multiclass classification algorithms feasible, and (ii) algorithm adaptation which focuses on modifying algorithms used into binary or multiclass classification, enabling them to make multi-label predictions. Several approaches have been proposed aiming to explore the relationships among the labels, with some of them through the transformation of a flat multi-label label space into a hierarchical multi-label label space, creating a tree-structured label taxonomy and inducing a hierarchical multi-label classifier to solve the classification problem. This paper presents a novel method in which a label hierarchy structured as a directed acyclic graph (DAG) is created from the multi-label label space, taking into account the label co-occurrences using the notion of closed frequent labelset. With this, it is possible to solve an MLC task as if it was a hierarchical multi-label classification (HMC) task. Global and local HMC approaches were tested with the obtained label hierarchies and compared with the approaches using tree-structured label hierarchies showing very competitive results. The main advantage of the proposed approach is better exploration and representation of the relationships between labels through the use of DAG-structured taxonomies, improving the results. Experimental results over 32 multi-label datasets from different domains showed that the proposed approach is better than related approaches in most of the multi-label evaluation measures and very competitive when compared with the state-of-the-art approaches. Moreover, we found that both tree and in specially DAG-structured label hierarchies combined with a local hierarchical classifier are more suitable to deal with imbalanced multi-label datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. 科技资源文本层次多标签分类方法.
- Author
-
王岳, 李雅文, and 李昂
- Subjects
CLASSIFICATION algorithms ,CLASSIFICATION - Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
11. Graph Enhanced Transformer for Aspect Category Detection.
- Author
-
Chen, Chen, Wang, Hou-Feng, Zhu, Qing-Qing, and Liu, Jun-Fei
- Subjects
SENTIMENT analysis - Abstract
Aspect category detection is one challenging subtask of aspect based sentiment analysis, which categorizes a review sentence into a set of predefined aspect categories. Most existing methods regard the aspect category detection as a flat classification problem. However, aspect categories are inter-related, and they are usually organized with a hierarchical tree structure. To leverage the structure information, this paper proposes a hierarchical multi-label classification model to detect aspect categories and uses a graph enhanced transformer network to integrate label dependency information into prediction features. Experiments have been conducted on four widely-used benchmark datasets, showing that the proposed model outperforms all strong baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Following the Lecturer: Hierarchical Knowledge Concepts Prediction for Educational Videos
- Author
-
Zhang, Xin, Liu, Qi, Huang, Wei, He, Weidong, Xiao, Tong, Huang, Ye, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Fang, Lu, editor, Povey, Daniel, editor, Zhai, Guangtao, editor, Mei, Tao, editor, and Wang, Ruiping, editor
- Published
- 2022
- Full Text
- View/download PDF
13. BIT-WOW at NLPCC-2022 Task5 Track1: Hierarchical Multi-label Classification via Label-Aware Graph Convolutional Network
- Author
-
Wang, Bo, Lu, Yi-Fan, Wei, Xiaochi, Liu, Xiao, Shi, Ge, Yuan, Changsen, huang, Heyan, Feng, Chong, Mao, Xianling, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lu, Wei, editor, Huang, Shujian, editor, Hong, Yu, editor, and Zhou, Xiabing, editor
- Published
- 2022
- Full Text
- View/download PDF
14. Simultaneous Fault Diagnosis Based on Hierarchical Multi-Label Classification and Sparse Bayesian Extreme Learning Machine.
- Author
-
Ye, Qing and Liu, Changhua
- Subjects
MACHINE learning ,FAULT diagnosis ,MATHEMATICAL optimization ,CLASSIFICATION - Abstract
This paper proposes an intelligent simultaneous fault diagnosis model based on a hierarchical multi-label classification strategy and sparse Bayesian extreme learning machine. The intelligent diagnosis model compares the similarity between an unknown sample to be diagnosed and each single fault mode, then outputs the probability of each fault mode occurring. First, multiple two-class sub-classifiers based on SBELM are trained by using single-fault samples to extract the correlation between various pairs of single-fault, and the sub-classifiers are integrated with the proposed hierarchical multi-label classification (HMLC) strategy to form the diagnostic model based on HMLC-SBELM. Then, samples of single faults and simultaneous faults are used to generate the optimal discriminative thresholds by using optimization algorithms. Finally, the probabilistic output generated by the HMLC-SBELM-based model is transformed into the final fault modes by using the optimal discriminative threshold. The model performance is evaluated by using actual vibration signals of the main reducer and is compared with several classical models. The contrastive results indicate that the proposed model is more accurate, efficient, and stable. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. A consistency-aware deep capsule network for hierarchical multi-label image classification.
- Author
-
Noor, Khondaker Tasrif, Robles-Kelly, Antonio, Zhang, Leo Yu, Bouadjenek, Mohamed Reda, and Luo, Wei
- Subjects
- *
IMAGE recognition (Computer vision) , *CAPSULE neural networks , *COMPUTER vision , *CLASSIFICATION , *AUTOMOBILES - Abstract
Hierarchical classification is a significant challenge in computer vision due to the logical order and interconnectedness of multiple labels. This paper presents HD-CapsNet, a novel neural network architecture based on deep capsule networks, specifically designed for hierarchical multi-label classification(HMC). By incorporating a tree-like hierarchical structure, HD-CapsNet is designed to leverage the inherent ontological order within the hierarchical label tree, thereby ensuring classification consistency across different levels. Additionally, we introduce a specialized loss function that promotes accurate hierarchical relationships while penalizing inconsistencies. This not only enhances classification performance but also strengthens the network's robustness. We rigorously evaluate HD-CapsNet's efficacy by benchmarking it against existing HMC methods across six diverse datasets: Fashion-MNIST, Marine-Tree, CIFAR-10, CIFAR-100, Caltech-UCSD Birds-200-2011, and Stanford Cars. Our results conclusively demonstrate that HD-CapsNet excels in learning hierarchical relationships and significantly outperforms the competition in various image classification tasks. Our implementation is available at https://github.com/tasrif-khondaker/HD-CapsNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Customer service complaint work order classification based on matrix factorization and attention multi-task learning
- Author
-
Yong SONG, Zhiwei YAN, Yukun QIN, Dongming ZHAO, Xiaozhou YE, Yuanyuan CHAI, and Ye OUYANG
- Subjects
hierarchical multi-label classification ,attention mechanism ,multi-task learning ,customer service work order classification ,Telecommunication ,TK5101-6720 ,Technology - Abstract
The automatic classification of complaint work orders is the requirement of the digital and intelligent development of customer service of communication operators.The categories of customer service complaint work orders have multiple levels, each level has multiple labels, and the levels are related, which belongs to a typical hierarchical multi-label text classification (HMTC) problem.Most of the existing solutions are based on classifiers to process all classification labels at the same time, or use multiple classifiers for each level, ignoring the dependence between hierarchies.A matrix factorization and attention-based multi-task learning approach (MF-AMLA) to deal with hierarchical multi-label text classification tasks was proposed.Under the classification data of real complaint work orders in the customer service scenario of communication operators, the maximum Top1 F1 value of MF-AMLA is increased by 21.1% and 5.7% respectively compared with the commonly used machine learning algorithm and deep learning algorithm in this scenario.It has been launched in the customer service system of one mobile operator, the accuracy of model output is more than 97%, and the processing efficiency of customer service agent unit time has been improved by 22.1%.
- Published
- 2022
- Full Text
- View/download PDF
17. A Hierarchical Multi-label Classification of Multi-resident Activities
- Author
-
Mehri, Hiba, Lemlouma, Tayeb, Montavont, Nicolas, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Yin, Hujun, editor, Camacho, David, editor, Tino, Peter, editor, Allmendinger, Richard, editor, Tallón-Ballesteros, Antonio J., editor, Tang, Ke, editor, Cho, Sung-Bae, editor, Novais, Paulo, editor, and Nascimento, Susana, editor
- Published
- 2021
- Full Text
- View/download PDF
18. Feature Selection for Hierarchical Multi-label Classification
- Author
-
da Silva, Luan V. M., Cerri, Ricardo, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Abreu, Pedro Henriques, editor, Rodrigues, Pedro Pereira, editor, Fernández, Alberto, editor, and Gama, João, editor
- Published
- 2021
- Full Text
- View/download PDF
19. Predictive Bi-clustering Trees for Hierarchical Multi-label Classification
- Author
-
Santos, Bruna Z., Nakano, Felipe K., Cerri, Ricardo, Vens, Celine, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hutter, Frank, editor, Kersting, Kristian, editor, Lijffijt, Jefrey, editor, and Valera, Isabel, editor
- Published
- 2021
- Full Text
- View/download PDF
20. Towards ultrasonic guided wave fine-grained damage detection on hierarchical multi-label classification network.
- Author
-
Guo, Ziye, Zhou, Ruohua, Gao, Yan, Fu, Wei, and Yu, Qiuyu
- Subjects
- *
ULTRASONIC waves , *HIERARCHICAL Bayes model , *IMAGE recognition (Computer vision) , *DEEP learning , *CLASSIFICATION , *STRUCTURAL health monitoring - Abstract
This study advocates the hierarchical multi-label classification (HMC) network for fine-grained damage detection using ultrasonic guided wave. Existing deep learning methods only focus on one aspect of the damage type or the degree of damage progression, and both are important for the long-term safe operation of the structure. To address this limitation, an ultrasonic guided wave HMC network (GHmcNet) and its lightweight version (L-GHmcNet) are developed. The essential designs of the proposed methods are derived from two motivations: (1) achieving robust fine-grained damage classification; (2) designing a lightweight network. The hierarchical labels (defect type-depth-size) are constructed on guided wave data. Inspired by the work of image classification, residual connections are introduced to transfer and fuse features between parent and children categories. The GHmcNet has three output channels and a weighted cross-entropy loss function is imposed on each channel. The weights of the corresponding loss are automatically adjusted during training. To reduce the parameters, we improve the structure of GHmcNet by applying different numbers of convolutional layers and channels at different levels based on the number of classes in each level, proposing the lightweight model L-GHmcNet. Both numerical and experimental studies are carried out, in which the dataset contains signals of different damage types, depths and sizes. The results show the superiority of the proposed methods in terms of accuracy at three levels compared to state-of-the-art methods. Moreover, the high accuracy classification results of L-GHmcNet demonstrate that the proposed lightweight network is a successful attempt. [Display omitted] [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems
- Author
-
Vidulin, Vedrana, Džeroski, Sašo, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Appice, Annalisa, editor, Tsoumakas, Grigorios, editor, Manolopoulos, Yannis, editor, and Matwin, Stan, editor
- Published
- 2020
- Full Text
- View/download PDF
22. Hyperbolic Embeddings for Hierarchical Multi-label Classification
- Author
-
Stepišnik, Tomaž, Kocev, Dragi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Helic, Denis, editor, Leitner, Gerhard, editor, Stettinger, Martin, editor, Felfernig, Alexander, editor, and Raś, Zbigniew W., editor
- Published
- 2020
- Full Text
- View/download PDF
23. Customer service complaint work order classification based on matrix factorization and attention multi-task learning.
- Author
-
SONG Yong, YAN Zhiwei, QIN Yukun, ZHAO Dongming, YE Xiaozhou, CHAI Yuanyuan, and OUYANG Ye
- Abstract
The automatic classification of complaint work orders is the requirement of the digital and intelligent development of customer service of communication operators. The categories of customer service complaint work orders have multiple levels, each level has multiple labels, and the levels are related, which belongs to a typical hierarchical multi-label text classification (HMTC) problem. Most of the existing solutions are based on classifiers to process all classification labels at the same time, or use multiple classifiers for each level, ignoring the dependence between hierarchies. A matrix factorization and attention-based multi-task learning approach (MF-AMLA) to deal with hierarchical multi-label text classification tasks was proposed. Under the classification data of real complaint work orders in the customer service scenario of communication operators, the maximum Top1 F1 value of MF-AMLA is increased by 21.1% and 5.7% respectively compared with the commonly used machine learning algorithm and deep learning algorithm in this scenario. It has been launched in the customer service system of one mobile operator, the accuracy of model output is more than 97%, and the processing efficiency of customer service agent unit time has been improved by 22.1%. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Simultaneous Fault Diagnosis Based on Hierarchical Multi-Label Classification and Sparse Bayesian Extreme Learning Machine
- Author
-
Qing Ye and Changhua Liu
- Subjects
simultaneous fault diagnosis ,hierarchical multi-label classification ,decision threshold ,main reducer ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
This paper proposes an intelligent simultaneous fault diagnosis model based on a hierarchical multi-label classification strategy and sparse Bayesian extreme learning machine. The intelligent diagnosis model compares the similarity between an unknown sample to be diagnosed and each single fault mode, then outputs the probability of each fault mode occurring. First, multiple two-class sub-classifiers based on SBELM are trained by using single-fault samples to extract the correlation between various pairs of single-fault, and the sub-classifiers are integrated with the proposed hierarchical multi-label classification (HMLC) strategy to form the diagnostic model based on HMLC-SBELM. Then, samples of single faults and simultaneous faults are used to generate the optimal discriminative thresholds by using optimization algorithms. Finally, the probabilistic output generated by the HMLC-SBELM-based model is transformed into the final fault modes by using the optimal discriminative threshold. The model performance is evaluated by using actual vibration signals of the main reducer and is compared with several classical models. The contrastive results indicate that the proposed model is more accurate, efficient, and stable.
- Published
- 2023
- Full Text
- View/download PDF
25. EC number prediction of protein sequences based on combination of hierarchical and global features.
- Author
-
Yang F, Han QL, Zhao WD, and Zhao Y
- Subjects
- Amino Acid Sequence, Proteins chemistry, Algorithms, Sequence Analysis, Protein methods, Enzymes chemistry, Enzymes metabolism, Computational Biology methods
- Abstract
The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.
- Published
- 2024
- Full Text
- View/download PDF
26. Ensembling Descendant Term Classifiers to Improve Gene - Abnormal Phenotype Predictions
- Author
-
Notaro, Marco, Schubach, Max, Frasca, Marco, Mesiti, Marco, Robinson, Peter N., Valentini, Giorgio, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Bartoletti, Massimo, editor, Barla, Annalisa, editor, Bracciali, Andrea, editor, Klau, Gunnar W., editor, Peterson, Leif, editor, Policriti, Alberto, editor, and Tagliaferri, Roberto, editor
- Published
- 2019
- Full Text
- View/download PDF
27. Machine learning for discovering missing or wrong protein function annotations
- Author
-
Felipe Kenji Nakano, Mathias Lietaert, and Celine Vens
- Subjects
Hierarchical multi-label classification ,Protein function prediction ,Benchmark datasets ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical multi-label classification (HMC) methods to predict annotations, using the Functional Catalogue (FunCat) or Gene Ontology (GO) label hierarchies. Most of these studies employed benchmark datasets created more than a decade ago, and thus train their models on outdated information. In this work, we provide an updated version of these datasets. By querying recent versions of FunCat and GO yeast annotations, we provide 24 new datasets in total. We compare four HMC methods, providing baseline results for the new datasets. Furthermore, we also evaluate whether the predictive models are able to discover new or wrong annotations, by training them on the old data and evaluating their results against the most recent information. Results The results demonstrated that the method based on predictive clustering trees, Clus-Ensemble, proposed in 2008, achieved superior results compared to more recent methods on the standard evaluation task. For the discovery of new knowledge, Clus-Ensemble performed better when discovering new annotations in the FunCat taxonomy, whereas hierarchical multi-label classification with genetic algorithm (HMC-GA), a method based on genetic algorithms, was overall superior when detecting annotations that were removed. In the GO datasets, Clus-Ensemble once again had the upper hand when discovering new annotations, HMC-GA performed better for detecting removed annotations. However, in this evaluation, there were less significant differences among the methods. Conclusions The experiments have showed that protein function prediction is a very challenging task which should be further investigated. We believe that the baseline results associated with the updated datasets provided in this work should be considered as guidelines for future studies, nonetheless the old versions of the datasets should not be disregarded since other tasks in machine learning could benefit from them.
- Published
- 2019
- Full Text
- View/download PDF
28. Hierarchical multi-label news article classification with distributed semantic model based features
- Author
-
Ivana Clairine Irsan and Masayu Leylia Khodra
- Subjects
Multi-label classification ,Hierarchical multi-label classification ,CNN ,Word embedding ,News ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Automatic news categorization is essential to automatically handle the classification of multi-label news articles in online portal. This research employs some potential methods to improve performance of hierarchical multi-label classifier for Indonesian news article. First potential method is using Convolutional Neural Network (CNN) to build the top level classifier. The second method could improve the classification performance by calculating the average of the word vectors obtained from distributed semantic model. The third method combines lexical and semantic method to extract documents features, which multiplied word term frequency (lexical) with word vector average (semantic). Model build using Calibrated Label Ranking as multi-label classification method, and trained using Naïve Bayes algorithm has the best F1-measure of 0.7531. Multiplication of word term frequency and the average of word vectors were also used to build this classifiers. This configuration improved multi-label classification performance by 4.25%, compared to the baseline. The distributed semantic model that gave best performance in this experiment obtained from 300-dimension word2vec of Wikipedia’s articles. The multi-label classification model performance is also influenced by news’ released date. The difference period between training and testing data would also decrease models’ performance.
- Published
- 2019
- Full Text
- View/download PDF
29. A co‐training‐based approach for the hierarchical multi‐label classification of research papers.
- Author
-
Masmoudi, Abir, Bellaaj, Hatem, Drira, Khalil, and Jmaiel, Mohamed
- Subjects
- *
LABELS , *HIERARCHICAL Bayes model , *CLASSIFICATION , *DIGITAL libraries - Abstract
This paper focuses on the problem of the hierarchical multi‐label classification of research papers, which is the task of assigning the set of relevant labels for a paper from a hierarchy, using reduced amounts of labelled training data. Specifically, we study leveraging unlabelled data, which are usually plentiful and easy to collect, in addition to the few available labelled ones in a semi‐supervised learning framework for achieving better performance results. Thus, in this paper, we propose a semi‐supervised approach for the hierarchical multi‐label classification task of research papers based on the well‐known Co‐training algorithm, which exploit content and bibliographic coupling information as two distinct papers' views. In our approach, two hierarchical multi‐label classifiers, are learnt on different views of the labelled data, and iteratively select their most confident unlabelled samples, which are further added to the labelled set. The success of our suggested Co‐training‐based approach lies in two main components. The first is the use of two suggested selection criteria (i.e., Maximum Agreement and Labels Cardinality Consistency) that enforce selecting confident unlabelled samples. The second is the appliance of an oversampling method that rebalances the labels distribution of the initial labelled set, which reduces the reinforcement of the label imbalance issue during the Co‐training learning. The proposed approach is evaluated using a collection of scientific papers extracted from the ACM digital library. Performed experiments show the effectiveness of our approach with regards to several baseline methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
30. A hierarchical multi-label classification method based on neural networks for gene function prediction
- Author
-
Shou Feng, Ping Fu, and Wenbin Zheng
- Subjects
Gene function prediction ,gene ontology ,hierarchical multi-label classification ,neural network ,Biotechnology ,TP248.13-248.65 - Abstract
Gene function prediction is used to assign biological or biochemical functions to genes, which continues to be a challenging problem in modern biology. Genes may exhibit many functions simultaneously, and these functions are organized into a hierarchy, such as a directed acyclic graph (DAG) for Gene Ontology (GO). Because of these characteristics, gene function prediction can be seen as a typical hierarchical multi-label classification (HMC) task. A novel HMC method based on neural networks is proposed in this article for predicting gene function based on GO. The proposed method belongs to a local approach by transferring the HMC task to a set of subtasks. There are three strategies implemented in this method to improve its performance. First, to tackle the imbalanced data set problem when building the training data set for each class, negative instances selecting policy and SMOTE approach are used to preprocess each imbalanced training data set. Second, a particular multi-layer perceptron (MLP) is designed for each node in GO. Third, a post processing method based on the Bayesian network is used to guarantee that the results are consistent with the hierarchy constraint. The experimental results indicate that the proposed HMC-MLPN method is a promising method for gene function prediction based on a comparison with two other state-of-the-art methods.
- Published
- 2018
- Full Text
- View/download PDF
31. Decision Making on Noisy Data with Additional Knowledge
- Author
-
Ye, Yuting
- Subjects
Statistics ,Biostatistics ,Binomial Mixture Model ,Disease Diagnosis ,Hierarchical Multi-label Classification ,U-shape constraint - Abstract
This dissertation addresses two statistical problems of dealing with noisy data with the aid of additional knowledge. My purpose is to highlight that in the era of big data, there is an increasing number of complicated problems with low signal-to-noise ratio, which cannot be simply solved by existing statistical or machine learning methods. For instance, biological data is notorious for its limited sample size but a substantial number of features (a typical p ≫ n problem). Fortunately, there is always additional knowledge from experts or insights that can be employed to devise smart methods to tackle these noisy data.Chapter 2 discusses my work supervised by Professor Haiyan Huang on the hierarchical multi-label classification. This project is motivated by automatic disease diagnosis, where we aim to predict the patient’s status with limited samples in each disease. The structural information that depicts the relationship between diseases can mitigate the low signal-to-noise-ratio issue. We introduce a new statistic called multidimensional-local-precision-rate (mLPR) for each object in each class. We show that classification decisions made by simply sorting objects across classes, in the descending order of mLPRs, can in theory ensure the class hierarchy and meanwhile leading to the maximization of CATCH, a pre-defined performance metric related to the area under a hit curve. In practical implementation, we need to estimate mLPRs from data. Ranking the objects across classes in the descending order of estimated mLPRs, however, would not ensure the optimization of CATCH and/or the class hierarchy anymore. In response to this, we introduce a new ranking algorithm called HierRank, which optimizes an empirical version of CATCH defined based on the estimated mLPRs. The ranking results from HierRank are ensured to satisfy the hierarchical constraint. The superior performance of our approach over state-of-art methods in literature is demonstrated with a synthetic dataset and two real datasets.Chapter 3 discusses my work supervised by Professor Peter J. Bickel on the binomial mixture model with the U-shape constraint under the regime that the binomial size m can be relatively large compared to the sample size n. This project is motivated by the GeneFishing method (Liu et al., 2019), whose output is a combination of the parameter of interest and the subsampling noise. To tackle the noise in the output, we utilize the observation that the density of the output has a U shape and model the output with the binomial mixture model under a U shape constraint. We first analyze the estimation of the underlying distribution F in the binomial mixture model under various conditions for F. Equipped with these theoretical understandings, we propose a simple method Ucut to identify the cutoffs of the U shape and recover the underlying distribution based on the Grenander estimator. It has been shown that when m = Ω(n), the identified cutoffs converge at the rate O(n^{−1/3}). The L1 distance between the recovered distribution and the true one decreases at the same rate. To demonstrate the performance, we apply our method to varieties of simulation studies, a GTEX dataset used in (Liu et al., 2019) and a single cell dataset from Tabula Muris.
- Published
- 2021
32. DietHub: Dietary habits analysis through understanding the content of recipes.
- Author
-
Petković, Matej, Popovski, Gorjan, Seljak, Barbara Koroušić, Kocev, Dragi, and Eftimov, Tome
- Subjects
- *
FOOD habits , *MEDITERRANEAN diet , *FOOD science , *FOOD labeling , *ARTIFICIAL intelligence , *COMPREHENSION - Abstract
Understanding the content of self-reported meals and online-published recipes is a basic requirement for further linking food and dietary concepts to heterogeneous health networks. Despite the huge amount of work that is done in the biomedical domain, the food and nutrition domains are relatively low-resourced. D iet H ub represents a step forward in food science & technology that requires knowledge from a broad spectrum of areas. D iet H ub is an AI workflow methodology that annotates online-published recipes or self-reported meals with the food concepts that are mentioned in them. The food semantic labels that are used are hierarchical food semantic tags from the Hansard taxonomy. D iet H ub overviews and exploits several state-of-the-art methods from two areas of AI: representation learning and predictive modelling. We evaluated D iet H ub by applying it on a corpus of online-published recipes of different styles, such as health, cooking and region. Once the selected recipes were annotated, we compared them considering their styles. The results show justifiable comparison of Mediterranean diet recipes with recipes from other diets. Key Findings and Conclusions: The experimental evaluation reveals that D iet H ub has high predictive power and correctly annotate the recipes with semantic tags. The analysis of the annotations shows that there is no statistically significant difference between Mediterranean diet and each of the diets: diabetic, weight loss, heart healthy recipes, low fat, low calorie, high fiber, and dairy free. All in all, the presented work shows that D iet H ub can be successfully used to analyze corpora of food-related textual documents and provide a deeper insight into the human dietary behaviour. • An AI methodology for understanding the content of online-published recipes. • DietHub overview and exploits several SoA methods from AI. • DietHub correctly annotate recipes with food semantic tags. • Justifiable comparison of Mediterranean diet recipes with recipes from other diets. • DietHub provides a deeper insight into the human dietary behaviour [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
33. Ensembles of extremely randomized predictive clustering trees for predicting structured outputs.
- Author
-
Kocev, Dragi, Ceci, Michelangelo, and Stepišnik, Tomaž
- Subjects
RANDOM forest algorithms ,DECISION trees ,GENE regulatory networks ,PREDICTION models ,ELECTRICITY pricing - Abstract
We address the task of learning ensembles of predictive models for structured output prediction (SOP). We focus on three SOP tasks: multi-target regression (MTR), multi-label classification (MLC) and hierarchical multi-label classification (HMC). In contrast to standard classification and regression, where the output is a single (discrete or continuous) variable, in SOP the output is a data structure—a tuple of continuous variables MTR, a tuple of binary variables MLC or a tuple of binary variables with hierarchical dependencies (HMC). SOP is gaining increasing interest in the research community due to its applicability in a variety of practically relevant domains. In this context, we consider the Extra-Tree ensemble learning method—the overall top performer in the DREAM4 and DREAM5 challenges for gene network reconstruction. We extend this method for SOP tasks and call the extension Extra-PCTs ensembles. As base predictive models we propose using predictive clustering trees (PCTs)–a generalization of decision trees for predicting structured outputs. We conduct a comprehensive experimental evaluation of the proposed method on a collection of 41 benchmark datasets: 21 for MTR, 10 for MLC and 10 for HMC. We first investigate the influence of the size of the ensemble and the size of the feature subset considered at each node. We then compare the performance of Extra-PCTs to other ensemble methods (random forests and bagging), as well as to single PCTs. The experimental evaluation reveals that the Extra-PCTs achieve optimal performance in terms of predictive power and computational cost, with 50 base predictive models across the three tasks. The recommended values for feature subset sizes vary across the tasks, and also depend on whether the dataset contains only binary and/or sparse attributes. The Extra-PCTs give better predictive performance than a single tree (the differences are typically statistically significant). Moreover, the Extra-PCTs are the best performing ensemble method (except for the MLC task, where performances are similar to those of random forests), and Extra-PCTs can be used to learn good feature rankings for all of the tasks considered here. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
34. Active learning for hierarchical multi-label classification.
- Author
-
Nakano, Felipe Kenji, Cerri, Ricardo, and Vens, Celine
- Subjects
SUPERVISED learning ,LABELS ,HIERARCHICAL Bayes model ,ALGORITHMS ,CLASSIFICATION ,MACHINE learning - Abstract
Due to technological advances, a massive amount of data is produced daily, presenting challenges for application areas where data needs to be labelled by a domain specialist or by expensive procedures, in order to be useful for supervised machine learning purposes. In order to select which data points will provide more information when labelled, one can make use of active learning methods. Active learning (AL) is a subfield of machine learning which addresses methods to build models with fewer, but more representative instances. Even though AL has been vastly studied, it has not been thoroughly investigated in hierarchical multi-label classification, a learning task where multiple class labels can be assigned to an instance and these labels are hierarchically structured. In this work, we provide a public framework containing baseline and state-of-the-art algorithms suitable for this task. Additionally, we also propose a new algorithm, namely Hierarchical Query-By-Committee (H-QBC), which is validated on datasets from different domains. Our results show that H-QBC is capable of providing superior predictive performance results compared to its competitors, while being computationally efficient and parameter free. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
35. A new sentence embedding framework for the education and professional training domain with application to hierarchical multi-label text classification.
- Author
-
Lefebvre, Guillaume, Elghazel, Haytham, Guillet, Theodore, Aussem, Alexandre, and Sonnati, Matthieu
- Subjects
- *
HIERARCHICAL Bayes model , *NATURAL language processing , *TRANSFORMER models , *CLASSIFICATION algorithms , *CLASSIFICATION - Abstract
In recent years, Natural Language Processing (NLP) has made significant advances through advanced general language embeddings, allowing breakthroughs in NLP tasks such as semantic similarity and text classification. However, complexity increases with hierarchical multi-label classification (HMC), where a single entity can belong to several hierarchically organized classes. In such complex situations, applied on specific-domain texts, such as the Education and professional training domain, general language embedding models often inadequately represent the unique terminologies and contextual nuances of a specialized domain. To tackle this problem, we present HMCCCProbT, a novel hierarchical multi-label text classification approach. This innovative framework chains multiple classifiers, where each individual classifier is built using a novel sentence-embedding method BERTEPro based on existing Transformer models, whose pre-training has been extended on education and professional training texts, before being fine-tuned on several NLP tasks. Each individual classifier is responsible for the predictions of a given hierarchical level and propagates local probability predictions augmented with the input feature vectors to the classifier in charge of the subsequent level. HMCCCProbT tackles issues of model scalability and semantic interpretation, offering a powerful solution to the challenges of domain-specific hierarchical multi-label classification. Experiments over three domain-specific textual HMC datasets indicate the effectiveness of HMCCCProbT to compare favorably to state-of-the-art HMC algorithms in terms of classification accuracy and also the ability of BERTEPro to obtain better probability predictions, well suited to HMCCCProbT , than three other vector representation techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Prediction of Human Phenotype Ontology terms by means of hierarchical ensemble methods
- Author
-
Marco Notaro, Max Schubach, Peter N. Robinson, and Giorgio Valentini
- Subjects
Human Phenotype Ontology ,Hierarchical multi-label classification ,Hierarchical ensemble methods ,Gene-Abnormal phenotype association ,Human Phenotype Ontology term prediction ,Phenotype gene prioritization ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background The prediction of human gene–abnormal phenotype associations is a fundamental step toward the discovery of novel genes associated with human disorders, especially when no genes are known to be associated with a specific disease. In this context the Human Phenotype Ontology (HPO) provides a standard categorization of the abnormalities associated with human diseases. While the problem of the prediction of gene–disease associations has been widely investigated, the related problem of gene–phenotypic feature (i.e., HPO term) associations has been largely overlooked, even if for most human genes no HPO term associations are known and despite the increasing application of the HPO to relevant medical problems. Moreover most of the methods proposed in literature are not able to capture the hierarchical relationships between HPO terms, thus resulting in inconsistent and relatively inaccurate predictions. Results We present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO. The modular structure of the proposed methods, that consists in a “flat” learning first step and a hierarchical combination of the predictions in the second step, allows the predictions of virtually any flat learning method to be enhanced. The experimental results show that hierarchical ensemble methods are able to predict novel associations between genes and abnormal phenotypes with results that are competitive with state-of-the-art algorithms and with a significant reduction of the computational complexity. Conclusions Hierarchical ensembles are efficient computational methods that guarantee biologically meaningful predictions that obey the true path rule, and can be used as a tool to improve and make consistent the HPO terms predictions starting from virtually any flat learning method. The implementation of the proposed methods is available as an R package from the CRAN repository.
- Published
- 2017
- Full Text
- View/download PDF
37. HmcNet: A General Approach for Hierarchical Multi-Label Classification
- Author
-
Huang, Wei, Chen, Enhong, Liu, Qi, Xiong, Hui, Huang, Zhenya, Tong, Shiwei, Zhang, Dan, Huang, Wei, Chen, Enhong, Liu, Qi, Xiong, Hui, Huang, Zhenya, Tong, Shiwei, and Zhang, Dan
- Abstract
Hierarchical multi-label classification (HMC) deals with the problem of assigning each entity to multiple classes with a taxonomic structure (e.g., tree). Within this structure, classes at different levels tend to have dependencies under the hierarchy constraints. However, most prior studies for HMC tasks tend to ignore the class dependencies within the hierarchy. Moreover, most existing methods generate incoherent predictions and do not satisfy the hierarchy constraint. To this end, based on previously developed HARNN, we propose a general framework, HmcNet, for introducing explicit and implicit class hierarchy constraints to generate coherent predictions. We develop an efficient Prune-based Coherent Prediction (PCP) strategy for the optimal paths selection, which produces coherent predictions in a principled way. HmcNet can be well explained from two perspectives. First, it develops the Hierarchical Attention-based Memory (HAM) unit with implicit class hierarchy constraints to capture class dependencies more intuitively; Second, it subsumes explicit class hierarchy constraints during training and inference phases and generates coherent predictions in a consistent manner. Finally, extensive experimental results on six real-world datasets demonstrate the effectiveness and interpretability of the HmcNet frameworks. To facilitate future research, our code has been made publicly available.
- Published
- 2023
38. Web Genre Classification via Hierarchical Multi-label Classification
- Author
-
Madjarov, Gjorgji, Vidulin, Vedrana, Dimitrovski, Ivica, Kocev, Dragi, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Jackowski, Konrad, editor, Burduk, Robert, editor, Walkowiak, Krzysztof, editor, Wozniak, Michal, editor, and Yin, Hujun, editor
- Published
- 2015
- Full Text
- View/download PDF
39. 一种基于增量式超网络的多标签分类方法.
- Author
-
王 进, 陈知良, 李 航, 李智星, 卜亚楠, 陈乔松, and 邓 欣
- Subjects
CLASSIFICATION ,LABELS ,PERFORMANCES - Abstract
Copyright of Journal of Chongqing University of Posts & Telecommunications (Natural Science Edition) is the property of Chongqing University of Posts & Telecommunications and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2019
- Full Text
- View/download PDF
40. Inducing Hierarchical Multi-label Classification rules with Genetic Algorithms.
- Author
-
Cerri, Ricardo, Basgalupp, Márcio P., Barros, Rodrigo C., and de Carvalho, André C.P.L.F.
- Subjects
ANT algorithms ,GENETIC algorithms ,HIERARCHICAL Bayes model ,CLASSIFICATION ,DATA distribution ,OPERATOR functions - Abstract
Abstract Hierarchical Multi-Label Classification is a challenging classification task where the classes are hierarchically structured, with superclass and subclass relationships. It is a very common task, for instance, in Protein Function Prediction, where a protein can simultaneously perform multiple functions. In these tasks it is very difficult to achieve a high predictive performance, since hundreds or even thousands of classes with imbalanced data distributions have to be considered. In addition, the models should ideally be easily interpretable to allow the validation of the knowledge extracted from the data. This work proposes and investigates the use of Genetic Algorithms to induce rules that are both hierarchical and multi-label. Several experiments with different fitness functions and genetic operators are preformed to obtain different Hierarchical Multi-Label Classification rules. The different proposed configurations of Genetic Algorithms are evaluated together with state-of-the-art methods for HMC rule induction based on Ant Colony Optimization and Predictive Clustering Trees, using many datasets related to the Protein Function Prediction task. The experimental results show that it is possible to recommend the best configuration in terms of predictive performance and model interpretability. Highlights • A Genetic Algorithm for Hierarchical and Multi-label Classification is proposed. • A sequential covering procedure is used to generated a set of classification rules. • Several variations of the algorithm are proposed and evaluated. • Performance is evaluated using hierarchical protein function prediction datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
41. Hierarchical multi-label news article classification with distributed semantic model based features.
- Author
-
Irsan, Ivana Clairine and Khodra, Masayu Leylia
- Subjects
WORD frequency ,CLASSIFICATION ,ARTIFICIAL neural networks ,HIERARCHICAL Bayes model ,LABELS - Abstract
Automatic news categorization is essential to automatically handle the classification of multi-label news articles in online portal. This research employs some potential methods to improve performance of hierarchical multi-label classifier for Indonesian news article. First potential method is using Convolutional Neural Network (CNN) to build the top level classifier. The second method could improve the classification performance by calculating the average of the word vectors obtained from distributed semantic model. The third method combines lexical and semantic method to extract documents features, which multiplied word term frequency (lexical) with word vector average (semantic). Model build using Calibrated Label Ranking as multi-label classification method, and trained using Naïve Bayes algorithm has the best F1-measure of 0.7531. Multiplication of word term frequency and the average of word vectors were also used to build this classifiers. This configuration improved multi-label classification performance by 4.25%, compared to the baseline. The distributed semantic model that gave best performance in this experiment obtained from 300-dimension word2vec of Wikipedia's articles. The multi-label classification model performance is also influenced by news' released date. The difference period between training and testing data would also decrease models' performance. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
42. Hierarchical Text Categorization Using Level Based Neural Networks of Word Embedding Sequences with Sharing Layer Information.
- Author
-
KLUNGPORNKUN, Mongkud and VATEEKUL, Peerapon
- Subjects
- *
ARTIFICIAL neural networks , *EMBEDDINGS (Mathematics) , *INFORMATION sharing , *SIGNAL convolution , *DEEP learning - Abstract
In text corpora, it is common to categorize each document to a predefined class hierarchy, which is usually a tree. One of the most widely-used approaches is a level-based strategy that induces a multiclass classifier for each class level independently. However, all prior attempts did not utilize information from its parent level and employed a bag of words rather than considered a sequence of words. In this paper, we present a novel level-based hierarchical text categorization with a strategy called "sharing layer information" For each class level, a neural network is constructed, where its input is a sequence of word embedding vectors generated from Convolutional Neural Networks (CNN). Also, a training strategy to avoid imbalance issues is proposed called "the balanced resampling with mini-batch training" Furthermore, a label correction strategy is proposed to conform the predicted results from all networks on different class levels. The experiment was conducted on 2 standard benchmarks: WIPO and Wiki comparing to a top-down based SVM framework with TF-IDF inputs called "HR-SVM." The results show that the proposed model can achieved the highest accuracy in terms of micro F1 and outperforms the baseline in the top levels in terms of macro F1. [ABSTRACT FROM AUTHOR]
- Published
- 2019
43. Label Correction Strategy on Hierarchical Multi-Label Classification
- Author
-
Ananpiriyakul, Thanawut, Poomsirivilai, Piyapan, Vateekul, Peerapon, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Kobsa, Alfred, editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Tanaka, Yuzuru, editor, Wahlster, Wolfgang, editor, Siekmann, Jörg, editor, and Perner, Petra, editor
- Published
- 2014
- Full Text
- View/download PDF
44. The Use of the Label Hierarchy in Hierarchical Multi-label Classification Improves Performance
- Author
-
Levatić, Jurica, Kocev, Dragi, Džeroski, Sašo, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Appice, Annalisa, editor, Ceci, Michelangelo, editor, Loglisci, Corrado, editor, Manco, Giuseppe, editor, Masciari, Elio, editor, and Ras, Zbigniew W., editor
- Published
- 2014
- Full Text
- View/download PDF
45. ReliefF for Hierarchical Multi-label Classification
- Author
-
Slavkov, Ivica, Karcheska, Jana, Kocev, Dragi, Kalajdziski, Slobodan, Džeroski, Sašo, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Appice, Annalisa, editor, Ceci, Michelangelo, editor, Loglisci, Corrado, editor, Manco, Giuseppe, editor, Masciari, Elio, editor, and Ras, Zbigniew W., editor
- Published
- 2014
- Full Text
- View/download PDF
46. Hierarchical Multi-Label Object Detection Framework for Remote Sensing Images
- Author
-
Su-Jin Shin, Seyeob Kim, Youngjung Kim, and Sungho Kim
- Subjects
object detection ,remote sensing images ,convolutional neural network (CNN) ,hierarchical multi-label classification ,Science - Abstract
Detecting objects such as aircraft and ships is a fundamental research area in remote sensing analytics. Owing to the prosperity and development of CNNs, many previous methodologies have been proposed for object detection within remote sensing images. Despite the advance, using the object detection datasets with a more complex structure, i.e., datasets with hierarchically multi-labeled objects, is limited to the existing detection models. Especially in remote sensing images, since objects are obtained from bird’s-eye view, the objects are captured with restricted visual features and not always guaranteed to be labeled up to fine categories. We propose a hierarchical multi-label object detection framework applicable to hierarchically partial-annotated datasets. In the framework, an object detection pipeline called Decoupled Hierarchical Classification Refinement (DHCR) fuses the results of two networks: (1) an object detection network with multiple classifiers, and (2) a hierarchical sibling classification network for supporting hierarchical multi-label classification. Our framework additionally introduces a region proposal method for efficient detection on vain areas of the remote sensing images, called clustering-guided cropping strategy. Thorough experiments validate the effectiveness of our framework on our own object detection datasets constructed with remote sensing images from WorldView-3 and SkySat satellites. Under our proposed framework, DHCR-based detections significantly improve the performance of respective baseline models and we achieve state-of-the-art results on the datasets.
- Published
- 2020
- Full Text
- View/download PDF
47. Probabilistic Clustering for Hierarchical Multi-Label Classification of Protein Functions
- Author
-
Barros, Rodrigo C., Cerri, Ricardo, Freitas, Alex A., de Carvalho, André C. P. L. F., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Blockeel, Hendrik, editor, Kersting, Kristian, editor, Nijssen, Siegfried, editor, and Železný, Filip, editor
- Published
- 2013
- Full Text
- View/download PDF
48. Exploiting Label Dependency for Hierarchical Multi-label Classification
- Author
-
Alaydie, Noor, Reddy, Chandan K., Fotouhi, Farshad, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Tan, Pang-Ning, editor, Chawla, Sanjay, editor, Ho, Chin Kuan, editor, and Bailey, James, editor
- Published
- 2012
- Full Text
- View/download PDF
49. A hierarchical multi-label classification method based on neural networks for gene function prediction.
- Author
-
Feng, Shou, Fu, Ping, and Zheng, Wenbin
- Subjects
GENE expression ,ONTOLOGIES (Information retrieval) ,FINITE element method ,DATA analysis ,MACHINE learning - Abstract
Gene function prediction is used to assign biological or biochemical functions to genes, which continues to be a challenging problem in modern biology. Genes may exhibit many functions simultaneously, and these functions are organized into a hierarchy, such as a directed acyclic graph (DAG) for Gene Ontology (GO). Because of these characteristics, gene function prediction can be seen as a typical hierarchical multi-label classification (HMC) task. A novel HMC method based on neural networks is proposed in this article for predicting gene function based on GO. The proposed method belongs to a local approach by transferring the HMC task to a set of subtasks. There are three strategies implemented in this method to improve its performance. First, to tackle the imbalanced data set problem when building the training data set for each class, negative instances selecting policy and SMOTE approach are used to preprocess each imbalanced training data set. Second, a particular multi-layer perceptron (MLP) is designed for each node in GO. Third, a post processing method based on the Bayesian network is used to guarantee that the results are consistent with the hierarchy constraint. The experimental results indicate that the proposed HMC-MLPN method is a promising method for gene function prediction based on a comparison with two other state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
50. A comparison of hierarchical multi-output recognition approaches for anuran classification.
- Author
-
Colonna, Juan G., Gama, João, and Nakamura, Eduardo F.
- Subjects
SPECIES distribution ,MACHINE learning ,CLADISTIC analysis ,BIOLOGICAL monitoring ,BIODIVERSITY - Abstract
In bioacoustic recognition approaches, a “flat” classifier is usually trained to recognize several species of anurans, where the number of classes is equal to the number of species. Consequently, the complexity of the classification function increases proportionally with the number of species. To avoid this issue, we propose a “hierarchical” approach that decomposes the problem into three taxonomic levels: the family, the genus, and the species. To accomplish this, we transform the original single-labelled problem into a multi-output problem (multi-label and multi-class) considering the biological taxonomy of the species. We then develop a top-down method using a set of classifiers organized as a hierarchical tree. We test and compare two hierarchical methods, using (1) one classifier per parent node and (2) one classifier per level, against a flat approach. Thus, we conclude that it is possible to predict the same set of species as a flat classifier, and additionally obtain new information about the samples and their taxonomic relationship. This helps us to better understand the problem and achieve additional conclusions by the inspection of the confusion matrices at the three classification levels. In addition, we propose a soft decision rule based on the joint probabilities of hierarchy pathways. With this we are able to identify and reject confusing cases. We carry out our experiments using cross-validation performed by individuals. This form of CV avoids mixing syllables that belong to the same specimens in the testing and training sets, preventing an overestimate of the accuracy and generalizing the predictive capabilities of the system. We tested our methods in a dataset with sixty individual frogs, from ten different species, eight genera, and four families, achieving a final Macro-Fscore of 80 and 70% with and without applying the rejection rule, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.