420 results on '"Self training"'
Search Results
2. Domain Adaptation for Medical Image Segmentation Using Transformation-Invariant Self-training
- Author
-
Ghamsarian, Negin, Gamazo Tejero, Javier, Márquez-Neila, Pablo, Wolf, Sebastian, Zinkernagel, Martin, Schoeffmann, Klaus, Sznitman, Raphael, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Greenspan, Hayit, editor, Madabhushi, Anant, editor, Mousavi, Parvin, editor, Salcudean, Septimiu, editor, Duncan, James, editor, Syeda-Mahmood, Tanveer, editor, and Taylor, Russell, editor
- Published
- 2023
- Full Text
- View/download PDF
3. DeMRC: Dynamically Enhanced Multi-hop Reading Comprehension Model for Low Data
- Author
-
Tang, Xiu, Xu, Yangchao, Lu, Xuefeng, He, Qiang, Fang, Jun, Chen, Junjie, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Chen, Weitong, editor, Yao, Lina, editor, Cai, Taotao, editor, Pan, Shirui, editor, Shen, Tao, editor, and Li, Xue, editor
- Published
- 2022
- Full Text
- View/download PDF
4. Self-Training-Transductive-Learning Broad Learning System (STTL-BLS): A model for effective and efficient image classification.
- Author
-
Yi, Lin, Lv, Di, Liu, Dinghao, Li, Suhuan, and Liu, Ran
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *FEATURE extraction , *TIME complexity , *SOURCE code - Abstract
A novel model called Self-Training-Transductive-Learning Broad Learning System (STTL-BLS) is proposed for image classification. The model consists of two key blocks: Feature Block (FB) and Enhancement Block (EB). The FB utilizes the Proportion of Large Values Attention (PLVA) technique and an Encoder for feature extraction. Multiple FBs are cascaded in the model to learn discriminative features. The Enhancement Block (EB) enhances feature learning and prevents under-fitting on complex datasets. Additionally, an architecture that combines characteristics of Broad Learning System (BLS) and gradient descent is designed for STTL-BLS, enabling the model to leverage the advantages of both BLS and Convolutional Neural Networks (CNNs). Moreover, a training algorithm (STTL) that combines self-training and transductive learning is presented for the model to improve its generalization ability. Experimental results demonstrate that the accuracy of the proposed model surpasses all compared BLS variants and performs comparably or even superior to deep networks: on small-scale datasets, STTL-BLS has an average accuracy improvement of 14.82 percentage points compared to other models; on large-scale datasets, 12.95 percentage points. Notably, the proposed model exhibits low time complexity, particularly with the shortest testing time on the small-scale datasets among all compared models: it has an average testing time of 46.4 s less than other models. It proves to be an additional valuable solution for image classification tasks on both small- and large-scale datasets. The source code for this paper can be accessed at https://github.com/threedteam/sttl_bls. • Proportion of Large Values Attention (PLVA) is introduced for feature extraction. • Enhancement Block is introduced for feature learning and to prevent under-fitting. • The designed architecture has the characteristics of both BLS and gradient descent. • Self-training and transductive learning are combined in the training algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Adapting OCR with Limited Supervision
- Author
-
Das, Deepayan, Jawahar, C. V., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bai, Xiang, editor, Karatzas, Dimosthenis, editor, and Lopresti, Daniel, editor
- Published
- 2020
- Full Text
- View/download PDF
6. Training deep neural networks with noisy clinical labels: toward accurate detection of prostate cancer in US data.
- Author
-
Javadi, Golara, Samadi, Samareh, Bayat, Sharareh, Sojoudi, Samira, Hurtado, Antonio, Eshumani, Walid, Chang, Silvia, Black, Peter, Mousavi, Parvin, and Abolmaesumi, Purang
- Abstract
Purpose: Ultrasound is the standard-of-care to guide the systematic biopsy of the prostate. During the biopsy procedure, up to 12 biopsy cores are randomly sampled from six zones within the prostate, where the histopathology of those cores is used to determine the presence and grade of the cancer. Histopathology reports only provide statistical information on the presence of cancer and do not normally contain fine-grain information of cancer distribution within each core. This limitation hinders the development of machine learning models to detect the presence of cancer in ultrasound so that biopsy can be more targeted to highly suspicious prostate regions. Methods: In this paper, we tackle this challenge in the form of training with noisy labels derived from histopathology. Noisy labels often result in the model overfitting to the training data, hence limiting its generalizability. To avoid overfitting, we focus on the generalization of the features of the model and present an iterative data label refinement algorithm to amend the labels gradually. We simultaneously train two classifiers, with the same structure, and automatically stop the training when we observe any sign of overfitting. Then, we use a confident learning approach to clean the data labels and continue with the training. This process is iteratively applied to the training data and labels until convergence. Results: We illustrate the performance of the proposed method by classifying prostate cancer using a dataset of ultrasound images from 353 biopsy cores obtained from 90 patients. We achieve area under the curve, sensitivity, specificity, and accuracy of 0.73, 0.80, 0.63, and 0.69, respectively. Conclusion: Our approach is able to provide clinicians with a visualization of regions that likely contain cancerous tissue to obtain more accurate biopsy samples. The results demonstrate that our proposed method produces superior accuracy compared to the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Revisiting instance search: A new benchmark using cycle self-training.
- Author
-
Zhang, Yuqi, Liu, Chong, Chen, Weihua, Xu, Xianzhe, Wang, Fan, Li, Hao, Hu, Shiyu, and Zhao, Xin
- Subjects
- *
GENERALIZATION - Abstract
Instance search aims at retrieving a particular object instance from a set of scene images. Although studied in previous competitions like TRECVID, there have been limited literature or datasets on this topic. In this paper, to overcome the generalization issue when arbitrary categories are involved in search and to benefit from the large amount of unlabeled data, we propose a cycle self-training framework which trains the instance search pipeline with automatic supervision. Given the two-stage pipeline with a localization and ranking module, the cycle self-training includes a ranker-guided localizer, and a localizer-guided ranker, each carefully designed to handle noisy labels that come with self-supervision. Furthermore, we build and release large-scale groundtruth annotations for instances to facilitate the algorithm evaluation and analysis in this research topic, especially for small objects in complex background. The datasets are publicly available at https://github.com/instance-search/instance-search. Extensive experiments show the effectiveness of the proposed cycle self-training framework and the superior performance compared with other state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. G2L: A Global to Local Alignment Method for Unsupervised Domain Adaptive Semantic Segmentation.
- Author
-
Manh, Nguyen Viet, Nam, Kieu Dang, Sang, Dinh Viet, and Nguyen, Thi-Oanh
- Subjects
IMAGE registration ,KNOWLEDGE transfer ,FOURIER transforms ,CONFIDENCE - Abstract
Unsupervised domain adaptation (UDA) for semantic segmentation aims to transfer knowledge from a source dataset with dense pixel-level annotations to an unlabeled target dataset. However, the performance of UDA methods often suffers from the domain shift, which is the discrepancy between the feature distributions of the two domains. There have been several attempts to match these distributions at the image level marginally. However, due to the so-called category-level domain shift, such global alignments do not guarantee a good separability of deep features extracted from different categories in the target domain. As a result, the generated pseudo-labels can be noisy and thus poison the learning process on the target domain. Some recent methods focus on denoising the pseudo-labels online using category-wise information. This paper introduces a novel UDA method called Global-to-Local alignment (G2L) that leverages fine-grained adversarial training and a newly proposed chromatic Fourier transform to address the image-level domain shift from a global perspective. Next, our method deals with the category-level domain shift under a local view. Specifically, we propose a long-tail category rating strategy as well as apply dynamic confidence thresholds and category-wise priority weights when generating and denoising the pseudo-labels to favor rare categories. Finally, self-distillation is used to boost the final segmentation results. Experiments on popular benchmarks GTA5 → Cityscapes and SYNTHIA → Cityscapes show that our method yields superior accuracy performance than other state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Entropy-aware self-training for graph convolutional networks.
- Author
-
Zhao, Gongpei, Wang, Tao, Li, Yidong, Jin, Yi, and Lang, Congyan
- Subjects
- *
BOOSTING algorithms , *RANDOM walks , *ALGORITHMS - Abstract
[Display omitted] • An entropy-aggregation layer proposed to strengthen reasoning ability of GCN. • An ingenious checking part based on self-training to enhance node classification. • Sufficient experiments and analyses to validate the superiority of ES-GCN. Recently, graph convolutional networks (GCNs) have achieved significant success in many graph-based learning tasks, especially for node classification, due to its excellent ability in representation learning. Nevertheless, it remains challenging for GCN models to obtain satisfying predictions on graphs where only few nodes are with known labels. In this paper, we propose a novel entropy-aware self-training algorithm to boost semi-supervised node classification on graphs with little supervised information. Firstly, an entropy-aggregation layer is developed to strengthen the reasoning ability of GCN models. To the best of our knowledge, this is the first work to combine the entropy-based random walk theory with GCN design. Furthermore, we propose an ingenious checking part to add new nodes as supervision after each training round to enhance node prediction. In particular, the checking part is designed based on aggregated features, which is demonstrated more effective than previous methods and boosts node classification significantly. The proposed algorithm is validated on six public benchmarks in comparison with several state-of-the-art baseline algorithms, and the results illustrate its excellent performance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. Improving Machine Reading Comprehension with Multi-Task Learning and Self-Training
- Author
-
Jianquan Ouyang and Mengen Fu
- Subjects
machine reading comprehension ,Natural Language Processing ,multi-task learning ,Self Training ,pre-trained model ,Mathematics ,QA1-939 - Abstract
Machine Reading Comprehension (MRC) is an AI challenge that requires machines to determine the correct answer to a question based on a given passage, in which extractive MRC requires extracting an answer span to a question from a given passage, such as the task of span extraction. In contrast, non-extractive MRC infers answers from the content of reference passages, including Yes/No question answering to unanswerable questions. Due to the specificity of the two types of MRC tasks, researchers usually work on one type of task separately, but real-life application situations often require models that can handle many different types of tasks in parallel. Therefore, to meet the comprehensive requirements in such application situations, we construct a multi-task fusion training reading comprehension model based on the BERT pre-training model. The model uses the BERT pre-training model to obtain contextual representations, which is then shared by three downstream sub-modules for span extraction, Yes/No question answering, and unanswerable questions, next we fuse the outputs of the three sub-modules into a new span extraction output and use the fused cross-entropy loss function for global training. In the training phase, since our model requires a large amount of labeled training data, which is often expensive to obtain or unavailable in many tasks, we additionally use self-training to generate pseudo-labeled training data to train our model to improve its accuracy and generalization performance. We evaluated the SQuAD2.0 and CAIL2019 datasets. The experiments show that our model can efficiently handle different tasks. We achieved 83.2EM and 86.7F1 scores on the SQuAD2.0 dataset and 73.0EM and 85.3F1 scores on the CAIL2019 dataset.
- Published
- 2022
- Full Text
- View/download PDF
11. Few-Shot Learning and Self-Training for eNodeB Log Analysis for Service-Level Assurance in LTE Networks.
- Author
-
Aoki, Shogo, Shiomoto, Kohei, and Eng, Chin Lam
- Abstract
With the increasing network topology complexity and continuous evolution of the new wireless technology, it is challenging to address the network service outage with traditional methods. In the long-term evolution (LTE) networks, a large number of base stations called eNodeBs are deployed to cover the entire service areas spanning various kinds of geographical regions. Each eNodeB generates a large number of key performance indicators (KPIs). Hundreds of thousands of eNodeBs are typically deployed to cover a nation-wide service area. Operators need to handle hundreds of millions of KPIs to cover the areas. It is impractical to handle manually such a huge amount of KPI data, and automation of data processing is therefore desired. To improve network operation efficiency, a suitable machine learning technique is used to learn and classify individual eNodeBs into different states based on multiple performance metrics during a specific time window. However, an issue with supervised learning requires a large amount of labeled dataset, which takes costly human-labor and time to annotate data. To mitigate the cost and time issues, we propose a method based on few-shot learning that uses Prototypical Networks algorithm to complement the eNodeB states analysis. Using a dataset from a live LTE network that consists of thousand of eNodeB, our experiment results show that the proposed technique provides high performance while using a low number of labeled data. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. Harnessing the Power of Self-Training for Gaze Point Estimation in Dual Camera Transportation Datasets
- Author
-
Bhagat, Hirva Alpesh and Bhagat, Hirva Alpesh
- Abstract
This thesis proposes a novel approach for efficiently estimating gaze points in dual camera transportation datasets. Traditional methods for gaze point estimation are dependent on large amounts of labeled data, which can be both expensive and time-consuming to collect. Additionally, alignment and calibration of the two camera views present significant challenges. To overcome these limitations, this thesis investigates the use of self-learning techniques such as semi-supervised learning and self-training, which can reduce the need for labeled data while maintaining high accuracy. The proposed method is evaluated on the DGAZE dataset and achieves a 57.2\% improvement in performance compared to the previous methods. This approach can prove to be a valuable tool for studying visual attention in transportation research, leading to more cost-effective and efficient research in this field.
- Published
- 2023
13. A Self-Adaptive Temporal-Spatial Self-Training Algorithm for Semisupervised Fault Diagnosis of Industrial Processes
- Author
-
Jinsong Zhao and Shaodong Zheng
- Subjects
Measure (data warehouse) ,Process (engineering) ,Property (programming) ,Computer science ,Self adaptive ,Work in process ,Fault (power engineering) ,Computer Science Applications ,Control and Systems Engineering ,Benchmark (computing) ,Electrical and Electronic Engineering ,Self training ,Algorithm ,Information Systems - Abstract
Investigating process monitoring techniques is required to reduce the loss of property and life caused by industrial processes accidents. Fault diagnosis, which attempts to determine the fault type, is a vital step in process monitoring because it can help operators respond to abnormal situations appropriately. Adequate data labels to train supervised fault diagnosis models are difficult to acquire in practice; however, semi-supervised methods, which are attracting increasing attention, can use unlabeled data. Self-labeled algorithms are an effective paradigm of semi-supervised methods, but their applications in industrial process fault diagnosis do not meet expectations, because they are prone to performance deterioration when handling industrial process data. To address this issue, a self-training algorithm with a modified confidence measure is proposed. The confidence measure is temporal-spatial with temporal identities of data introduced to its definition and calculation, which makes the algorithm adaptable to industrial processes. The proposed algorithm is also self-adaptive to avoid time-consuming hyper-parameter tuning processes. The benchmark Tennessee Eastman process data were used to evaluate the proposed algorithm, and the experiment results demonstrate its superiority compared to competing semi-supervised methods.
- Published
- 2022
- Full Text
- View/download PDF
14. Effective training duration and frequency for lip-seal training in older people using a self-training instrument
- Author
-
Midori Ohta, Takeshi Oki, Kaoru Sakurai, Takayuki Ueda, and Tomofumi Takano
- Subjects
medicine.medical_specialty ,education ,03 medical and health sciences ,0302 clinical medicine ,Swallowing ,Medicine ,Humans ,030212 general & internal medicine ,Muscle Strength ,General Dentistry ,Aged ,Cross-Over Studies ,business.industry ,Training (meteorology) ,030206 dentistry ,Continuous training ,Crossover study ,Lip ,Test (assessment) ,Deglutition ,Duration (music) ,Physical therapy ,Female ,Geriatrics and Gerontology ,Older people ,business ,Self training - Abstract
Objective To determine the effects of the training duration and frequency on lip-seal strength (LSS) in older people. Background Lip-seal is important for speaking, eating and swallowing. LSS decreases after training ends; therefore, continuous training is essential. Materials and methods Participants underwent the resistance training of LSS. Regarding training duration, eight women aged ≥65 years participated in a crossover study with trainings A (direction: 1, duration: 50 seconds) and B (directions: 3, duration: 3 minutes), daily for 4 weeks. Regarding training frequency, 40 women aged ≥65 years were divided into four groups based on frequency (everyday, every-other-day, once-a-week and control groups), and all groups excluding the control group performed training B for 4 weeks. LSS was measured at weeks 0, 2 and 4 using a digital strain gauge. Friedman's test was used, followed by Steel-Dwass test (α = 0.05). Results Regarding the effects of the training duration, significant differences in LSS were noted between weeks 0 and 4 for training B, but no difference was noted for training A. Regarding training frequency, significant differences were observed between weeks 0 and 2 or 4 in the everyday and once-a-week groups. Significant differences were observed in the every-other-day group between weeks 0 and 4 and no difference in the control group. For all groups, median LSS was higher in week 2 or 4 than that in week 0. Conclusion Lip-seal training for 3 minutes per session everyday, every-other-day or once-a-week for 4 weeks increased LSS of older people.
- Published
- 2021
15. A novel semi-supervised self-training method based on resampling for Twitter fake account identification
- Author
-
Shouqiang Sun, Ziming Zeng, Jie Yin, Jingjing Sun, and Tingting Li
- Subjects
Computer science ,business.industry ,Process (computing) ,Semi-supervised learning ,Library and Information Sciences ,Machine learning ,computer.software_genre ,Data set ,Identification (information) ,Resampling ,Classifier (linguistics) ,Labeled data ,Artificial intelligence ,business ,computer ,Self training ,Information Systems - Abstract
PurposeTwitter fake accounts refer to bot accounts created by third-party organizations to influence public opinion, commercial propaganda or impersonate others. The effective identification of bot accounts is conducive to accurately judge the disseminated information for the public. However, in actual fake account identification, it is expensive and inefficient to manually label Twitter accounts, and the labeled data are usually unbalanced in classes. To this end, the authors propose a novel framework to solve these problems.Design/methodology/approachIn the proposed framework, the authors introduce the concept of semi-supervised self-training learning and apply it to the real Twitter account data set from Kaggle. Specifically, the authors first train the classifier in the initial small amount of labeled account data, then use the trained classifier to automatically label large-scale unlabeled account data. Next, iteratively select high confidence instances from unlabeled data to expand the labeled data. Finally, an expanded Twitter account training set is obtained. It is worth mentioning that the resampling technique is integrated into the self-training process, and the data class is balanced at the initial stage of the self-training iteration.FindingsThe proposed framework effectively improves labeling efficiency and reduces the influence of class imbalance. It shows excellent identification results on 6 different base classifiers, especially for the initial small-scale labeled Twitter accounts.Originality/valueThis paper provides novel insights in identifying Twitter fake accounts. First, the authors take the lead in introducing a self-training method to automatically label Twitter accounts from the semi-supervised background. Second, the resampling technique is integrated into the self-training process to effectively reduce the influence of class imbalance on the identification effect.
- Published
- 2021
- Full Text
- View/download PDF
16. Entropy-aware self-training for graph convolutional networks
- Author
-
Tao Wang, Congyan Lang, Yi Jin, Yidong Li, and Gongpei Zhao
- Subjects
Theoretical computer science ,Artificial Intelligence ,Computer science ,Cognitive Neuroscience ,Node (networking) ,Entropy (information theory) ,Layer (object-oriented design) ,Random walk ,Self training ,Feature learning ,Graph ,Computer Science Applications - Abstract
Recently, graph convolutional networks (GCNs) have achieved significant success in many graph-based learning tasks, especially for node classification, due to its excellent ability in representation learning. Nevertheless, it remains challenging for GCN models to obtain satisfying predictions on graphs where only few nodes are with known labels. In this paper, we propose a novel entropy-aware self-training algorithm to boost semi-supervised node classification on graphs with little supervised information. Firstly, an entropy-aggregation layer is developed to strengthen the reasoning ability of GCN models. To the best of our knowledge, this is the first work to combine the entropy-based random walk theory with GCN design. Furthermore, we propose an ingenious checking part to add new nodes as supervision after each training round to enhance node prediction. In particular, the checking part is designed based on aggregated features, which is demonstrated more effective than previous methods and boosts node classification significantly. The proposed algorithm is validated on six public benchmarks in comparison with several state-of-the-art baseline algorithms, and the results illustrate its excellent performance.
- Published
- 2021
- Full Text
- View/download PDF
17. Semi-Supervised Self-Training of Hate and Offensive Speech from Social Media
- Author
-
Samira Sadaoui and Safa Alsafari
- Subjects
ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Computer science ,Applied psychology ,Offensive ,Social media ,Self training - Abstract
Improving Offensive and Hate Speech (OHS) classifiers’ performances requires a large, confidently labeled textual training dataset. Our study devises a semi-supervised classification approach with ...
- Published
- 2021
- Full Text
- View/download PDF
18. Extracting Relations from Italian Wikipedia Using Self-Training
- Author
-
Siciliani, Lucia, Cassotti, Pierluigi, Basile, Pierpaolo, De Gemmis, Marco, Lops, Pasquale, and Semeraro, Giovanni
- Subjects
Computational Linguistics ,linguistica ,linguistique computationelle ,Italian Wikipedia ,open information extraction ,Self training ,Linguistics ,linguistica computazionale ,linguistique ,Wikipedia ,Language - Abstract
This dataset contains relations extracted from the Italian Wikipedia by the WikiOIE framework. WikiOIE is based on UDPipe and the Universal Dependencies project for text processing. It easily allows customizing the information extraction (IE) approach to automatically extract triples (subject, predicate, object). This dataset contains relations extracted by a supervised approach based on self-training. The extraction process is provided in JSON format. Version 2 of the dataset was extracted using an improved version of the learning algorithm. The files of version 2 are identified by the suffix "_reg" in the file name. More information and the Java code are available here: https://github.com/pippokill/WikiOIE Self-training approach: Lucia Siciliani, Pierluigi Cassotti, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro2021. Extracting Relations from Italian Wikipedia using Self-Training. In Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it2021). CEUR-WS. WikiOIE framework: Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile, Marco de Gemmis, and Pasquale Lops. 2021. Extracting relations from Italian Wikipedia using unsupervised information extraction. In Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS.
- Published
- 2022
19. A Surgical Training Simulator for Quantitative Assessment of the Anastomotic Technique of Coronary Artery Bypass Grafting
- Author
-
Park, Y., Shinke, M., Kanemitsu, N., Yagi, T., Azuma, T., Shiraishi, Y., Kormos, R., Umezu, M., Magjarevic, R., editor, Nagel, J. H., editor, Lim, Chwee Teck, editor, and Goh, James C. H., editor
- Published
- 2009
- Full Text
- View/download PDF
20. A semi-supervised learning method for hyperspectral imagery based on self-training and local-based affinity propagation
- Author
-
Liguo Wang, Wenlong Zhu, Haizhu Pan, Cheng Li, Yanping Teng, Yanzhong Liu, and Haimiao Ge
- Subjects
010504 meteorology & atmospheric sciences ,Computer science ,business.industry ,0211 other engineering and technologies ,Hyperspectral imaging ,Pattern recognition ,02 engineering and technology ,Semi-supervised learning ,01 natural sciences ,Remote sensing (archaeology) ,General Earth and Planetary Sciences ,Affinity propagation ,Artificial intelligence ,business ,Self training ,021101 geological & geomatics engineering ,0105 earth and related environmental sciences - Abstract
In hyperspectral remote sensing, the classification of hyperspectral imagery is an important issue of concern. However, obtaining sufficient labelled samples for the classification is hard work and...
- Published
- 2021
- Full Text
- View/download PDF
21. Graph Convolutional Network-based Model for Incident-related Congestion Prediction: A Case Study of Shanghai Expressways
- Author
-
Hui Li, Weishan Sun, Wenbin Wang, Xi Wang, and Yibo Chai
- Subjects
050210 logistics & transportation ,General Computer Science ,Computer science ,05 social sciences ,02 engineering and technology ,Management Information Systems ,Transport engineering ,Congestion prediction ,Megacity ,Traffic congestion ,Obstacle ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,020201 artificial intelligence & image processing ,China ,Self training - Abstract
Traffic congestion has become a significant obstacle to the development of mega cities in China. Although local governments have used many resources in constructing road infrastructure, it is still insufficient for the increasing traffic demands. As a first step toward optimizing real-time traffic control, this study uses Shanghai Expressways as a case study to predict incident-related congestions. Our study proposes a graph convolutional network-based model to identify correlations in multi-dimensional sensor-detected data, while simultaneously taking into account environmental, spatiotemporal, and network features in predicting traffic conditions immediately after a traffic incident. The average accuracy, average AUC, and average F-1 score of the predictive model are 92.78%, 95.98%, and 88.78%, respectively, on small-scale ground-truth data. Furthermore, we improve the predictive model’s performance using semi-supervised learning by including more unlabeled data instances. As a result, the accuracy, AUC, and F-1 score of the model increase by 2.69%, 1.25%, and 4.72%, respectively. The findings of this article have important implications that can be used to improve the management and development of Expressways in Shanghai, as well as other metropolitan areas in China.
- Published
- 2021
- Full Text
- View/download PDF
22. Attitudes toward overtime work and self‐training: A survey on obstetricians and gynecologists in Japan
- Author
-
Michinori Mayama, Chisato Kodera, Takayuki Enomoto, Michiko Kido, Tokumasa Suemitsu, Takuma Ohsuga, Masayuki Sekine, Yosuke Sugita, Takeshi Umazume, Kazutoshi Nakano, Yuto Maeda, Satoshi Nakagawa, Koji Nishijima, Takashi Murakami, Hidemichi Watari, Yukio Suzuki, Ayako Shibata, Makio Shozu, Nobuya Unno, Yohei Onodera, Jumpei Ogura, and Hiroaki Komatsu
- Subjects
Generation gap ,medicine.medical_specialty ,business.industry ,media_common.quotation_subject ,education ,Obstetrics and Gynecology ,Questionnaire ,Overtime work ,Obstetrics ,Attitude ,Japan ,Obstetrics and gynaecology ,Gynecology ,Surveys and Questionnaires ,Family medicine ,Humans ,Medicine ,Christian ministry ,Quality (business) ,business ,Self training ,Welfare ,media_common - Abstract
Aim The Ministry of Health, Labour, and Welfare of Japan proposed a regulation of overtime work as a reform in work style. However, the regulation may deteriorate the quality of medical services due to the reduction in training time. Thus, the study aimed to reveal perceptions in terms of generation gaps in views on self-training and overtime work, among members of the Japan Society of Obstetrics and Gynecology (JSOG). Methods A web-based, self-administered questionnaire survey was conducted among members of the JSOG. In total, 1256 respondents were included in the analysis. Data were collected on age, sex, experience as a medical doctor, location of workplace, work style, the type of main workplace, and number of full-time doctors in the main workplace. The study examined the attitudes of the respondents toward overtime work and self-training. The respondents were categorized based on experience as a medical doctor. Results According to years of experience, 112 (8.9%), 226 (18.0%), 383 (30.5%), 535 (42.6%) doctors have been working for ≤5, 6-10, 11-19, and ≥ 20 years, respectively. Although 54.5% of doctors with ≤5 years of experience expected the regulation on working hours to improve the quality of medical services, those with ≥20 years of experience expressed potential deterioration. After adjusting for covariates, more years of experience were significantly related with the expectation of deterioration in the quality of medical services. Conclusions The study revealed a generation gap in the views about self-training and overtime work among obstetricians and gynecologists in Japan.
- Published
- 2021
- Full Text
- View/download PDF
23. Perspective of Making Self-training Habit from Psychological Consideration and Practice
- Author
-
Hiroshi Bando, Akito Moriyasu, Hiroya Hanabusa, Mitsuru Murakami, and Makoto Takasugi
- Subjects
Protocol (science) ,Self-efficacy ,Motivation ,Rehabilitation ,medicine.medical_treatment ,media_common.quotation_subject ,Perspective (graphical) ,Applied psychology ,General Medicine ,Sport psychology ,Task (project management) ,Push-up ,Self-training ,medicine ,Habit ,Psychology ,Self training ,media_common - Abstract
Authors and collaborators have continued clinical practice and research on rehabilitation and self-training, in which various problems were found. Protocol: The author himself tried home self-training exercise of push-up for 2 months, which was successfully achieved. Results: Positive changes were 94 to 96.5cm in chest circumference, and 45 to 100 times in continuous push-up, respectively. Discussion: From the viewpoint of sport psychology, close relationship among motivation, self-efficacy and performance has been observed. Self-efficacy can influence one’s beliefs concerning accomplishing and continuing the task, activities and effort. This report will hopefully become the reference for future practice and research development.
- Published
- 2021
- Full Text
- View/download PDF
24. An Effective Tumor Classification With Deep Forest and Self-Training
- Author
-
Lili Shen, Xiaojun Sun, and Zhanbo Chen
- Subjects
semi-supervised learning ,Gene expression omnibus ,General Computer Science ,business.industry ,Process (engineering) ,Computer science ,Tumor classification ,Supervised learning ,General Engineering ,Sample (statistics) ,Machine learning ,computer.software_genre ,Field (computer science) ,TK1-9971 ,Random forest ,ComputingMethodologies_PATTERNRECOGNITION ,self-training ,Robustness (computer science) ,deep forest ,General Materials Science ,Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,business ,computer ,Self training - Abstract
In recent years, tumor classification based on the gene expression omnibus has become a continuous attention field in the area of bioinformatics. Integration machine learning techniques are an efficient methods to solve these problems. Generally, in order to obtain good performance in the supervised learning tasks, a large number of labelled samples will be required. However, in many cases, only a few labelled samples and abundant unlabelled samples exist in the training database. The process of labelling these unlabelled samples manually is difficult and expensive. Therefore, semi-supervised learning approaches have been proposed to utilize unlabelled samples to improve the performance of a model. However, noisy samples decrease the robustness of model in semi-supervised learning. We wish training style that samples can be implemented to train by from high- to low-confidence, self-training can meet this requirement, and the deep forest approach with the hyper-parameter settings used in this work can obtain good accuracy. Therefore, in this paper, we present a novel semi-supervised learning approach with a deep forest model to increase the performance of tumor classification, which employs unlabelled samples and minimizes the cost; that is, a updated unlabelled sample mechanism is investigated to expand the number of high-confidence pseudo-labelled samples. Multiple real-world experiments indicate that our proposed approach can obtain results up 0.96 accuracy and F1-Score, and 0.9798 AUCs.
- Published
- 2021
- Full Text
- View/download PDF
25. PRPS-ST: A Protocol-Agnostic Self-training Method for Gene Expression–Based Classification of Blood Cancers
- Author
-
Christopher Rushton, Ryan D. Morin, Bruno M. Grande, David W. Scott, Aixiang Jiang, Jeffrey Tang, and Laura K. Hilton
- Subjects
Protocol (science) ,Computer science ,Gene Expression ,General Medicine ,Computational biology ,Data type ,Article ,Blood cancer ,Class imbalance ,Binary classification ,Hematologic Neoplasms ,Neoplasms ,Gene expression ,Humans ,Enhanced sensitivity ,Self training - Abstract
Gene expression classifiers are gaining increasing popularity for stratifying tumors into subgroups with distinct biological features. A fundamental limitation shared by current classifiers is the requirement for comparable training and testing datasets. Here, we describe a self-training implementation of our probability ratio-based classification prediction score method (PRPS-ST), which facilitates the porting of existing classification models to other gene expression datasets. In comparison with gold standards, we demonstrate favorable performance of PRPS-ST in gene expression–based classification of diffuse large B-cell lymphoma (DLBCL) and B-lineage acute lymphoblastic leukemia (B-ALL) using a diverse variety of gene expression data types and preprocessing methods, including in classifications with a high degree of class imbalance. Tumors classified by our method were significantly enriched for prototypical genetic features of their respective subgroups. Interestingly, this included cases that were unclassifiable by established methods, implying the potential enhanced sensitivity of PRPS-ST. Significance: The adoption of binary classifiers such as cell of origin (COO) has been thwarted, in part, by the challenges imposed by batch effects and continual evolution of gene expression technologies. PRPS-ST resolves this by enabling classifiers to be ported across platforms while retaining high accuracy. This article is highlighted in the In This Issue feature, p. 215
- Published
- 2020
- Full Text
- View/download PDF
26. A self-training hierarchical prototype-based approach for semi-supervised classification
- Author
-
Xiaowei Gu
- Subjects
Structure (mathematical logic) ,Information Systems and Management ,business.industry ,Computer science ,Process (engineering) ,05 social sciences ,050301 education ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science Applications ,Theoretical Computer Science ,Knowledge base ,Artificial Intelligence ,Control and Systems Engineering ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Key (cryptography) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,0503 education ,Self training ,computer ,Software - Abstract
This paper introduces a novel self-training hierarchical prototype-based approach for semi-supervised classification. The proposed approach firstly identifies meaningful prototypes from labelled samples at multiple levels of granularity and, then, self-organizes a highly transparent, multi-layered recognition model by arranging them in a form of pyramidal hierarchies. After this, the learning model continues to self-evolve its structure and self-expand its knowledge base to incorporate new patterns recognized from unlabelled samples by exploiting the pseudo-label technique. Thanks to its prototype-based nature, the overall computational process of the proposed approach is highly explainable and traceable. Experimental studies with various benchmark image recognition problems demonstrate the state-of-the-art performance of the proposed approach, showing its strong capability to mine key information from unlabelled data for classification.
- Published
- 2020
- Full Text
- View/download PDF
27. A Prediction Approach Based on Self-Training and Deep Learning for Biological Data
- Author
-
Mohamed Lamine Berkane, Mahmoud Boufaida, and Mohamed Nadjib Boufenara
- Subjects
Biological data ,ComputingMethodologies_PATTERNRECOGNITION ,business.industry ,Computer science ,Deep learning ,Artificial intelligence ,business ,Machine learning ,computer.software_genre ,computer ,Self training - Abstract
With the exponential growth of biological data, labeling this kind of data becomes difficult and costly. Although unlabeled data are comparatively more plentiful than labeled ones, most supervised learning methods are not designed to use unlabeled data. Semi-supervised learning methods are motivated by the availability of large unlabeled datasets rather than a small amount of labeled examples. However, incorporating unlabeled data into learning does not guarantee an improvement in classification performance. This paper introduces an approach based on a model of semi-supervised learning, which is the self-training with a deep learning algorithm to predict missing classes from labeled and unlabeled data. In order to assess the performance of the proposed approach, two datasets are used with four performance measures: precision, recall, F-measure, and area under the ROC curve (AUC).
- Published
- 2020
- Full Text
- View/download PDF
28. Tunnel condition assessment via cloud model‐based random forests and self‐training approach
- Author
-
Hehua Zhu, J. Woody Ju, Feng Guo, Mengqi Zhu, and Xueqin Chen
- Subjects
Computer science ,business.industry ,Decision tree ,Cloud computing ,Machine learning ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Condition assessment ,Computer Science Applications ,Random forest ,Computational Theory and Mathematics ,Artificial intelligence ,CRFS ,business ,computer ,Self training ,Civil and Structural Engineering - Abstract
To proactively assess the losses caused by the deterioration of metro tunnels during the operational period, a new method, the cloud model‐based random forests (CRFs), is proposed to discu...
- Published
- 2020
- Full Text
- View/download PDF
29. A semi-supervised self-training method based on density peaks and natural neighbors
- Author
-
Junnan Li and Suwen Zhao
- Subjects
0209 industrial biotechnology ,General Computer Science ,business.industry ,Computer science ,Decision tree ,Pattern recognition ,Computational intelligence ,02 engineering and technology ,k-nearest neighbors algorithm ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Cluster analysis ,business ,Self training ,Classifier (UML) - Abstract
The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification and can train a classifier by exploiting both labeled data and unlabeled data. However, most of the self-training methods are limited by the distribution of initial labeled data, heavily rely on parameters and have the poor ability of prediction in the self-training process. To solve these problems, a novel self-training method based on density peaks and natural neighbors (STDPNaN) is proposed. In STDPNaN, an improved parameter-free density peaks clustering (DPCNaN) is firstly presented by introducing natural neighbors. The DPCNaN can reveal the real structure and distribution of data without any parameter, and then helps STDPNaN restore the real data space with the spherical or non-spherical distribution. Also, an ensemble classifier is employed to improve the predictive ability of STDPNaN in the self-training process. Intensive experiments show that (a) STDPNaN outperforms state-of-the-art methods in improving classification accuracy of k nearest neighbor, support vector machine and classification and regression tree; (b) STDPNaN also outperforms comparison methods without any restriction on the number of labeled data; (c) the running time of STDPNaN is acceptable.
- Published
- 2020
- Full Text
- View/download PDF
30. Semi‐Supervised Learning
- Author
-
Gaurav Malik, Deepak Kumar Sharma, and Manish Devgan
- Subjects
symbols.namesake ,business.industry ,Computer science ,symbols ,Artificial intelligence ,Semi-supervised learning ,Baum–Welch algorithm ,Machine learning ,computer.software_genre ,business ,computer ,Self training - Published
- 2020
- Full Text
- View/download PDF
31. Self-training algorithm combining density peak and cut edge weight
- Author
-
Yang Liu
- Subjects
Computer science ,Edge (geometry) ,Self training ,Algorithm - Published
- 2020
- Full Text
- View/download PDF
32. A boosting Self-Training Framework based on Instance Generation with Natural Neighbors for K Nearest Neighbor
- Author
-
Junnan Li and Qingsheng Zhu
- Subjects
Boosting (machine learning) ,Computer science ,business.industry ,02 engineering and technology ,Machine learning ,computer.software_genre ,Ensemble learning ,k-nearest neighbors algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Labeled data ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Self training ,computer ,Classifier (UML) - Abstract
The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification. The mislabeling is the most challenging issue in self-training methods and the ensemble learning is one of the common techniques for dealing with the mislabeling. Specifically, the ensemble learning can solve or alleviate the mislabeling by constructing an ensemble classifier to improve prediction accuracy in the self-training process. However, most ensemble learning methods may not perform well in self-training methods because it is difficult for ensemble learning methods to train an effective ensemble classifier with a small number of labeled data. Inspired by the successful boosting methods, we introduce a new boosting self-training framework based on instance generation with natural neighbors (BoostSTIG) in this paper. BoostSTIG is compatible with most boosting methods and self-training methods. It can use most boosting methods to solve or alleviate the mislabeling of existing self-training methods by improving the prediction accuracy in the self-training process. Besides, an instance generation with natural neighbors is proposed to enlarge initial labeled data in BoostSTIG, which makes boosting methods more suitable for self-training methods. In experiments, we apply the BoostSTIG framework to 2 self-training methods and 4 boosting methods, and then validate BoostSTIG by comparing some state-of-the-art technologies on real data sets. Intensive experiments show that BoostSTIG can improve the performance of tested self-training methods and train an effective k nearest neighbor.
- Published
- 2020
- Full Text
- View/download PDF
33. Self-training and learning the waveform features of microseismic data using an adaptive dictionary
- Author
-
Quan Zhang, Hang Wang, Jinwei Fang, Guoyin Zhang, and Yangkang Chen
- Subjects
Microseism ,010504 meteorology & atmospheric sciences ,Computer science ,Process (computing) ,010502 geochemistry & geophysics ,computer.software_genre ,01 natural sciences ,Geophysics ,Hydraulic fracturing ,Geochemistry and Petrology ,Waveform ,Unsupervised learning ,Data mining ,computer ,Dictionary learning ,Self training ,0105 earth and related environmental sciences - Abstract
Microseismic monitoring is an indispensable technique in characterizing the physical processes that are caused by extraction or injection of fluids during the hydraulic fracturing process. Microseismic data, however, are often contaminated with strong random noise and have a low signal-to-noise ratio (S/N). The low S/N in most microseismic data severely affects the accuracy and reliability of the source localization and source-mechanism inversion results. We have developed a new denoising framework to enhance the quality of microseismic data. We use the method of adaptive sparse dictionaries to learn the waveform features of the microseismic data by iteratively updating the dictionary atoms and sparse coefficients in an unsupervised way. Unlike most existing dictionary learning applications in the seismic community, we learn the features from 1D microseismic data, thereby to learn 1D features of the waveforms. We develop a sparse dictionary learning framework and then prepare the training patches and implement the algorithm to obtain favorable denoising performance. We use extensive numerical examples and real microseismic data examples to demonstrate the validity of our method. Results show that the features of microseismic waveforms can be learned to distinguish signal patches and noise patches even from a single channel of microseismic data. However, more training data can make the learned features smoother and better at representing useful signal components.
- Published
- 2020
- Full Text
- View/download PDF
34. Divide-and-conquer ensemble self-training method based on probability difference
- Author
-
Tingting Li and Jia Lu
- Subjects
Divide and conquer algorithms ,Structure (mathematical logic) ,General Computer Science ,Generalization ,business.industry ,Computer science ,Process (computing) ,020206 networking & telecommunications ,Computational intelligence ,Pattern recognition ,02 engineering and technology ,ComputingMethodologies_PATTERNRECOGNITION ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Noise (video) ,Artificial intelligence ,business ,Classifier (UML) ,Self training - Abstract
Self-training method can train an effective classifier by exploiting labeled instances and unlabeled instances. In the process of self-training method, the high confidence instances are usually selected iteratively and added to the training set for learning. Unfortunately, the structure information of high confidence instances is so similar that it leads to local over-fitting during the iterations. In order to avoid the over-fitting phenomenon, and improve the classification effect of self-training methods, a novel divide-and-conquer ensemble self-training framework based on probability difference is proposed. Firstly, the probability difference of instances is calculated by the category probability of each classifier, the low-fuzzy and high-fuzzy instances of each classifier are divided through the probability difference. Then, a divide-and-conquer strategy is adopted. That is, the low-fuzzy instances determined by all the classifiers are directly labeled and high-fuzzy instances are manually labeled. Finally, the labeled instances are added to the training set for iteration self-training. This method expands the training set by selecting low-fuzzy instances with accurate structure information and high-fuzzy instances with more comprehensive structure information, and it improves the generalization performance of the method effectively. The method is more suitable for noise data sets and it can obtain structure information even in a few labeled instances. The effectiveness of the proposed method is verified by comparative experiments on the University of California Irvine (UCI).
- Published
- 2020
- Full Text
- View/download PDF
35. STDS: self-training data streams for mining limited labeled data in non-stationary environment
- Author
-
Jafar Tanha, Arash Sharifi, Shirin Khezri, and Ali Ahmadi
- Subjects
Concept drift ,Data stream mining ,business.industry ,Computer science ,02 engineering and technology ,Machine learning ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Data point ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Labeled data ,020201 artificial intelligence & image processing ,Artificial intelligence ,Cluster analysis ,business ,Self training ,Classifier (UML) ,computer - Abstract
Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are several approaches to semi-supervised learning in stationary environment which are not applicable directly for data streams. We propose a novel semi-supervised learning algorithm, named STDS. The proposed approach uses labeled and unlabeled data and employs an approach to handle the concept drift in data streams. The main challenge in semi-supervised self-training for data streams is to find a proper selection metric in order to find a set of high-confidence predictions and a proper underlying base learner. We therefore propose an ensemble approach to find a set of high-confidence predictions based on clustering algorithms and classifier predictions. We then employ the Kullback-Leibler (KL) divergence approach to measure the distribution differences between sequential chunks in order to detect the concept drift. When drift is detected, a new classifier is updated from the new set of labeled data in the current chunk; otherwise, a percentage of high-confidence newly labeled data in the current chunk is added to the labeled data in the next chunk for updating the incremental classifier based on the proposed selection metric. The results of our experiments on a number of classification benchmark datasets show that STDS outperforms the supervised and the most of other semi-supervised learning methods.
- Published
- 2020
- Full Text
- View/download PDF
36. METHODOLOGY OF ORGANISING SELF-TRAINING OF PROSPECTIVE SINGERS FOR STAGING POPULAR PERFORMANCES
- Author
-
D. Lievit
- Subjects
Medical education ,Psychology ,Self training - Published
- 2020
- Full Text
- View/download PDF
37. Improved well-log classification using semisupervised label propagation and self-training, with comparisons to popular supervised algorithms
- Author
-
Alison Malcolm, Michael W. Dunham, and J. Kim Welford
- Subjects
010504 meteorology & atmospheric sciences ,Computer science ,business.industry ,010502 geochemistry & geophysics ,Machine learning ,computer.software_genre ,01 natural sciences ,ComputingMethodologies_PATTERNRECOGNITION ,Geophysics ,Geochemistry and Petrology ,Artificial intelligence ,business ,computer ,Self training ,0105 earth and related environmental sciences ,Label propagation - Abstract
Machine-learning techniques allow geoscientists to extract meaningful information from data in an automated fashion, and they are also an efficient alternative to traditional manual interpretation methods. Many geophysical problems have an abundance of unlabeled data and a paucity of labeled data, and the lithology classification of wireline data reflects this situation. Training supervised algorithms on small labeled data sets can lead to overtraining, and subsequent predictions for the numerous unlabeled data may be unstable. However, semisupervised algorithms are designed for classification problems with limited amounts of labeled data, and they are theoretically able to achieve better accuracies than supervised algorithms in these situations. We explore this hypothesis by applying two semisupervised techniques, label propagation (LP) and self-training, to a well-log data set and compare their performance to three popular supervised algorithms. LP is an established method, but our self-training method is a unique adaptation of existing implementations. The well-log data were made public through an SEG competition held in 2016. We simulate a semisupervised scenario with these data by assuming that only one of the 10 wells has labels (i.e., core samples), and our objective is to predict the labels for the remaining nine wells. We generate results from these data in two stages. The first stage is applying all the algorithms in question to the data as is (i.e., the global data), and the results from this motivate the second stage, which is applying all algorithms to the data when they are decomposed into two separate data sets. Overall, our findings suggest that LP does not outperform the supervised methods, but our self-training method coupled with LP can outperform the supervised methods by a notable margin if the assumptions of LP are met.
- Published
- 2020
- Full Text
- View/download PDF
38. METHODS OF DEVELOPMENT AND USE OF MENTAL MAPS DURING SELF-TRAINING OF HIGHER EDUCATION PERSONS ENGAGED IN MARTIAL ARTS
- Author
-
O.O. Nesterenko, O.A. Samoilenko, S.V. Levchenko, V.P. Skliarenko, and S.I. Karpenko
- Subjects
Medical education ,Fuel Technology ,Martial arts ,Higher education ,business.industry ,Process Chemistry and Technology ,Mental mapping ,Economic Geology ,General Medicine ,Psychology ,business ,Self training - Published
- 2020
- Full Text
- View/download PDF
39. Chronological Self-Training for Real-Time Speaker Diarization
- Author
-
Dirk Padfield and Daniel J. Liebling
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Sound (cs.SD) ,Computer Science - Computation and Language ,Computer science ,Speech recognition ,Computer Science - Sound ,Machine Learning (cs.LG) ,Speaker diarisation ,Audio and Speech Processing (eess.AS) ,FOS: Electrical engineering, electronic engineering, information engineering ,Self training ,Computation and Language (cs.CL) ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Diarization partitions an audio stream into segments based on the voices of the speakers. Real-time diarization systems that include an enrollment step should limit enrollment training samples to reduce user interaction time. Although training on a small number of samples yields poor performance, we show that the accuracy can be improved dramatically using a chronological self-training approach. We studied the tradeoff between training time and classification performance and found that 1 second is sufficient to reach over 95% accuracy. We evaluated on 700 audio conversation files of about 10 minutes each from 6 different languages and demonstrated average diarization error rates as low as 10%., Comment: 5 pages, 5 figures, ICASSP 2021
- Published
- 2022
- Full Text
- View/download PDF
40. THEORETICAL TRAINING OF CADETS OF HIGHER EDUCATION INSTITUTIONS OF THE MINISTRY OF INTERNAL AFFAIRS IN SPORTS AND PEDAGOGICAL DISCIPLINES
- Author
-
O. Arkhipova, E. Krestnikova, and D. Gladkikh
- Subjects
Officer ,Medical education ,Higher education ,business.industry ,Political science ,Foundation (evidence) ,Christian ministry ,business ,Training (civil) ,Self training ,Professional activity - Abstract
The purpose of the study was the presence of a theoretical foundation of knowledge in sports and pedagogical disciplines among cadets of the Ministry of Internal Affairs Universities, which determines their success in further professional activities. Theory not only motivates, programs, and regulates, but also controls the practical activities of the future police officer. The successful acquisition of certain knowledge and skills serves as criteria for their entry into the general cultural baggage of a specialist, expanding the opportunities for the development of his professional activity
- Published
- 2021
- Full Text
- View/download PDF
41. Uncertainty-Aware Self-Training for Semi-Supervised Event Temporal Relation Extraction
- Author
-
Wei Bi, Jun Zhao, Yubo Chen, Xinyu Zuo, Pengfei Cao, and Kang Liu
- Subjects
Sample selection ,Event (computing) ,business.industry ,Computer science ,Process (engineering) ,Natural language understanding ,computer.software_genre ,Machine learning ,Relationship extraction ,Task (project management) ,Artificial intelligence ,business ,Self training ,Data Annotation ,computer - Abstract
Extracting event temporal relations is an important task for natural language understanding. Many works have been proposed for supervised event temporal relation extraction, which typically requires a large amount of human-annotated data for model training. However, the data annotation for this task is very time-consuming and challenging. To this end, we study the problem of semi-supervised event temporal relation extraction. Self-training as a widely used semi-supervised learning method can be utilized for this problem. However, it suffers from the noisy pseudo-labeling problem. In this paper, we propose the use of uncertainty-aware self-training framework (UAST) to quantify the model uncertainty for coping with pseudo-labeling errors. Specifically, UAST utilizes (1) Uncertainty Estimation module to compute the model uncertainty for pseudo-labeling unlabeled data; (2) Sample Selection with Exploration module to select informative samples based on uncertainty estimates; and (3) Uncertainty-Aware Learning module to explicitly incorporate the model uncertainty into the self-training process. Experimental results indicate that our approach significantly outperforms previous state-of-the-art methods.
- Published
- 2021
- Full Text
- View/download PDF
42. Active learning algorithms for multitopic classification
- Author
-
Universitat Politècnica de Catalunya. Departament d'Enginyeria Telemàtica, Moreno Bilbao, M. Asunción, Ruiz Costa-Jussà, Marta, Bonafonte Pardàs, Guillem, Universitat Politècnica de Catalunya. Departament d'Enginyeria Telemàtica, Moreno Bilbao, M. Asunción, Ruiz Costa-Jussà, Marta, and Bonafonte Pardàs, Guillem
- Abstract
In this master thesis we develop a model that surpasses previous studies to be able to detect cyberbullying and other disorders that are a common behaviour in teenagers. We analyze short sentences in social media with new techniques that haven?t been studied in depth in language processing in order to be able to detect these problems. Deep learning is nowadays the common approach for text analysis. However, struggling with dataset size is one of the most common problems. It is not optimal to dedicate thousands of hours to label data by humans every time we want to create a new model. Different techniques have been used over the years to solve or at least minimize this problem, for instance transfer learning or self-learning. One of the most known ways to solve this is by data augmentation. In this thesis we make use of active learning and self-training to address having restrictions of labelled data. We have used data that has not been labeled to improve the performance of our models. The architecture of the model is composed of a Bert model plus a linear layer that projects the Bert sentence embedding into the number of classes we want to detect. We take advantage of this already functional model to label new data that we will use afterwards to create our final model. Using noise techniques we modify the data so the final model has to predict less structured data and learn from difficult scenarios. Thanks to this technique we were able to improve the results in some of the classes, for instance the F-score modified increases by 7% for substance abuse (drugs, alcohol, etc) and 3% in disorders (anxiety, depression and distress) while keeping the performance of the other classes.
- Published
- 2021
43. Dual-Consistency Self-Training For Unsupervised Domain Adaptation
- Author
-
Jie Wang, Yasuto Yokota, Chaoliang Zhong, Masaru Ide, Cheng Feng, and Jun Sun
- Subjects
Dual consistency ,Domain adaptation ,Computer science ,business.industry ,Artificial intelligence ,Machine learning ,computer.software_genre ,business ,Self training ,computer - Published
- 2021
- Full Text
- View/download PDF
44. An Improved Self-Training Method for Positive Unlabeled Time Series Classification Using DTW Barycenter Averaging
- Author
-
Yabo Dong, Duanqing Xu, Tongbin Zuo, Jing Li, and Haowen Zhang
- Subjects
Time series classification ,Dynamic time warping ,Computer science ,Boundary (topology) ,TP1-1185 ,Biochemistry ,Article ,Analytical Chemistry ,Domain (software engineering) ,Set (abstract data type) ,self-training ,Cluster Analysis ,Humans ,Electrical and Electronic Engineering ,Instrumentation ,Sequence ,business.industry ,Chemical technology ,positive unlabeled time series classification ,Pattern recognition ,Atomic and Molecular Physics, and Optics ,ComputingMethodologies_PATTERNRECOGNITION ,dynamic time warping ,Labeled data ,Artificial intelligence ,business ,Self training ,DTW barycenter averaging - Abstract
Traditional supervised time series classification (TSC) tasks assume that all training data are labeled. However, in practice, manually labelling all unlabeled data could be very time-consuming and often requires the participation of skilled domain experts. In this paper, we concern with the positive unlabeled time series classification problem (PUTSC), which refers to automatically labelling the large unlabeled set U based on a small positive labeled set PL. The self-training (ST) is the most widely used method for solving the PUTSC problem and has attracted increased attention due to its simplicity and effectiveness. The existing ST methods simply employ the one-nearest-neighbor (1NN) formula to determine which unlabeled time-series should be labeled. Nevertheless, we note that the 1NN formula might not be optimal for PUTSC tasks because it may be sensitive to the initial labeled data located near the boundary between the positive and negative classes. To overcome this issue, in this paper we propose an exploratory methodology called ST-average. Unlike conventional ST-based approaches, ST-average utilizes the average sequence calculated by DTW barycenter averaging technique to label the data. Compared with any individuals in PL set, the average sequence is more representative. Our proposal is insensitive to the initial labeled data and is more reliable than existing ST-based methods. Besides, we demonstrate that ST-average can naturally be implemented along with many existing techniques used in original ST. Experimental results on public datasets show that ST-average performs better than related popular methods.
- Published
- 2021
45. Personalized Simulated HUT as an At-home Prediction Model for Heart Rate Changes in Syncope Patients and At-home, Orthostatic Self-training Efficacy
- Author
-
Helmut Ahammer, Herbert F. Jelinek, Dahlia Hassan, Dominik Wehler, and Robert Krones
- Subjects
medicine.medical_specialty ,Orthostatic vital signs ,biology ,business.industry ,Internal medicine ,Heart rate ,Syncope (genus) ,Cardiology ,medicine ,biology.organism_classification ,business ,Self training - Abstract
Head-up tilt (HUT) testing supports the diagnosis of syncope by detecting abnormalities in heart rate and blood pressure changes. Home-based self-training can be of benefit to neurocardiogenic patients if during clinical HUT, heart rate decreases in the early stage of being in an upright position. However, HUT testing is not always possible in the hospital as it is inconvenient and sometimes even risky for patients with cardiac abnormalities as it may trigger a loss of consciousness and arrhythmia. To address this, the current paper introduces a personalized HUT simulation to determine the efficacy of at-home training. To develop the model, Holter ECG recordings were obtained from 28 syncope patients and the simulated output was compared to clinical findings. The model aims to predict heart rate changes associated with the simulated HUT that can indicate efficacy of an at-home program. Heart rate represents a variable of velocity in the model measured in liters per second against gravity. The results show that a decrease in heart rate in early simulated HUT as determined by the model shows a greater than 84% efficiency for syncope patients to benefit from at-home training and allows physicians to recommend home training during an online or telemedicine consultation. Keywords— head-up tilt test, syncope, blood flow, heart rate prediction Clinical Relevance— The cardiovascular model predicts the patient-specific efficacy of at home tilt-training for patients diagnosed with syncope.
- Published
- 2021
- Full Text
- View/download PDF
46. Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers
- Author
-
Dong-Hoon Lee, Namgyu Kim, and William Xiu Shun Wong
- Subjects
Information Systems and Management ,Sociology and Political Science ,Computer science ,business.industry ,Artificial intelligence ,business ,Machine learning ,computer.software_genre ,Self training ,computer - Published
- 2019
- Full Text
- View/download PDF
47. Interpolative self-training approach for link prediction
- Author
-
Somayyeh Aghababaei and Masoud Makrehchi
- Subjects
Artificial Intelligence ,business.industry ,Computer science ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Machine learning ,computer.software_genre ,Link (knot theory) ,computer ,Self training ,Theoretical Computer Science - Published
- 2019
- Full Text
- View/download PDF
48. Deep Contextualized Self-training for Low Resource Dependency Parsing
- Author
-
Roi Reichart and Guy Rotman
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Linguistics and Language ,Computer Science - Computation and Language ,Low resource ,business.industry ,Computer science ,Communication ,lcsh:P98-98.5 ,computer.software_genre ,Machine Learning (cs.LG) ,Computer Science Applications ,Human-Computer Interaction ,Artificial Intelligence ,Dependency grammar ,Labeled data ,Artificial intelligence ,lcsh:Computational linguistics. Natural language processing ,business ,Computation and Language (cs.CL) ,computer ,Self training ,Natural language processing - Abstract
Neural dependency parsing has proven very effective, achieving state-of-the-art results on numerous domains and languages. Unfortunately, it requires large amounts of labeled data, that is costly and laborious to create. In this paper we propose a self-training algorithm that alleviates this annotation bottleneck by training a parser on its own output. Our Deep Contextualized Self-training (DCST) algorithm utilizes representation models trained on sequence labeling tasks that are derived from the parser's output when applied to unlabeled data, and integrates these models with the base parser through a gating mechanism. We conduct experiments across multiple languages, both in low resource in-domain and in cross-domain setups, and demonstrate that DCST substantially outperforms traditional self-training as well as recent semi-supervised training methods., Comment: Accepted to TACL in September 2019
- Published
- 2019
- Full Text
- View/download PDF
49. Logistics optimisation of slab pre-marshalling problem in steel industry
- Author
-
Lixin Tang, Ying Meng, Jiyin Liu, Peixin Ge, and Ren Zhao
- Subjects
0209 industrial biotechnology ,021103 operations research ,Computer science ,business.industry ,Strategy and Management ,0211 other engineering and technologies ,02 engineering and technology ,Structural engineering ,Management Science and Operations Research ,Hybrid algorithm ,Industrial and Manufacturing Engineering ,Marshalling ,020901 industrial engineering & automation ,Stack (abstract data type) ,Group (periodic table) ,Slab ,business ,Self training - Abstract
We study the slab pre-marshalling problem to re-position slabs in a way that the slabs are stored in the least number of stacks and each stack contains only the slabs of the same group, which can b...
- Published
- 2019
- Full Text
- View/download PDF
50. Materials for Self-Training of Foreign Students in the Course 'Fundamentals of Linguistics'
- Author
-
Oksana Voloshina
- Subjects
Mathematics education ,Psychology ,Self training ,Course (navigation) - Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.