200 results on '"Unlabelled data"'
Search Results
2. Exploiting unlabelled data for relation extraction
- Author
-
Tran, Thy, Ananiadou, Sophia, and Batista-Navarro, Riza Theresa
- Subjects
006.3 ,Unlabelled Data ,Relation Extraction - Abstract
Information extraction transforms unstructured text to structured by annotating semantic information on raw data. A crucial step in information extraction is relation extraction, which identifies semantic relationships between named entities in text. The resulting relations can be used to construct and populate knowledge bases as well as used in various applications such as information retrieval and question answering. Relation extraction has been widely studied using fully supervised learning and distantly supervised approaches, these approaches require either manually- or automatically-annotated data. In contrast, a massive amount of unlabelled texts freely available are underused. We hence focus on leveraging the unlabelled data to improve and extend relation extraction. We approach the use of unlabelled text from three directions: (i) use it for pre-training word representations, (ii) conduct unsupervised learning, and (iii) perform weak supervision. Regarding the first direction, we want to leverage syntactic information for relation extraction. Instead of directly tuning such information on a relation extraction corpus, we propose a novel graph neural model for learning syntactically-informed word representations. The proposed method allows us to enrich pretrained word representations with syntactic information rather than re-training language models from scratch as previous work. Throughout this work, we can confirm that our novel representations are beneficial for relations in two different domains. In the second direction, we study unsupervised relation extraction, which is a promising approach because it does not require manually- or automatically-labelled data. We hypothesise that inductive biases are extremely important to direct unsupervised relation extraction. We hence employ two simple methods using only entity types to infer relations. Despite their simplicity, our methods can outperform existing approaches on two popular datasets. These surprising results suggest that entity types provide a strong inductive bias for unsupervised relation extraction. The last direction is inspired by recent evidence that large-scale pretrained language models capture some sort of relational facts. We want to investigate whether these pretrained language models can serve as weak annotators. To this end, we evaluate three large pretrained language models by matching sentences against relations' exemplars. The matching scores decide how likely a given sentence expresses a relation. The top relations are further used as weak annotations to train a relation classifier. We observe that pretrained language models are confused by highly similar relations, thus, we propose a method that models the labelling confusion to correct relation prediction. We validate the proposed method on two datasets with different characteristics, showing that it can effectively model labelling noise from our weak annotator. Overall, we illustrate that exploring the use of unlabelled data is an important step towards improving relation extraction. The use of unlabelled data is a promising path for relation extraction and should receive more attention from researchers.
- Published
- 2021
3. Predicting Financial Literacy via Semi-supervised Learning
- Author
-
Rudd, David Hason, Huo, Huan, Xu, Guandong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Long, Guodong, editor, Yu, Xinghuo, editor, and Wang, Sen, editor
- Published
- 2022
- Full Text
- View/download PDF
4. AT-ST: Self-training Adaptation Strategy for OCR in Domains with Limited Transcriptions
- Author
-
Kišš, Martin, Beneš, Karel, Hradiš, Michal, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lladós, Josep, editor, Lopresti, Daniel, editor, and Uchida, Seiichi, editor
- Published
- 2021
- Full Text
- View/download PDF
5. Soft Voting Windowing Ensembles for Learning from Partially Labelled Streams
- Author
-
Floyd, Sean L. A., Viktor, Herna L., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ceci, Michelangelo, editor, Loglisci, Corrado, editor, Manco, Giuseppe, editor, Masciari, Elio, editor, and Ras, Zbigniew, editor
- Published
- 2020
- Full Text
- View/download PDF
6. RESEARCH ON THE GEOLOGICAL ENTITIES BUSINESS RELATION EXTRACTION BASED ON THE BOOTSTRAPPING METHOD.
- Author
-
Lv Pengfei, Yao Zheng, Wang Chunning, Zhu Yueqin, and Liu Wei
- Subjects
GEOLOGICAL research ,MACHINE learning ,KNOWLEDGE base - Abstract
Copyright of Transformations in Business & Economics is the property of Vilnius University and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2022
7. Integrating contextual information into multi-class classification to improve the context-aware recommendation.
- Author
-
STITINI, Oumaima, KALOUN, Soulaimane, and BENCHAREF, Omar
- Subjects
RECOMMENDER systems ,INFORMATION retrieval ,UBIQUITOUS computing ,CLASSIFICATION ,INFORMATION processing ,DATA mining - Abstract
Researchers and practitioners in various fields, including e-commerce customization, information retrieval, ubiquitous and mobile computing, data mining, marketing, and management, have realized the value of contextual information. Context-aware recommender systems assist users in finding their chosen material in a reasonable amount of time by utilizing information that describes the scenario in which the items will be consumed. For better personalized user recommendation, recommender systems leverage the contextual information in their process of recommendation called context-aware recommendation. Classification is used for context-prediction which represents the prediction of future context based on recorded previous context. The context prediction algorithm's goal is to recognize typical behavior patterns that have been seen in the past and then offer the most likely continuation of a presently observed collection of context components based on this knowledge. In this article we study the correlation between the multi-class classification and the context-aware recommendation.With this correlation we conclude that the linkage between contextual information and classification enhance and improve the recommendation results. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. HLMCC: A Hybrid Learning Anomaly Detection Model for Unlabeled Data in Internet of Things
- Author
-
Nusaybah Alghanmi, Reem Alotaibi, and Seyed M Buhari
- Subjects
Anomaly detection ,Internet of Things ,machine learning ,unlabelled data ,sensor data ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The Internet of Things (IoT) is a network of distributed devices or sensors connected through the internet to allow gathering and sharing of data. The data generated by these devices is affected by anomalies or abnormal behaviour due to attack issues, or breakdown in devices, as examples. However, most current anomaly detection systems rely on labelled data, while the class labels for IoT data are usually unavailable. Furthermore, the manual labelling task is expensive and time-consuming to perform due to the need for domain experts. More importantly, the volume of data in the IoT is growing rapidly, creating a need to predict the classification labels for future data. This study proposes a Hybrid Learning Model which uses both Clustering and Classification methods (HLMCC) to automate the labelling process and detect anomalies in IoT data. The model consists of two practical phases, automatic labelling and detecting anomalies. First, the HLMCC groups the data into normal and anomaly clusters by adopting Hierarchical Affinity Propagation (HAP) clustering. Second, the labelled data obtained from the clustering phase is used to train the Decision Trees (DTs) and to classify future unseen data. The results show that the HLMCC is able to automate the labelling of data, which is beneficial to minimize human involvement. Moreover, HLMCC outperforms the DTs on the originally labelled datasets and the state-of-the-art model over a wide range of evaluation metrics based on the average ranks. HLMCC produces the highest average ranks against other models in terms of False Positive Rate (FPR), recall, precision and the Area Under the Precision-Recall curve (AUCPR) with 1.8, 1.6, 1.8 and 1.8, respectively.
- Published
- 2019
- Full Text
- View/download PDF
9. Combining Dimensionality Reduction with Random Forests for Multi-label Classification Under Interactivity Constraints
- Author
-
Nair-Benrekia, Noureddine-Yassine, Kuntz, Pascale, Meyer, Frank, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Kim, Jinho, editor, Shim, Kyuseok, editor, Cao, Longbing, editor, Lee, Jae-Gil, editor, Lin, Xuemin, editor, and Moon, Yang-Sae, editor
- Published
- 2017
- Full Text
- View/download PDF
10. Context Aware Data Fusion on Massive IOT Data in Dynamic IOT Analytics.
- Author
-
Saranya, S. S. and Fatima, N. Sabiyath
- Subjects
- *
MULTISENSOR data fusion , *CONVOLUTIONAL neural networks , *SIGNAL convolution , *INTERNET of things , *NOISE control , *DATA scrubbing - Abstract
Educational Data management is a critical task for the researchers due to mammoth data generated by sensors and IoT (Internet of Things) devices. Managing this huge volume of data, cleaning this data from impurities is an inherent need. DF (Data Fusion) processes combine data from multiple sources based on their similarity for an easy management. DF processes focus on many factors like nature of data and application that uses that data. Many DFAs (Data Fusion approaches) have been proposed without detailing on the context for integrating data in fusion tasks. This work attempts to cover this gap of context's relevance by proposing a technique CDFT (Context aware Data Fusion technique). In this research work, initially data from IoT devices will be gathered and pre-processed to make it clear for the fusion processing. In this work, boundary based noise reduction algorithm is introduced for data pre-processing which attempts to label the unlabelled attributes in the data's that are gathered, so that data fusion can be done accurately. After pre-processing Context aware data fusion is performed which will combine the data's from multiple IoT devices together with the concern of context. Finally this combined data will be learnt using the convolution neural network for data fusion performance checking. The proposed CDFT is simulated on Matlab whose results prove that the proposed technique obtains optimal outcomes. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. Microarray Breast Cancer Data Clustering Using Map Reduce Based K-Means Algorithm.
- Author
-
Thottathy, Hymavathi, Pavan, Kanadam Karteeka, and Panchadula, Rajeev Priyatam
- Subjects
MICROARRAY technology ,K-means clustering ,CLUSTER analysis (Statistics) ,BREAST cancer ,GENE expression - Abstract
Breast cancer is one of the world's most advanced and most common cancers occurring in women. An early diagnosis of breast cancer offers treatment for it; therefore, several experiments are in development establishing approaches for the early detection of breast cancer. The great increase in research in the last decade in microarray data processing is a potent tool of diagnosing diseases. Based on genomic knowledge, micro-arrays have changed the way clinical pathology recognizes, identifies, and classifies the diseases of humans, particularly those of cancer. In this article, we examined microarray data for breast cancer with the k-means clustering algorithm, but it was hard to scale and process a large number of micro-array data alone. To this end, we use a chart to minimize the paradigm for evaluating microarray data on breast cancer. Moreover, the efficiency of the parallel kmeans model is measured with the operating period, the scaling, and all runtime of the model. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. Semisupervised inference for explained variance in high dimensional linear regression and its applications.
- Author
-
Tony Cai, T. and Guo, Zijian
- Subjects
SIGNAL detection ,MATHEMATICAL statistics ,VARIANCES ,FORECASTING ,STATISTICAL accuracy - Abstract
Summary: The paper considers statistical inference for the explained variance βTΣβ under the high dimensional linear model Y=Xβ+ε in the semisupervised setting, where β is the regression vector and Σ is the design covariance matrix. A calibrated estimator, which efficiently integrates both labelled and unlabelled data, is proposed. It is shown that the estimator achieves the minimax optimal rate of convergence in the general semisupervised framework. The optimality result characterizes how the unlabelled data contribute to the estimation accuracy. Moreover, the limiting distribution for the proposed estimator is established and the unlabelled data have also proved useful in reducing the length of the confidence interval for the explained variance. The method proposed is extended to semisupervised inference for the unweighted quadratic functional ‖β‖22. The inference results obtained are then applied to a range of high dimensional statistical problems, including signal detection and global testing, prediction accuracy evaluation and confidence ball construction. The numerical improvement of incorporating the unlabelled data is demonstrated through simulation studies and an analysis of estimating heritability for a yeast segregant data set with multiple traits. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
13. Clustering
- Author
-
Bramer, Max, Mackie, Ian, Series editor, Abramsky, Samson, Advisory board, Breitman, Karin, Advisory board, Hankin, Chris, Advisory board, Kozen, Dexter C., Advisory board, Pitts, Andrew, Advisory board, Riis Nielson, Hanne, Advisory board, Skiena, Steven S, Advisory board, Stewart, Iain, Advisory board, and Bramer, Max
- Published
- 2016
- Full Text
- View/download PDF
14. Introduction to Data Mining
- Author
-
Bramer, Max, Mackie, Ian, Series editor, Abramsky, Samson, Advisory board, Breitman, Karin, Advisory board, Hankin, Chris, Advisory board, Kozen, Dexter C., Advisory board, Pitts, Andrew, Advisory board, Riis Nielson, Hanne, Advisory board, Skiena, Steven S, Advisory board, Stewart, Iain, Advisory board, and Bramer, Max
- Published
- 2016
- Full Text
- View/download PDF
15. Driven Learning for Driving: How Introspection Improves Semantic Mapping
- Author
-
Triebel, Rudolph, Grimmett, Hugo, Paul, Rohan, Posner, Ingmar, Siciliano, Bruno, Series editor, Khatib, Oussama, Series editor, Inaba, Masayuki, editor, and Corke, Peter, editor
- Published
- 2016
- Full Text
- View/download PDF
16. Selecting right questions with Restricted Boltzmann Machines
- Author
-
Zięba, Maciej, Tomczak, Jakub M., Brzostowski, Krzysztof, Kacprzyk, Janusz, Series editor, Selvaraj, Henry, editor, Zydek, Dawid, editor, and Chmaj, Grzegorz, editor
- Published
- 2015
- Full Text
- View/download PDF
17. Ising Bandits with Side Information
- Author
-
Ghosh, Shaona, Prügel-Bennett, Adam, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Appice, Annalisa, editor, Rodrigues, Pedro Pereira, editor, Santos Costa, Vítor, editor, Soares, Carlos, editor, Gama, João, editor, and Jorge, Alípio, editor
- Published
- 2015
- Full Text
- View/download PDF
18. Inferring Aspect-Specific Opinion Structure in Product Reviews Using Co-training
- Author
-
Carter, Dave, Inkpen, Diana, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Gelbukh, Alexander, editor
- Published
- 2015
- Full Text
- View/download PDF
19. Semi-supervised Learning Using an Unsupervised Atlas
- Author
-
Pitelis, Nikolaos, Russell, Chris, Agapito, Lourdes, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Calders, Toon, editor, Esposito, Floriana, editor, Hüllermeier, Eyke, editor, and Meo, Rosa, editor
- Published
- 2014
- Full Text
- View/download PDF
20. Text Quantification
- Author
-
Sebastiani, Fabrizio, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, de Rijke, Maarten, editor, Kenter, Tom, editor, de Vries, Arjen P., editor, Zhai, ChengXiang, editor, de Jong, Franciska, editor, Radinsky, Kira, editor, and Hofmann, Katja, editor
- Published
- 2014
- Full Text
- View/download PDF
21. The T Index: Measuring the Reliability of Accuracy Estimates Obtained from Non-Probability Samples
- Author
-
François Waldner
- Subjects
accuracy assessment ,validation ,classification ,spatial balance ,unlabelled data ,sample selection bias ,Science - Abstract
In remote sensing, the term accuracy typically expresses the degree of correctness of a map. Best practices in accuracy assessment have been widely researched and include guidelines on how to select validation data using probability sampling designs. In practice, however, probability samples may be lacking and, instead, cross-validation using non-probability samples is common. This practice is risky because the resulting accuracy estimates can easily be mistaken for map accuracy. The following question arises: to what extent are accuracy estimates obtained from non-probability samples representative of map accuracy? This letter introduces the T index to answer this question. Certain cross-validation designs (such as the common single-split or hold-out validation) provide representative accuracy estimates when hold-out sets are simple random samples of the map population. The T index essentially measures the probability of a hold-out set of unknown sampling design to be a simple random sample. To that aim, we compare its spread in the feature space against the spread of random unlabelled samples of the same size. Data spread is measured by a variant of Moran’s I autocorrelation index. Consistent interpretation of the T index is proposed through the prism of significance testing, with T values < 0.05 indicating unreliable accuracy estimates. Its relevance and interpretation guidelines are also illustrated in a case study on crop-type mapping. Uptake of the T index by the remote-sensing community will help inform about—and sometimes caution against—the representativeness of accuracy estimates obtained by cross-validation, so that users can better decide whether a map is fit for their purpose or how its accuracy impacts their application. Subsequently, the T index will build trust and improve the transparency of accuracy assessment in conditions which deviate from best practices.
- Published
- 2020
- Full Text
- View/download PDF
22. Applications in Intelligent Sound Analysis
- Author
-
Schuller, Björn and Schuller, Björn W.
- Published
- 2013
- Full Text
- View/download PDF
23. Fast and Adaptive Deep Fusion Learning for Detecting Visual Objects
- Author
-
Doulamis, Nikolaos, Doulamis, Anastasios, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Fusiello, Andrea, editor, Murino, Vittorio, editor, and Cucchiara, Rita, editor
- Published
- 2012
- Full Text
- View/download PDF
24. A Unifying Theory of Active Discovery and Learning
- Author
-
Hospedales, Timothy M., Gong, Shaogang, Xiang, Tao, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Fitzgibbon, Andrew, editor, Lazebnik, Svetlana, editor, Perona, Pietro, editor, Sato, Yoichi, editor, and Schmid, Cordelia, editor
- Published
- 2012
- Full Text
- View/download PDF
25. Multimodal CSI-Based Human Activity Recognition Using GANs
- Author
-
Wei Cui, Lihua Xie, Dazhuo Wang, Jianfei Yang, and Sumei Sun
- Subjects
Environmental dynamics ,Computer Networks and Communications ,Computer science ,business.industry ,Unlabelled data ,Privacy protection ,Stability (learning theory) ,Machine learning ,computer.software_genre ,Computer Science Applications ,Activity recognition ,Hardware and Architecture ,Channel state information ,Signal Processing ,Artificial intelligence ,business ,computer ,Wearable technology ,Computer Science::Information Theory ,Information Systems ,Generator (mathematics) - Abstract
Channel State Information (CSI) based human activity recognition has received great attention in recent years due to its advantages in privacy protection, insensitivity to illumination, and no requirement for wearable devices. In this paper, we propose a Multimodal Channel State Information Based Activity Recognition (MCBAR) system that leverages existing WiFi infrastructures and monitors human activities from CSI measurements. MCBAR aims to address the performances degradation of WiFi-based human recognition systems due to environmental dynamics. Specifically, we address the issue of non-uniformly distributed unlabelled data with rarely-performed activities by taking advantages of the generative adversarial network (GAN) and semi-supervised learning. We apply a multimodal generator to approximate the CSI data distribution in different environment settings with limited measured CSI data. The generated CSI data using the multimodal generator can provide better diversity for knowledge transfer. This multimodal generator improves the ability of MCBAR to recognize specific activities with various CSI patterns caused by environmental dynamics. Compared to state-of-the-art CSI-based recognition systems, MCBAR is more robust as it is able to handle the non-uniformly distributed CSI data collected from a new environment setting. In addition, diverse generated data from the multimodal generator improves the stability of the system. We have tested MCBAR under multiple experimental settings at different places. The experimental results demonstrate that our algorithm overcomes environmental dynamics and outperforms existing human activity recognition systems.
- Published
- 2021
- Full Text
- View/download PDF
26. On Improving Performance and Increasing Useability of EFS
- Author
-
Lughofer, Edwin, Kacprzyk, Janusz, editor, and Lughofer, Edwin
- Published
- 2011
- Full Text
- View/download PDF
27. Stream-Based Active Unusual Event Detection
- Author
-
Loy, Chen Change, Xiang, Tao, Gong, Shaogang, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Kimmel, Ron, editor, Klette, Reinhard, editor, and Sugimoto, Akihiro, editor
- Published
- 2011
- Full Text
- View/download PDF
28. Active learning combining uncertainty and diversity for multi‐class image classification
- Author
-
Yingjie Gu, Zhong Jin, and Steve C. Chiu
- Subjects
multiclass image classification ,computer vision ,pattern recognition applications ,unlabelled data ,active learning algorithm ,support vector machine ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computer software ,QA76.75-76.765 - Abstract
In computer vision and pattern recognition applications, there are usually a vast number of unlabelled data whereas the labelled data are very limited. Active learning is a kind of method that selects the most representative or informative examples for labelling and training; thus, the best prediction accuracy can be achieved. A novel active learning algorithm is proposed here based on one‐versus‐one strategy support vector machine (SVM) to solve multi‐class image classification. A new uncertainty measure is proposed based on some binary SVM classifiers and some of the most uncertain examples are selected from SVM output. To ensure that the selected examples are diverse from each other, Gaussian kernel is adopted to measure the similarity between any two examples. From the previous selected examples, a batch of diverse and uncertain examples are selected by the dynamic programming method for labelling. The experimental results on two datasets demonstrate the effectiveness of the proposed algorithm.
- Published
- 2015
- Full Text
- View/download PDF
29. Low‐rank representation for semi‐supervised software defect prediction.
- Author
-
Zhang, Zhi-Wu, Jing, Xiao-Yuan, and Wu, Fei
- Abstract
Software defect prediction based on machine learning is an active research topic in the field of software engineering. The historical defect data in software repositories may contain noises because automatic defect collection is based on modified logs and defect reports. When the previous defect labels of modules are limited, predicting the defect‐prone modules becomes a challenging problem. In this study, the authors propose a graph‐based semi‐supervised defect prediction approach to solve the problems of insufficient labelled data and noisy data. Graph‐based semi‐supervised learning methods used the labelled and unlabelled data simultaneously and consider them as the nodes of the graph at the training phase. Therefore, they solve the problem of insufficient labelled samples. To improve the stability of noisy defect data, a powerful clustering method, low‐rank representation (LRR), and neighbourhood distance are used to construct the relationship graph of samples. Therefore, they propose a new semi‐supervised defect prediction approach, named low‐rank representation‐based semi‐supervised software defect prediction (LRRSSDP). The widely used datasets from NASA projects and noisy datasets are employed as test data to evaluate the performance. Experimental results show that (i) LRRSSDP outperforms several representative state‐of‐the‐art semi‐supervised defect prediction methods; and (ii) LRRSSDP can maintain robustness in noisy environments. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
30. Learning with Missing or Incomplete Data
- Author
-
Gabrys, Bogdan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Foggia, Pasquale, editor, Sansone, Carlo, editor, and Vento, Mario, editor
- Published
- 2009
- Full Text
- View/download PDF
31. Semi-supervised Prediction of Protein Interaction Sentences Exploiting Semantically Encoded Metrics
- Author
-
Polajnar, Tamara, Girolami, Mark, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Istrail, Sorin, editor, Pevzner, Pavel, editor, Waterman, Michael S., editor, Kadirkamanathan, Visakan, editor, Sanguinetti, Guido, editor, Girolami, Mark, editor, Niranjan, Mahesan, editor, and Noirel, Josselin, editor
- Published
- 2009
- Full Text
- View/download PDF
32. A Study of Semi-supervised Generative Ensembles
- Author
-
Zanda, Manuela, Brown, Gavin, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Benediktsson, Jón Atli, editor, and Roli, Fabio, editor
- Published
- 2009
- Full Text
- View/download PDF
33. Adaptive Biometric Systems That Can Improve with Use
- Author
-
Roli, Fabio, Didaci, Luca, Marcialis, Gian Luca, Ratha, Nalini K., editor, and Govindaraju, Venu, editor
- Published
- 2008
- Full Text
- View/download PDF
34. Semi-supervised PCA-Based Face Recognition Using Self-training
- Author
-
Roli, Fabio, Marcialis, Gian Luca, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Yeung, Dit-Yan, editor, Kwok, James T., editor, Fred, Ana, editor, Roli, Fabio, editor, and de Ridder, Dick, editor
- Published
- 2006
- Full Text
- View/download PDF
35. Detecting and Verifying Dissimilar Patterns in Unlabelled Data
- Author
-
Wallace, Manolis, Mylonas, Phivos, Kollias, Stefanos, Kacprzyk, Janusz, editor, Hoffmann, Frank, editor, Köppen, Mario, editor, Klawonn, Frank, editor, and Roy, Rajkumar, editor
- Published
- 2005
- Full Text
- View/download PDF
36. An Efficient Method to Estimate Labelled Sample Size for Transductive LDA(QDA/MDA) Based on Bayes Risk
- Author
-
Liu, Han, Yuan, Xiaobin, Tang, Qianying, Kustra, Rafal, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Boulicaut, Jean-François, editor, Esposito, Floriana, editor, Giannotti, Fosca, editor, and Pedreschi, Dino, editor
- Published
- 2004
- Full Text
- View/download PDF
37. Semi-supervised Kernel Regression Using Whitened Function Classes
- Author
-
Franz, Matthias O., Kwon, Younghee, Rasmussen, Carl Edward, Schölkopf, Bernhard, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Rasmussen, Carl Edward, editor, Bülthoff, Heinrich H., editor, Schölkopf, Bernhard, editor, and Giese, Martin A., editor
- Published
- 2004
- Full Text
- View/download PDF
38. Transductive Learning Machine Based on the Affinity-Rule for Semi-supervised Problems and Its Algorithm
- Author
-
Long, Weijiang, Zhang, Wenxiu, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Yin, Fu-Liang, editor, Wang, Jun, editor, and Guo, Chengan, editor
- Published
- 2004
- Full Text
- View/download PDF
39. Using Unlabelled Data to Train a Multilayer Perceptron
- Author
-
Verikas, Antanas, Gelzinis, Adas, Malmqvist, Kerstin, Bacauskiene, Marija, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Singh, Sameer, editor, Murshed, Nabeel, editor, and Kropatsch, Walter, editor
- Published
- 2001
- Full Text
- View/download PDF
40. A Probabilistic Approach to High-Resolution Sleep Analysis
- Author
-
Sykacek, Peter, Roberts, Stephen, Rezek, Iead, Flexer, Arthur, Dorffner, Georg, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Dorffner, Georg, editor, Bischof, Horst, editor, and Hornik, Kurt, editor
- Published
- 2001
- Full Text
- View/download PDF
41. A Dynamic Approach to Reducing Dialog in On-Line Decision Guides
- Author
-
Doyle, Michelle, Cunningham, Pádraig, Goos, G., editor, Hartmanis, J., editor, van Leeuwen, J., editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Blanzieri, Enrico, editor, and Portinale, Luigi, editor
- Published
- 2000
- Full Text
- View/download PDF
42. Knowledge-maximized ensemble algorithm for different types of concept drift.
- Author
-
Ren, Siqi, Liao, Bo, Zhu, Wen, and Li, Keqin
- Subjects
- *
ONLINE data processing , *ALGORITHMS , *DATA mining , *SENSOR networks , *DATA extraction - Abstract
Knowledge extraction from data streams has attracted attention in recent years due to its wide range of applications, including sensor networks, web clickstreams, and user interest analysis. Concept drift is one of the most important research topics in data stream mining. Many algorithms that can adapt to concept drift have been proposed. However, most of them specialize in only one type of concept drift and can rarely be used in the environments with a large number of unavailable sample labels. In this study, we propose a new data stream classifier called knowledge-maximized ensemble (KME). First, supervised and unsupervised knowledge are leveraged to detect concept drift, recognize recurrent concepts, and evaluate the weights of ensemble members. Second, the preserved labelled instances in past blocks can be reused to enhance the recognition ability of the candidate member. The final decision for an incoming observation is derived from all the prediction results of the component classifiers. Accordingly, the maximum utilization of the relevant information in a data stream can be achieved, which is critical to models with limited training data. Third, KME can react to multiple types of concept drift by combining the mechanisms of online and chunk-based ensembles. Finally, we compare KME with eight state-of-the-art classifiers on several synthetic and real-world datasets. The comparison demonstrates the effectiveness of KME in various types of concept drift scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
43. Markov Random Field Modelling of fMRI Data Using a Mean Field EM-algorithm4
- Author
-
Svensén, Markus, Kruggel, Frithjof, von Cramon, D. Yves, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Hancock, Edwin R., editor, and Pelillo, Marcello, editor
- Published
- 1999
- Full Text
- View/download PDF
44. Context Aware Data Fusion on Massive IOT Data in Dynamic IOT Analytics
- Author
-
Dr.N. Sabiyath Fatima and S.S. Saranya
- Subjects
Human-Computer Interaction ,Information Systems and Management ,Computer science ,Unlabelled data ,Analytics ,business.industry ,Context (language use) ,Library and Information Sciences ,business ,Internet of Things ,Sensor fusion ,Data science ,Software - Abstract
Educational Data management is a critical task for the researchers due to mammoth data generated by sensors and IoT (Internet of Things) devices. Managing this huge volume of data, cleaning this data from impurities is an inherent need. DF (Data Fusion) processes combine data from multiple sources based on their similarity for an easy management. DF processes focus on many factors like nature of data and application that uses that data. Many DFAs (Data Fusion approaches) have been proposed without detailing on the context for integrating data in fusion tasks. This work attempts to cover this gap of context’s relevance by proposing a technique CDFT (Context aware Data Fusion technique). In this research work, initially data from IoT devices will be gathered and pre-processed to make it clear for the fusion processing. In this work, boundary based noise reduction algorithm is introduced for data pre-processing which attempts to label the unlabelled attributes in the data’s that are gathered, so that data fusion can be done accurately. After pre-processing Context aware data fusion is performed which will combine the data’s from multiple IoT devices together with the concern of context. Finally this combined data will be learnt using the convolution neural network for data fusion performance checking. The proposed CDFT is simulated on Matlab whose results prove that the proposed technique obtains optimal outcomes.
- Published
- 2020
- Full Text
- View/download PDF
45. Profile generation system using artificial intelligence for information recovery and analysis
- Author
-
Javier Prieto, David García-Retuerta, Fernando De la Prieta, Álvaro Bartolomé, and Pablo Chamoso
- Subjects
General Computer Science ,business.industry ,Process (engineering) ,Unlabelled data ,Computer science ,Information recovery ,020206 networking & telecommunications ,Computational intelligence ,02 engineering and technology ,0202 electrical engineering, electronic engineering, information engineering ,Added value ,020201 artificial intelligence & image processing ,The Internet ,Lack of knowledge ,Artificial intelligence ,business ,Personally identifiable information - Abstract
The advances in data computing and analysis methodologies have contributed to the added value of data. Several years ago it was difficult to imagine that we would ever be able to extract such a large amount of information from the Internet. All this thanks to the ability of current techniques to process large volumes of data in a short period of time. The Internet provides access to a large amount of unstructured or unlabelled data, which are hard to retrieve for any human due to the lack of knowledge of the available sources of information. Moreover, in many cases people are unaware of the online availability of their personal data. This article presents a system for retrieving personal information from the Internet on the basis of several input criteria. The system is capable of differentiating the information of different people with the same name by using artificial intelligence techniques. In the conducted case study, the information has been gathered from sources containing information about people living in Spain, but it could be adapted to the specific sources of information of other countries. The system has been validated in a case study which included several participants and the obtained results have been quite satisfactory.
- Published
- 2020
- Full Text
- View/download PDF
46. Improving supervised wind power forecasting models using extended numerical weather variables and unlabelled data.
- Author
-
Fang, Shengchen and Chiang, Hsiao‐Dong
- Abstract
A variety of supervised forecasting models using numerical weather prediction data have been utilised for short‐term wind power forecasting. These forecasting models only use meteorological variables of the target wind farm as essential features. This study proposes a novel method to improve existing supervised forecasting models such as support vector machines, artificial neural networks, and Gaussian processes (GPs) for higher forecasting accuracy. The proposed method develops a data‐driven feature extraction procedure to utilise unlabelled numerical weather data, and the feature extraction procedure transforms extended numerical weather variables into supplementary input features which then can be used for supervised forecasting models. The only modification to an existing supervised forecasting model is the addition of these supplementary input features, and thus does not alter the training algorithm of the supervised forecasting model. For illustrative purposes, the GP is used as the supervised forecasting model to be improved. Numerical evaluation of the proposed method was performed on a subset of data provided in the 2012 Global Energy Forecasting Competition (GEF 2012). Evaluation results reveal that the proposed method achieves higher forecasting accuracy for all wind farms. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
47. Machine Learning Based Clustering of COVID-19 Symptoms
- Author
-
Brahmaleen Kaur Sidhu and Amreen Ghumman
- Subjects
Coronavirus disease 2019 (COVID-19) ,business.industry ,Computer science ,Unlabelled data ,Disease ,medicine.disease_cause ,Machine learning ,computer.software_genre ,Pandemic ,medicine ,Unsupervised learning ,Artificial intelligence ,business ,Cluster analysis ,Set (psychology) ,computer ,Coronavirus - Abstract
Coronavirus disease (COVID-19) is an infectious respiratory illness caused by Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2). The virus is new and has caused an unprecedented pandemic that has led into unforeseeable circumstances. Due to lack of knowledge, medical fraternity is facing a lot of hurdles in the detection and treatment of this disease which is caused by coronavirus. It is immensely crucial to handle the prevailing situation for the sake of healthier lives in future. In present scenario, data recorded from the occurrences of the disease worldwide can play a vital role in creating a better understanding of its nature. Based on data, artificial intelligence techniques like machine learning provide useful insights and solutions for real world problems that are otherwise incomprehensible for human. Specifically, clustering is an unsupervised machine learning technique that uses unlabelled data to identify different behaviour, groups and categories of observations. This paper presents a machine learning based clustering approach to identify different clusters of symptomatic profiles of patients suffering from covid-19. Dividing the set of medical symptoms experienced by covid-19 patients into distinct clusters will facilitate an understanding of the different kinds of effects this disease has on human beings. It will also enable development of suitable and effective treatments.
- Published
- 2021
- Full Text
- View/download PDF
48. Semi-supervised Domain Adaptation via adversarial training
- Author
-
Anton-David Almasan and Antonin Couturier
- Subjects
Domain adaptation ,Signal processing ,Computer science ,Unlabelled data ,business.industry ,Supervised learning ,Training (meteorology) ,Machine learning ,computer.software_genre ,Convolutional neural network ,Adversarial system ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial intelligence ,business ,computer ,Mixing (physics) - Abstract
Whilst convolutional neural networks (CNN) offer state-of-the-art performance for classification and detection tasks in computer vision, their successful adoption in defence applications is limited by the cost of labelled data and the inability to use crowd sourcing due to classification issues. Popular approaches to solve this problem use the expansive labelled data for training. It would be more cost-efficient to learn representations from the unlabelled data whilst leveraging labelled data from existing datasets, as empirically the performance of supervised learning is far greater than unsupervised-learning. In this paper we investigate the benefits of mixing Domain Adaptation and semi-supervised learning to train CNNs and showcase using adversarial training to tackle this issue.
- Published
- 2021
- Full Text
- View/download PDF
49. A Semi-Supervised Deep Learning Approach for the Classification of Steel Surface Defects
- Author
-
Siyamalan Manivannan and Mathuranathan Mayuravaani
- Subjects
Surface (mathematics) ,Computer science ,business.industry ,Unlabelled data ,Deep learning ,Supervised learning ,Semi-supervised learning ,Machine learning ,computer.software_genre ,Convolutional neural network ,Task (project management) ,ComputingMethodologies_PATTERNRECOGNITION ,Margin (machine learning) ,Artificial intelligence ,business ,computer - Abstract
Automatic surface inspection (ASI) to identify defects in manufactured items plays an important role in ensuring the production quality in industrial manufacturing processes. Various approaches have been proposed for this purpose, and the majority of them use supervised learning. Supervised learning requires labelled data for training. Obtaining a large amount of labelled data is a difficult, and time consuming task. On the other hand, Semi-supervised learning approaches become popular for ASI, as they make use of both labelled and unlabelled data. In this work, we propose a Convolutional Neural Network based semi-supervised learning approach for the recognition of steel surface defects. Our approach predicts the labels of the unlabelled data, and weights them based on their prediction confidence. These weighted samples are then used with their corresponding predicted labels, together with the labelled data for training the network. Our approach mainly differs from the existing approaches in the way the unlabelled samples are weighted when training the network. We propose to weight the samples based on how confidently they are predicted. We propose a margin-based approach to determine the prediction confidence. Experimental results on a public steel surface detection dataset (NEU surface defects) show that the proposed method can achieve a state-of-the-art accuracy of 99.15 ±0.08%, which is competitive compared to the performance achieved by fully supervised deep learning approaches, but ours with only 10% of labelled training data compared to the supervised learning approaches. In addition, comparison with recently proposed semi supervised learning approaches for ASI shows the effectiveness of our approach.
- Published
- 2021
- Full Text
- View/download PDF
50. Machine Learning Aided Diagnosis of Diseases Without Clinical Gold Standard: A New Score for Laryngopharyngeal Reflux Disease Based on pH Monitoring
- Author
-
Lei Wang, Lianyong Li, Yuzhu Guo, Wang Lipeng, Changmin Qu, Zhezhe Sun, Ying Zhou, Baowei Li, Changqing Zhong, Haolun Han, Gang Wang, Hongdan Liu, Xiaoli Zhang, Wei Wu, Xinwei Bao, Simeng Li, and Bingxin Xu
- Subjects
medicine.medical_specialty ,General Computer Science ,02 engineering and technology ,Disease ,LPRD ,Ph monitoring ,03 medical and health sciences ,Laryngopharyngeal reflux ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,Medicine ,General Materials Science ,030223 otorhinolaryngology ,business.industry ,Unlabelled data ,pH monitoring ,General Engineering ,Reflux ,Gold standard (test) ,medicine.disease ,Latent class model ,Computer aided diagnosis ,machine learning ,Ryan score ,020201 artificial intelligence & image processing ,Radiology ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,lcsh:TK1-9971 ,Aided diagnosis - Abstract
Objective: Laryngopharyngeal Reflux Disease (LPRD) is prevalent and has a range of symptoms. However, diagnosing LPRD is difficult because of the lack of specific symptoms or clinical gold standard. An objective and reliable test, which does not rely on a perfect clinical gold standard, is required in clinical practice. Methods: 60 normal volunteers and 74 confirmed Laryngopharyngeal Reflux (LPR) patients were labelled based on the combined consideration of the reflux symptom index, reflux finding score, 24h oropharyngeal pH monitoring and results of anti-reflux treatment. 72 candidate features were extracted from pH recordings and the most efficient feature combination was detected using a stepwise wrapper method. The labelled data were combined with 1552 unlabelled data for feature selection and model training using semi-supervised learning. A latent class model method was used to assess the proposed model based on 64 additional validation data and an imperfect clinical reference test. Results: A new score (named W score), which significantly improved the sensitivity of the LPRD test (82.67% vs. 24.09%) and has a relatively high specificity (80.19%), was proposed. W score concurs with the complicated clinical test. Conclusion: W score which significantly improves LPRD diagnostic efficiency, can aid the clinical diagnosis of LPRD. W score provides an objective, efficient, and reliable indicator for the application of anti-reflux treatment in clinical practice. Significance: More LPRD patients can benefit from anti-reflux treatment in clinical practice and fewer patients may suffer from, for example, the side effects of unnecessary long-term Proton Pump Inhibitors (PPI) treatments.
- Published
- 2020
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.