101 results on '"Weakly supervised"'
Search Results
2. Weakly supervised video anomaly detection based on hyperbolic space.
- Author
-
Qi, Meilin and Wu, Yuanyuan
- Subjects
- *
HYPERBOLIC spaces , *ANOMALY detection (Computer security) , *IMPLICIT learning , *DATA modeling , *VIDEOS - Abstract
In recent years, there has been a proliferation of weakly supervised methods in the field of video anomaly detection. Despite significant progress in existing research, these efforts have primarily focused on addressing this issue within Euclidean space. Conducting weakly supervised video anomaly detection in Euclidean space imposes a fundamental limitation by constraining the ability to model complex patterns due to the dimensionality constraints of the embedding space and lacking the capacity to model long-term contextual information. This inadequacy can lead to misjudgments of anomalous events due to insufficient video representation. However, hyperbolic space has shown significant potential for modeling complex data, offering new insights. In this paper, we rethink weakly supervised video anomaly detection with a novel perspective: transforming video features from Euclidean space into hyperbolic space may enable the network to learn implicit relationships in normal and anomalous videos, thereby enhancing its ability to effectively distinguish between them. Finally, to validate our approach, we conducted extensive experiments on the UCF-Crime and XD-Violence datasets. Experimental results show that our method not only has the lowest number of parameters but also achieves state-of-the-art performance on the XD-Violence dataset using only RGB information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Annotate less but perform better: weakly supervised shadow detection via label augmentation.
- Author
-
Chen, Hongyu, Chen, Xiao-Diao, Wu, Wen, Yang, Wenya, and Mao, Xiaoyang
- Subjects
- *
IMAGE segmentation , *IMAGE reconstruction , *DETECTORS , *ANNOTATIONS , *PIXELS - Abstract
Shadow detection is essential for scene understanding and image restoration. Existing paradigms for producing shadow detection training data usually rely on densely labeling each image pixel, which will lead to a bottleneck when scaling up the number of images. To tackle this problem, by labeling shadow images with only a few strokes, this paper designs a learning framework for Weakly supervised Shadow Detection, namely WSD. Firstly, it creates two shadow detection datasets with scribble annotations, namely Scr-SBU and Scr-ISTD. Secondly, it proposes an uncertainty-guided label augmentation scheme based on graph convolutional networks, which can propagate the sparse scribble annotations to more reliable regions, and then avoid the model converging to an undesired local minima as intra-class discontinuity. Finally, it introduces a multi-task learning framework to jointly learn for shadow detection and edge detection, which encourages generated shadow maps to be comprehensive and well aligned with shadow boundaries. Experimental results on benchmark datasets demonstrate that our framework even outperforms existing semi-supervised and fully supervised shadow detectors requiring only 2% pixels to be labeled. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Semantic segmentation of point clouds of ancient buildings based on weak supervision.
- Author
-
Zhao, Jianghong, Yu, Haiquan, Hua, Xinnan, Wang, Xin, Yang, Jia, Zhao, Jifu, and Xu, Ailin
- Subjects
- *
POINT cloud , *ANCIENT architecture , *HISTORIC buildings , *BUILDING information modeling , *PROFESSIONALISM - Abstract
Semantic segmentation of point clouds of ancient buildings plays an important role in Historical Building Information Modelling (HBIM). As the annotation task of point cloud of ancient architecture is characterised by strong professionalism and large workload, which greatly restricts the application of point cloud semantic segmentation technology in the field of ancient architecture, therefore, this paper launches a research on the semantic segmentation method of point cloud of ancient architecture based on weak supervision. Aiming at the problem of small differences between classes of ancient architectural components, this paper introduces a self-attention mechanism, which can effectively distinguish similar components in the neighbourhood. Moreover, this paper explores the insufficiency of positional encoding in baseline and constructs a high-precision point cloud semantic segmentation network model for ancient buildings—Semantic Query Network based on Dual Local Attention (SQN-DLA). Using only 0.1% of the annotations in our homemade dataset and the Architectural Cultural Heritage (ArCH) dataset, the mean Intersection over Union (mIoU) reaches 66.02% and 58.03%, respectively, which is an improvement of 3.51% and 3.91%, respectively, compared to the baseline. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. 无监督和弱监督视频异常检测方法回顾与前瞻.
- Author
-
张琳, 陈兆波, 马晓轩, and 张凡博
- Abstract
With the continuous development of monitoring technology, surveillance cameras have been widely deployed in various scenarios. Manual detection of video abnormality has become impossible. Therefore, video anomaly detection technology, as the core of intelligent surveillance systems, is receiving extensive attention and research. With the development of deep learning, the field of video anomaly detection has made significant achievements and has emerged many new anomaly detection methods. Unsupervised and weakly supervised video anomaly detection learning methods applied to various data types were sorted out, the contributions of existing methods were analyzed, and the performance of different models was compared. In addition, some commonly used and newly released datasets have also been compiled, and the challenges and development trends that future work will face have been summarized. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. A noise-robust water segmentation method based on synthetic aperture radar images combined with automatic sample collection.
- Author
-
Hou, Zhuoyan, Meng, Mengmeng, Zhou, Guichao, Zhang, Xuedong, Cao, Mingjun, Qian, Junhao, Li, Ning, Huang, Yabo, Wu, Lin, and Xie, Linglin
- Subjects
- *
SYNTHETIC aperture radar , *BODIES of water , *SYNTHETIC apertures , *SPECKLE interference , *AUTOMATIC identification , *WATER sampling , *WATER use - Abstract
Synthetic Aperture Radar (SAR) images have been widely used for surface water identification due to their all-weather capabilities. However, the presence of inherent speckle noise in SAR data poses a challenge for accurate water identification. Additionally, annotating high-quality water body samples requires significant human labour, which can be costly and time-consuming. Aiming at the above problems, a noise-robust automatic water identification architecture without artificial labels is proposed. First, a two-stage automatic sample collection method that utilizes k-means++ clustering and morphological concepts is designed. Then, a weakly supervised noise-resistant SAR water body segmentation method NRM-ACUNet has been developed based on U-Net combined with LNR-Dice loss function and Conditionally Parameterized Convolutions (CondConv) to minimize the impact of sample noises. Experimental results show that the morphological processing can improve water body sample quality compared to k-means++, and compared with U-Net, NRM-ACUNet performs superior with noise-containing pseudo-samples, achieving 96.8% F1 accuracy and 52.06% accuracy improvement. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation.
- Author
-
Zhai, Wei, Wu, Pingyu, Zhu, Kai, Cao, Yang, Wu, Feng, and Zha, Zheng-Jun
- Subjects
- *
CROSS-entropy method , *LOCALIZATION (Mathematics) - Abstract
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While existing FPM-based methods use cross-entropy to evaluate the foreground prediction map and to guide the learning of the generator, this paper presents two astonishing experimental observations on the object localization learning process: For a trained network, as the foreground mask expands, (1) the cross-entropy converges to zero when the foreground mask covers only part of the object region. (2) The activation value continuously increases until the foreground mask expands to the object boundary. Therefore, to achieve a more effective localization performance, we argue for the usage of activation value to learn more object regions. In this paper, we propose a background activation suppression (BAS) method. Specifically, an activation map constraint module is designed to facilitate the learning of generator by suppressing the background activation value. Meanwhile, by using foreground region guidance and area constraint, BAS can learn the whole region of the object. In the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets. In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. Code and models are available at https://github.com/wpy1999/BAS-Extension. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Advances and Challenges in Deep Learning-Based Change Detection for Remote Sensing Images: A Review through Various Learning Paradigms.
- Author
-
Wang, Lukang, Zhang, Min, Gao, Xu, and Shi, Wenzhong
- Subjects
- *
REMOTE sensing , *PATTERN recognition systems , *SURFACE of the earth , *OPTICAL remote sensing , *DEEP learning , *EMERGENCY management - Abstract
Change detection (CD) in remote sensing (RS) imagery is a pivotal method for detecting changes in the Earth's surface, finding wide applications in urban planning, disaster management, and national security. Recently, deep learning (DL) has experienced explosive growth and, with its superior capabilities in feature learning and pattern recognition, it has introduced innovative approaches to CD. This review explores the latest techniques, applications, and challenges in DL-based CD, examining them through the lens of various learning paradigms, including fully supervised, semi-supervised, weakly supervised, and unsupervised. Initially, the review introduces the basic network architectures for CD methods using DL. Then, it provides a comprehensive analysis of CD methods under different learning paradigms, summarizing commonly used frameworks. Additionally, an overview of publicly available datasets for CD is offered. Finally, the review addresses the opportunities and challenges in the field, including: (a) incomplete supervised CD, encompassing semi-supervised and weakly supervised methods, which is still in its infancy and requires further in-depth investigation; (b) the potential of self-supervised learning, offering significant opportunities for Few-shot and One-shot Learning of CD; (c) the development of Foundation Models, with their multi-task adaptability, providing new perspectives and tools for CD; and (d) the expansion of data sources, presenting both opportunities and challenges for multimodal CD. These areas suggest promising directions for future research in CD. In conclusion, this review aims to assist researchers in gaining a comprehensive understanding of the CD field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Weakly supervised salient object detection via bounding-box annotation and SAM model.
- Author
-
Liu, Xiangquan and Huang, Xiaoming
- Subjects
- *
DEEP learning , *CONVOLUTIONAL neural networks , *DIGITAL technology , *ARTIFICIAL intelligence , *EQUATIONS - Abstract
Salient object detection (SOD) aims to detect the most attractive region in an image. Fully supervised SOD based on deep learning usually needs a large amount of data with human annotation. Researchers have gradually focused on the SOD task using weakly supervised annotation such as category, scribble, and bounding-box, while these existing weakly supervised methods achieve limited performance and demonstrate a huge performance gap with fully supervised methods. In this work, we proposed one novel two-stage weakly supervised method based on bounding-box annotation and the recent large visual model Segment Anything (SAM). In the first stage, we regarded the bounding-box annotation as the box prompt of SAM to generate initial labels and proposed object completeness check and object inversion check to exclude low quality labels, then we selected reliable pseudo labels for the training initial SOD model. In the second stage, we used the initial SOD model to predict the saliency map of excluded images and adopted SAM with the everything mode to generate segmentation candidates, then we fused the saliency map and segmentation candidates to predict pseudo labels. Finally we used all reliable pseudo labels generated in the two stages to train one refined SOD model. We also designed a simple but effective SOD model, which can capture rich global context information. Performance evaluation on four public datasets showed that the proposed method significantly outperforms other weakly supervised methods and also achieves comparable performance with fully supervised methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Knowledge evolution learning: A cost-free weakly supervised semantic segmentation framework for high-resolution land cover classification.
- Author
-
Cui, Hao, Zhang, Guo, Chen, Yujia, Li, Xue, Hou, Shasha, Li, Haifeng, Ma, Xiaolong, Guan, Na, and Tang, Xuemin
- Subjects
- *
LAND cover , *CONVOLUTIONAL neural networks , *DEEP learning - Abstract
Despite the success of deep learning in land cover classification, high-resolution (HR) land cover mapping remains challenging due to the time-consuming and labor-intensive process of collecting training samples. Many global land cover products (LCP) can reflect the low-level commonality (LLC) knowledge of land covers, such as basic shape and underlying semantic information. Therefore, we expect to use LCP as weakly supervised information to guide the semantic segmentation of HR images. We regard high-level specialty (HLS) knowledge as HR information unavailable in the LCP. We believe LLC knowledge can gradually evolve into HLS knowledge through self-active learning. Hence, we design a knowledge evolution learning strategy from LLC to HLS knowledge and correspondingly devise a knowledge evolution weakly supervised learning framework (KE-WESUP) based on LCP. KE-WESUP mainly includes three tasks: (1) Abstraction of LLC knowledge. KE-WESUP first adopts a training method based on superpixel to alleviate the inconsistency between LCP and HR images and directly learns the LLC knowledge from LCP according to the feature-fitting capacity of convolutional neural networks. (2) Automatic exploration of HLS knowledge. We propose a dynamic label optimization strategy to obtain a small number of point labels with high confidence and encourage the model to automatically mine HLS knowledge through the knowledge exploration mechanism, which prompts the model to adapt to complexHR scenes. (3) Dynamic interaction of LLC and HLS knowledge. We adopt the consistency regularization method to achieve further optimization and verification of LLC and HLS knowledge. To verify the effectiveness of KE-WESUP, we conduct experiments on USDA National Agriculture Imagery Program (1 m) and GaoFen-2 (1 m) data using WolrdCover (10 m) as labels. The results show that KE-WESUP achieves outstanding results in both experiments, which has significant advantages over existing weakly supervised methods. Therefore, the proposed method has great potential in utilizing the prior information of LCP and is expected to become a new paradigm for large-scale HR land cover classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. One label is all you need: Interpretable AI-enhanced histopathology for oncology.
- Author
-
Tavolara, Thomas E., Su, Ziyu, Gurcan, Metin N., and Niazi, M. Khalid Khan
- Subjects
- *
HISTOPATHOLOGY , *GENETIC markers , *CLINICAL medicine , *ONCOLOGY , *ARTIFICIAL intelligence , *HEMATOXYLIN & eosin staining - Abstract
Artificial Intelligence (AI)-enhanced histopathology presents unprecedented opportunities to benefit oncology through interpretable methods that require only one overall label per hematoxylin and eosin (H&E) slide with no tissue-level annotations. We present a structured review of these methods organized by their degree of verifiability and by commonly recurring application areas in oncological characterization. First, we discuss morphological markers (tumor presence/absence, metastases, subtypes, grades) in which AI-identified regions of interest (ROIs) within whole slide images (WSIs) verifiably overlap with pathologist-identified ROIs. Second, we discuss molecular markers (gene expression, molecular subtyping) that are not verified via H&E but rather based on overlap with positive regions on adjacent tissue. Third, we discuss genetic markers (mutations, mutational burden, microsatellite instability, chromosomal instability) that current technologies cannot verify if AI methods spatially resolve specific genetic alterations. Fourth, we discuss the direct prediction of survival to which AI-identified histopathological features quantitatively correlate but are nonetheless not mechanistically verifiable. Finally, we discuss in detail several opportunities and challenges for these one-label-per-slide methods within oncology. Opportunities include reducing the cost of research and clinical care, reducing the workload of clinicians, personalized medicine, and unlocking the full potential of histopathology through new imaging-based biomarkers. Current challenges include explainability and interpretability, validation via adjacent tissue sections, reproducibility, data availability, computational needs, data requirements, domain adaptability, external validation, dataset imbalances, and finally commercialization and clinical potential. Ultimately, the relative ease and minimum upfront cost with which relevant data can be collected in addition to the plethora of available AI methods for outcome-driven analysis will surmount these current limitations and achieve the innumerable opportunities associated with AI-driven histopathology for the benefit of oncology. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Weakly supervised graph learning for action recognition in untrimmed video.
- Author
-
Yao, Xiao, Zhang, Jia, Chen, Ruixuan, Zhang, Dan, and Zeng, Yifeng
- Subjects
- *
SUPERVISED learning , *RECOGNITION (Psychology) , *VIDEOS - Abstract
Action recognition in real-world scenarios is a challenging task which involves the action localization and classification for untrimmed video. Since the untrimmed video in real scenarios lacks fine annotation, existing supervised learning methods have limited effectiveness and robustness in performance. Moreover, state-of-the-art methods discuss each action proposal individually, ignoring the exploration of semantic relationship between different proposals from continuity of video. To address these issues, we propose a weakly supervised approach to explore the proposal relations using Graph Convolutional Networks (GCNs). Specifically, the method introduces action similarity edges and temporal similarity edges to represent the context semantic relationship between different proposals for graph constructing, and the similarity of action features is used to weakly supervise the spatial semantic relationship between labeled and unlabeled samples to achieve the effective recognition of actions in the video. We validate the effectiveness of the proposed method on public benchmarks for untrimmed video (THUMOS14 and ActivityNet). The experimental results demonstrate that the proposed method in this paper has achieved state-of-the-art results, and achieves better robustness and generalization performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. FBN: Weakly Supervised Thyroid Nodule Segmentation Optimized by Online Foreground and Background.
- Author
-
Yu, Ruiguo, Yan, Shaoqi, Gao, Jie, Zhao, Mankun, Fu, Xuzhou, Yan, Yang, Li, Ming, and Li, Xuewei
- Subjects
- *
THYROID nodules , *ULTRASONIC imaging - Abstract
The main objective of the work described here was to train a semantic segmentation model using classification data for thyroid nodule ultrasound images to reduce the pressure of obtaining pixel-level labeled data sets. Furthermore, we improved the segmentation performance of the model by mining the image information to narrow the gap between weakly supervised semantic segmentation (WSSS) and fully supervised semantic segmentation. Most WSSS methods use a class activation map (CAM) to generate segmentation results. However, the lack of supervision information makes it difficult for a CAM to highlight the object region completely. Therefore, we here propose a novel foreground and background pair (FB-Pair) representation method, which consists of high- and low-response regions highlighted by the original CAM-generated online in the original image. During training, the original CAM is revised using the CAM generated by the FB-Pair. In addition, we design a self-supervised learning pretext task based on FB-Pair, which requires the model to predict whether the pixels in FB-Pair are from the original image during training. After this task, the model will accurately distinguish between different categories of objects. Experiments on the thyroid nodule ultrasound image (TUI) data set revealed that our proposed method outperformed existing methods, with a 5.7% improvement in the mean intersection-over-union (mIoU) performance of segmentation compared with the second-best method and a reduction to 2.9% in the difference between the performance of benign and malignant nodules. Our method trains a well-performing segmentation model on ultrasound images of thyroid nodules using only classification data. In addition, we determined that CAM can take full advantage of the information in the images to highlight the target regions more accurately and thus improve the segmentation performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Credible Dual-Expert Learning for Weakly Supervised Semantic Segmentation.
- Author
-
Zhang, Bingfeng, Xiao, Jimin, Wei, Yunchao, and Zhao, Yao
- Subjects
- *
PREDICTION models , *NOISE , *PIXELS - Abstract
Great progress has been witnessed for weakly supervised semantic segmentation, which aims to segment objects without dense pixel annotations. Most approaches concentrate on generating high quality pseudo labels, which are then fed into a standard segmentation model as supervision. However, such a solution has one major limitation: noise of pseudo labels is inevitable, which is unsolvable for the standard segmentation model. In this paper, we propose a credible dual-expert learning (CDL) framework to mitigate the noise of pseudo labels. Specifically, we first observe that the model predictions with different optimization loss functions will have different credible regions; thus, it is possible to make self-corrections with multiple predictions. Based on this observation, we design a dual-expert structure to mine credible predictions, which are then processed by our noise correction module to update pseudo labels in an online way. Meanwhile, to handle the case that the dual-expert produces incredible predictions for the same region, we design a relationship transfer module to provide feature relationships, enabling our noise correction module to transfer predictions from the credible regions to such incredible regions. Considering the above designs, we propose a base CDL network and an extended CDL network to satisfy different requirements. Extensive experiments show that directly replacing our model with a conventional fully supervised segmentation model, the performances of various weakly supervised semantic segmentation pipelines were boosted, achieving new state-of-the-art performances on both PASCAL VOC 2012 and MS COCO with a clear margin. Code will be available at: https://github.com/zbf1991/CDL. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Weak Localization of Radiographic Manifestations in Pulmonary Tuberculosis from Chest X-ray: A Systematic Review.
- Author
-
Feyisa, Degaga Wolde, Ayano, Yehualashet Megersa, Debelee, Taye Girma, and Schwenker, Friedhelm
- Subjects
- *
TUBERCULOSIS , *PULMONARY manifestations of general diseases , *X-rays , *BACTERIAL diseases , *PLEURAL effusions , *BRAIN function localization - Abstract
Pulmonary tuberculosis (PTB) is a bacterial infection that affects the lung. PTB remains one of the infectious diseases with the highest global mortalities. Chest radiography is a technique that is often employed in the diagnosis of PTB. Radiologists identify the severity and stage of PTB by inspecting radiographic features in the patient's chest X-ray (CXR). The most common radiographic features seen on CXRs include cavitation, consolidation, masses, pleural effusion, calcification, and nodules. Identifying these CXR features will help physicians in diagnosing a patient. However, identifying these radiographic features for intricate disorders is challenging, and the accuracy depends on the radiologist's experience and level of expertise. So, researchers have proposed deep learning (DL) techniques to detect and mark areas of tuberculosis infection in CXRs. DL models have been proposed in the literature because of their inherent capacity to detect diseases and segment the manifestation regions from medical images. However, fully supervised semantic segmentation requires several pixel-by-pixel labeled images. The annotation of such a large amount of data by trained physicians has some challenges. First, the annotation requires a significant amount of time. Second, the cost of hiring trained physicians is expensive. In addition, the subjectivity of medical data poses a difficulty in having standardized annotation. As a result, there is increasing interest in weak localization techniques. Therefore, in this review, we identify methods employed in the weakly supervised segmentation and localization of radiographic manifestations of pulmonary tuberculosis from chest X-rays. First, we identify the most commonly used public chest X-ray datasets for tuberculosis identification. Following that, we discuss the approaches for weakly localizing tuberculosis radiographic manifestations in chest X-rays. The weakly supervised localization of PTB can highlight the region of the chest X-ray image that contributed the most to the DL model's classification output and help pinpoint the diseased area. Finally, we discuss the limitations and challenges of weakly supervised techniques in localizing TB manifestations regions in chest X-ray images. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. BLPSeg: Balance the Label Preference in Scribble-Supervised Semantic Segmentation.
- Author
-
Wang, Yude, Zhang, Jie, Kan, Meina, Shan, Shiguang, and Chen, Xilin
- Subjects
- *
TASK analysis , *ANNOTATIONS , *PIXELS , *SEMANTICS , *PROBABILITY theory - Abstract
Scribble-supervised semantic segmentation is an appealing weakly supervised technique with low labeling cost. Existing approaches mainly consider diffusing the labeled region of scribble by low-level feature similarity to narrow the supervision gap between scribble labels and mask labels. In this study, we observe an annotation bias between scribble and object mask, i.e., label workers tend to scribble on the spacious region instead of corners. This label preference makes the model learn well on those frequently labeled regions but poor on rarely labeled pixels. Therefore, we propose BLPSeg to balance the label preference for complete segmentation. Specifically, the BLPSeg first predicts an annotation probability map to evaluate the rarity of labels on each image, then utilizes a novel BLP loss to balance the model training by up-weighting those rare annotations. Additionally, to further alleviate the impact of label preference, we design a local aggregation module (LAM) to propagate supervision from labeled to unlabeled regions in gradient backpropagation. We conduct extensive experiments to illustrate the effectiveness of our BLPSeg. Our single-stage method even outperforms other advanced multi-stage methods and achieves state-of-the-art performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Near Real-Time Flood Mapping with Weakly Supervised Machine Learning.
- Author
-
Vongkusolkit, Jirapa, Peng, Bo, Wu, Meiliu, Huang, Qunying, and Andresen, Christian G.
- Subjects
- *
DEEP learning , *MACHINE learning , *SUPERVISED learning , *HURRICANE Florence, 2018 , *COMPUTER vision , *CONVOLUTIONAL neural networks , *FLOODS , *REMOTE sensing - Abstract
Advances in deep learning and computer vision are making significant contributions to flood mapping, particularly when integrated with remotely sensed data. Although existing supervised methods, especially deep convolutional neural networks, have proved to be effective, they require intensive manual labeling of flooded pixels to train a multi-layer deep neural network that learns abstract semantic features of the input data. This research introduces a novel weakly supervised approach for pixel-wise flood mapping by leveraging multi-temporal remote sensing imagery and image processing techniques (e.g., Normalized Difference Water Index and edge detection) to create weakly labeled data. Using these weakly labeled data, a bi-temporal U-Net model is then proposed and trained for flood detection without the need for time-consuming and labor-intensive human annotations. Using floods from Hurricanes Florence and Harvey as case studies, we evaluated the performance of the proposed bi-temporal U-Net model and baseline models, such as decision tree, random forest, gradient boost, and adaptive boosting classifiers. To assess the effectiveness of our approach, we conducted a comprehensive assessment that (1) covered multiple test sites with varying degrees of urbanization, and (2) utilized both bi-temporal (i.e., pre- and post-flood) and uni-temporal (i.e., only post-flood) input. The experimental results showed that the proposed framework of weakly labeled data generation and the bi-temporal U-Net could produce near real-time urban flood maps with consistently high precision, recall, f1 score, IoU score, and overall accuracy compared with baseline machine learning algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. 基于结构重参数化的太阳斑点图像 弱监督去模糊方法.
- Author
-
邓林浩, 蒋慕蓉, 杨 磊, 谌俊毅, and 金亚辉
- Subjects
- *
MACHINE learning , *SPECKLE interferometry , *FEATURE extraction , *DEEP learning , *SUPERVISED learning , *OBSERVATORIES , *SPECKLE interference - Abstract
With the supervised deep learning algorithms, it is prone to generate artifacts when restoring the blurred solar speckle images taken by Yunnan Observatories, and it has a long training time and over-reliance on reference images, this paper proposed a weakly supervised method based on structural reparameterization combined with multi-branch module to reconstruct solar speckle images. First, deblurring model combined single-scale and multi-scale network to design, with constructing multi-branch modules to extract features of different scales, enhance detailed information, and reduce the generation of artifacts; second, each branch structure re-parameterized to make the reuse of structure parameters runs through the entire feature extraction process; after that, the deblurring model embedded in the weakly supervised training, the blurred image assorted firstly, then the degradation model used to learn different levels of degradation. Constituted paired dataset of corresponding levels, and the deblurring model used to inversely degenerate the dataset to reconstruct solar speckle images. Experimental results show that compared with the existing deblurring method, the proposed method has higher model training efficiency and less dependence on reference images, which can meet the high-resolution reconstruction requirements of solar speckle images. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. Segment-level event perception with semantic dictionary for weakly supervised audio-visual video parsing.
- Author
-
Xie, Zhuyang, Yang, Yan, Yu, Yankai, Wang, Jie, Liu, Yan, and Jiang, Yongquan
- Subjects
- *
ENCYCLOPEDIAS & dictionaries , *TIMESTAMPS , *VIDEOS , *ANNOTATIONS , *FORECASTING - Abstract
Videos capture auditory and visual signals, each conveying distinct events. Simultaneously analyzing these multimodal signals enhances human comprehension of the video content. We focus on the audio-visual video parsing task, in which we integrate auditory and visual cues to identify events in each modality and pinpoint their temporal boundaries. Since fine-grained segment-level annotation is labor-intensive and time-consuming, only video-level labels are provided during the training phase. Labels and timestamps for each modality are unknown. A prevalent strategy is to aggregate audio and visual features through cross-modal attention and further denoise video labels to parse events within video segments in a weakly supervised manner. However, these denoised labels have limitations: they are restricted to the video level, and segment-level annotations remain unknown. In this paper, we propose a semantic dictionary description method for audio-visual video parsing, termed SDDP (S emantic D ictionary D escription for video P arsing), which uses a semantic dictionary to explicitly represent the content of video segments. In particular, we query the relevance of each segment with semantic words from the dictionary and determine the pertinent semantic words to redescribe each segment. These redescribed segments encode event-related information, facilitating cross-modal video parsing. Furthermore, a pseudo label generation strategy is introduced to convert the relevance of semantic dictionary queries into segment-level pseudo labels, which provide segment-level event information to supervise event prediction. Extensive experiments demonstrate the effectiveness of the proposed method, achieving superior performance compared with state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
20. Epipolar constraint-guided differentiable keypoint detection and description.
- Author
-
Li, Xi, Feng, Yulong, Yu, Xianguo, Cong, Yirui, and Chen, Lili
- Abstract
Sparse local feature matching techniques have made significant strides in a variety of visual geometry tasks. Among them, the weakly supervised methods have drawn particular attention recently and outperform the fully supervised counterparts via decoupled describe-then-detect training. However, they often rely on policy gradients for detector training, overlooking keypoint reliability. Meanwhile, many of the sparse local feature matching methods put more emphasis on accuracy over speed, making them unfriendly for real-time applications. To address these issues, we introduce the differentiable keypoint extraction and the dispersity peak loss to generate clean score maps and enhance the reliability of the keypoints. The proposed model is trained under the weakly supervised fashion by leveraging the epipolar constraint between images. Additionally, we propose an efficient model that achieves a good balance between accuracy and speed. Experiments on various public benchmarks show our method achieving higher performance than existing ones. Code will be available at https://github.com/FYL0123/WSDK. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
21. Mask-free Iterative Refinement Network for weakly-supervised Few-shot Semantic Segmentation.
- Author
-
Chen, Shanjuan, Yu, Yunlong, Li, Yingming, Lu, Ziqian, and Zhou, Yulin
- Subjects
- *
IMAGE segmentation , *ANNOTATIONS , *SUPERVISION - Abstract
The dependence on densely labeled samples in Few-shot Semantic Segmentation (FSS) poses challenges in terms of sample annotation. This paper introduces a more challenging few-shot semantic segmentation task that only utilizes intra-class images as weak supervision. To address this task, we propose a Mask-free Iterative Refinement Network consisting of a mask generation module (MGM) and an iterative refinement module (IRM), respectively addressing the inherent two challenges of locating segmented objects and deriving class-specific features in the absence of support mask and semantic labels. MGM generates pseudo-masks for the support image without requiring any training, which provides a rough estimation of the object locations, serving as an initial guidance for segmentation. IRM is designed to capture the co-occurrence class-specific information between the support and query images for segmentation and progressively refine the predicted mask in a bootstrap manner with the guidance of the class-specific information. Experimental results on three FSS benchmarks, i.e., FSS-1000, PASCAL-5i, and COCO-20i demonstrate that our proposed method achieves comparable or decent performance compared to existing zero-shot and supervised segmentation methods. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
22. Scarcity of Labels in Non-Stationary Data Streams: A Survey.
- Author
-
FAHY, CONOR, SHENGXIANG YANG, and GONGORA, MARIO
- Subjects
- *
SCARCITY , *CONCEPT learning , *ACTIVE learning - Abstract
In a dynamic stream there is an assumption that the underlying process generating the stream is nonstationary and that concepts within the stream will drift and change as the stream progresses. Concepts learned by a classification model are prone to change and non-adaptive models are likely to deteriorate and become ineffective over time. The challenge of recognising and reacting to change in a stream is compounded by the scarcity of labels problem. This refers to the very realistic situation in which the true class label of an incoming point is not immediately available (or might never be available) or in situations where manually annotating data points are prohibitively expensive. In a high-velocity stream, it is perhaps impossible to manually label every incoming point and pursue a fully supervised approach. In this article, we formally describe the types of change, which can occur in a data-stream and then catalogue the methods for dealing with change when there is limited access to labels. We present an overview of the most influential ideas in the field along with recent advancements and we highlight trends, research gaps, and future research directions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. Annotation-Free Deep Learning-Based Prediction of Thyroid Molecular Cancer Biomarker BRAF (V600E) from Cytological Slides.
- Author
-
Wang, Ching-Wei, Muzakky, Hikam, Lee, Yu-Ching, Lin, Yi-Jia, and Chao, Tai-Kuang
- Subjects
- *
DEEP learning , *THYROID cancer , *BRAF genes , *NEEDLE biopsy , *BIOMARKERS , *CYTOLOGY - Abstract
Thyroid cancer is the most common endocrine cancer. Papillary thyroid cancer (PTC) is the most prevalent form of malignancy among all thyroid cancers arising from follicular cells. Fine needle aspiration cytology (FNAC) is a non-invasive method regarded as the most cost-effective and accurate diagnostic method of choice in diagnosing PTC. Identification of BRAF (V600E) mutation in thyroid neoplasia may be beneficial because it is specific for malignancy, implies a worse prognosis, and is the target for selective BRAF inhibitors. To the authors' best knowledge, this is the first automated precision oncology framework effectively predict BRAF (V600E) immunostaining result in thyroidectomy specimen directly from Papanicolaou-stained thyroid fine-needle aspiration cytology and ThinPrep cytological slides, which is helpful for novel targeted therapies and prognosis prediction. The proposed deep learning (DL) framework is evaluated on a dataset of 118 whole slide images. The results show that the proposed DL-based technique achieves an accuracy of 87%, a precision of 94%, a sensitivity of 91%, a specificity of 71% and a mean of sensitivity and specificity at 81% and outperformed three state-of-the-art deep learning approaches. This study demonstrates the feasibility of DL-based prediction of critical molecular features in cytological slides, which not only aid in accurate diagnosis but also provide useful information in guiding clinical decision-making in patients with thyroid cancer. With the accumulation of data and the continuous advancement of technology, the performance of DL systems is expected to be improved in the near future. Therefore, we expect that DL can provide a cost-effective and time-effective alternative tool for patients in the era of precision oncology. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
24. Weakly Supervised Salient Object Detection by Hierarchically Enhanced Scribbles.
- Author
-
Wang, Xiongying, Al-Huda, Zaid, Peng, Bo, and Tang, Xin
- Subjects
- *
OBJECT recognition (Computer vision) , *DEEP learning , *RANDOM fields , *CONTOURS (Cartography) - Abstract
The performance of salient object detection (SOD) has been significantly advanced by using deep convolutional networks. However, it largely depends on the high cost of pixel-level annotations. To reduce human effort while improving the prediction accuracy, we propose a novel two-phase learning framework. The weakly supervised information in terms of scribbles is provided as initial labels. Then, as the first phase, high-quality pseudo-labels are generated by mapping scribbles onto object/object-part contours. These contour maps are predicted by the hierarchical contour detection algorithm, providing superior accuracy and smoothness. In the second phase, a deep neural network is alternately trained and predicted. The pseudo-labels are refined in an iterated process, where a conditional random field (CRF) model and a filter module are designed to promote the performance. Extensive experiments on five benchmarks show that our framework can achieve comparable results with the state-of-the-art fully and weakly supervised methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. CamNuvem: A Robbery Dataset for Video Anomaly Detection.
- Author
-
de Paula, Davi D., Salvadeo, Denis H. P., and de Araujo, Darlan M. N.
- Subjects
- *
VIDEO surveillance , *ANOMALY detection (Computer security) , *ROBBERY , *CRIMINAL investigation , *CAMCORDERS , *HUMAN behavior - Abstract
(1) Background: The research area of video surveillance anomaly detection aims to automatically detect the moment when a video surveillance camera captures something that does not fit the normal pattern. This is a difficult task, but it is important to automate, improve, and lower the cost of the detection of crimes and other accidents. The UCF–Crime dataset is currently the most realistic crime dataset, and it contains hundreds of videos distributed in several categories; it includes a robbery category, which contains videos of people stealing material goods using violence, but this category only includes a few videos. (2) Methods: This work focuses only on the robbery category, presenting a new weakly labelled dataset that contains 486 new real–world robbery surveillance videos acquired from public sources. (3) Results: We have modified and applied three state–of–the–art video surveillance anomaly detection methods to create a benchmark for future studies. We showed that in the best scenario, taking into account only the anomaly videos in our dataset, the best method achieved an AUC of 66.35%. When all anomaly and normal videos were taken into account, the best method achieved an AUC of 88.75%. (4) Conclusion: This result shows that there is a huge research opportunity to create new methods and approaches that can improve robbery detection in video surveillance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Classification of Alzheimer's Disease Based on Weakly Supervised Learning and Attention Mechanism.
- Author
-
Wu, Xiaosheng, Gao, Shuangshuang, Sun, Junding, Zhang, Yudong, and Wang, Shuihua
- Subjects
- *
ALZHEIMER'S disease , *MAGNETIC resonance imaging , *DATA augmentation , *IMAGE recognition (Computer vision) , *BRAIN damage - Abstract
The brain lesions images of Alzheimer's disease (AD) patients are slightly different from the Magnetic Resonance Imaging of normal people, and the classification effect of general image recognition technology is not ideal. Alzheimer's datasets are small, making it difficult to train large-scale neural networks. In this paper, we propose a network model (WS-AMN) that fuses weak supervision and an attention mechanism. The weakly supervised data augmentation network is used as the basic model, the attention map generated by weakly supervised learning is used to guide the data augmentation, and an attention module with channel domain and spatial domain is embedded in the residual network to focus on the distinctive channels and spaces of images respectively. The location information enhances the corresponding features of related features and suppresses the influence of irrelevant features.The results show that the F1-score is 99.63%, the accuracy is 99.61%. Our model provides a high-performance solution for accurate classification of AD. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. Affinity Attention Graph Neural Network for Weakly Supervised Semantic Segmentation.
- Author
-
Zhang, Bingfeng, Xiao, Jimin, Jiao, Jianbo, Wei, Yunchao, and Zhao, Yao
- Subjects
- *
PIXELS , *CONVOLUTIONAL neural networks , *SOURCE code - Abstract
Weakly supervised semantic segmentation is receiving great attention due to its low human annotation cost. In this paper, we aim to tackle bounding box supervised semantic segmentation, i.e., training accurate semantic segmentation models using bounding box annotations as supervision. To this end, we propose affinity attention graph neural network ($A^2$ A 2 GNN). Following previous practices, we first generate pseudo semantic-aware seeds, which are then formed into semantic graphs based on our newly proposed affinity Convolutional Neural Network (CNN). Then the built graphs are input to our $A^2$ A 2 GNN, in which an affinity attention layer is designed to acquire the short- and long- distance information from soft graph edges to accurately propagate semantic labels from the confident seeds to the unlabeled pixels. However, to guarantee the precision of the seeds, we only adopt a limited number of confident pixel seed labels for $A^2$ A 2 GNN, which may lead to insufficient supervision for training. To alleviate this issue, we further introduce a new loss function and a consistency-checking mechanism to leverage the bounding box constraint, so that more reliable guidance can be included for the model optimization. Experiments show that our approach achieves new state-of-the-art performances on Pascal VOC 2012 datasets (val: 76.5 percent, test: 75.2 percent). More importantly, our approach can be readily applied to bounding box supervised instance segmentation task or other weakly supervised semantic segmentation tasks, with state-of-the-art or comparable performance among almot all weakly supervised tasks on PASCAL VOC or COCO dataset. Our source code will be available at https://github.com/zbf1991/A2GNN. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Decoupling foreground and background with Siamese ViT networks for weakly-supervised semantic segmentation.
- Author
-
Lin, Meiling, Li, Gongyan, Xu, Shaoyun, Hao, Yuexing, and Zhang, Shu
- Subjects
- *
CONFIDENCE regions (Mathematics) , *DATA mining , *ALGORITHMS , *HEURISTIC - Abstract
Due to the coarse granularity of information extraction in image-level annotation-based weakly supervised semantic segmentation algorithms, there exists a significant gap between the generated pseudo-labels and the real pixel-level labels. In this paper, we propose the DeFB-SV framework, which consists of a dual-branch Siamese network structure. This framework separates the foreground and background of images by generating unified resolution and mixed resolution class activation maps, which are then fused to obtain pseudo-labels. The mixed-resolution class activation maps are produced by a new mixed-resolution patch partition method, where we introduce a semantically heuristic patch scorer to divide the image into patches of different sizes based on semantics. Additionally, a novel multi-confidence region division mechanism is proposed to enable the adaptive extraction of the effective parts of pseudo-labels, further enhancing the accuracy of weakly supervised semantic segmentation algorithms. The proposed semantic segmentation framework, DeFB-SV, is evaluated on the PASCAL VOC 2012 and MS COCO 2014 datasets, demonstrating comparable segmentation performance with state-of-the-art methods. • A novel weakly supervised semantic segmentation framework named DeFB-SV. • A Siamese network consisting of two ViT branches yielding fine-grained pseudo-labels. • A semantically heuristic patch scorer generating mixed-resolution image patches. • A multi-confidence-region strategy achieving finer segmentation results adaptively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Weakly supervised 3D point cloud semantic segmentation for architectural heritage using teacher-guided consistency and contrast learning.
- Author
-
Huang, Shuowen, Hu, Qingwu, Ai, Mingyao, Zhao, Pengcheng, Li, Jian, Cui, Hao, and Wang, Shaohua
- Subjects
- *
POINT cloud , *LEARNING strategies , *POINT set theory , *ARCHES , *SUPERVISION - Abstract
Point cloud semantic segmentation is significant for managing and protecting architectural heritage. Currently, fully supervised methods require a large amount of annotated data, while weakly supervised methods are difficult to transfer directly to architectural heritage. This paper proposes an end-to-end teacher-guided consistency and contrastive learning weakly supervised (TCCWS) framework for architectural heritage point cloud semantic segmentation, which can fully utilize limited labeled points to train network. Specifically, a teacher-student framework is designed to generate pseudo labels and a pseudo label dividing module is proposed to distinguish reliable and ambiguous point sets. Based on it, a consistency and contrastive learning strategy is designed to fully utilize supervision signals to learn the features of point clouds. The framework is tested on the ArCH dataset and self-collected point cloud, which demonstrates that the proposed method can achieve effective semantic segmentation of architectural heritage using only 0.1 % of annotated points. • Develop an end-to-end weakly supervised framework for architectural heritage point cloud semantic segmentation. • Propose a pseudo label dividing module to distinguish reliable and ambiguous sets in pseudo labels. • Design a consistency and contrastive learning strategy to facilitate the training of teacher-student framework. • Experiment results on the real architectural heritage data showed the proposed method achieved the outstanding performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Scribble-based boundary-aware network for weakly supervised salient object detection in remote sensing images.
- Author
-
Huang, Zhou, Xiang, Tian-Zhu, Chen, Huai-Xin, and Dai, Hang
- Subjects
- *
REMOTE sensing , *OBJECT recognition (Computer vision) , *COMMUNITIES - Abstract
Existing CNN-based salient object detection (SOD) models rely heavily on large-scale pixel-level annotations, which are labor-intensive, time-consuming, and expensive. In contrast, sparse annotations (e.g. , image-level or scribble) are gradually entering the SOD community. However, few efforts have been devoted to studying SOD from sparse annotations, especially in remote sensing. Moreover, sparse annotations usually contain a small amount of information, making it challenging to train a well-performing model, thereby causing its performance to lag largely behind fully-supervised models. Although some SOD methods adopt prior cues (e.g. , edges) to improve performance, they usually lack targeted discrimination of object boundaries and thus provide saliency maps with poor boundary localization. To this end, in this paper, we propose a novel weakly-supervised SOD framework to predict the saliency of remote sensing images from sparse scribble annotations. To achieve it, we first construct the scribble-based remote sensing saliency dataset by relabeling an existing large-scale SOD dataset with scribbles, namely S-EOR dataset. After that, we present a novel scribble-based boundary-aware network (SBA-Net) for remote sensing saliency detection. Specifically, we design a boundary-aware module (BAM) to explore object boundary semantics, which is explicitly supervised by high-confidence object boundary (pseudo) labels generated by the boundary label generation (BLG) module, forcing the model to learn features that highlight object structures and thus boosting object boundary localization. Extensive quantitative and qualitative comparisons of two public remote sensing SOD datasets show that the proposed method outperforms current weakly supervised and unsupervised SOD methods and is highly competitive with existing fully supervised methods. Numerous ablation experiments demonstrate the effectiveness and generalization of the proposed model. The dataset and code will be publicly available at: https://github.com/ZhouHuang23/SBA-Net. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. Weakly Supervised Violence Detection in Surveillance Video.
- Author
-
Choqueluque-Roman, David and Camara-Chavez, Guillermo
- Subjects
- *
VIDEO surveillance , *ARTIFICIAL neural networks , *ARCHITECTURAL style , *SURVEILLANCE detection , *PERSONAL security , *VIOLENCE , *TRACKING algorithms - Abstract
Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignoring important events due to human limitations when paying attention to multiple targets at a time. Researchers have proposed several methods to detect violent events automatically to overcome this problem. So far, most previous studies have focused only on classifying short clips without performing spatial localization. In this work, we tackle this problem by proposing a weakly supervised method to detect spatially and temporarily violent actions in surveillance videos using only video-level labels. The proposed method follows a Fast-RCNN style architecture, that has been temporally extended. First, we generate spatiotemporal proposals (action tubes) leveraging pre-trained person detectors, motion appearance (dynamic images), and tracking algorithms. Then, given an input video and the action proposals, we extract spatiotemporal features using deep neural networks. Finally, a classifier based on multiple-instance learning is trained to label each action tube as violent or non-violent. We obtain similar results to the state of the art in three public databases Hockey Fight, RLVSD, and RWF-2000, achieving an accuracy of 97.3%, 92.88%, 88.7%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Weakly Supervised Video Object Segmentation via Dual-attention Cross-branch Fusion.
- Author
-
Wei, Lili, Lang, Congyan, Liang, Liqian, Feng, Songhe, Wang, Tao, and Chen, Shidi
- Subjects
- *
OPTICAL flow , *IMAGE segmentation , *VIDEOS - Abstract
Recently, concerning the challenge of collecting large-scale explicitly annotated videos, weakly supervised video object segmentation (WSVOS) using video tags has attracted much attention. Existing WSVOS approaches follow a general pipeline including two phases, i.e., a pseudo masks generation phase and a refinement phase. To explore the intrinsic property and correlation buried in the video frames, most of them focus on the later phase by introducing optical flow as temporal information to provide more supervision. However, these optical flow-based studies are greatly affected by illumination and distortion and lack consideration of the discriminative capacity of multi-level deep features. In this article, with the goal of capturing more effective temporal information and investigating a temporal information fusion strategy accordingly, we propose a unified WSVOS model by adopting a two-branch architecture with a multi-level cross-branch fusion strategy, named as dual-attention cross-branch fusion network (DACF-Net). Concretely, the two branches of DACF-Net, i.e., a temporal prediction subnetwork (TPN) and a spatial segmentation subnetwork (SSN), are used for extracting temporal information and generating predicted segmentation masks, respectively. To perform the cross-branch fusion between TPN and SSN, we propose a dual-attention fusion module that can be plugged into the SSN flexibly. We also pose a cross-frame coherence loss (CFCL) to achieve smooth segmentation results by exploiting the coherence of masks produced by TPN and SSN. Extensive experiments demonstrate the effectiveness of proposed approach compared with the state-of-the-arts on two challenging datasets, i.e., Davis-2016 and YouTube-Objects. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. TiWS-iForest: Isolation forest in weakly supervised and tiny ML scenarios.
- Author
-
Barbariol, Tommaso and Susto, Gian Antonio
- Subjects
- *
DECISION support systems , *ANOMALY detection (Computer security) , *INFORMATION resources , *PSYCHOLOGICAL feedback - Abstract
Unsupervised anomaly detection tackles the problem of finding anomalies inside datasets without the labels availability; since data tagging is typically hard or expensive to obtain, such approaches have seen huge applicability in recent years. In this context, Isolation Forest is a popular algorithm able to define an anomaly score by means of an ensemble of peculiar trees called isolation trees. These are built using a random partitioning procedure that is extremely fast and cheap to train. However, we find that the standard algorithm might be improved in terms of memory requirements, latency and performances; this is of particular importance in low resources scenarios and in TinyML implementations on ultra-constrained microprocessors. Moreover, Anomaly Detection approaches currently do not take advantage of weak supervisions: being typically consumed in Decision Support Systems, feedback from the users, even if rare, can be a valuable source of information that is currently unexplored. Beside showing iForest training limitations, we propose here TiWS-iForest, an approach that, by leveraging weak supervision is able to reduce Isolation Forest complexity and to enhance detection performances. We showed the effectiveness of TiWS-iForest on real word datasets and we share the code in a public repository to enhance reproducibility. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. TiWS-iForest: Isolation forest in weakly supervised and tiny ML scenarios.
- Author
-
Barbariol, Tommaso and Susto, Gian Antonio
- Subjects
- *
DECISION support systems , *ANOMALY detection (Computer security) , *INFORMATION resources , *PSYCHOLOGICAL feedback - Abstract
Unsupervised anomaly detection tackles the problem of finding anomalies inside datasets without the labels availability; since data tagging is typically hard or expensive to obtain, such approaches have seen huge applicability in recent years. In this context, Isolation Forest is a popular algorithm able to define an anomaly score by means of an ensemble of peculiar trees called isolation trees. These are built using a random partitioning procedure that is extremely fast and cheap to train. However, we find that the standard algorithm might be improved in terms of memory requirements, latency and performances; this is of particular importance in low resources scenarios and in TinyML implementations on ultra-constrained microprocessors. Moreover, Anomaly Detection approaches currently do not take advantage of weak supervisions: being typically consumed in Decision Support Systems, feedback from the users, even if rare, can be a valuable source of information that is currently unexplored. Beside showing iForest training limitations, we propose here TiWS-iForest, an approach that, by leveraging weak supervision is able to reduce Isolation Forest complexity and to enhance detection performances. We showed the effectiveness of TiWS-iForest on real word datasets and we share the code in a public repository to enhance reproducibility. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. Attention-Based Dropout Layer for Weakly Supervised Single Object Localization and Semantic Segmentation.
- Author
-
Choe, Junsuk, Lee, Seungho, and Shim, Hyunjung
- Subjects
- *
WEAK localization (Quantum mechanics) , *IMAGE segmentation , *FEATURE extraction , *TASK analysis - Abstract
Both weakly supervised single object localization and semantic segmentation techniques learn an object’s location using only image-level labels. However, these techniques are limited to cover only the most discriminative part of the object and not the entire object. To address this problem, we propose an attention-based dropout layer, which utilizes the attention mechanism to locate the entire object efficiently. To achieve this, we devise two key components, 1) hiding the most discriminative part from the model to capture the entire object, and 2) highlighting the informative region to improve the classification power of the model. These allow the classifier to be maintained with a reasonable accuracy while the entire object is covered. Through extensive experiments, we demonstrate that the proposed method effectively improves the weakly supervised single object localization accuracy, thereby achieving a new state-of-the-art localization accuracy on the CUB-200-2011 and a comparable accuracy existing state-of-the-arts on the ImageNet-1k. The proposed method is also effective in improving the weakly supervised semantic segmentation performance on the Pascal VOC and MS COCO. Furthermore, the proposed method is more efficient than existing techniques in terms of parameter and computation overheads. Additionally, the proposed method can be easily applied in various backbone networks. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
36. Tobler's First Law in GeoAI: A Spatially Explicit Deep Learning Model for Terrain Feature Detection under Weak Supervision.
- Author
-
Li, Wenwen, Hsu, Chia-Yu, and Hu, Maosheng
- Subjects
- *
GEOSPATIAL data , *TERRAIN mapping , *DEEP learning , *ARTIFICIAL intelligence , *REMOTE sensing - Abstract
Recent interest in geospatial artificial intelligence (GeoAI) has fostered a wide range of applications using artificial intelligence (AI), especially deep learning for geospatial problem solving. Major challenges, however, such as a lack of training data and ignorance of spatial principles and spatial effects in AI model design remain, significantly hindering the in-depth integration of AI with geospatial research. This article reports our work in developing a cutting-edge deep learning model that enables object detection, especially of natural features, in a weakly supervised manner. Our work has made three innovative contributions: First, we present a novel method of object detection using only weak labels. This is achieved by developing a spatially explicit model according to Tobler's first law of geography to enable weakly supervised object detection. Second, we integrate the idea of an attention map into the deep learning–based object detection pipeline and develop a multistage training strategy to further boost detection performance. Third, we have successfully applied this model for the automated detection of Mars impact craters, the inspection of which often involved tremendous manual work prior to our solution. Our model is generalizable for detecting both natural and man-made features on the surface of the Earth and other planets. This research has made a major contribution to the enrichment of the theoretical and methodological body of knowledge of GeoAI. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
37. Weakly Supervised Neuron Reconstruction From Optical Microscopy Images With Morphological Priors.
- Author
-
Chen, Xuejin, Zhang, Chi, Zhao, Jie, Xiong, Zhiwei, Zha, Zheng-Jun, and Wu, Feng
- Subjects
- *
GENERATIVE adversarial networks , *MICROSCOPY , *DEEP learning , *OPTICAL images , *NEURONS , *NEURAL circuitry - Abstract
Manually labeling neurons from high-resolution but noisy and low-contrast optical microscopy (OM) images is tedious. As a result, the lack of annotated data poses a key challenge when applying deep learning techniques for reconstructing neurons from noisy and low-contrast OM images. While traditional tracing methods provide a possible way to efficiently generate labels for supervised network training, the generated pseudo-labels contain many noisy and incorrect labels, which lead to severe performance degradation. On the other hand, the publicly available dataset, BigNeuron, provides a large number of single 3D neurons that are reconstructed using various imaging paradigms and tracing methods. Though the raw OM images are not fully available for these neurons, they convey essential morphological priors for complex 3D neuron structures. In this paper, we propose a new approach to exploit morphological priors from neurons that have been reconstructed for training a deep neural network to extract neuron signals from OM images. We integrate a deep segmentation network in a generative adversarial network (GAN), expecting the segmentation network to be weakly supervised by pseudo-labels at the pixel level while utilizing the supervision of previously reconstructed neurons at the morphology level. In our morphological-prior-guided neuron reconstruction GAN, named MP-NRGAN, the segmentation network extracts neuron signals from raw images, and the discriminator network encourages the extracted neurons to follow the morphology distribution of reconstructed neurons. Comprehensive experiments on the public VISoR-40 dataset and BigNeuron dataset demonstrate that our proposed MP-NRGAN outperforms state-of-the-art approaches with less training effort. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
38. Weakly supervised pneumonia localization in chest X‐rays using generative adversarial networks.
- Author
-
Keshavamurthy, Krishna Nand, Eickhoff, Carsten, and Juluru, Krishna
- Subjects
- *
GENERATIVE adversarial networks , *X-rays , *RECEIVER operating characteristic curves , *PNEUMONIA , *CHEST (Anatomy) , *MACHINE learning - Abstract
Purpose: Automatic localization of pneumonia on chest X‐rays (CXRs) is highly desirable both as an interpretive aid to the radiologist and for timely diagnosis of the disease. However, pneumonia's amorphous appearance on CXRs and complexity of normal anatomy in the chest present key challenges that hinder accurate localization. Existing studies in this area are either not optimized to preserve spatial information of abnormality or depend on expensive expert‐annotated bounding boxes. We present a novel generative adversarial network (GAN)‐based machine learning approach for this problem, which is weakly supervised (does not require any location annotations), was trained to retain spatial information, and can produce pixel‐wise abnormality maps highlighting regions of abnormality (as opposed to bounding boxes around abnormality). Methods: Our method is based on the Wasserstein GAN framework and, to the best of our knowledge, the first application of GANs to this problem. Specifically, from an abnormal CXR as input, we generated the corresponding pseudo normal CXR image as output. The pseudo normal CXR is the "hypothetical" normal, if the same abnormal CXR were not to have any abnormalities. We surmise that the difference between the pseudo normal and the abnormal CXR highlights the pixels suspected to have pneumonia and hence is our output abnormality map. We trained our algorithm on an "unpaired" data set of abnormal and normal CXRs and did not require any location annotations such as bounding boxes/segmentations of abnormal regions. Furthermore, we incorporated additional prior knowledge/constraints into the model and showed that they help improve localization performance. We validated the model on a data set consisting of 14 184 CXRs from the Radiological Society of North America pneumonia detection challenge. Results: We evaluated our methods by comparing the generated abnormality maps with radiologist annotated bounding boxes using receiver operating characteristic (ROC) analysis, image similarity metrics such as normalized cross‐correlation/mutual information, and abnormality detection rate.We also present visual examples of the abnormality maps, covering various scenarios of abnormality occurrence. Results demonstrate the ability to highlight regions of abnormality with the best method achieving an ROC area under the curve (AUC) of 0.77 and a detection rate of 85%.The GAN tended to perform better as prior knowledge/constraints were incorporated into the model. Conclusions: We presented a novel GAN based approach for localizing pneumonia on CXRs that (1) does not require expensive hand annotated location ground truth; and (2) was trained to produce abnormality maps at the pixel level as opposed to bounding boxes. We demonstrated the efficacy of our methods via quantitative and qualitative results. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. Looking for Abnormalities in Mammograms With Self- and Weakly Supervised Reconstruction.
- Author
-
Tardy, Mickael and Mateus, Diana
- Subjects
- *
MAMMOGRAMS , *BREAST cancer , *HUMAN abnormalities , *EARLY detection of cancer , *IMAGE reconstruction , *BREAST - Abstract
Early breast cancer screening through mammography produces every year millions of images worldwide. Despite the volume of the data generated, these images are not systematically associated with standardized labels. Current protocols encourage giving a malignancy probability to each studied breast but do not require the explicit and burdensome annotation of the affected regions. In this work, we address the problem of abnormality detection in the context of such weakly annotated datasets. We combine domain knowledge about the pathology and clinically available image-wise labels to propose a mixed self- and weakly supervised learning framework for abnormalities reconstruction. We also introduce an auxiliary classification task based on the reconstructed regions to improve explainability. We work with high-resolution imaging that enables our network to capture different findings, including masses, micro-calcifications, distortions, and asymmetries, unlike most state-of-the-art works that mainly focus on masses. We use the popular INBreast dataset as well as our private multi-manufacturer dataset for validation and we challenge our method in segmentation, detection, and classification versus multiple state-of-the-art methods. Our results include image-wise AUC up to 0.86, overall region detection true positives rate of 0.93, and the pixel-wise ${F}_{{1}}$ score of 64% on malignant masses. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
40. Learning Hierarchical Attention for Weakly-Supervised Chest X-Ray Abnormality Localization and Diagnosis.
- Author
-
Ouyang, Xi, Karanam, Srikrishna, Wu, Ziyan, Chen, Terrence, Huo, Jiayu, Zhou, Xiang Sean, Wang, Qian, and Cheng, Jie-Zhi
- Subjects
- *
X-rays , *DEEP learning , *CLINICAL medicine , *PHYSICIANS , *COMPUTER-assisted image analysis (Medicine) , *ALGORITHMS - Abstract
We consider the problem of abnormality localization for clinical applications. While deep learning has driven much recent progress in medical imaging, many clinical challenges are not fully addressed, limiting its broader usage. While recent methods report high diagnostic accuracies, physicians have concerns trusting these algorithm results for diagnostic decision-making purposes because of a general lack of algorithm decision reasoning and interpretability. One potential way to address this problem is to further train these models to localize abnormalities in addition to just classifying them. However, doing this accurately will require a large amount of disease localization annotations by clinical experts, a task that is prohibitively expensive to accomplish for most applications. In this work, we take a step towards addressing these issues by means of a new attention-driven weakly supervised algorithm comprising a hierarchical attention mining framework that unifies activation- and gradient-based visual attention in a holistic manner. Our key algorithmic innovations include the design of explicit ordinal attention constraints, enabling principled model training in a weakly-supervised fashion, while also facilitating the generation of visual-attention-driven model explanations by means of localization cues. On two large-scale chest X-ray datasets (NIH ChestX-ray14 and CheXpert), we demonstrate significant localization performance improvements over the current state of the art while also achieving competitive classification performance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
41. Event-driven weakly supervised video anomaly detection.
- Author
-
Sun, Shengyang and Gong, Xiaojin
- Subjects
- *
WORK design , *VIDEOS - Abstract
Inspired by the observations of human working manners, this work proposes an event-driven method for weakly supervised video anomaly detection. Complementary to the conventional snippet-level anomaly detection, this work designs an event analysis module to predict the event-level anomaly scores as well. It first generates event proposals simply via a temporal sliding window and then constructs a cascaded causal transformer to capture temporal dependencies for potential events of varying durations. Moreover, a dual-memory augmented self-attention scheme is also designed to capture global semantic dependencies for event feature enhancement. The network is learned with a standard multiple instance learning (MIL) loss, together with normal-abnormal contrastive learning losses. During inference, the snippet- and event-level anomaly scores are fused for anomaly detection. Experiments show that the event-level analysis helps to detect anomalous events more continuously and precisely. The performance of the proposed method on three public datasets demonstrates that the proposed approach is competitive with state-of-the-art methods. • This work proposes a WS-VAD method predicting snippet- and event-level anomalies. • This work constructs a cascaded causal transformer to capture temporal dependencies. • This work designs a memory-augmented attention for global semantic dependencies. • Experiments on public datasets show the method achieves competitive performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Unified weakly and semi-supervised crack segmentation framework using limited coarse labels.
- Author
-
Xiang, Chao, Gan, Vincent J.L., Deng, Lu, Guo, Jingjing, and Xu, Shaopeng
- Subjects
- *
CONVOLUTIONAL neural networks , *GAUSSIAN mixture models , *CRACKING of concrete , *TRANSFORMER models - Abstract
Obtaining extensive, high-quality datasets for crack segmentation with pixel-level labels is expensive and labor-intensive. The Unified Weakly and Semi-supervised Crack Segmentation (UWSCS) framework addresses this challenge by leveraging a limited number of images with coarse labels and a larger set of unlabeled images. Two label correction modules, based on super-pixel segmentation and a shrink module, are incorporated in the model training to improve crack label accuracy and optimize edge refinement. UWSCS employs a dual-encoder fusion network, combining transformers and convolutional neural networks, to enhance crack segmentation in complex backgrounds. An enhanced algorithm using the medial axis transform is proposed for accurately quantifying crack length and width. Extensive experiments were conducted on both synthetic and real crack datasets to validate the superior performance of UWSCS. The results underscore the significant impact of label quality and quantity used in training on model prediction accuracy. Trained on a concrete crack dataset with limited coarse labels, UWSCS achieves an Intersection of Union (IoU) of 77.53%, surpassing the fully supervised model using the same number of coarse labels by 28.64%. It closely approaches the performance of a fully supervised model with the same number of fine labels (IoU of 80.21%). UWSCS outperforms other advanced networks and semi-supervised/weakly supervised algorithms when trained with a limited set of more cost-effective manually labeled coarse labels. Integrated with the crack segmentation network, super-pixel segmentation, and shrink modules during training, UWSCS with limited coarse labels performs similarly to a fully supervised model using fine labels, thereby reducing manual labeling costs by over 90% and enhancing detection efficiency in practical engineering. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Weakly supervised semantic segmentation by iteratively refining optimal segmentation with deep cues guidance.
- Author
-
Al-Huda, Zaid, Peng, Bo, Yang, Yan, Algburi, Riyadh Nazar Ali, Ahmad, Muqeet, Khurshid, Faisal, and Moghalles, Khaled
- Subjects
- *
PIXELS , *GAUSSIAN mixture models , *CONVOLUTIONAL neural networks - Abstract
Weakly supervised semantic segmentation under image-level label supervision has undergone impressive improvements over the past years. These approaches can significantly reduce human annotation efforts, although they remain inferior to fully supervised procedures. In this paper, we propose a novel framework that iteratively refines pixel-level annotations and optimizes segmentation network. We first produce initial deep cues using the combination of activation maps and a saliency map. To produce high-quality pixel-level annotations, a graphical model is constructed over optimal segmentation of high-quality region hierarchies to propagate information from deep cues to unmarked regions. In the training process, the initial pixel-level annotations are used as supervision to train the segmentation network and to predict segmentation masks. To correct inaccurate labels of segmentation masks, we use these segmentation masks with the graphical model to produce accurate pixel-level annotations and use them as supervision to retrain the segmentation network. Experimental results show that the proposed method can significantly outperform the weakly-supervised semantic segmentation methods using static labels. The proposed method has state-of-the-art performance, which are 66.7 % mIoU score on PASCAL VOC 2012 test set and 27.0 % mIoU score on MS COCO validation set. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
44. Multi-Scale Structure-Aware Network for Weakly Supervised Temporal Action Detection.
- Author
-
Yang, Wenfei, Zhang, Tianzhu, Mao, Zhendong, Zhang, Yongdong, Tian, Qi, and Wu, Feng
- Subjects
- *
VIDEO coding , *IMAGE segmentation , *FEATURE extraction , *SCALABILITY , *NOISE measurement , *OBJECT tracking (Computer vision) - Abstract
Weakly supervised temporal action detection has better scalability and practicability than fully supervised action detection in reality deployment. However, it is difficult to learn a robust model without temporal action boundary annotations. In this paper, we propose an en-to-end Multi-Scale Structure-Aware Network (MSA-Net) for weakly supervised temporal action detection by exploring both the global structure information of a video and the local structure information of actions. The proposed SA-Net enjoys several merits. First, to localize actions with different durations, each video is encoded into feature representations with different temporal scales. Second, based on the multi-scale feature representation, the proposed model has designed two effective structure modeling mechanisms including global structure modeling and local structure modeling, which can effectively learn discriminative structure aware representations for robust and complete action detection. To the best of our knowledge, this is the first work to fully explore the global and local structure information in a unified deep model for weakly supervised action detection. And extensive experimental results on two benchmark datasets demonstrate that the proposed MSA-Net performs favorably against state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
45. Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding.
- Author
-
Yang, Wenfei, Zhang, Tianzhu, Zhang, Yongdong, and Wu, Feng
- Subjects
- *
SUPERVISED learning , *FEATURE extraction , *TASK analysis - Abstract
Weakly supervised temporal sentence grounding has better scalability and practicability than fully supervised methods in real-world application scenarios. However, most of existing methods cannot model the fine-grained video-text local correspondences well and do not have effective supervision information for correspondence learning, thus yielding unsatisfying performance. To address the above issues, we propose an end-to-end Local Correspondence Network (LCNet) for weakly supervised temporal sentence grounding. The proposed LCNet enjoys several merits. First, we represent video and text features in a hierarchical manner to model the fine-grained video-text correspondences. Second, we design a self-supervised cycle-consistent loss as a learning guidance for video and text matching. To the best of our knowledge, this is the first work to fully explore the fine-grained correspondences between video and text for temporal sentence grounding by using self-supervised learning. Extensive experimental results on two benchmark datasets demonstrate that the proposed LCNet significantly outperforms existing weakly supervised methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
46. Weakly Supervised Group Mask Network for Object Detection.
- Author
-
Song, Lingyun, Liu, Jun, Sun, Mingxuan, and Shang, Xuequn
- Abstract
Learning object detectors from weak image annotations is an important yet challenging problem. Many weakly supervised approaches formulate the task as a multiple instance learning problem, where each image is represented as a bag of instances. For predicting the score for each object that occurs in an image, existing MIL based approaches tend to select the instance that responds more strongly to a specific class, which, however, overlooks the contextual information. Besides, objects often exhibit dramatic variations such as scaling and transformations, which makes them hard to detect. In this paper, we propose the weakly supervised group mask network (WSGMN), which mainly has two distinctive properties: (i) it exploits the relations among regions to generate community instances, which contain context information and are robust to object variations. (ii) It generates a mask for each label group, and utilizes these masks to dynamically select the feature information of the most useful community instances for recognizing specific objects. Extensive experiments on several benchmark datasets demonstrate the effectiveness of WSGMN on the tasks of weakly supervised object detection. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. Shape robust Siamese network tracking based on weakly supervised learning.
- Author
-
Ma, Ding, Zhou, Yong, Yao, Rui, Zhao, Jiaqi, Liu, Bing, and Gua, Banji
- Subjects
- *
ARTIFICIAL neural networks , *ARTIFICIAL satellite tracking , *OCCLUSION (Chemistry) , *ELECTRONIC data processing - Abstract
This paper combines the boundary box regression with the training data occlusion processing method, the occlusion problem is more accurate and the tracking accuracy is improved. The occlusion problem is now the major challenge in target tracking. This paper puts forward a weakly monitoring framework to address this problem. The main idea is to randomly hide the most discriminating patches in the input images, forcing the network to focus on other relevant parts. Our method only needs to modify the inputs, no need to hide any patches during the test. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
48. On Learning 3D Face Morphable Model from In-the-Wild Images.
- Author
-
Tran, Luan and Liu, Xiaoming
- Abstract
As a classic statistical model of 3D facial shape and albedo, 3D Morphable Model (3DMM) is widely used in facial analysis, e.g., model fitting, image synthesis. Conventional 3DMM is learned from a set of 3D face scans with associated well-controlled 2D face images, and represented by two sets of PCA basis functions. Due to the type and amount of training data, as well as, the linear bases, the representation power of 3DMM can be limited. To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of in-the-wild face images, without collecting 3D face scans. Specifically, given a face image as input, a network encoder estimates the projection, lighting, shape and albedo parameters. Two decoders serve as the nonlinear 3DMM to map from the shape and albedo parameters to the 3D shape and albedo, respectively. With the projection parameter, lighting, 3D shape, and albedo, a novel analytically-differentiable rendering layer is designed to reconstruct the original input face. The entire network is end-to-end trainable with only weak supervision. We demonstrate the superior representation power of our nonlinear 3DMM over its linear counterpart, and its contribution to face alignment, 3D reconstruction, and face editing. Source code and additional results can be found at our project page: http://cvlab.cse.msu.edu/project-nonlinear-3dmm.html. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
49. WSODPB: Weakly supervised object detection with PCSNet and box regression module.
- Author
-
Yi, Sheng, Ma, Huimin, Li, Xi, and Wang, Yu
- Subjects
- *
BOXES , *ALGORITHMS - Abstract
Weakly supervised object detection (WSOD) task uses only image-level annotations to train object detection task. WSOD does not require time-consuming instance-level annotations, which attracts more and more attention. Previous weakly supervised object detection methods iteratively update detectors and pseudo-labels or use rules-based methods, which could not generate complete and accurate proposals. We utilize the features extracted by the convolutional layers to optimize the proposals generated by rules-based methods, and solve the above problem through combining the two different features. Then, a box regression module is added to the weakly supervised object detection network, which supervised by a proposal completeness scoring network (PCSNet). The box regression module modifies proposals to obtain higher intersection-over-Union (IoU) with ground truth. PCSNet scores the proposal output from the box regression network and utilizes the score to improve the box regression module. In addition, we take advantage of the random proposal scoring (RPS) algorithm for generating more accurate pseudo labels to train the box regression module. The results show that our methods have made significant improvements on PASCAL VOC 2007 and 2012 datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
50. Weakly supervised easy-to-hard learning for object detection in image sequences.
- Author
-
Yu, Hongkai, Guo, Dazhou, Yan, Zhipeng, Fu, Lan, Simmons, Jeff, Przybyla, Craig P., and Wang, Song
- Subjects
- *
CONVOLUTIONAL neural networks , *OBJECT tracking (Computer vision) , *COMPUTER vision , *OBJECT recognition (Computer vision) , *DEEP learning , *ARTIFICIAL neural networks - Abstract
Object detection is an important research problem in computer vision. Convolutional Neural Networks (CNN) based deep learning models could be used for this problem, but it would require a large number of manual annotated objects for training or fine-tuning. Unfortunately, fine-grained manually annotated objects are not available in many cases. Usually, it is possible to obtain imperfect initialized detections by some weak object detectors using some weak supervisions like the prior knowledge of shape, size or motion. In some real-world applications, objects have little inter-occlusions and split/merge difficulties, so the spatio-temporal consistency in object tracking are well preserved in the image sequences/videos. Starting from the imperfect initialization, this paper proposes a new easy-to-hard learning method to incrementally improve the object detection in image sequences/videos by an unsupervised spatio-temporal analysis which involves more complex examples that are hard for object detection for next-iteration training. The proposed method does not require manual annotations, but uses weak supervisions and spatio-temporal consistency in tracking to simulate the supervisions in the CNN training. Experimental results on three different tasks show significant improvements over the initialized detections by the weak object detectors. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.