29 results on '"spatial transformer networks"'
Search Results
2. Adversarial and Random Transformations for Robust Domain Adaptation and Generalization.
- Author
-
Xiao, Liang, Xu, Jiaolong, Zhao, Dawei, Shang, Erke, Zhu, Qi, and Dai, Bin
- Subjects
- *
ARTIFICIAL neural networks , *DATA augmentation , *MACHINE learning , *REINFORCEMENT learning , *GENERALIZATION , *SEARCH algorithms - Abstract
Data augmentation has been widely used to improve generalization in training deep neural networks. Recent works show that using worst-case transformations or adversarial augmentation strategies can significantly improve accuracy and robustness. However, due to the non-differentiable properties of image transformations, searching algorithms such as reinforcement learning or evolution strategy have to be applied, which are not computationally practical for large-scale problems. In this work, we show that by simply applying consistency training with random data augmentation, state-of-the-art results on domain adaptation (DA) and generalization (DG) can be obtained. To further improve the accuracy and robustness with adversarial examples, we propose a differentiable adversarial data augmentation method based on spatial transformer networks (STNs). The combined adversarial and random-transformation-based method outperforms the state-of-the-art on multiple DA and DG benchmark datasets. Furthermore, the proposed method shows desirable robustness to corruption, which is also validated on commonly used datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Recognition of the Multioriented Text Based on Deep Learning
- Author
-
Priyadarsini, K., Janahan, Senthil Kumar, Thirumal, S., Bindu, P., Raj, T. Ajith Bosco, Majji, Sankararao, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Jacob, I. Jeena, editor, Kolandapalayam Shanmugam, Selvanayaki, editor, and Bestak, Robert, editor
- Published
- 2022
- Full Text
- View/download PDF
4. A Detection Network United Local Feature Points and Components for Fine-Grained Image Classification
- Author
-
Du, Yong, Yu, Bin, Hong, Peng, Pan, Wei, Wang, Yang, Wang, Yu, Xhafa, Fatos, Series Editor, Hassanien, Aboul Ella, editor, Xu, Yaoqun, editor, Zhao, Zhijie, editor, Mohammed, Sabah, editor, and Fan, Zhipeng, editor
- Published
- 2022
- Full Text
- View/download PDF
5. Scalable Handwritten Text Recognition System for Lexicographic Sources of Under-Resourced Languages and Alphabets
- Author
-
Idziak, Jan, Šeļa, Artjoms, Woźniak, Michał, Leśniak, Albert, Byszuk, Joanna, Eder, Maciej, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Paszynski, Maciej, editor, Kranzlmüller, Dieter, editor, Krzhizhanovskaya, Valeria V., editor, Dongarra, Jack J., editor, and Sloot, Peter M. A., editor
- Published
- 2021
- Full Text
- View/download PDF
6. Revisiting Data Augmentation for Rotational Invariance in Convolutional Neural Networks
- Author
-
Quiroga, Facundo, Ronchetti, Franco, Lanzarini, Laura, Bariviera, Aurelio F., Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Ferrer-Comalat, Joan Carles, editor, Linares-Mustarós, Salvador, editor, and Merigó, José M., editor
- Published
- 2020
- Full Text
- View/download PDF
7. Cascaded Region Proposal Networks for Proposal-Based Tracking
- Author
-
Zhang, Ximing, Fan, Xuewu, Luo, Shujuan, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, McDaniel, Troy, editor, Berretti, Stefano, editor, Curcio, Igor D. D., editor, and Basu, Anup, editor
- Published
- 2020
- Full Text
- View/download PDF
8. A Deep Learning Framework for Audio Deepfake Detection.
- Author
-
Khochare, Janavi, Joshi, Chaitali, Yenarkar, Bakul, Suratkar, Shraddha, and Kazi, Faruk
- Subjects
- *
CONVOLUTIONAL neural networks , *DEEP learning , *SPEECH synthesis , *MACHINE learning , *DEEPFAKES , *CLASSIFICATION algorithms - Abstract
Audio deepfakes have been increasingly emerging as a potential source of deceit, with the development of avant-garde methods of synthetic speech generation. Hence, differentiating fake audio from the real one is becoming even more difficult owing to the increasing accuracy of text-to-speech models, posing a serious threat to speaker verification systems. Within the domain of audio deepfake detection, a majority of experiments have been based on the ASVSpoof or the AVSpoof dataset using various machine learning and deep learning approaches. In this work, experiments were performed on a more recent dataset, the Fake or Real (FoR) dataset which contains data generated using some of the best text to speech models. Two approaches have been adopted to the solve problem: feature-based approach and image-based approach. The feature-based approach involves converting audio data into a dataset consisting of various spectral features of the audio samples, which are fed to the machine learning algorithms for the classification of audio as fake or real. While in the image-based approach audio samples are converted into melspectrograms which are input into deep learning algorithms, namely Temporal Convolutional Network (TCN) and Spatial Transformer Network (STN). TCN has been implemented because it is a sequential model and has been shown to give good results on sequential data. A comparison between the performances of both the approaches has been made, and it is observed that deep learning algorithms, particularly TCN, outperforms the machine learning algorithms by a significant margin, with a 92 percent test accuracy. This solution presents a model for audio deepfake classification which has an accuracy comparable to the traditional CNN models like VGG16, XceptionNet, etc. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Computationally Efficient ANN Model for Small-Scale Problems
- Author
-
Sharma, Shikhar, Shivhare, Shiv Naresh, Singh, Navjot, Kumar, Krishan, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Tanveer, M., editor, and Pachori, Ram Bilas, editor
- Published
- 2019
- Full Text
- View/download PDF
10. Improving Deep Image Clustering with Spatial Transformer Layers
- Author
-
Souza, Thiago V. M., Zanchettin, Cleber, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Tetko, Igor V., editor, Kůrková, Věra, editor, Karpov, Pavel, editor, and Theis, Fabian, editor
- Published
- 2019
- Full Text
- View/download PDF
11. A Reinforcement Learning Approach for Sequential Spatial Transformer Networks
- Author
-
Azimi, Fatemeh, Raue, Federico, Hees, Jörn, Dengel, Andreas, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Tetko, Igor V., editor, Kůrková, Věra, editor, Karpov, Pavel, editor, and Theis, Fabian, editor
- Published
- 2019
- Full Text
- View/download PDF
12. Unsupervised Domain Adaptation From Axial to Short-Axis Multi-Slice Cardiac MR Images by Incorporating Pretrained Task Networks.
- Author
-
Koehler, Sven, Hussain, Tarique, Blair, Zach, Huffaker, Tyler, Ritzmann, Florian, Tandon, Animesh, Pickardt, Thomas, Sarikouch, Samir, Latus, Heiner, Greil, Gerald, Wolf, Ivo, and Engelhardt, Sandy
- Subjects
- *
MAGNETIC resonance imaging , *CARDIAC magnetic resonance imaging , *CARDIAC imaging , *CONGENITAL heart disease , *CARDIOVASCULAR diseases - Abstract
Anisotropic multi-slice Cardiac Magnetic Resonance (CMR) Images are conventionally acquired in patient-specific short-axis (SAX) orientation. In specific cardiovascular diseases that affect right ventricular (RV) morphology, acquisitions in standard axial (AX) orientation are preferred by some investigators, due to potential superiority in RV volume measurement for treatment planning. Unfortunately, due to the rare occurrence of these diseases, data in this domain is scarce. Recent research in deep learning-based methods mainly focused on SAX CMR images and they had proven to be very successful. In this work, we show that there is a considerable domain shift between AX and SAX images, and therefore, direct application of existing models yield sub-optimal results on AX samples. We propose a novel unsupervised domain adaptation approach, which uses task-related probabilities in an attention mechanism. Beyond that, cycle consistency is imposed on the learned patient-individual 3D rigid transformation to improve stability when automatically re-sampling the AX images to SAX orientations. The network was trained on 122 registered 3D AX-SAX CMR volume pairs from a multi-centric patient cohort. A mean 3D Dice of 0.86 ± 0.06 for the left ventricle, 0.65 ± 0.08 for the myocardium, and 0.77 ± 0.10 for the right ventricle could be achieved. This is an improvement of 25% in Dice for RV in comparison to direct application on axial slices. To conclude, our pre-trained task module has neither seen CMR images nor labels from the target domain, but is able to segment them after the domain gap is reduced. Code: https://github.com/Cardio-AI/3d-mri-domain-adaptation [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. A Refined Spatial Transformer Network
- Author
-
Shu, Chang, Chen, Xi, Yu, Chong, Han, Hua, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Cheng, Long, editor, Leung, Andrew Chi Sing, editor, and Ozawa, Seiichi, editor
- Published
- 2018
- Full Text
- View/download PDF
14. An effective recognition approach for contactless palmprint.
- Author
-
Xu, Nuoya, Zhu, Qi, Xu, Xiangyu, and Zhang, Daoqiang
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE registration , *SIGNAL convolution - Abstract
The biometrics character has been widely used for individual identification and verification. Palmprint as one of biological features contains abundant discriminative features, which has already attracted a lot of interest. In this work, we focus on the identification and verification of contactless palmprint images. Considering the main differences between contact and contactless images, including orientation and deformation, we use a deep network combined with image alignment to further improve the recognition performance of contactless palmprint images. Recently, convolutional neural networks can well solve many classification problems, and researchers have proposed many networks with different architectures. We exploit the residual network in our framework, which achieves promising performance on the image classification problem. In order to improve the accuracy of verification, the spatial transformation network is used to align the image. The proposed method is tested on two public palmprint databases CASIA, GPDS. Extensive experiments are carried out with several state-of-the-art approaches as comparison, and the results demonstrated the effectiveness of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
15. gvnn: Neural Network Library for Geometric Computer Vision
- Author
-
Handa, Ankur, Bloesch, Michael, Pătrăucean, Viorica, Stent, Simon, McCormac, John, Davison, Andrew, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Hua, Gang, editor, and Jégou, Hervé, editor
- Published
- 2016
- Full Text
- View/download PDF
16. An Improved Deep Residual Network Prediction Model for the Early Diagnosis of Alzheimer’s Disease
- Author
-
Haijing Sun, Anna Wang, Wenhui Wang, and Chen Liu
- Subjects
residual network ,Mish ,spatial transformer networks ,non-local attention mechanism ,Alzheimer’s disease ,Chemical technology ,TP1-1185 - Abstract
The early diagnosis of Alzheimer’s disease (AD) can allow patients to take preventive measures before irreversible brain damage occurs. It can be seen from cross-sectional imaging studies of AD that the features of the lesion areas in AD patients, as observed by magnetic resonance imaging (MRI), show significant variation, and these features are distributed throughout the image space. Since the convolutional layer of the general convolutional neural network (CNN) cannot satisfactorily extract long-distance correlation in the feature space, a deep residual network (ResNet) model, based on spatial transformer networks (STN) and the non-local attention mechanism, is proposed in this study for the early diagnosis of AD. In this ResNet model, a new Mish activation function is selected in the ResNet-50 backbone to replace the Relu function, STN is introduced between the input layer and the improved ResNet-50 backbone, and a non-local attention mechanism is introduced between the fourth and the fifth stages of the improved ResNet-50 backbone. This ResNet model can extract more information from the layers by deepening the network structure through deep ResNet. The introduced STN can transform the spatial information in MRI images of Alzheimer’s patients into another space and retain the key information. The introduced non-local attention mechanism can find the relationship between the lesion areas and normal areas in the feature space. This model can solve the problem of local information loss in traditional CNN and can extract the long-distance correlation in feature space. The proposed method was validated using the ADNI (Alzheimer’s disease neuroimaging initiative) experimental dataset, and compared with several models. The experimental results show that the classification accuracy of the algorithm proposed in this study can reach 97.1%, the macro precision can reach 95.5%, the macro recall can reach 95.3%, and the macro F1 value can reach 95.4%. The proposed model is more effective than other algorithms.
- Published
- 2021
- Full Text
- View/download PDF
17. Learning transform-aware attentive network for object tracking.
- Author
-
Lu, Xiankai, Ni, Bingbing, Ma, Chao, and Yang, Xiaokang
- Subjects
- *
OBJECT tracking (Computer vision) , *ARTIFICIAL satellite tracking , *MOTION - Abstract
Existing trackers often decompose the task of visual tracking into multiple independent components, such as target appearance sampling, classifier learning, and target state inferring. In this paper, we present a transform-aware attentive tracking framework, which uses a deep attentive network to directly predict the target states via spatial transform parameters. During off-line training, the proposed network learns generic motion patterns of target objects from auxiliary large-scale videos. These leaned motion patterns are then applied to track target objects on test sequences. Built on the Spatial Transform Network (STN), the proposed attentive network is fully differentiable and can be trained in an end-to-end manner. Notably, we only fine-tune the pre-trained network in the initial frame. The proposed tracker requires neither online model update nor appearance sampling during the tracking process. Extensive experiments on OTB-2013, OTB-2015, VOT-2014 and UAV-123 datasets demonstrate the competitive performance of our method against state-of-the-art attentive tracking methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
18. An Automatic Modulation Recognition Method with Low Parameter Estimation Dependence Based on Spatial Transformer Networks.
- Author
-
Li, Mingxuan, Li, Ou, Liu, Guangyi, and Zhang, Ce
- Subjects
DEEP learning ,PREDICATE calculus ,PARAMETER estimation - Abstract
Recently, automatic modulation recognition has been an important research topic in wireless communication. Due to the application of deep learning, it is prospective of using convolution neural networks on raw in-phase and quadrature signals in developing automatic modulation recognition methods. However, the errors introduced during signal reception and processing will greatly deteriorate the classification performance, which affects the practical application of such methods. Therefore, we first analyze and quantify the errors introduced by signal detection and isolation in noncooperative communication through a baseline convolution neural network. In response to these errors, we then design a signal spatial transformer module based on the attention model to eliminate errors by a priori learning of signal structure. By cascading a signal spatial transformer module in front of the baseline classification network, we propose a method that can adaptively resample the signal capture to adjust time drift, symbol rate, and clock recovery. Besides, it can also automatically add a perturbation on the signal carrier to correct frequency offset. By applying this improved model to automatic modulation recognition, we obtain a significant improvement in classification performance compared with several existing methods. Our method significantly improves the prospect of the application of automatic modulation recognition based on deep learning under nonideal synchronization. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
19. A deep learning method for image super-resolution based on geometric similarity.
- Author
-
Lu, Jian, Hu, Weidong, and Sun, Yi
- Subjects
- *
DEEP learning , *DIGITAL image processing , *HIGH resolution imaging , *ALGORITHMS , *ARTIFICIAL neural networks - Abstract
Abstract A single image super-resolution (SR) algorithm that combines deep convolutional neural networks (CNNs) with multi-scale similarity is presented in this work. The aim of this method is to address the incapability of the existing CNN methods in digging the potential information in the image itself. In order to dig these information, the image patches that look similar within the same scale and across the different scales are firstly searched inside the input image. Subsequently, a spatial transform networks (STNs) are embedded into the CNNs to make the similar patches well aligned. The STNs allow the CNNs to have the ability of spatial manipulation of data. Finally, when SR is performing through the proposed pyramid-shaped CNNs, the high-resolution (HR) image will be predicted gradually according to the complementary information provided by these aligned patches. The experimental results confirm the effectiveness of the proposed method and demonstrate it can be compared with state-of-the-art approaches for single image SR. Highlights • Super resolution with convolutional neural networks and multi-scale similarity. • Embedded spatial transformer into convolutional neural network (CNN) to align data. • Enhanced image resolution gradually using a pyramid-shaped CNN. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
20. Sequence recognition of Chinese license plates.
- Author
-
Wang, Jianlin, Huang, He, Qian, Xusheng, Cao, Jinde, and Dai, Yakang
- Subjects
- *
AUTOMOBILE license plates , *FEATURE selection , *ARTIFICIAL neural networks , *ACCURACY , *MATHEMATICAL models - Abstract
Abstract The recognition of license plates is very important for intelligent transportation systems. Generally, the performance of an intelligent recognition algorithm is greatly affected by different shooting angles, illumination conditions and backgrounds of the license plate images. This paper presents a sequence recognition approach for intelligent recognition of Chinese license plates. Firstly, a spatial transformer network (STN) is employed to adjust the inclined and deformed license plates such that all the plates have a uniform orientation and thus are easier to be recognized. Then, an improved convolutional neural network (CNN) is designed to extract sequence features of the rectified license plates. The features of different convolutional layers are integrated as input to a bi-directional recurrent neural network (BRNN), where the character segmentation is not needed. Finally, the recognition is accomplished by the BRNN and connectionist temporal classification (CTC). Due to the lack of adequate Chinese license plates, an effective training method is presented in which the network is pre-trained by sufficiently enough synthetic license plates and is fine-tuned by our collected real Chinese license plates. The experimental results show that our model achieves better recognition accuracy and lower average edit distance than some existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
21. Ω-Net (Omega-Net): Fully automatic, multi-view cardiac MR detection, orientation, and segmentation with deep neural networks.
- Author
-
Vigneault, Davis M., Xie, Weidi, Ho, Carolyn Y., Bluemke, David A., and Noble, J. Alison
- Subjects
- *
CARDIAC magnetic resonance imaging , *IMAGE segmentation , *ARTIFICIAL neural networks , *HYPERTROPHIC cardiomyopathy , *IMAGE processing - Abstract
Pixelwise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady state free precession (SSFP) cine sequences is an essential preprocessing step for a wide range of analyses. Variability in contrast, appearance, orientation, and placement of the heart between patients, clinical views, scanners, and protocols makes fully automatic semantic segmentation a notoriously difficult problem. Here, we present Ω-Net (Omega-Net): A novel convolutional neural network (CNN) architecture for simultaneous localization, transformation into a canonical orientation, and semantic segmentation. First, an initial segmentation is performed on the input image; second, the features learned during this initial segmentation are used to predict the parameters needed to transform the input image into a canonical orientation; and third, a final segmentation is performed on the transformed image. In this work, Ω-Nets of varying depths were trained to detect five foreground classes in any of three clinical views (short axis, SA; four-chamber, 4C; two-chamber, 2C), without prior knowledge of the view being segmented. This constitutes a substantially more challenging problem compared with prior work. The architecture was trained using three-fold cross-validation on a cohort of patients with hypertrophic cardiomyopathy (HCM, N = 42 ) and healthy control subjects ( N = 21 ). Network performance, as measured by weighted foreground intersection-over-union (IoU), was substantially improved for the best-performing Ω-Net compared with U-Net segmentation without localization or orientation (0.858 vs 0.834). In addition, to be comparable with other works, Ω-Net was retrained from scratch using five-fold cross-validation on the publicly available 2017 MICCAI Automated Cardiac Diagnosis Challenge (ACDC) dataset. The Ω-Net outperformed the state-of-the-art method in segmentation of the LV and RV bloodpools, and performed slightly worse in segmentation of the LV myocardium. We conclude that this architecture represents a substantive advancement over prior approaches, with implications for biomedical image segmentation more generally. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
22. An Automatic Modulation Recognition Method with Low Parameter Estimation Dependence Based on Spatial Transformer Networks
- Author
-
Mingxuan Li, Ou Li, Guangyi Liu, and Ce Zhang
- Subjects
deep learning ,automatic modulation recognition ,spatial transformer networks ,signal processing ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Recently, automatic modulation recognition has been an important research topic in wireless communication. Due to the application of deep learning, it is prospective of using convolution neural networks on raw in-phase and quadrature signals in developing automatic modulation recognition methods. However, the errors introduced during signal reception and processing will greatly deteriorate the classification performance, which affects the practical application of such methods. Therefore, we first analyze and quantify the errors introduced by signal detection and isolation in noncooperative communication through a baseline convolution neural network. In response to these errors, we then design a signal spatial transformer module based on the attention model to eliminate errors by a priori learning of signal structure. By cascading a signal spatial transformer module in front of the baseline classification network, we propose a method that can adaptively resample the signal capture to adjust time drift, symbol rate, and clock recovery. Besides, it can also automatically add a perturbation on the signal carrier to correct frequency offset. By applying this improved model to automatic modulation recognition, we obtain a significant improvement in classification performance compared with several existing methods. Our method significantly improves the prospect of the application of automatic modulation recognition based on deep learning under nonideal synchronization.
- Published
- 2019
- Full Text
- View/download PDF
23. Understanding when spatial transformer networks do not support invariance, and what to do about it
- Author
-
Finnveden, Lukas, Jansson, Ylva, Lindeberg, Tony, Finnveden, Lukas, Jansson, Ylva, and Lindeberg, Tony
- Abstract
Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image with those of its original. STNs are therefore unable to support invariance when transforming CNN feature maps. We present a simple proof for this and study the practical implications, showing that this inability is coupled with decreased classification accuracy. We therefore investigate alternative STN architectures that make use of complex features. We find that while deeper localization networks are difficult to train, localization networks that share parameters with the classification network remain stable as they grow deeper, which allows for higher classification accuracy on difficult datasets. Finally, we explore the interaction between localization network complexity and iterative image alignment., Not duplicate with DiVA 1428271QC 20210831
- Published
- 2021
- Full Text
- View/download PDF
24. Understanding when spatial transformer networks do not support invariance, and what to do about it
- Author
-
Lukas Finnveden, Tony Lindeberg, and Ylva Jansson
- Subjects
FOS: Computer and information sciences ,Network complexity ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,spatial transformer networks ,02 engineering and technology ,invariant neural networks ,010501 environmental sciences ,01 natural sciences ,Convolutional neural network ,Datorseende och robotik (autonoma system) ,convolutional neural networks ,0202 electrical engineering, electronic engineering, information engineering ,Practical implications ,Computer Vision and Robotics (Autonomous Systems) ,0105 earth and related environmental sciences ,Transformer (machine learning model) ,business.industry ,deep learning ,Pattern recognition ,Spatial transformation ,Image alignment ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image with those of its original. STNs are therefore unable to support invariance when transforming CNN feature maps. We present a simple proof for this and study the practical implications, showing that this inability is coupled with decreased classification accuracy. We therefore investigate alternative STN architectures that make use of complex features. We find that while deeper localization networks are difficult to train, localization networks that share parameters with the classification network remain stable as they grow deeper, which allows for higher classification accuracy on difficult datasets. Finally, we explore the interaction between localization network complexity and iterative image alignment., Comment: 13 pages, 7 figures, 6 tables
- Published
- 2021
25. Understanding when spatial transformer networks do not support invariance, and what to do about it
- Author
-
Finnveden, Lukas, Jansson, Ylva, Lindeberg, Tony, Finnveden, Lukas, Jansson, Ylva, and Lindeberg, Tony
- Abstract
Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image with those of its original. STNs are therefore unable to support invariance when transforming CNN feature maps. We present a simple proof for this and study the practical implications, showing that this inability is coupled with decreased classification accuracy. We therefore investigate alternative STN architectures that make use of complex features. We find that while deeper localization networks are difficult to train, localization networks that share parameters with the classification network remain stable as they grow deeper, which allows for higher classification accuracy on difficult datasets. Finally, we explore the interaction between localization network complexity and iterative image alignment., Not duplicate with DiVA 1516191QC 20200511
- Published
- 2020
26. Inability of spatial transformations of CNN feature maps to support invariant recognition
- Author
-
Jansson, Ylva, Maydanskiy, Maksim, Finnveden, Lukas, Lindeberg, Tony, Jansson, Ylva, Maydanskiy, Maksim, Finnveden, Lukas, and Lindeberg, Tony
- Abstract
A large number of deep learning architectures use spatial transformations of CNN feature maps or filters to better deal with variability in object appearance caused by natural image transformations. In this paper, we prove that spatial transformations of CNN feature maps cannot align the feature maps of a transformed image to match those of its original, for general affine transformations, unless the extracted features are themselves invariant. Our proof is based on elementary analysis for both the single- and multi-layer network case. The results imply that methods based on spatial transformations of CNN feature maps or filters cannot replace image alignment of the input and cannot enable invariant recognition for general affine transformations, specifically not for scaling transformations or shear transformations. For rotations and reflections, spatially transforming feature maps or filters can enable invariance but only for networks with learnt or hardcoded rotation- or reflection-invariant features, QC 20200507
- Published
- 2020
27. The problems with using STNs to align CNN feature maps
- Author
-
Finnveden, Lukas, Jansson, Ylva, Lindeberg, Tony, Finnveden, Lukas, Jansson, Ylva, and Lindeberg, Tony
- Abstract
Spatial transformer networks (STNs) were designed to enable CNNs to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image and its original. We present a theoretical argument for this and investigate the practical implications, showing that this inability is coupled with decreased classification accuracy. We advocate taking advantage of more complex features in deeper layers by instead sharing parameters between the classification and the localisation network., QC 20200123
- Published
- 2020
28. An Improved Deep Residual Network Prediction Model for the Early Diagnosis of Alzheimer's Disease.
- Author
-
Sun, Haijing, Wang, Anna, Wang, Wenhui, and Liu, Chen
- Subjects
MAGNETIC resonance imaging ,ALZHEIMER'S disease ,EARLY diagnosis ,CONVOLUTIONAL neural networks ,CROSS-sectional imaging ,CLASSIFICATION algorithms - Abstract
The early diagnosis of Alzheimer's disease (AD) can allow patients to take preventive measures before irreversible brain damage occurs. It can be seen from cross-sectional imaging studies of AD that the features of the lesion areas in AD patients, as observed by magnetic resonance imaging (MRI), show significant variation, and these features are distributed throughout the image space. Since the convolutional layer of the general convolutional neural network (CNN) cannot satisfactorily extract long-distance correlation in the feature space, a deep residual network (ResNet) model, based on spatial transformer networks (STN) and the non-local attention mechanism, is proposed in this study for the early diagnosis of AD. In this ResNet model, a new Mish activation function is selected in the ResNet-50 backbone to replace the Relu function, STN is introduced between the input layer and the improved ResNet-50 backbone, and a non-local attention mechanism is introduced between the fourth and the fifth stages of the improved ResNet-50 backbone. This ResNet model can extract more information from the layers by deepening the network structure through deep ResNet. The introduced STN can transform the spatial information in MRI images of Alzheimer's patients into another space and retain the key information. The introduced non-local attention mechanism can find the relationship between the lesion areas and normal areas in the feature space. This model can solve the problem of local information loss in traditional CNN and can extract the long-distance correlation in feature space. The proposed method was validated using the ADNI (Alzheimer's disease neuroimaging initiative) experimental dataset, and compared with several models. The experimental results show that the classification accuracy of the algorithm proposed in this study can reach 97.1%, the macro precision can reach 95.5%, the macro recall can reach 95.3%, and the macro F1 value can reach 95.4%. The proposed model is more effective than other algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
29. Automatic target recognition with convolutional neural networks.
- Author
-
Baili, Nada
- Subjects
- Automatic target recognition, convolutional neural networks, spatial transformer networks, wide residual neural networks, DSIAC, Other Computer Engineering
- Abstract
Automatic Target Recognition (ATR) characterizes the ability for an algorithm or device to identify targets or other objects based on data obtained from sensors, being commonly thermal. ATR is an important technology for both civilian and military computer vision applications. However, the current level of performance that is available is largely deficient compared to the requirements. This is mainly due to the difficulty of acquiring targets in realistic environments, and also to limitations of the distribution of classified data to the academic community for research purposes. This thesis proposes to solve the ATR task using Convolutional Neural Networks (CNN). We present three learning approaches using WideResNet-28-2\cite{wrn} as a backbone CNN. The first method uses random initialization of the network weights. The second method explores transfer learning. Finally, the third approach relies on spatial transformer networks \cite{stn} to enhance the geometric invariance of the model. To validate, analyze and compare our three proposed models, we use a large-scale real benchmark dataset that includes civilian and military vehicles. These targets are captured at different viewing angles, different resolutions, and different times of the day. We evaluate the effectiveness of our methods by studying their robustness to realistic case scenarios where no ground truth data is available and targets are automatically detected. We show that the method that uses spatial transformer networks achieves the best results and demonstrates the most robustness to various perturbations that can be encountered in real applications.
- Published
- 2020
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.