Journal: signal processing: image communication / Topic: convolutional neural networks - Searchworks@Jio Institute Digital Library Search Results

Showing total 66 results

Start Over Topic convolutional neural networks Journal signal processing: image communication

66 results

1. Learned fractional downsampling network for adaptive video streaming.

Author: Chen, Li-Heng, Bampis, Christos G., Li, Zhi, Sole, Joel, Chen, Chao, and Bovik, Alan C.
Subjects: *STREAMING video & television, *CONVOLUTIONAL neural networks, *VIDEO codecs, *LANCZOS method, *VIDEO processing, *VIDEO coding
Abstract: Given increasing demand for very large format contents and displays, spatial resolution changes have become an important part of video streaming. In particular, video downscaling is a key ingredient that streaming providers implement in their encoding pipeline as part of video quality optimization workflows. Here, we propose a downsampling network architecture that progressively reconstructs residuals at different scales. Since the layers of convolutional neural networks (CNNs) can only be used to alter the resolutions of their inputs by integer scale factors, we seek new ways to achieve fractional scaling, which is crucial in many video processing applications. More concretely, we utilize an alternative building block, formulated as a conventional convolutional layer followed by a differentiable resizer. To validate the efficacy of our proposed downsampling network, we integrated it into a modern video encoding system for adaptive streaming. We extensively evaluated our method using a variety of different video codecs and upsampling algorithms to show its generality. The experimental results show that improvements in coding efficiency over the conventional Lanczos algorithm and state-of-the-art methods are attained, in terms of PSNR, SSIM, and VMAF, when tested on high-resolution test videos. In addition to quantitative experiments, we also carried out a subjective quality study, validating that the proposed downsampling model yields favorable results. • A network architecture to learn residuals prior to scaling and supports non-integer scaling factors, enhancing flexibility in video encoding workflows. • The learned downsampling models was integrated with a realistic video encoding pipeline for adaptive video streaming, to achieve improved coding efficiency. • Demonstrates significant improvements through comprehensive experiments, showing both objective and subjective quality enhancements. • Recognized as one of the papers with the longest review time in the journal Signal Processing: Image Communication (SPIC), reflecting the thorough and rigorous evaluation it underwent. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Multi-scale strip-shaped convolution attention network for lightweight image super-resolution.

Author: Xu, Ke, Pan, Lulu, Peng, Guohua, Zhang, Wenbo, Lv, Yanheng, Li, Guo, Li, Lingxiao, and Lei, Le
Subjects: *CONVOLUTIONAL neural networks, *ARTIFICIAL neural networks, *HIGH resolution imaging, *FEATURE extraction, *COMPUTATIONAL complexity
Abstract: • A lightweight attention mechanism based on strip convolution in parallel called MSA. • We generalize MSA to other networks, and outperform them in performance. • A lightweight network for image SR called multi-scale strip-shaped convolution attention network. • MSAN achieves SOTA performance with fewer model parameter counts. Lightweight convolutional neural networks for Single Image Super-Resolution (SISR) have exhibited remarkable performance improvements in recent years. These models achieve excellent performance by relying on attention mechanisms that incorporate square-shaped convolutions to enhance feature representation. However, these approaches still suffer from redundancy which comes from square-shaped convolutional kernels and overlooks the utilization of multi-scale information. In this paper, we propose a novel attention mechanism called Multi-scale Strip-shaped convolution Attention (MSA), which utilizes three sets of differently sized depth-wise separable stripe convolution kernels in parallel to replace the redundant square-shaped convolution attention and extract multi-scale features. We also generalize MSA to other lightweight neural network models, and experimental results show that MSA outperforms other convolutional based attention mechanisms. Building upon MSA, we propose an Efficient Feature Extraction Block (EFEB), a lightweight block for SISR. Finally, based on EFEB, we propose a lightweight image super-resolution neural network named Multi-scale Strip-shaped convolution Attention Network (MSAN). Experiments demonstrate that MSAN outperforms existing state-of-the-art lightweight SR methods with fewer parameters and lower computational complexity. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Reinforced Res-Unet transformer for underwater image enhancement.

Author: Li, Peitong, Chen, Jiaying, and Cai, Chengtao
Subjects: *CONVOLUTIONAL neural networks, *TRANSFORMER models, *FEATURE selection, *IMAGE intensifiers, *LIGHT propagation
Abstract: Light propagation through water is subject to varying degrees of energy loss, causing captured images to display characteristics of color distortion, reduced contrast, and indistinct details and textures. The data-driven approach offers significant advantages over traditional algorithms, such as improved accuracy and reduced computational costs. However, challenges such as optimizing network architecture, refining coding techniques, and expanding database resources must be addressed to ensure the generation of high-quality reconstructed images across diverse tasks. In this paper, an underwater image enhancement network based on feature fusion is proposed named RUTUIE, which integrates feature fusion techniques. It leverages the strengths of both Resnet and U-shape architecture, primarily structured around a streamlined up-and-down sampling mechanism. Specifically, the U-shaped structure serves as the backbone of ResNet, equipped with two feature transformers at both the encoding and decoding ends, which are linked by a single-stage up-and-down sampling structure. This architecture is designed to minimize the omission of minor features during feature scale transformations. Furthermore, the improved Transformer encoder leverages a feature-level attention mechanism and the advantages of CNNs, endowing the network with both local and global perceptual capabilities. Then, we propose and demonstrate that embedding an adaptive feature selection module at appropriate locations can retain more learned feature representations. Moreover, the application of a previously proposed color transfer method for synthesizing underwater images and augmenting network training. Extensive experiments demonstrate that our work effectively corrects color casts, reconstructs the rich texture information in natural scenes, and improves the contrast. • To meet the needs of UIE task, a network based on feature fusion is constructed. • The network refines sampling architecture to avoid the loss of non-significant features. • Appropriate feature recombination to ensure smooth information flow. • The color transfer algorithm is first introduced for simulating underwater images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. HorSR: High-order spatial interactions and residual global filter for efficient image super-resolution.

Author: Wang, Fengsui and Chu, Xi
Subjects: *FEATURE extraction, *HIGH resolution imaging, *CONVOLUTIONAL neural networks, *TRANSFORMER models
Abstract: • An efficient network based on transformer for efficient image super-resolution is proposed. • A recursive gated convolution is generalized for building lightweight network. • A residual global filter block is designed extract high-frequency image information. Recent advances in efficient image super-resolution (EISR) include convolutional neural networks, which exploit distillation and aggregation strategies with copious channel split and concatenation operations to fully exploit limited hierarchical features. In contrast, the Transformer network presents a challenge for EISR because multiheaded self-attention is a computationally demanding process. To respond to this challenge, this paper proposes replacing multiheaded self-attention in the Transformer network with global filtering and recursive gated convolution. This strategy allows us to design a high-order spatial interaction and residual global filter network for efficient image super-resolution (HorSR), which comprises three components: a shallow feature extraction module, a deep feature extraction module, and a high-quality image-reconstruction module. In particular, the deep feature extraction module comprises residual global filtering and recursive gated convolution blocks. The experimental results show that the HorSR network provides state-of-the-art performance with the lowest FLOPs of existing EISR methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Learning content-aware feature fusion for guided depth map super-resolution.

Author: Zuo, Yifan, Wang, Hao, Xu, Yaping, Huang, Huimin, Huang, Xiaoshui, Xia, Xue, and Fang, Yuming
Subjects: *CONVOLUTIONAL neural networks, *DEPTH maps (Digital image processing)
Abstract: RGB-D data including paired RGB color images and depth maps is widely used in downstream computer vision tasks. However, compared with the acquisition of high-resolution color images, the depth maps captured by consumer-level sensors are always in low resolution. Within decades of research, the most state-of-the-art (SOTA) methods of depth map super-resolution cannot adaptively tune the guidance fusion for all feature positions by channel-wise feature concatenation with spatially sharing convolutional kernels. This paper proposes JTFNet to resolve this issue, which simulates the traditional Joint Trilateral Filter (JTF). Specifically, a novel JTF block is introduced to adaptively tune the fusion pattern between the color features and the depth features for all feature positions. Moreover, based on the variant of JTF block whose target features and guidance features are in the cross-scale shape, the fusion for depth features is performed in a bi-directional way. Therefore, the error accumulation along scales can be effectively mitigated by iteratively HR feature guidance. Compared with the SOTA methods, the sufficient experiment is conducted on the mainstream synthetic datasets and real datasets, i.e., Middlebury, NYU and ToF-Mark, which shows remarkable improvement of our JTFNet. • The Joint Trilateral Filter (JTF) block is proposed to adaptively tune the effect of guidance features. • We design two light subnetworks to learn kernel generation for color and depth features. • We propose Bidirectional Fusion blocks to fuse cross-scale depth feature based on the JTF block. • Our results on Middlebury, NYU and ToF-Mark shows the remarkable improvement. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Automatic detection of passable roads after floods in remote sensed and social media data.

Author: Ahmad, Kashif, Pogorelov, Konstantin, Riegler, Michael, Ostroukhova, Olga, Halvorsen, Pål, Conci, Nicola, and Dahyot, Rozenn
Subjects: *SOCIAL media, *REMOTE-sensing images, *FLOODS, *INFORMATION resources, *ARTIFICIAL neural networks, *RADARSAT satellites
Abstract: Abstract This paper addresses the problem of floods classification and floods aftermath detection based on both social media and satellite imagery. Automatic detection of disasters such as floods is still a very challenging task. The focus lies on identifying passable routes or roads during floods. Two novel solutions are presented, which were developed for two corresponding tasks at the MediaEval 2018 benchmarking challenge. The tasks are (i) identification of images providing evidence for road passability and (ii) differentiation and detection of passable and non-passable roads in images from two complementary sources of information. For the first challenge, we mainly rely on object and scene-level features extracted through multiple deep models pre-trained on the ImageNet and Places datasets. The object and scene-level features are then combined using early, late and double fusion techniques. To identify whether or not it is possible for a vehicle to pass a road in satellite images, we rely on Convolutional Neural Networks and a transfer learning-based classification approach. The evaluation of the proposed methods is carried out on the large-scale datasets provided for the benchmark competition. The results demonstrate significant improvement in the performance over the recent state-of-art approaches. Highlights • This paper addresses the problem of floods classification and floods aftermath detection based on both social media and satellite imagery. • The tasks carried out in this work are (i) identification of images providing evidence for road passability and (ii) differentiation and detection of passable and non-passable roads in images from two complementary sources of information. • Mainly relies on the deep models for both task. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

7. DSRNet: Depth Super-Resolution Network guided by blurry depth and clear intensity edges.

Author: Lan, Hui and Jung, Cheolkon
Subjects: *CONVOLUTIONAL neural networks, *VIRTUAL reality
Abstract: Although high resolution (HR) depth images are required in many applications such as virtual reality and autonomous navigation, their resolution and quality generated by consumer depth cameras fall short of the requirements. Existing depth upsampling methods focus on extracting multiscale features of HR color image to guide low resolution (LR) depth upsampling, thus causing blurry and inaccurate edges in depth. In this paper, we propose a depth super-resolution (SR) network guided by blurry depth and clear intensity edges, called DSRNet. DSRNet differentiates effective edges from a number of HR edges with the guidance of blurry depth and clear intensity edges. First, we perform global residual estimation based on an encoder–decoder architecture to extract edge structure from HR color image for depth SR. Then, we distinguish effective edges from HR edges in the decoder side with the guidance of LR depth upsampling. To maintain edges for depth SR, we use intensity edge guidance that extracts clear intensity edges from HR image. Finally, we use residual loss to generate accurate high frequency (HF) residual and reconstruct HR depth maps. Experimental results show that DSRNet successfully reconstructs depth edges in SR results as well as outperforms the state-of-the-art methods in terms of visual quality and quantitative measurements. 1 1 The proposed model with some test image pairs are available in https://github.com/lanhui-123/DSRNet. • We propose a depth SR network guided by blurry depth and clear intensity edges, called DSRNet. • DSRNet differentiates effective edges from a number of HR edges with the guidance of blurry depth and clear intensity edges. • DSRNet combines global residual estimation with LR depth upsampling and intensity edge guidance for depth SR. • DSRNet calculates the loss on the estimated residual map with its ground truth, thus generating an accurate residual map. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. KOHTD: Kazakh offline handwritten text dataset.

Author: Toiganbayeva, Nazgul, Kasem, Mahmoud, Abdimanap, Galymzhan, Bostanbekov, Kairat, Abdallah, Abdelrahman, Alimova, Anel, and Nurseitov, Daniyar
Subjects: *TEXT recognition, *HANDWRITING recognition (Computer science), *DEEP learning, *GENETIC algorithms, *WORD recognition, *MACHINE learning
Abstract: Despite the transition to digital information exchange, many documents, such as invoices, taxes, memos and questionnaires, historical data, and answers to exam questions, still require handwritten inputs. In this regard, there is a need to implement Handwritten Text Recognition (HTR) which is an automatic way to decrypt records using a computer. Handwriting recognition is challenging because of the virtually infinite number of ways a person can write the same message. For this proposal we introduce Kazakh handwritten text recognition research, a comprehensive dataset of Kazakh handwritten texts is necessary. This is particularly true given the lack of a dataset for handwritten Kazakh text. In this paper, we proposed our extensive Kazakh offline Handwritten Text dataset (KOHTD), which has 3000 handwritten exam papers and more than 140335 segmented images and there are approximately 922010 symbols. It can serve researchers in the field of handwriting recognition tasks by using deep and machine learning. We used a variety of popular text recognition methods for word and line recognition in our studies, including CTC-based and attention-based methods. In this paper, we have implemented state-of-the-art deep learning-based methods for Handwriting recognition for KOHTD dataset to create several strong baselines. The findings demonstrate KOHTD's diversity. Also, we proposed a Genetic Algorithm (GA) for line and word segmentation based on random enumeration of a parameter. The dataset and GA code are available at https://github.com/abdoelsayed2016/KOHTD. • Introduce Kazakh handwritten text recognition research, a comprehensive dataset of Kazakh handwritten texts is necessary. • Genetic Algorithm (GA) for line and word segmentation based on random enumeration of a parameter. • Implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines. • Solve a handwritten Kazakh interpretation task using well-known RNN models, such as Flor, Abdallah, Bluche, and Puigcerver HTR models. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

9. Dynamic and static feature fusion for increased accuracy in signature verification.

Author: Sadak, Mustafa Semih, Kahraman, Nihan, and Uludağ, Umut
Subjects: *FEATURE extraction, *STATISTICAL hypothesis testing, *FRAUD investigation, *SOUND recordings
Abstract: The success rate in offline signature verification studies has reached high and limiting levels recently. However, any increase in this performance is and will be highly valuable in terms of fraud detection. This study assesses the impact of the sound arising from the friction of pen and paper on handwritten signature verification. A dataset was built containing static data from the signature image and dynamic data from the signature sound by taking samples from 75 participants according to different combinations of pen, paper types, and mobile phone models for recording the sounds of the signatures with their internal microphones. It was aimed to increase verification success by fusing dynamic and static features. From the static data, the features are extracted by the LBP and SIFT algorithms. For dynamic data, spectral flux onset envelopes and spectral centroids of audio signals are plotted and converted to image files. Thus, the dynamic data of the signature sound signal became static data and as in the static image of the signature, feature extraction was performed with the LBP and SIFT algorithms. Classification is performed with the OC-SVM algorithm. Moreover, instead of LBP and SIFT features, another verification method with the deep features obtained with a CNN-based model was also proposed and comparatively analyzed. Test results indicate that the aforementioned fusion of these two traits leads to increased signature verification success rates (statistical significance test results are provided), without incurring large costs, considering the sensor availability and acquisition times. • The sound arising from the friction of pen and paper during the signing is assessed. • A dataset containing sound files and signature image files was built with 75 signers. • The proposed method applied for only sound, only image, and fusion of these two data. • When verification is made with only sound data, the EERs are between 0.09% and 6%. • Results show that the fusion of signature sound and signature image decreases EERs. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

10. Traffic sign detection algorithm based on improved YOLOv4-Tiny.

Author: Yao, Yingbiao, Han, Li, Du, Chenjie, Xu, Xin, and Jiang, Xianyang
Subjects: *TRAFFIC monitoring, *TRAFFIC signs & signals, *CONVOLUTIONAL neural networks, *FEATURE extraction, *TRACKING algorithms
Abstract: There are three problems in YOLOv4-Tiny when it is used for traffic sign detection: the feature pyramid network fails to fuse high-level and low-level features sufficiently, the importance of low-level features for small object detection is not considered, and the ability to extract the features of small objects in the backbone network is not strong. Focusing on these problems, this paper proposes an improved YOLOv4 -Tiny for real-time traffic sign detection. Firstly, this paper improves YOLOv4-Tiny's feature fusion method and proposes an adaptive feature pyramid network (AFPN), which aims to adaptively fuse the two feature layers with different scales. Secondly, two receptive field blocks (RFB) are added after the two feature layers of the backbone network. These two RFBs are composed of multi-branch structures and dilated convolution with different dilation rates, which can enhance the feature extraction ability of the backbone network. The CCTSDB and GTSDB datasets are used to evaluate the effectiveness of the improved method. The experimental results show that our proposed network is superior to the original network in the precision, recall rate, and mAP. In addition, compared with other state-of-the-art approaches on traffic sign detection, our proposed network has good comprehensive performance in accuracy and speed. The above results show that our improved method is effective in improving the performance of traffic sign detection. • A novel feature fusion method is proposed based on an Adaptive Feature Pyramid Network (AFPN). AFPN can better fuse the output results of the two scale feature layers of the backbone network, so the fused features have more semantic information and location information. • We proposed to add a receptive field block [16](RFB) after the output layer of the backbone network. RFBs use a multi-branch structure and a dilated convolution layer to superimpose different scale receptive fields and enhance the feature extraction ability of the convolutional neural network, thereby improving the detection accuracy of the network. • The experimental results show that compared with the original network (i.e., YOLOv4-tiny), the proposed network improves the precision, recall rate, and mAP by 2.62%, 2.17%, and 1.34%, respectively, on the CCTSDB dataset, and works better on the CCTSDB_s dataset with smaller traffic signs. At the same time, the frame processing rate of the proposed network is still up to 145.7 FPS. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

11. A dual fusion deep convolutional network for blind universal image denoising.

Author: Lyu, Zhiyu, Chen, Yan, Sun, Haojun, and Hou, Yimin
Subjects: *IMAGE denoising, *CONVOLUTIONAL neural networks
Abstract: Blind image denoising and edge-preserving are two primary challenges to recover an image from low-level vision to high-level vision. Blind denoising requires a single denoiser can denoise images with any intensity of noise, and it has practical utility since accurate noise levels cannot be acquired from realistic images. On the other hand, edge preservation can provide more image features for subsequent processing which is also important for the denoising. In this paper, we propose a novel blind universal image denoiser to remove synthesis and realistic noise while preserving the image texture. The denoiser consists of noise network and prior network parallelly, and then a fusion block is used to give the weight between these two networks to balance computation cost and denoising performance. We also use the Non-subsampled Shearlet Transform (NSST) to enlarge the size of receptive field to obtain more detailed information. Extensive denoising experiments on synthetic images and realistic images show the effectiveness of our denoiser. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. EFFNet: Element-wise feature fusion network for defect detection of display panels.

Author: He, Feng, Tan, Jiubin, Wang, Weibo, Liu, Shutian, Zhu, Yuemin, and Liu, Zhengjun
Subjects: *CONVOLUTIONAL neural networks, *LEARNING strategies, *FEATURE extraction, *ARRAY processing, *COMPUTATIONAL complexity, *DEEP learning
Abstract: Online or real-time defect detection of display panels after array process is of paramount importance for quality control and yield rate improvement of products in display industry. However, owing to the limitation in feature representation, the performances of traditional defect detection methods are not satisfactory. This paper develops a novel element-wise feature fusion network (EFFNet) to solve the issue and achieve high-accuracy real-time defect detection of display panels. The method adopts a transfer learning and fine-tuning strategy for feature extraction layers and a decoder with relatively less computational complexity. Particularly, a feature fusion module based on element-wise addition of pyramid features is proposed in skip connection to improve detection efficiency and accuracy. Our method is compared with many state-of-the-art CNN-based models. Additionally, the effects of training dataset size, motion blur, and different backgrounds on the performance of the proposed method are investigated. Extensive experiments, including the ablation study, demonstrate that the developed network can accurately detect defects with complex textures, ambiguous boundaries and low contrast. It also has good robustness against motion blur. It outperforms state-of-the-art methods in terms of mIoU, mPA, and F1-Measure. Moreover, it is able to detect defects at speeds of up to 159 fps with input images of size 256 × 256 pixels. • A deep learning-based method for real-time defect detection of display panels. • An element-wise feature fusion module (EFFM) for the feature decoder. • A comprehensive study of the proposed network and transfer learning strategy. • A highly efficient, effective and robust model for challenging objects. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

13. Vehicle re-identification in still images: Application of semi-supervised learning and re-ranking.

Author: Wu, Fangyu, Yan, Shiyang, Smith, Jeremy S., and Zhang, Bailing
Subjects: *SUPERVISED learning, *K-nearest neighbor classification, *COMPUTER vision, *ARTIFICIAL neural networks, *VEHICLES
Abstract: Vehicle re-identification (re-ID), namely, finding exactly the same vehicle from a large number of vehicle images, remains a great challenge in computer vision. Most existing vehicle re-ID approaches follow a fully-supervised learning methodology, in which sufficient labeled training data is required. However, this limits their scalability to realistic applications, due to the high cost of data labeling. In this paper, we adopted a Generative Adversarial Network (GAN) to generate unlabeled samples and enlarge the training set. A semi-supervised learning scheme with the Convolutional Neural Networks (CNN) was proposed accordingly, which assigns a uniform label distribution to the unlabeled images to regularize the supervised model and improve the performance of the vehicle re-ID system. Besides, an improved re-ranking method based on the Jaccard distance and k -reciprocal nearest neighbors is proposed to optimize the initial rank list. Extensive experiments over the benchmark datasets VeRi-776, VehicleID and VehicleReID have demonstrated that the proposed method outperforms the state-of-the-art approaches for vehicle re-ID. • We propose a novel semi-supervised learning for vehicle re-ID task. • We present a re-ranking method which is firstly introduced for the vehicle re-ID task. • Achieve state-of-the-art results on two benchmark datasets, VeRi-776 and VehicleID. • We apply the single shot setting on the VehicleReID and obtain promising results. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

14. OFF-ApexNet on micro-expression recognition system.

Author: Gan, Y.S., Liong, Sze-Teng, Yau, Wei-Chuen, Huang, Yen-Chang, and Tan, Lit-Ken
Subjects: *CONVOLUTIONAL neural networks, *OPTICAL flow, *HUMAN facial recognition software, *COMPUTER vision, *FEATURE extraction
Abstract: Abstract When a person attempts to conceal an emotion, the genuine emotion is manifest as a micro-expression. Exploration of automatic facial micro-expression recognition systems is relatively new in the computer vision domain. This is due to the difficulty in implementing optimal feature extraction methods to cope with the subtlety and brief motion characteristics of the expression. Most of the existing approaches extract the subtle facial movements based on hand-crafted features. In this paper, we address the micro-expression recognition task with a convolutional neural network (CNN) architecture, which well integrates the features extracted from each video. We introduce the Optical Flow Features from Apex frame Network (OFF-ApexNet). This is a new feature descriptor that combines the optical flow guided context with the CNN. Firstly, we obtain the location of the apex frame from each video sequence as it portrays the highest intensity of facial motion among all frames. Then, the optical flow information are attained from the apex frame and a reference frame (i.e., onset frame). Finally, the optical flow features are fed into a pre-designed CNN model for further feature enhancement as well as to carry out the expression classification. To evaluate the effectiveness of OFF-ApexNet method, comprehensive evaluations are conducted on three public spontaneous micro-expression datasets (i.e., SMIC, CASME II and SAMM). The promising recognition result suggests that the proposed method can optimally describe the significant micro-expression details. In particular, we report that, in a multi-database with leave-one-subject-out cross-validation (LOSOCV) experimental protocol, the recognition performance reaches 74.60% of recognition accuracy and F-measure of 71.04%. We also note that this is the first work that performs cross-dataset validation on three databases in this domain. Highlights • Only two frames from each video were utilized to represent significant motion features. • A feature extractor that incorporates both the handcrafted and data-drive features were proposed. • Promising performances were obtained in three micro-expression databases. • A comparison of the proposed method with the state-of-the-arts was reported. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

15. Tensor rank learning in CP decomposition via convolutional neural network.

Author: Zhou, Mingyi, Liu, Yipeng, Long, Zhen, Chen, Longxi, and Zhu, Ce
Subjects: *CONVOLUTIONAL neural networks, *RETINAL blood vessels, *ARTIFICIAL neural networks, *TARDINESS
Abstract: Abstract Tensor factorization is a useful technique for capturing the high-order interactions in data analysis. One assumption of tensor decompositions is that a predefined rank should be known in advance. However, the tensor rank prediction is an NP-hard problem. The CANDECOMP/PARAFAC (CP) decomposition is a typical one. In this paper, we propose two methods based on convolutional neural network (CNN) to estimate CP tensor rank from noisy measurements. One applies CNN to the CP rank estimation directly. The other one adds a pre-decomposition for feature acquisition, which inputs rank-one components to CNN. Experimental results on synthetic and real-world datasets show the proposed methods outperforms state-of-the-art methods in terms of rank estimation accuracy. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

16. Robust contrast enhancement forensics based on convolutional neural networks.

Author: Shan, Wuyang, Yi, Yaohua, Huang, Ronggang, and Xie, Yong
Subjects: *SIGNAL convolution, *IMAGE analysis, *JPEG (Image coding standard), *ROBUST control, *HUMAN fingerprints
Abstract: Abstract Contrast enhancement (CE) is frequently applied to conceal traces of forgery and therefore can provide indirect forensic evidence of tampering when investigating composite images. The performance of existing CE forensic methods however, suffers fatal degradation when detecting enhanced images stored in the JPEG format. In this paper, we propose a new JPEG-robust CE forensic method based on a modified convolutional neural network (CNN). Unlike traditional CNNs, the first layer of our CNN architecture accepts a potentially enhanced image as the input and outputs its Gray-Level Co-occurrence Matrix (GLCM), which contains CE fingerprints; termed a GLCM layer. A cropping layer is used for noise reduction in GLCMs. In addition, the output of the cropping layer becomes input when extracting multiple features for further classification using a tailor-made CNN, which significantly extracts residual CE features under JPEG compression. Extensive experimental results show that the proposed method achieves significant improvements in both global and local CE detection. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

17. Deep learning for printed document source identification.

Author: Tsai, Min-Jen, Tao, Yu-Han, and Yuadi, Imam
Subjects: *DIGITAL image processing, *PATTERN recognition systems, *DEEP learning, *INFORMATION technology, *FEATURE selection, *FEATURE extraction
Abstract: Abstract Due to the rapid development of the information technology and wide use of the Internet, Information is easily to be obtained in the form of digital format. Digital content can be freely printed into documents since the convenience and accessibility of the printers. On the other hand, printed documents can be illegally manipulated by some criminal issues such as: forged documents, counterfeit currency, copyright infringement, and so on. Therefore, how to develop an efficient and appropriate safety testing tool to identify the source of printed documents is an important task in the meantime. Currently, the forensic system using the statistical methods and support vector machine technology has been able to identify the source printer for the text and the image documents. Such an approach belongs to the category of shallow machine learning with human interaction during the stages of feature extraction, feature selection and data pre-processing. In this paper, a deep learning system to solve the complex image classification problem is developed by Convolutional Neural Networks (CNNs) of deep learning which can learn the features automatically. Systematic experiments have been performed for both systems. For microscopic documents, feature based SVM system outperforms the deep learning system with limited gap. For scanned documents, both system can achieve equally well with high accuracy. Both systems should be constantly evaluated and compared for the best interest in universal utilization. Highlights • Deep learning can learn the texture features automatically. • For microscopic documents, feature based SVM classification still performs better. • For scanned documents, both system perform equally well with high accuracy. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

18. Multi-scale deep feature fusion based sparse dictionary selection for video summarization.

Author: Wu, Xiao, Ma, Mingyang, Wan, Shuai, Han, Xiuxiu, and Mei, Shaohui
Subjects: *VIDEO summarization, *DEEP learning, *COMPUTER vision, *CONVOLUTIONAL neural networks, *GREEDY algorithms
Abstract: The explosive growth of video data constitutes a series of new challenges in computer vision, and the function of video summarization (VS) is becoming more and more prominent. Recent works have shown the effectiveness of sparse dictionary selection (SDS) based VS, which selects a representative frame set to sufficiently reconstruct a given video. Existing SDS based VS methods use conventional handcrafted features or single-scale deep features, which could diminish their summarization performance due to the underutilization of frame feature representation. Deep learning techniques based on convolutional neural networks (CNNs) exhibit powerful capabilities among various vision tasks, as the CNN provides excellent feature representation. Therefore, in this paper, a multi-scale deep feature fusion based sparse dictionary selection (MSDFF-SDS) is proposed for VS. Specifically, multi-scale features include the directly extracted features from the last fully connected layer and the global average pooling (GAP) processed features from intermediate layers, then VS is formulated as a problem of minimizing the reconstruction error using the multi-scale deep feature fusion. In our formulation, the contribution of each scale of features can be adjusted by a balance parameter, and the row-sparsity consistency of the simultaneous reconstruction coefficient is used to select as few keyframes as possible. The resulting MSDFF-SDS model is solved by using an efficient greedy pursuit algorithm. Experimental results on two benchmark datasets demonstrate that the proposed MSDFF-SDS improves the F-score of keyframe based summarization more than 3% compared with the existing SDS methods, and performs better than most deep-learning methods for skimming based summarization. • Usage of multi-scale features from neural networks for video summarization. • Multi-scale deep feature fusion based sparse dictionary selection. • Efficient greedy optimization for video summarization. • Explorations of feature configurations, network architectures, and pooling strategies. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. PIMnet: A quality enhancement network for compressed videos with prior information modulation.

Author: Yang, Mingyi, Zhou, Xile, Yang, Fuzheng, Zhou, Mingcai, and Wang, Hao
Subjects: *CONVOLUTIONAL neural networks, *VIDEOS
Abstract: In this paper, we propose a quality enhancement network for compressed videos, named as PIMnet, which can effectively use the spatio-temporal information of multiple frames to improve the video quality. The main idea of PIMnet is to use the Quantization Parameter (QP) and Delta Picture Order Count (Δ POC) of multiple input frames to modulate the network, where QP can reflect the quality of frames and Δ POC can reflect the temporal distance between neighboring frames and the current frame. In PIMnet, the modulated deformable convolution (DCNv2) is performed to align and fuse multiple input frames. The offsets of DCNv2 for alignment are obtained by the flow-guided offset prediction module and the masks of DCNv2 for fusion are obtained by the mask prediction module. The offset and mask prediction modules are modulated by prior information. Afterwards, the features obtained by DCNv2 are further used by the QE module to compute the enhanced result. Extensive experiments demonstrate that the proposed PIMnet can achieve superior performance in quality enhancement. • Quality enhancement network for compressed videos with multi-frame input. • Modulate the network with QP and Δ POC of compressed videos. • Better explore the spatiotemporal information from multiple compressed frames. • Flow-guided offset prediction. • Multi-scale features fusing. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. Fluorescence microscopy images denoising via deep convolutional sparse coding.

Author: Chen, Ge, Wang, Jianjun, Wang, Hailin, Wen, Jinming, Gao, Yi, and Xu, Yongjian
Subjects: *IMAGE denoising, *CONVOLUTIONAL neural networks
Abstract: Fluorescence microscopy images captured in low light and short exposure time conditions are always contaminated by photons and readout noises, which reduce the fluorescence microscopy images quality. In most cases, this kind of noise can be modeled as Poisson–Gaussian noise. Correspondingly, its denoising task has always been a hot but challenging topic in recent years. In this paper, by integrating model-driven and learning-driven methodologies, we propose an end-to-end supervised neural network for fluorescence microscopy images denoising, named MCSC-net, which embeds the multi-layer learned iterative soft threshold algorithm (ML-LISTA) into deep convolutional neural network (DCNN). Our approach not only uses the strong learning ability of DCNN to adaptively update all parameters in the ML-LISTA, but also introduces dilated convolution into network training without additional parameters to improve denoising performance. In addition, compared with several related methods on a real data set of fluorescence microscopy images, MCSC-net achieves the best denoising effects both in qualitative and quantitative aspects, which shows its strong appeal in practical denoising applications. • The network is an extension of pursuit algorithm (ML-LISTA). • The network can be deepened without introducing additional parameters. • The introduction of dilated convolution improves the denoising effect. • Our method achieves attractive results in all the comparison methods. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

21. On the vulnerability of deep learning to adversarial attacks for camera model identification.

Author: Marra, F., Gragnaniello, D., and Verdoliva, L.
Subjects: *DEEP learning, *ARTIFICIAL neural networks, *ROBUST control, *JPEG (Image coding standard), *ONLINE social networks
Abstract: Camera model identification is a fundamental task for many investigative activities, and is drawing great attention in the research community. In this context, convolutional neural networks (CNN) are expected to provide a significant performance gain over the current state of the art, as already happened for a wide range of image processing applications. However, recent studies enlightened the vulnerability of CNNs to adversarial attacks, casting shadows on their reliability for critical applications. In this paper, we investigate the robustness to adversarial attacks of CNN-based methods for camera model identification. Several networks and attack methods are considered, both when the attacker has complete knowledge of the network and when only the training set is available. In addition, the analysis concerns both original and JPEG compressed images, to simulate a social network environment. The experiments, carried out on a publicly available dataset with images coming from 29 different camera models, shed some light on the suitability of CNN-based approaches for this task. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

22. Deep convolutional image retrieval: A general framework.

Author: Tzelepi, Maria and Tefas, Anastasios
Subjects: *ARTIFICIAL neural networks, *IMAGE retrieval, *FUNCTIONAL magnetic resonance imaging, *IMAGE segmentation
Abstract: In this paper a Convolutional Neural Network framework for Content Based Image Retrieval is proposed. We employ a deep CNN model to obtain the feature representations from the activations of the deepest layers and we retrain the network in order to produce more efficient image descriptors, relying on the available information. Our method suggests three basic model retraining approaches. That is, the Fully Unsupervised Retraining, if no information except from the dataset itself is available, the Retraining with Relevance Information, if the labels of the dataset are available, and the Relevance Feedback based Retraining, if feedback from users is available. We propose these approaches independently or in a pipeline, where each retraining approach operates as a pretraining step to the subsequent one. We also apply a query expansion method with spatial reranking on top of these approaches in order to boost the retrieval performance. The experimental evaluation on six publicly available image retrieval datasets indicates the effectiveness of the proposed method in learning more efficient representations for the retrieval task, outperforming other CNN-based retrieval techniques, as well as conventional hand-crafted feature-based approaches. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

23. A novel contrast enhancement forensics based on convolutional neural networks.

Author: Sun, Jee-Young, Kim, Seung-Wook, Lee, Sang-Won, and Ko, Sung-Jea
Subjects: *IMAGE segmentation, *IMAGE quality analysis, *DIGITAL image processing, *FORENSIC sciences, *IMAGE reconstruction
Abstract: Contrast enhancement (CE), one of the most popular digital image retouching technologies, is frequently utilized for malicious purposes. As a consequence, verifying the authenticity of digital images in CE forensics has recently drawn significant attention. Current CE forensic methods can be performed using relatively simple handcrafted features based on first-and second-order statistics, but these methods have encountered difficulties in detecting modern counter-forensic attacks. In this paper, we present a novel CE forensic method based on convolutional neural network (CNN). To the best of our knowledge, this is the first work that applies CNN to CE forensics. Unlike the conventional CNN in other research fields that generally accepts the original image as its input, in the proposed method, we feed the CNN with the gray-level co-occurrence matrix (GLCM) which contains traceable features for CE forensics, and is always of the same size, even for input images of different resolutions. By learning the hierarchical feature representations and optimizing the classification results, the proposed CNN can extract a variety of appropriate features to detect the manipulation. The performance of the proposed method is compared to that of three conventional forensic methods. The comparative evaluation is conducted within a dataset consisting of unaltered images, contrast-enhanced images, and counter-forensically attacked images. The experimental results indicate that the proposed method outperforms conventional forensic methods in terms of forgery-detection accuracy, especially in dealing with counter-forensic attacks. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

24. End-to-end subtitle detection and recognition for videos in East Asian languages via CNN ensemble.

Author: Xu, Yan, Shan, Siyuan, Qiu, Ziming, Jia, Zhipeng, Shen, Zhengyang, Wang, Yipei, Shi, Mengfei, and Chang, Eric I-Chao
Subjects: *ARTIFICIAL neural networks, *ASIAN languages, *DYNAMIC programming, *COGNITIVE ability, *TASK performance
Abstract: In this paper, we propose an innovative end-to-end subtitle detection and recognition system for videos in East Asian languages. Our end-to-end system consists of multiple stages. Subtitles are firstly detected by a novel image operator based on the sequence information of consecutive video frames. Then, an ensemble of Convolutional Neural Networks (CNNs) trained on synthetic data is adopted for detecting and recognizing East Asian characters. Finally, a dynamic programming approach leveraging language models is applied to constitute results of the entire body of text lines. The proposed system achieves average end-to-end accuracies of 98.2% and 98.3% on 40 videos in Simplified Chinese and 40 videos in Traditional Chinese respectively, which is a significant outperformance of other existing methods. The near-perfect accuracy of our system dramatically narrows the gap between human cognitive ability and state-of-the-art algorithms used for such a task. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

25. Multi-level channel attention excitation network for human action recognition in videos.

Author: Wu, Hanbo, Ma, Xin, and Li, Yibin
Subjects: *RECOGNITION (Psychology), *HUMAN behavior, *HUMAN activity recognition, *CONVOLUTIONAL neural networks, *WIRELESS channels
Abstract: Channel attention mechanism has continuously attracted strong interests and shown great potential in enhancing the performance of deep CNNs. However, when applied to video-based human action recognition task, most existing methods generally learn channel attention at frame level, which ignores the temporal dependencies and may limit the recognition performance. In this paper, we propose a novel multi-level channel attention excitation (MCAE) module to model the temporal-related channel attention at both frame and video levels. Specifically, based on video convolutional feature maps, frame-level channel attention (FCA) is generated by exploring time-channel correlations, and video-level channel attention (VCA) is generated by aggregating global motion variations. MCAE firstly recalibrates video feature responses with frame-wise FCA, and then activates the motion-sensitive channel features with motion-aware VCA. MCAE module learns the channel discriminability from multiple levels and can act as a guidance to facilitate efficient spatiotemporal feature modeling in activated motion-sensitive channels. It can be flexibly embedded into 2D networks with very limited extra computation cost to construct MCAE-Net, which effectively enhances the spatiotemporal representation of 2D models for video action recognition task Extensive experiments on five human action datasets show that our method achieves superior or very competitive performance compared with the state-of-the-arts, which demonstrates the effectiveness of the proposed method for improving the performance of human action recognition. • Learning temporal-related channel attention at both frame and video levels. • Multilevel channel attention activates discriminative action-related channels. • Efficient spatiotemporal feature modeling in motion-salient feature channels. • Spatiotemporal dependency together with channel attention for action recognition. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Diving deeper into underwater image enhancement: A survey.

Author: Anwar, Saeed and Li, Chongyi
Subjects: *IMAGE intensifiers, *DEEP diving, *UNDERWATER exploration, *CONVOLUTIONAL neural networks, *DEEP learning, *IMAGE processing
Abstract: The powerful representation capacity of deep learning has made it inevitable for the underwater image enhancement community to employ its potential. The exploration of deep underwater image enhancement networks is increasing over time; hence, a comprehensive survey is the need of the hour. In this paper, our main aim is two-fold, (1): to provide a comprehensive and in-depth survey of the deep learning-based underwater image enhancement, which covers various perspectives ranging from algorithms to open issues, and (2): to conduct a qualitative and quantitative comparison of the deep algorithms on diverse datasets to serve as a benchmark, which has been barely explored before. We first introduce the underwater image formation models, which are the base of training data synthesis and design of deep networks, and also helpful for understanding the process of underwater image degradation. Then, we review deep underwater image enhancement algorithms, and a glimpse of some of the aspects of the current networks is presented, including architecture, parameters, training data, loss function, and training configurations. We also summarize the evaluation metrics and underwater image datasets. Following that, a systematically experimental comparison is carried out to analyze the robustness and effectiveness of deep algorithms. Meanwhile, we point out the shortcomings of current benchmark datasets and evaluation metrics. Finally, we discuss several unsolved open issues and suggest possible research directions. We hope that all efforts done in this paper might serve as a comprehensive reference for future research and call for the development of deep learning-based underwater image enhancement. • We provide a thorough review of the recent techniques. • We introduce a new taxonomy of the algorithms based on their structural differences. • A comprehensive analysis is performed based on different architectural aspects. • We provide a systematic evaluation of algorithms on three publicly available datasets. • We discuss the challenges and provide insights into possible future directions. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

27. Action Recognition from Still Images Based on Deep VLAD Spatial Pyramids.

Author: Yan, Shiyang, Smith, Jeremy S., and Zhang, Bailing
Subjects: *IMAGE processing, *COMPUTER vision, *ARTIFICIAL neural networks, *COMPUTER algorithms, *PYRAMIDS (Geometry), *CONTEXTUAL analysis
Abstract: The recognition of human actions in images is a challenging task in computer vision. In many applications, actions can be exploited as mid-level semantic features for high level tasks. Actions often appear in fine-grained categorization, where the differences between two categories are small. Recently, deep learning approaches have achieved great success in many vision tasks, e.g., image classification, object detection, and attribute and action recognition. Also, the Bag-of-Visual-Words (BoVW) and its extensions, e.g., Vector of Locally Aggregated Descriptors (VLAD) encoding, have proved to be powerful in identifying global contextual information. In this paper, we propose a new action recognition scheme by combining the powerful feature representational capabilities of Convolutional Neural Networks (CNNs) with the VLAD encoding scheme. Specifically, we encode the CNN features of image patches generated by a region proposal algorithm with VLAD and subsequently represent an image by the compact code, which not only captures the more fine-grained properties of the images but also contains global contextual information. To identify the spatial information, we exploit the spatial pyramid representation and encode CNN features inside each pyramid. Experiments have verified that the proposed schemes are not only suitable for action recognition but also applicable to more general recognition tasks such as attribute classification. The proposed scheme is validated with four benchmark datasets with competitive mAP results of 88.5% on the Stanford 40 Action dataset, 81.3% on the People Playing Musical Instrument dataset, 90.4% on the Berkeley Attributes of People dataset and 74.2% on the 27 Human Attributes dataset. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

28. Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks.

Author: Pham, Cuong Cao and Jeon, Jae Wook
Subjects: *NEURAL circuitry, *STEREO vision (Computer science), *IMAGE processing, *LOCALIZATION (Mathematics), *ALGORITHMS
Abstract: Object proposals have recently emerged as an essential cornerstone for object detection. The current state-of-the-art object detectors employ object proposals to detect objects within a modest set of candidate bounding box proposals instead of exhaustively searching across an image using the sliding window approach. However, achieving high recall and good localization with few proposals is still a challenging problem. The challenge becomes even more difficult in the context of autonomous driving, in which small objects, occlusion, shadows, and reflections usually occur. In this paper, we present a robust object proposals re-ranking algorithm that effectivity re-ranks candidates generated from a customized class-independent 3DOP (3D Object Proposals) method using a two-stream convolutional neural network (CNN). The goal is to ensure that those proposals that accurately cover the desired objects are amongst the few top-ranked candidates. The proposed algorithm, which we call DeepStereoOP, exploits not only RGB images as in the conventional CNN architecture, but also depth features including disparity map and distance to the ground. Experiments show that the proposed algorithm outperforms all existing object proposal algorithms on the challenging KITTI benchmark in terms of both recall and localization. Furthermore, the combination of DeepStereoOP and Fast R-CNN achieves one of the best detection results of all three KITTI object classes. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

29. A simple framework to leverage state-of-the-art single-image super-resolution methods to restore light fields.

Author: Farrugia, Reuben A. and Guillemot, Christine
Subjects: *SINGULAR value decomposition, *OPTICAL flow, *HIGH resolution imaging, *ARTIFICIAL neural networks, *MARKOV random fields
Abstract: This paper describes a simple framework allowing us to leverage state-of-the-art single image super-resolution (SISR) techniques into light fields, while taking into account specific light field geometrical constraints. The idea is to first compute a representation compacting most of the light field energy into as few components as possible. This is achieved by aligning the light field using optical flow and then by decomposing the aligned light field using singular value decomposition (SVD). The principal basis captures the information that is coherent across all the views, while the other basis contain the high angular frequencies. Super-resolving this principal basis using an SISR method allows us to super-resolve all the information that is coherent across the entire light field. In this paper, to demonstrate the effectiveness of the approach, we have used the very deep super resolution (VDSR) method, which is one of the leading SISR algorithms, to restore the principal basis. The information restored in the principal basis is then propagated to restore all the other views using the computed optical flow. This framework allows the proposed light field super-resolution method to inherit the benefits of the SISR method used. Experimental results show that the proposed method is competitive, and most of the time superior, to recent light field super-resolution methods in terms of both PSNR and SSIM quality metrics, with a lower complexity. Moreover, the subjective results demonstrate that our method manages to restore sharper light fields which enables to generate refocused images of higher quality. • The proposed method is a simple framework that extends single image super-resolution for light field super-resolution. • This is achieved by compacting the energy of the light field and apply single image super-resolution on the principal basis. • Experimental results show that the proposed method is competitive to recent light field super-resolution methods. • Moreover, the proposed method achieve better subjective results even when consider non-Lambertian surfaces. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

30. Asymmetry-aware bilinear pooling in multi-modal data for head pose estimation.

Author: Chen, Jiazhong, Li, Qingqing, Ren, Dakai, Cao, Hua, and Ling, Hefei
Subjects: *CONVOLUTIONAL neural networks, *INFORMATION asymmetry
Abstract: The head pose on roll and yaw directions is decided by the asymmetric appearance in human faces, and the contextual information of asymmetric appearance is encoded in a head pose related neighborhood. However, CNNs used in existing head pose estimation methods often evenly performs on the features of full image. Thus it is hard to collect the contextual information of such asymmetric appearance by those methods. To address this issue, this paper proposes a novel head pose estimation method that could perceive the asymmetric appearance in human faces. Specifically, the awareness of such asymmetry is undertaken by the local pairwise feature interaction in head pose related neighborhood via bilinear pooling. Evaluations on two public datasets demonstrate that our method could achieve promising results. • We propose a novel bilinear pooling for multi-modal data head pose estimation. • Our method could perceive the asymmetric appearance in head pose related neighborhood. • We provide an extensive ablation study to explain the achieved performance. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

31. Graph-based discriminative features learning for fine-grained image retrieval.

Author: Sun, Han, Lang, Wenxi, Xu, Can, Liu, Ningzhong, and Zhou, Huiyu
Subjects: *IMAGE retrieval, *CONVOLUTIONAL neural networks, *IMAGE databases, *K-nearest neighbor classification, *COMPUTER vision
Abstract: Fine-grained image retrieval has gradually become a hot topic in computer vision , which aims to retrieve images with the same subcategories from general visual categories. Though fine-grained image retrieval has made a breakthrough with the development of convolutional neural networks, its performance is still limited by the low discriminative feature embedding. To solve this problem, most prior works focus on mining more discriminative features with various strategies. In this paper, we propose a novel graph-based discriminative features learning network for fine-grained image retrieval (GDF-Net). We first design a global fine-grained feature aggregation module, which reconstructs the discriminative features through capturing context correlation based on a K-Nearest Neighbor graph. To reduce storage overhead and speed up retrieval, we further design a semantic hash encoding module, which generates a semantically compact hash code under the guidance of Cauchy quantization loss and bit balance loss. Validated by extensive experiments and ablation studies, our method consistently outperforms state-of-the-art generic retrieval methods as well as fine-grained retrieval methods on three datasets, e.g., CUB Birds, Stanford Dogs and Stanford Cars. • We propose the GDF-Net framework to solve fine-grained image retrieval problems by mining correlations between discriminative features and constructing hash codes. • We design GFFAM to explore the interdependencies among feature vectors based on a graph convolutional network, which will guide the fusion of independent discriminative elements to enhance the global fine-grained features. • We design SHECM to learn a compact hash code to improve the retrieval performance by combining the classification loss with Cauchy cross-entropy loss and bit balance loss. • Experimental results on three datasets achieve state-of-the-art results. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

32. Hyperspectral image super-resolution using cluster-based deep convolutional networks.

Author: Zou, Changzhong and Zhang, Can
Subjects: *SPECTRAL imaging, *MULTISPECTRAL imaging, *HIGH resolution imaging, *CONVOLUTIONAL neural networks, *SPATIAL resolution, *DEEP learning
Abstract: In recent years, deep convolutional neural networks (CNNs) have been widely exploited for the hyperspectral image (HSI) super-resolution and obtained remarkable performance. However, most of the existing CNN-based methods have two main problems. One is to use two-dimension (2D) convolution to extract spatial information without paying attention to the mining of spectral information of hyperspectral images. The other is to use three-dimension (3D) convolution, which reduces the efficiency of the model when the network parameters increase. To address the above issues, we propose clustering deep residual neural network (CDRNN) for hyperspectral image super-resolution in this paper. The proposed CDRNN learns the complex, nonlinear mappings between low spatial resolution HSI and high spatial resolution HSI. At first, an unsupervised clustering method is used to divide a low spatial resolution HSI into several classes according to spectral correlation. Then, the spectrum-pairs from the classified low spatial resolution HSI and the corresponding high spatial resolution HSI are used to train the CDRNN to establish the nonlinear mapping for each class. Finally, we classify the given low spatial resolution HSI into the determined category and use the trained CDRNN to reconstruct the final high spatial resolution HSI from the classified low spatial resolution HSI. We conduct extensive experiments on three simulated benchmark datasets and a real HSI to evaluate the super-resolution performance of the proposed method. Experimental results show that our proposed method achieves significant improvement over state-of-the-art methods. • A new super-resolution framework based on clustering is proposed to accurately extract the corresponding spatial-spectral features of HSI. • We modify the input and output convolution to make it more suitable, and modify the number of convolution of the residual block to improves efficiency. • We evaluate the performance of the proposed method on four benchmark datasets, showing the superiority of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

33. Gated fusion network for SAO filter and inter frame prediction in Versatile Video Coding.

Author: Kuanar, Shiba, Athitsos, Vassilis, Mahapatra, Dwarikanath, and Rao, K.R.
Subjects: *VIDEO coding, *CONVOLUTIONAL neural networks, *IMAGE reconstruction, *DATA augmentation, *DEEP learning, *COMPUTATIONAL complexity
Abstract: In order to achieve higher coding efficiency, the Versatile Video Coding (VVC) standard includes several new components at the expense of an increase in decoder computational complexity. These technologies often create ringing and contouring effects on the reconstructed frames at a low bit rate and introduce blurring and distortion. To smooth those visual artifacts, the H.266/VVC framework supports four post-processing filter operations. The state-of-the-art CNN-based in-loop filters prefer to deploy multiple networks for various quantization parameters and frame resolutions, which increases training resources and subsequently becomes overhead at decoder frame reconstruction. This paper presents a single deep-learning-based model for sample adaptive off-set (SAO) non-linear filtering operation on the decoder side, uses feature correlation among adjacent frames, and substantiates the merits of intra–inter frame quality enhancement. We introduced a variable filter size dual multi-scale convolutional neural network (D-MSCNN) to attenuate the compression artifact and incorporated strided deconvolution to restore the high-frequency details on the distorted frame. Our model follows sequential training across all QP values and updates the model weights. Using data augmentation, weight fusion, and residual learning, we demonstrated that our model could be trained effectively by transferring the convolution prior feature indices to the decoder to produce a dense output map. The Objective measurements demonstrate that the proposed method outperforms the baseline VVC method in PSNR, MS-SSIM, and VMAF metrics and achieves an average of 5.16% bit rate saving on different test sequence categories. • We presented a gated fusion-guided framework in our model design, which effectively combines the inter–intra frame local and temporal feature heterogeneity. • Our decoupled model includes a modified loss function to constrain the pixel errors and incorporates the intermediate convolution feature maps through skip connections. • Our loss function can be viewed as a generalization of MSE at each batch and adds image gradients as priors for final image reconstruction. • A data-driven deconvolution framework is integrated into the decoder module to overcome the quantization artifacts. • The end-to-end framework learns the feature map aggregation in separate sub-tasks, optimizes the parameters, and reduces the noise to a greater capacity. • Our model's qualitative and quantitative evaluation shows the effectiveness of artifact removal, especially at crowded target regions, and performs favorably against the existing in-loop deep learning models. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

34. Hierarchical aesthetic quality assessment using deep convolutional neural networks.

Author: Kao, Yueying, Huang, Kaiqi, and Maybank, Steve
Subjects: *SIGNAL convolution, *ARTIFICIAL neural networks, *DEEP learning, *IMAGE analysis, *IMAGE quality analysis
Abstract: Aesthetic image analysis has attracted much attention in recent years. However, assessing the aesthetic quality and assigning an aesthetic score are challenging problems. In this paper, we propose a novel framework for assessing the aesthetic quality of images. Firstly, we divide the images into three categories: “scene”, “object” and “texture”. Each category has an associated convolutional neural network (CNN) which learns the aesthetic features for the category in question. The object CNN is trained using the whole images and a salient region in each image. The texture CNN is trained using small regions in the original images. Furthermore, an A&C CNN is developed to simultaneously assess the aesthetic quality and identify the category for overall images. For each CNN, classification and regression models are developed separately to predict aesthetic class (high or low) and to assign an aesthetic score. Experimental results on a recently published large-scale dataset show that the proposed method can outperform the state-of-the-art methods for each category. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

35. Deep Convolutional Neural Networks for pedestrian detection.

Author: Tomè, D., Monti, F., Baroffio, L., Bondi, L., Tagliasacchi, M., and Tubaro, S.
Subjects: *SIGNAL convolution, *ARTIFICIAL neural networks, *ROBOTICS, *IMAGE segmentation, *COMPUTER vision
Abstract: Pedestrian detection is a popular research topic due to its paramount importance for a number of applications, especially in the fields of automotive, surveillance and robotics. Despite the significant improvements, pedestrian detection is still an open challenge that calls for more and more accurate algorithms. In the last few years, deep learning and in particular Convolutional Neural Networks emerged as the state of the art in terms of accuracy for a number of computer vision tasks such as image classification, object detection and segmentation, often outperforming the previous gold standards by a large margin. In this paper, we propose a pedestrian detection system based on deep learning, adapting a general-purpose convolutional network to the task at hand. By thoroughly analyzing and optimizing each step of the detection pipeline we propose an architecture that outperforms traditional methods, achieving a task accuracy close to that of state-of-the-art approaches, while requiring a low computational time. Finally, we tested the system on an NVIDIA Jetson TK1, a 192-core platform that is envisioned to be a forerunner computational brain of future self-driving cars. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

36. An antagonistic training algorithm for TFT-LCD module mura defect detection.

Author: Lin, Guimin, Kong, Lingfeng, Liu, Tianjian, Qiu, Lida, and Chen, Xiyao
Subjects: *DEEP learning, *CONVOLUTIONAL neural networks, *MANUAL labor, *CRYSTAL models, *ALGORITHMS, *MANUFACTURING processes
Abstract: Although the production process of liquid crystal display model has been automated, the quality detection still depends on manual work. Mura defect is one of the common defects appearing in TFT-LCD modules. Since mura defect is not significantly different from the common background, it is difficult to detect. This paper presents a deep channel attention-based classification network (DCANet), which acts as a powerful feature extractor for object detectors, and proposes an antagonistic training algorithm based on convolution neural network. By the proposed training approach, the deep learning-based object detectors can achieve high accuracy even with a small number of training samples of mura defect. The experimental results show that compared to vanilla training method, the deep learning-based detectors trained by our proposed method could significantly improve their performance on mura defect detection with a few training samples. Even trained on only 600 samples, the mistake rate and miss rate are only 8.08% and 0.267% respectively, which can completely fulfill the enterprise's requirements of 10% and 0.3%. • A lightweight backbone network based upon dense blocks with channel attention (DCANet) is proposed and analyzed. • A data augment method of generate antagonistic sample is presented. • An antagonistic training framework is proposed to train deep learning-based object detectors. • Detailed study for the proposed backbone network and training algorithm. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

37. Feature compensation network based on non-uniform quantization of channels for digital image global manipulation forensics.

Author: Zhang, Yuxue, Yan, Yunfeng, and Feng, Guorui
Subjects: *DIGITAL images, *EDITING software, *CONVOLUTIONAL neural networks, *IMAGE compression
Abstract: With the popularity of image editing software and technology, it has become very easy to manipulate an image, which has challenged the authenticity and security of digital images. Digital Image Manipulation Forensics (DIMF) has become an important research direction to confirm the authenticity of images, detect the manipulations that images have undergone, and avoid the misuse of image editing. Currently, most DIMF targets the detection of multiple image manipulations with fixed parameters. However, we consider detecting image manipulation in a more complex scenario where the parameters are chosen to be arbitrary. In this paper, we propose a Feature Compensation Network (FCNet) based on non-uniform quantization of channels. Briefly, it contains three important parts: (1) feature enhancement block, which extracts and enhances valid information from low-level features and eliminates semantic gaps between them and the high-level features. (2) sensitivity estimation block, which obtains the importance coefficients of each channel and guides the non-uniform quantization of low-level features. (3) adaptive average pooling, which keeps the resolution of low-level features and high-level features consistent and ensures that subsequent feature fusion is appropriate. Through extensive experiments, we have demonstrated the effectiveness of the proposed method in detecting multiple image manipulations. • For global manipulation forensics, the spatial information of the low-level features can assist in extracting more manipulated traces, and the semantic information of the high-level features can help understand which manipulation the input image has undergone, and one cannot be without the other. • We propose a feature compensation network based on the non-uniform quantization of channels, which reuses low-level features to enhance the ability of the network to extract manipulated traces. • We design a feature enhancement module to extract manipulated traces in the low-level features and eliminate semantic gaps between the low-level features and the high-level features. At the same time, the importance of the features of different channels is estimated by using the sensitive property of the CNN. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

38. Hybrid deep-learning framework for object-based forgery detection in video.

Author: Tan, Shunquan, Chen, Baoying, Zeng, Jishen, Li, Bin, and Huang, Jiwu
Subjects: *RECURRENT neural networks, *FORGERY, *CONVOLUTIONAL neural networks
Abstract: Detection of object-based video forgery is receiving attention in recent years. However, until recently, dominant object-based forgery detectors are still based on hand-crafted features and their performances are not satisfactory. In this paper, we propose a novel hybrid deep-learning network which for the first time incorporates two-dimensional/three-dimensional convolutional neural network and recurrent neural network for object-based video forgery detection in videos with advanced encoding formats. Please note that the proposed framework is a full end-to-end data-driven solution. It is comprised of a specifically initialized three-dimensional convolutional layer which tries to mix up primitive intra-frame and inter-frame features, a two-dimensional CNN which tries to extract low-dimensional intra-frame features, and a four-layer bi-directional LSTM network which tries to catch high-level temporal features. Using this way, our proposed approach catches the intra-frame and inter-frame inherent properties of a target video with a united framework. The extensive experiments conducted on the largest object-based forged video database ever reported in the literature show that our hybrid framework achieves superior performance in forged video detection and forged segment localization. Moreover, the experiments conducted on datasets of videos with degraded quality demonstrated that our proposed framework is more robust in real-life scenarios. [Display omitted] • We have proposed a full end-to-end network for object-based video forgery detection. • The extensive experiments show that our framework achieves superior performance. • Moreover, the experiments show our framework is more robust on degraded videos. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

39. FloodNet: Underwater image restoration based on residual dense learning.

Author: Gangisetty, Shankar and Rai, Raghu Raj
Subjects: *IMAGE reconstruction, *DEEP learning, *CONVOLUTIONAL neural networks, *FEATURE extraction, *UNDERWATER construction, *LIGHT scattering, *DOCUMENT imaging systems
Abstract: The efficiency of the underwater image restoration task is hindered due to the degradation factors like light scattering, color shift, suspended particles and haze occurring in the underwater environment. In recent years, different methods have relied on the underwater image formation model and deep learning techniques to restore the underwater image, but they tend to produce unnatural artifacts and reduced levels of sharpness. To address these challenges, in this paper, we propose FloodNet using residual dense learning with the objective of estimating restored underwater images from a wide variety of degraded underwater images. The proposed FloodNet architecture is a fully convolution neural network that comprises of three modules, namely, low-level feature extraction for extracting features from the degraded underwater image, residual dense blocks that are densely connected via skip-connections for hierarchical feature fusion and to improve the flow of gradient during back-propagation, and finally global feature fusion to adaptively utilize both local and global residual learning to obtain a restored underwater image. We conduct the quantitative and qualitative evaluations on paired and unpaired underwater image datasets, as well as the application and user study analysis. The results demonstrate superior performance of FloodNet against existing SOTA methods. • Propose FloodNet, an end-to-end CNN architecture to restored underwater images from a wide variety of degraded underwater images. • Introduce residual dense blocks to connect the convolutional layers using skip-connections. • Global feature fusion to adaptively utilize both local and global residual learning. • Exhaustive perspective and quantitative experimental analysis on paired and unpaired underwater images. • Perform application and user study analysis. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

40. Double compression detection in HEVC-coded video with the same coding parameters using picture partitioning information.

Author: Uddin, Kutub, Yang, Yoonmo, and Oh, Byung Tae
Subjects: *VIDEO compression, *VIDEO coding, *CONVOLUTIONAL neural networks, *MACHINE learning
Abstract: Detection of double compression, particularly in the high-efficiency video coding (HEVC) compressed domain, is one of the most operative and efficacious ways of authenticating the validity of videos in the field of forensic analysis. The strength of identifying abnormalities in the videos confides in diverse coding parameters (such as quantization parameters, size and structure of the group of pictures, and modes of compression). Many methods have been introduced to dig up HEVC double compression with different coding parameters. However, the revelation of the HEVC double compression under the same coding environments still remains a competitive task, as recompressions leave small footprints. In this paper, we introduce a novel method based on frame partitioning information to distinguish between single and double compressions with the same coding parameters. We propose extracting statistical and deep convolution neural network (DCNN) features from partition pictures and prediction modes, including coding unit, prediction unit, transform unit, and most probable modes information. Finally, machine learning technology is integrated to categorize videos into two classes, single and double compressions, by combining the statistical and DCNN features. We obtain the best experimental results by assembling the statistical and DCNN features for wide video graphics array (WVGA) and high-definition (HD) sequences with average accuracies of 99.66% and 99.60% in all-intra and 99.46% and 99.33% in low-delay P modes respectively. Experimental results of the proposed system show the effectiveness and efficiency over the state-of-the-art techniques in video forensic. • This proposes the video forensic analysis in the HEVC-coded videos for ensuring the authenticity and integrity by detecting double compression. • Mainly focuses on the portioning and prediction information for discriminating HEVC single and double compressions. • This introduces two classes of features: statistical features and deep convolution neural network (DCNN) features for evaluating the proposed system. • Experiments are carried out in separate and combined fashion for statistical and DCNN features to show the robustness of each feature set. • The full analysis and comparisons of the quantitative results are provided. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

41. Feature back-projection guided residual refinement for real-time stereo matching network.

Author: Wen, Bin, Zhu, Han, Yang, Chao, Li, Zhicong, and Cao, Renxuan
Subjects: *CONVOLUTIONAL neural networks, *GRAPHICS processing units, *FEATURE extraction, *STEREO vision (Computer science), *STEREO image
Abstract: In recent stereo matching research, deep convolutional neural networks (CNNs) have shown excellent performance to estimate depth from stereo image pairs. Previous works mainly focus on improving the robust performance of the stereo matching network to obtain higher matching accuracy. In this paper, we propose an end-to-end real-time stereo matching network (FBPGNet). FBPGNet manifests its characteristics in three parts: feature extraction module (FEM), initial disparity estimation module (IDEM), feature back-projection guided residual refinement module (FBPG) The FEM is designed to capture semantic and contextual information, which is composed of residual block, dilation convolution and spatial attention mechanism. The IDEM is proposed to produce an initial low-resolution (LR) disparity map, which utilizes an hourglass 3D convolution architecture. In addition, the FBPG is employed to refine the up-sampled low-resolution disparity map, which takes the features from the FEM and the low-resolution disparity map as guide information. Experiments show that the proposed stereo matching network has comparable prediction accuracy and inference speed compared with recent real-time stereo matching networks, and can achieve 25 fps on high-end GPU. • We design a lightweight but efficient module to extract features. The module is composed of linear residual network, dilation convolution and spatial attention mechanism. • We propose a lightweight 3D convolutional neural network with an hourglass structure to generate the initial disparity map. • We propose a feature back-projection guided residual refinement module. This module uses a back-projection generator to generate high-frequency features to guide the disparity refinement. • Experiments show that our proposed stereo matching network can achieve 25 fps on high-end GPU. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

42. Convolutional analysis operator learning for multifocus image fusion.

Author: Zhang, Chengfang and Feng, Ziliang
Subjects: *IMAGE fusion, *CONVOLUTIONAL neural networks, *INVERSE problems
Abstract: Sparse representation (SR), convolutional sparse representation (CSR) and convolutional dictionary learning (CDL) are synthetic-based priors that have proven to be successful in signal inverse problems (such as multifocus image fusion). Unlike "synthesis" formulas, "analysis" model assigns probabilities to signals through various forward measurements of signals. Analysis operator learning (AOL) is a classical analysis-based learning method. Convolutional analysis operator learning (CAOL) is convolutional form of AOL. CAOL uses unsupervised learning method to train autocoded convolutional neural network (CNN) to more accurately solve inverse problem. From the perspective of CAOL, this paper introduces learned convolutional regularizers into multifocus image fusion and proposes CAOL-based multifocus image fusion algorithm. In the CDL stage, convergent block proximal extrapolated gradient method with majorizer (BPEG-M) and adaptive momentum restarting scheme are used. In the sparse fusion stage, alternating direction method of multipliers (ADMM) approach with convolutional basis pursuit denoising (CBPDN) and l 1 norm maximum strategy are employed for high-frequency and low-frequency component, respectively. 3 types of multifocus images (static gray images, gray images in sports and color images) are tested to verify performance of the proposed method. A comparison with representative methods demonstrates superiority of our method in terms of subjective observation and objective evaluation. • A new multi-focus image fusion framework CAOL-based is proposed. • The different rules are used to alleviate fusion defect of connection areas. • The fusion performance under different filters with CAOL is discussed. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

43. Tone mapping high dynamic range images based on region-adaptive self-supervised deep learning.

Author: Zhou, Fei, Liao, Guangsen, Duan, Jiang, Liu, Bozhi, and Qiu, Guoping
Subjects: *HIGH dynamic range imaging, *DEEP learning, *CONVOLUTIONAL neural networks
Abstract: This paper presents a region-adaptive self-supervised deep learning (RASSDL) technique for high dynamic range (HDR) image tone mapping. The RASSDL tone mapping operator (TMO) is a convolutional neural network (CNN) trained on local image regions that can seamlessly tone map images of arbitrary sizes. The training of RASSDL TMO is through the design of a self-supervising target that automatically adapts to the local image regions based on their information contents. The self-supervising target is designed to ensure the tone-mapped output achieves a balance between preserving the relative contrast of the original scene and the visibilities of the fine details to achieve faithful reproduction of the HDR scene. Distinguishing from many existing TMOs that require manual tuning of parameters, RASSDL is parameter-free and completely automatic. Experimental results demonstrate that RASSDL TMO can achieve state-of-the-art performance in terms of preserving overall contrasts, revealing fine details, and being free from visual artifacts. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

44. Parallel multiscale context-based edge-preserving optical flow estimation with occlusion detection.

Author: Zhang, Congxuan, Feng, Cheng, Chen, Zhen, Hu, Weiming, and Li, Ming
Subjects: *OPTICAL flow, *CONVOLUTIONAL neural networks, *OCCLUSION (Chemistry)
Abstract: Although convolutional neural network (CNN)-based optical flow approaches have exhibited good performance in terms of computational accuracy and efficiency in recent years, the issue of edge-blurring caused by motion occlusions remains. In this paper, we propose a parallel multiscale context-based edge-preserving optical flow estimation method with occlusion detection, named PMC-PWC. First, we exploit a parallel multiscale context (PMC) network for occlusion detection, in which the proposed PMC model is able to aggregate the multiscale context information to develop the performance of occlusion detection near motion boundaries. Second, we combine the PMC model with a context network to plan an occlusion estimation module and incorporate it into a pyramid, warping, and cost volume model to construct an edge-preserving optical flow computation network. Third, we design a novel loss function including an endpoint error (EPE)-based loss, a binary cross-entropy loss and an edge loss to supervise the proposed PMC-PWC network to produce optical flow and occlusion simultaneously. Finally, we run the proposed PMC-PWC method on the MPI-Sintel and KITTI datasets to conduct a comprehensive comparison with several state-of-the-art approaches. The experimental results indicate that the proposed PMC-PWC method performed well in terms of both accuracy and robustness, especially due to the significant benefits of edge preservation and occlusion handling. • We construct a parallel multiscale context network for occlusion detection, which extracts multiscale context information to refine the occlusion boundaries. • We combine the PMC network with a context network to establish an occlusion detection module and incorporate it into a pyramid, warping, and cost volume network to construct an edge-preserving optical flow model. • We exploit a novel loss function by integrating an edge loss with an EPE-based loss and a binary cross-entropy loss. The proposed loss function supervises the network to estimate flow field and occlusions simultaneously. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

45. Improved fine-grained object retrieval with Hard Global Softmin Loss objective.

Author: Wang, Xiaodong, Zeng, Xianxian, Zhang, Yun, Chen, Kairui, and Li, Dong
Subjects: *CONVOLUTIONAL neural networks, *PROBLEM solving, *IMAGE retrieval, *COMPUTER vision
Abstract: Image retrieval is a general task in computer vision, which aims at returning similar images of the query. Nowadays, extensive research has been drawn to fine-grained object retrieval, which is one of the difficult tasks of image retrieval. Compared to general image retrieval, the data of fine-grained objects show a great diversity in the same class, while a small diversity in different classes. Therefore, the key to fine-grained object retrieval resides in training models to obtain discriminative features. A kind of methods are based on local structure loss functions, e.g. pairwise and triplet loss, to generate distinguishable features for fine-grained object retrieval. However, these methods are time-consuming at the training stage and of low accuracy. To solve these problems, some methods based on global structure loss functions are proposed. Convolutional neural networks are optimized with the global centers and then generate distinguishable features. In this paper, based on the global structure loss functions and hard mining strategy, we propose the Hard Global Softmin Loss to improve the performance of fine-grained object retrieval. Furthermore, a learnable parameter is introduced into the proposed loss, which is dynamically adjusted by the network throughout the training. Lots of experiments show that the proposed loss function is effective and helpful for promoting the retrieval performance. Specifically, significant improvements are obtained over the state-of-the-art on four popular fine-grained datasets in our experiments 1 1 Our code is publicly available at https://github.com/RiyaoDong/HGSL.. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

46. Micro-expression recognition from local facial regions.

Author: Aouayeb, Mouath, Hamidouche, Wassim, Soladie, Catherine, Kpalma, Kidiyo, and Seguier, Renaud
Subjects: *DEEP learning, *LONG-term memory, *SHORT-term memory, *CONVOLUTIONAL neural networks, *HUMAN beings, *EMOTIONS
Abstract: MiE is a facial involuntary reaction that reflects the real emotion and thoughts of a human being. It is very difficult for a normal human to detect a Micro-Expression (MiE), since it is a very fast and local face reaction with low intensity. As a consequence, it is a challenging task for researchers to build an automatic system for MiE recognition. Previous works for MiE recognition have attempted to use the whole face, yet a facial MiE appears in a small region of the face, which makes the extraction of relevant features a hard task. In this paper, we propose a novel deep learning approach that leverages the locality aspect of MiEs by learning spatio-temporal features from local facial regions using a composite architecture of Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). The proposed solution succeeds to extract relevant local features for MiEs recognition. Experimental results on benchmark datasets demonstrate the highest recognition accuracy of our solution with respect to state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

47. Speckle noise removal based on structural convolutional neural networks with feature fusion for medical image.

Author: Li, Dazi, Yu, Wenjie, Wang, Kunfeng, Jiang, Daozhong, and Jin, Qibing
Subjects: *SPECKLE interference, *CONVOLUTIONAL neural networks, *DEEP learning, *ADDITIVE white Gaussian noise, *IMAGE fusion, *DIAGNOSTIC imaging
Abstract: Application of convolutional neural networks (CNNs) for image additive white Gaussian noise (AWGN) removal has attracted considerable attentions with the rapid development of deep learning in recent years. However, the work of image multiplicative speckle noise removal is rarely done. Moreover, most of the existing speckle noise removal algorithms are based on traditional methods with human priori knowledge, which means that the parameters of the algorithms need to be set manually. Nowadays, deep learning methods show clear advantages on image feature extraction. Multiplicative speckle noise is very common in real life images, especially in medical images. In this paper, a novel neural network structure is proposed to recover noisy images with speckle noise. Our proposed method mainly consists of three subnetworks. One network is rough clean image estimate subnetwork. Another is subnetwork of noise estimation. The last one is an information fusion network based on U-Net and several convolutional layers. Different from the existing speckle denoising model based on the statistics of images, the proposed network model can handle speckle denoising of different noise levels with an end-to-end trainable model. Extensive experimental results on several test datasets clearly demonstrate the superior performance of our proposed network over state-of-the-arts in terms of quantitative metrics and visual quality. • A new model based on structural CNN with feature fusion is proposed. • The proposed method can guide U-Net to reconstruct clean image model. • The model is proved to be effective in groups of medical images denoising tests. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

48. S2-aware network for visual recognition.

Author: Zhao, Wenyi, Yang, Huihua, Pan, Xipeng, and Li, Lingqiao
Subjects: *IMAGE recognition (Computer vision), *COMPUTER vision, *CONVOLUTIONAL neural networks, *PETRI nets
Abstract: Capturing the comprehensive information of various sizes and shapes of images in the same convolution layer is typically a challenging task in computer vision. There are two main kinds of methods for capturing those features. The first uses the inception structure and its variants. The second utilizes larger convolution kernels on specific layers or stacks with more convolution blocks. However, these methods can result in computationally intensive or vanishing gradients. In this paper, to accommodate feature distributions with different sizes, shapes and reduce computational cost, we propose a width- and depth-aware module named the WD-module to match feature distributions. Moreover, the proposed WD-module consumes less computational cost and parameters compared with traditional residual convolution layers. To verify the effectiveness of our proposed method, a size- and shape-aware backbone network named S2A-Net was built, which was obtained by stacking the WD-modules. By visualizing heat maps and features, the proposed S2A-Net can adapt to objects with different sizes and shapes in visual recognition tasks and learn more comprehensive characteristics. Experimental results show that the proposed method has higher accuracy in image recognition and outperforms other state-of-the-art networks with the same numbers of layers. • A width-aware and deep-aware module is build. • A size-aware and shape-aware network for visual recognition is build. • Compared with other state-of-the-art methods, the proposed method can achieve better results while consume less parameters and FLOPs. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

49. Image source identification with known post-processed based on convolutional neural network.

Author: Liao, Xin, Chen, Jing, and Chen, Jiaxin
Subjects: *CONVOLUTIONAL neural networks, *PROBLEM solving, *HIGHPASS electric filters, *DIGITAL images
Abstract: Image source identification is important to verify the origin and authenticity of digital images. However, when images are altered by some post-processing, the performance of the existing source verification methods may degrade. In this paper, we propose a convolutional neural network (CNN) to solve the above problem. Specifically, we present a theoretical framework for different tampering operations, to confirm whether a single operation has affected photo response non-uniformity (PRNU) contained in images. Then, we divide these operations into two categories: non-influential operation and influential operation. Besides, the images altered by the combination of non-influential and influential operations are equal to images that have only undergone a single influential operation. To make our introduced CNN robust to both non-influential operation and influential operation, we define a multi-kernel noise extractor that consists of a high-pass filter and three parallel convolution filters of different sizes. The features generated by the parallel convolution layers are then fed to subsequent convolutional layers for further feature extraction. The experimental results provide the effectiveness of our method. • The tampering operations are divided into influential and non-influential operations. • The images altered by non-influential and influential operations are equal to images that undergone single influential operation. • An image source identification network containing a multi-kernel noise extractor is designed. • The experimental results show that our method can obtain better performance. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

50. Multi-scale spatial convolution algorithm for lane line detection and lane offset estimation in complex road conditions.

Author: Haris, Malik, Hou, Jin, and Wang, Xiaomin
Subjects: *ALGORITHMS, *CONVOLUTIONAL neural networks, *OBJECT recognition (Computer vision), *DEEP learning, *TRAFFIC estimation, *HOUGH transforms
Abstract: Deep learning has made remarkable progress in the field of image classification and object detection. Nevertheless, in the autonomous driving research, the real-time lane line detection and lane offset estimation in complex traffic scenes have always been challenging and difficult tasks. Traditional detection methods need manual adjustment of parameters, they face many problems and difficulties and are still highly susceptible to interference caused by obstructing objects, illumination changes, and pavement wear. It is still challenging to design a robust lane detection and lane offset estimation algorithm. In this paper, we propose a convolutional neural network for lane offset estimation and lane line detection in a complex road environment, which transforms the problems of lane line detection into the instance's segmentation. In response to a change in the method of lane processing, the network will form its example to each line. The global scale perception optimization mechanism is designed to solve the issue, especially where the lane line width is gradually narrowing at the vanishing point of the lane. At the same time, to realize multi-tasking processing and improve performance, and end-to-end lane offset estimation network is used in addition to the lane line detection network. • Multi-Scale Spatial Convolution Algorithm. • Lane Line Detection. • Lane Offset Estimation. • Multi-tasking in Lane Detection. • Lane Detection by active sensor (i.e. Camera). [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

66 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources