316 results
Search Results
2. Key–Value Pair Identification from Tables Using Multimodal Learning.
- Author
-
Chu, Jung Soo, Pyo, Bryan, Parth, Vik, Hussein, Ahmed, and Wang, Patrick
- Subjects
MACHINE learning ,NATURAL language processing ,OPTICAL character recognition ,COMPUTER vision ,IMAGE processing - Abstract
Computer vision and optical character recognition techniques have rapidly advanced in order to accurately capture text and other features from paper documents. While state-of-the-art tools in these fields now yield high accuracy, analyzing their outputs requires more research. Since tables are common in such documents, a new pipeline, based on multimodal learning, is proposed to better extract key–value pairs from tables. Its performance is evaluated with a synthetically generated dataset with randomly generated tables and a dataset of mechanical part documents provided by SiliconExpert Technologies. Its performance is also compared with another state-of-the-art model built for similar tasks, LayoutLM. The proposed pipeline provides a fully automated, end-to-end scalable solution, beginning with image processing and computer vision components to a machine learning model that uses data from optical character recognition and natural language processing to make the final decisions. In the best configuration, the pipeline achieved a 96.26% accuracy on a large, synthetically generated training and test set. When comparing the proposed pipeline with LayoutLM, the proposed pipeline performed similarly on the synthetic dataset and better on the real dataset. These results show the potential of the multimodal approach in extracting key–value pairs from tables from real paper documents. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Editorial: Computer Vision Theory and Applications at VISAPP 2020.
- Author
-
Radeva, Petia and Farinella, Giovanni Maria
- Subjects
COMPUTATIONAL mathematics ,COMPUTER vision ,ARTIFICIAL intelligence - Published
- 2021
- Full Text
- View/download PDF
4. AEMNet: Unsupervised Video Anomaly Detection Method Based on Attention-Enhanced Memory Networks.
- Author
-
Zhang, Linliang, Yan, Lianshan, Peng, Shouxin, and Pan, Lihu
- Subjects
- *
INTRUSION detection systems (Computer security) , *ANOMALY detection (Computer security) , *COMPUTER vision , *LEARNING ability , *VIDEOS - Abstract
Video anomaly detection has always been a challenging task in computer vision due to data imbalance and susceptibility to scene variations such as lighting and occlusions. In response to this challenge, this paper proposes an unsupervised video anomaly detection method based on an attention-enhanced memory network. The method utilizes a dual-stream network structure of autoencoders, enhancing the model's learning ability for important features in appearance and motion by introducing coordinate attention mechanisms and variance attention mechanisms, emphasizing significant characteristics of static objects and rapidly moving regions. By adding memory modules to both the appearance and motion branches, the network structure's memory information is reinforced, enabling it to capture long-term spatiotemporal dependencies in videos and thereby improving the accuracy of anomaly detection. Furthermore, by optimizing the network structure's activation functions to handle negative inputs, it enhances its nonlinear modeling capabilities, enabling better adaptation to complex environments, including variations in lighting and occlusions, further improving the effectiveness of anomaly detection. The paper conducts comparative experiments and ablation studies using three public available datasets and various models. The results demonstrate that compared to baseline models, the AUC performance is improved by 3.9%, 4.7%, and 1.7% on UCSD Ped2, CHUK Avenue, and ShanghaiTech datasets, respectively. When compared with the other models, the average AUC performance is improved by 4.3%, 5.4%, and 6.2%, with an average improvement of 8.75% in the ERR metric, validating the effectiveness and adaptability of the proposed method. The code can be obtained at the following URL: https://github.com/AcademicWhite/AEMNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Residual Network for Image Compression Artifact Reduction.
- Author
-
hu, Jianhua, Luo, Guixiang, Wang, Bo, Wu, Weimei, Yang, Jiahui, and Guo, Jianding
- Subjects
- *
IMAGE compression , *TRANSFORMER models , *COMPUTER vision , *CONVOLUTIONAL neural networks , *IMAGE transmission , *COMPUTER systems - Abstract
This paper proposes an image compression algorithm based on Swin Transformer and residual network (STRN), aiming to reduce blurring and distortions in traditionally compressed images. The algorithm utilizes a dual-channel mechanism to remove artifacts from the image, which takes advantage of the complementary features of the transform and residual networks. The Swin Transformer networks address the issue of long-range dependency, leading to an enhanced and improved reconstructed image quality. The residual network is an effective network that mitigates gradient loss and recovers image details during the image compression process. The paper demonstrates that image compression can be achieved by training a convolutional network based on a transformer and residuals network, which significantly reduces artifacts and provides better reconstructed image quality compared to previously used and current mainstream methods based on traditional convolutional neural networks. The proposed approach can remove blocking artifacts by subtracting estimated artifacts from the input image, while still preserving most of the original details. Therefore, our proposed method is highly effective in improving image quality and reducing visual artifacts caused by traditional compression methods. Moreover, this method is useful for enhancing image transmission and storage efficiency in various computer vision systems that employ digital visual codecs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. A Method for 3D Human Pose Estimation and Similarity Calculation in Tai Chi Videos.
- Author
-
Cai, Xingquan, Lu, Rui, Zhang, Haoyu, Huo, Yuqing, Sun, Haiyan, and Ji, Jiaqi
- Subjects
POSE estimation (Computer vision) ,TAI chi ,COMPUTER vision ,HUMAN mechanics ,VIDEOS ,CHANNEL coding - Abstract
Human pose estimation from video sequences has become a hot research topic in the domain of robotics and computer vision. However, existing three-dimensional (3D) pose estimation methods usually analyze individual frames, which have a low accuracy due to various human movement speed, limiting its practical application. In this paper, we propose a method for estimating 3D pose and calculating similarity from Tai Chi video sequences based on Seq2Seq network. Specifically, using 2D joint point coordinate sequence of the original image as input, our method constructs an encoder and a decoder to build a Seq2Seq network. Our method introduces an attention mechanism for weighing the input data to obtain an intermediate vector and decode it to estimate the 3D joint point sequence. Afterwards, using a template video and a target video as input, our method calculates the cost of passing through each point within the constraints to construct a cost matrix for video similarity. With the cost matrix, our method can determine the optimal path and use the correspondence of the video sequence to calculate the image similarity of the corresponding frame. The experimental data show that the proposed method can effectively improve the accuracy of 3D pose estimation, and increase the speed for video similarity calculation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Self-Supervised Texture Image Anomaly Detection by Fusing Normalizing Flow and Dictionary Learning.
- Author
-
Guo, Yaohua, Song, Lijuan, and Ma, Zirui
- Subjects
ANOMALY detection (Computer security) ,INTRUSION detection systems (Computer security) ,COMPUTER vision ,DEEP learning - Abstract
A common study area in anomaly identification is industrial images anomaly detection based on texture background. The interference of texture images and the minuteness of texture anomalies are the main reasons why many existing models fail to detect anomalies. We propose a strategy for anomaly detection that combines dictionary learning and normalizing flow based on the aforementioned questions. The two-stage anomaly detection approach that is already in use is enhanced by our method. In order to improve baseline method, this research adds normalizing flow in representation learning and combines deep learning and dictionary learning. Improved algorithms have exceeded 95 % detection accuracy on all MVTec AD texture type data after experimental validation. It shows strong robustness. The baseline method's detection accuracy for the Carpet data was 67.9 %. The paper was upgraded, raising the detection accuracy to 99.7 %. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. A Fast and Accurate Human Pose Estimation Method Based on Multi-Scale Feature Fusion Grid Structure.
- Author
-
Li, Qiming, Wan, Daizong, and Yang, Xiaoyan
- Subjects
POSE estimation (Computer vision) ,VISUAL fields ,COMPUTER vision ,HUMAN beings - Abstract
Human pose estimation (HPE) is a research hotspot in the field of computer vision. Most of the existing approaches first generate low-resolution representation from high-resolution representation through continuous serial downsampling, and then reconstruct high-resolution results from low-resolution features through continuous serial upsampling, which loses a lot of effective feature information and leads to slow model inference. In this paper, the Fast Accuracy Network (FANet), a framework that enables fast and high-accuracy HPE, is proposed. The innovation lies in that, first of all, a grid structure is proposed and adopted, which can be regarded as a set of deep paths and shallow paths. The structure uses multiple high-resolution and low-resolution branch pairs to perform skip-level connections at different scale-space levels so that the information can be exchanged between different resolution representations for many times. The feature information fusion of multi-scale space is realized to obtain more abundant feature information. Second, an improved bottleneck block is proposed to extract effective feature information with fewer parameters, ensuring that the computational burden is reduced without sacrificing accuracy performance. The experimental results show that, compared with other current models, FANet has faster inference speed on the premise of a slight improvement in accuracy performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Neural Network Based on Work Piece Recognition and Robot Intelligent Capture in Complex Environments.
- Author
-
Tang, Bo
- Subjects
CONVOLUTIONAL neural networks ,SPACE robotics ,MANUFACTURING processes ,INDUSTRIAL robots ,IMAGE recognition (Computer vision) ,MANUAL labor ,ROBOT programming ,COMPUTER vision - Abstract
In today's rapid development of science and technology, the manual work in the factory has been gradually replaced by machines, and the process of industrial intelligence has been further deepened. Workpiece recognition is the use of machine learning, computer vision and other technologies to identify the target workpiece, and the robot intelligent capture is a higher level of operation of the workpiece recognition, which is the key to realize the intelligentization of industrial robots. Due to the complex environmental factors and the diversity of the shape and size of the objects to be grasped, the accuracy and efficiency of the workpiece recognition are not ideal at this stage, and intelligent crawling is even more difficult to talk about. Aiming at the above problems, this paper builds a crawling planning model based on the convolutional neural network and the grasping pose mapping rules. Based on the established crawling planning model, the sampling candidate grabbing algorithm is designed and the migration learning method is adopted. The pretrained convolutional neural network for image recognition on the ImageNet dataset was migrated to the capture detection task of the Carnegie Mellon dataset. Experiments show that the network model proposed in this paper performs well, and its correct crawl rate is as high as 81.27%. This is to achieve a more stable and reliable identification and intelligent crawling work. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Dynamic Coal Quantity Detection and Classification of Permanent Magnet Direct Drive Belt Conveyor Based on Machine Vision and Deep Learning.
- Author
-
Wang, Guimei, Li, Xuehui, and Yang, Lijie
- Subjects
DEEP learning ,CONVEYOR belts ,COMPUTER vision ,BELT conveyors ,BELT drives ,PERMANENT magnets ,COAL - Abstract
Real-time and accurate measurement of coal quantity is the key to energy-saving and speed regulation of belt conveyor. The electronic belt scale and the nuclear scale are the commonly used methods for detecting coal quantity. However, the electronic belt scale uses contact measurement with low measurement accuracy and a large error range. Although nuclear detection methods have high accuracy, they have huge potential safety hazards due to radiation. Due to the above reasons, this paper presents a method of coal quantity detection and classification based on machine vision and deep learning. This method uses an industrial camera to collect the dynamic coal quantity images of the conveyor belt irradiated by the laser transmitter. After preprocessing, skeleton extraction, laser line thinning, disconnection connection, image fusion, and filling, the collected images are processed to obtain coal flow cross-sectional images. According to the cross-sectional area and the belt speed of the belt conveyor, the coal volume per unit time is obtained, and the dynamic coal quantity detection is realized. On this basis, in order to realize the dynamic classification of coal quantity, the coal flow cross-section images corresponding to different coal quantities are divided into coal type images to establish the coal quantity data set. Then, a Dense-VGG network for dynamic coal classification is established by the VGG16 network. After the network training is completed, the dynamic classification performance of the method is verified through the experimental platform. The experimental results show that the classification accuracy reaches 94.34%, and the processing time of a single frame image is 0.270 s. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
11. Copy-Move Forgery Detection and Localization Using Deep Learning.
- Author
-
Mehrjardi, Fatemeh Zare, Latif, Ali Mohammad, and Zarchi, Mohsen Sardari
- Subjects
DEEP learning ,FORGERY ,COMPUTER vision ,FEATURE extraction - Abstract
Forgery detection is one of the challenging subjects in computer vision. Forgery is performed using image manipulation with editor tools. Image manipulation tries to change the concept of the image but preserves the integrity of the texture and structure of the image as much as possible. Images are used as evidence in some applications, so if the images are manipulated, they will not be reliable. The copy-move forgery is one of the simplest image manipulation methods. This method removes or inserts information into the image with the least clue by copying a part of the image and pasting it into other places of the same image. Recently, traditional (block-based and keypoint-based) and deep learning methods have been proposed to detect forgery images. Traditional methods include two main steps, feature extraction, and feature matching. Unlike the traditional methods, the deep learning method performs the forgery detection automatically by extracting hierarchical features directly from the data. This paper presents a deep learning method for forgery detection at both image and pixel levels. In this method, we used a pre-trained deep model with a global average pooling (GAP) layer instead of default fully connected layers to detect forgery. The GAP layer creates a good dependency between the feature maps and the classes. In pixel forgery detection, a visualization technique called heatmap activation is used in forgery images. This technique identifies parts of the image that are candidates for forgery. Then, the best candidate is selected and the location of the forgery is determined. The proposed method is performed on the CoMoFod and MICC datasets. The extensive experiments showed the satisfactory performance of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Emotion Recognition from Facial Expression Using Hybrid CNN–LSTM Network.
- Author
-
Mohana, M., Subashini, P., and Krishnaveni, M.
- Subjects
EMOTION recognition ,FACIAL expression ,ARTIFICIAL intelligence ,COMPUTER vision ,CONVOLUTIONAL neural networks ,DEEP learning - Abstract
Facial Expression Recognition (FER) is a prominent research area in Computer Vision and Artificial Intelligence that has been playing a crucial role in human–computer interaction. The existing FER system focuses on spatial features for identifying the emotion, which suffers when recognizing emotions from a dynamic sequence of facial expressions in real time. Deep learning techniques based on the fusion of convolutional neural networks (CNN) and long short-term memory (LSTM) are presented in this paper for recognizing emotion and identifying the relationship between the sequence of facial expressions. In this approach, a hyperparameter tweaked VGG-19 skeleton is employed to extract the spatial features automatically from a sequence of images, which avoids the shortcoming of the conventional feature extraction methods. Second, these features are given into bidirectional LSTM (Bi-LSTM) for extracting spatiotemporal features of time series in two directions, which recognize emotion from a sequence of expressions. The proposed method's performance is evaluated using the CK+ benchmark as well as an in-house dataset captured from the designed IoT kit. Finally, this approach has been verified through hold-out cross-validation techniques. The proposed techniques show an accuracy of 0.92% on CK+, and 0.84% on the in-house dataset. The experimental results reveal that the proposed method outperforms compared to baseline methods and state-of-the-art approaches. Furthermore, precision, recall, F1-score, and ROC curve metrics have been used to evaluate the performance of the proposed system. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
13. Image Feature Extraction and Object Recognition Based on Vision Neural Mechanism.
- Author
-
Wei, Peng Cheng and Zou, Yang
- Subjects
COMPUTER vision ,COMPUTER engineering ,OPTICAL information processing ,OBJECT recognition (Computer vision) ,VISION ,ARTIFICIAL intelligence ,FEATURE extraction - Abstract
As an important branch of artificial intelligence, computer vision plays a huge role in the rapid development of artificial intelligence. From a biological point of view, in the acquisition and processing of information, vision is much more important than hearing, touch, etc., because 70% of the human cerebral cortex is processing visual information. Therefore, advances in computer vision technology are critical to the development of artificial intelligence that is designed to allow machines to think and handle things like humans. The acquisition and processing of visual information has always been the focus of computer vision research, and it is also difficult. The main problem of traditional computer vision technology in the processing of visual information is that the extracted image features are less discriminative, the generalization ability of image features in complex background scenes is insufficient, and the recognition ability on object recognition is poor. In response to these problems, based on the visual neural mechanism, this paper establishes an appropriate computer model for the neuronal cells in the human primary visual cortex, models the recognition response mechanism of the visual ventral system, and performs image feature extraction on the training samples. And object recognition. The results show that compared with the traditional methods, the proposed method effectively improves the discrimination of image features, and the image features extracted under complex background scenes have good generalization ability. On this basis, the training samples can be effectively recognized. The results show that the model based on the visual neural mechanism, the recognition of the edge, orientation and contour of the training sample show the advantages of the biological vision mechanism in object recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. Millimeter-Wave Radar and Machine Vision-Based Lane Recognition.
- Author
-
Li, Wei, Guan, Yue, Chen, Liguo, and Sun, Lining
- Subjects
MILLIMETER wave radar ,COMPUTER vision ,MACHINE learning ,PATTERN recognition systems ,LEAST squares - Abstract
Camera can sensor the environment on the lane by extracting the lane lines, but such detection is limited to a short distance with effect of illumination and other factors; radar can detect objects a long distance away but cannot detect the lane conditions. This paper combined machine vision with millimeter-wave radar and extracted the nearby distinct lane line through images; at the same time, the radar obtained the motion trajectory information of distant vehicles, then the least-square method was used to make curve fitting on those motion trajectory information in order to reconstruct the lane line information. Finally, in the stage of fusing two segments of lane lines, the goodness of fit was applied to complete the matching of corresponding lane lines. While, for areas between two segments of lane lines that neither camera or radar can detect, we established a lane model, utilized probabilistic neural network to select the corresponding lane model for matching, and then used approximate mathematics expression according to the selected lane model, thus obtaining the final front road information of current vehicle. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
15. A Novel Dual U-Net Generative Adversarial Network for Image Inpainting.
- Author
-
Yuan, Jianjun, Wu, Hong, and Wu, Fujun
- Subjects
- *
GENERATIVE adversarial networks , *INPAINTING , *DEEP learning , *COMPUTER vision , *VISUAL fields , *EMULSION paint - Abstract
Thanks to the rapid development of deep learning in recent years, image inpainting has made significant progress. As a fundamental task in the field of computer vision, many researchers are committed to exploring more efficient methods, and state-of-the-art research results prove that generative adversarial networks (GAN) have superior performance. However, due to the inherent ill-posedness of image inpainting tasks, these approaches suffer from lack of detailed information, local structural fractures or boundary artifacts. In this paper, we leverage the properties of GAN architecture to process images in more detail and more comprehensively. A novel dual U-Net GAN is designed to inpaint images, which is composed of a U-Net based generator and a U-Net-based discriminator. The former captures semantic information of different scales layer by layer and decodes it back to the original size to repair damaged images, while the latter optimizes the network by combining reconstruction loss, adversarial loss, perceptual loss and style loss. In particular, the U-Net-based discriminator allows per-pixel detail and global feedback to be provided to the generator, guaranteeing the global consistency of the inpainted image and the realism of local shapes and textures. Extensive experiments demonstrate that for different proportions of damage, the images inpainted by our proposed model have reasonable texture structure and contextual semantic information. Furthermore, the proposed model outperforms state-of-the-art models in both qualitative and quantitative comparisons. The code will be available at https://github.com/yjjswu. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Intelligent Classification of Metallographic Based on Improved Deep Residual Efficiency Networks.
- Author
-
Huang, Xiaohong, Liu, Yanping, Qi, Xueqian, and Song, Yue
- Subjects
- *
ARTIFICIAL neural networks , *ARTIFICIAL intelligence , *COMPUTER vision , *CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *COMPUTER performance - Abstract
The recognition of steel microstructure images plays a crucial role in the metallographic analysis process. Although some progress has been made through the application of artificial intelligence algorithms, several challenges remain. First, existing algorithms exhibit weak nonlinear feature extraction capabilities and noticeable limitations. Second, they overlook the intrinsic noise and redundant interference present in microscopic images. To address these issues, this paper investigates the automatic recognition of metallographic tissues by leveraging residual structures in deep neural networks. An enhanced residual network model based on transfer learning is proposed, which utilizes the pre-trained weights from the ImageNet dataset to facilitate learning with small sample data. This network offers higher classification accuracy and higher F1 scores. In addition, a deep residual shrinkage network model based on an attention mechanism is proposed. This model incorporates an attention sub-network into the original residual module and employs a soft threshold function to eliminate redundant features, including noise. The proposed algorithms are evaluated against various convolutional neural networks using 20 types of metallographic test sets. The experimental results showed that both methods have high accuracy rates of 95% and 94.44%, respectively, and F1 scores of 0.9464 and 0.9419. While maintaining the complexity of the model, there has been a significant improvement in accuracy, and the models exhibit strong generalization capabilities. Our research contributes to enhancing production efficiency, strengthening quality control, and improving material performance through computer vision technology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Neural Network-Based Method for Early Diagnosis of Autism Spectral Disorder Head-Banging Behavior from Recorded Videos.
- Author
-
Sadek, Esraa T., Seada, Noha A., and Ghoniemy, Said
- Subjects
BEHAVIOR disorders ,AUTISM spectrum disorders ,COMPUTER vision ,AUTISM ,NEURAL computers ,EARLY diagnosis ,EAR - Abstract
Autism spectrum disorder (ASD) is a mental developmental disorder associated with social and communicational defects and Stereotypical Motor Movements (SMM). SMM is a set of repetitive motor activities associated with several mental developmental disorders like Autism. SMM has several forms like arm flapping, head banging, ear covering, and spinning with various degrees of severity that might lead to self-injury in severe cases. Developing a computer-vision-based technology to detect noticeable SMM behaviors can help in the early diagnosis of autism. In this paper, a computer vision-based neural network model was proposed to detect and recognize repetitive motor behaviors. The proposed model went through three main stages: First, data preparation. Second, human body features extraction using deep learning pose estimation and the skeleton representation model, and finally, multiclass classification to distinguish between several classes of headbanging. The proposed solution was evaluated using the Self Stimulatory Behavior Dataset (SSBD) which is a public dataset of three classes of repetitive motor behaviors associated with autism. We also collected a set of 40 videos of autistic children exhibiting headbanging from public domains like YouTube. In addition to that, we captured 25 videos of typically developing subjects mimicking headbanging. The collected and the videoed videos were used to evaluate the proposed model. This work proves the applicability of diagnosing mental developmental syndrome symptoms using vision-based techniques in cooperation with neural networks. The produced results prove that the used techniques can operate well in real-world challenging applications. The proposed model achieved 85.5% accuracy on SSBD and 93% on the collected and recorded videos. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Masked Visual Transformer for Efficient Training with Small Dataset.
- Author
-
Guan, Chen-Zhi
- Subjects
TRANSFORMER models ,CONVOLUTIONAL neural networks ,COMPUTER vision - Abstract
Vision Transformers (ViTs) are becoming an architectural paradigm replacing the convolutional neural networks (CNNs) in computer vision. ViTs offer competitive performances with respect to CNNs, but they, especially the vanilla ViTs, are hungrier for data than the typical CNNs as the short of the common inductive bias of convolution. Recently, few works focused on training vanilla ViTs efficiently with small datasets. In this paper, we perform research on training vanilla ViTs with small dataset containing thousands of images, and propose a method that is applied to self-supervised pretraining stage. The proposed method combines parametric instance discrimination with CutMix and Multi-crop. Furthermore, we introduce image masking to reduce the overfitting of pretraining on small dataset. State-of-the-art results are achieved by our method for training from scratch based on vanilla ViT backbones on seven small-scale datasets. The transferring performance of our method is also tested on small datasets, and results show that it is improved significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. EDITORIAL.
- Author
-
DE GREGORIO, MASSIMO and FRUCCI, MARIA
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,VISION ,PATTERN perception ,FORM perception ,HUMAN-computer interaction - Abstract
No abstract received. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
20. Small Object Detection Methods in Complex Background: An Overview.
- Author
-
Li, Zhigang, Guo, Qimei, Sun, Bo, Cao, Difei, Li, Yingqi, and Sun, Xiaochuan
- Subjects
OBJECT recognition (Computer vision) ,COMPUTER vision ,VISUAL fields - Abstract
Small object detection has been a research hotspot in the field of computer vision. Especially in complex backgrounds (CBs), SOD faces various challenges, including inconspicuous small object features, object distortion due to CBs interference, and inaccurate object localization due to various noises. So far, many methods have been proposed to improve the SOD content in CBs. In this paper, based on an extensive study of related literature, we first outline the current challenges and some cutting-edge solutions for SOD, and then introduce the complex background interference types present in small object images and the imaging characteristics of different types of images, as well as the characteristics of small objects. Next, the image pre-processing methods are summarized. Based on this, machine learning-based SOD methods and traditional SOD methods are focused on. Finally, the future development direction is given. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Improving Registration of Augmented Reality by Incorporating DCNNS into Visual SLAM.
- Author
-
Chen, Yongbin, He, Hanwu, Chen, Heen, and Zhu, Teng
- Subjects
AUGMENTED reality ,SLAM (Robotics) ,THREE-dimensional imaging ,NEURAL computers ,COMPUTER vision - Abstract
Augmented reality (AR) by analyzing the characteristics of the scene, the computer-generated geometric information which can be added to the real environment in the way of visual fusion, reinforces the perception of the world. Three-dimensional (3D) registration is one of the core issues of in AR. The key issue is to estimate the visual sensor’s posture in the 3D environment and figure out the objects in the scene. Recently, computer vision has made significant progress, but the registration based on natural feature points in 3D space for AR system is still a severe problem. There is the difficulty of working out the mobile camera’s posture in the 3D scene precisely due to the unstable factors, such as the image noise, changing light and the complex background pattern. Therefore, to design a stable, reliable and efficient scene recognition algorithm is still very challenging work. In this paper, we propose an algorithm which combines Visual Simultaneous Localization and Mapping (SLAM) and Deep Convolutional Neural Networks (DCNNS) to boost the performance of AR registration. Semantic segmentation is a dense prediction task which aims to predict categories for each pixel in an image when applying to AR registration, and it will be able to narrow the searching range of the feature point between the two frames thus enhancing the stability of the system. Comparative experiments in this paper show that the semantic scene information will bring a revolutionary breakthrough to the AR interaction. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
22. A New Method on Super Pixel Reducing Stereo Matching Time of Integrated Imaging.
- Author
-
Wang, Xue-Guang, Li, Ming, Zhang, Lei, Zhao, Hui, and Palaoag, Thelma D.
- Subjects
BINOCULAR vision ,STEREO image ,TEST methods ,COMPUTER vision ,PIXELS - Abstract
Stereo vision and 3D reconstruction technologies are increasingly concerned in many fields. Stereo matching algorithm is the core of stereo vision and also a technical difficulty. A novel method based on super pixels is mentioned in this paper to reduce the calculating amount and the time. Stereo images from University of Tsukuba are used to test our method. The proposed method spends only 1% of the time spent by the conventional method. Through a two-step super-pixel matching optimization, it takes 6.72 s to match a picture, which is 12.96% of the pre-optimization. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
23. Head Pose Estimation Based on Multi-Level Feature Fusion.
- Author
-
Yan, Chunman and Zhang, Xiao
- Subjects
- *
POSE estimation (Computer vision) , *COMPUTER vision , *FEATURE extraction , *APPLICATION software , *QUATERNIONS , *PROBLEM solving , *EULER angles - Abstract
Head Pose Estimation (HPE) has a wide range of applications in computer vision, but still faces challenges: (1) Existing studies commonly use Euler angles or quaternions as pose labels, which may lead to discontinuity problems. (2) HPE does not effectively address regression via rotated matrices. (3) There is a low recognition rate in complex scenes, high computational requirements, etc. This paper presents an improved unconstrained HPE model to address these challenges. First, a rotation matrix form is introduced to solve the problem of unclear rotation labels. Second, a continuous 6D rotation matrix representation is used for efficient and robust direct regression. The RepVGG-A2 lightweight framework is used for feature extraction, and by adding a multi-level feature fusion module and a coordinate attention mechanism with residual connection, to improve the network's ability to perceive contextual information and pay attention to features. The model's accuracy was further improved by replacing the network activation function and improving the loss function. Experiments on the BIWI dataset 7:3 dividing the training and test sets show that the average absolute error of HPE for the proposed network model is 2.41. Trained on the dataset 300W_LP and tested on the AFLW2000 and BIWI datasets, the average absolute errors of HPE of the proposed network model are 4.34 and 3.93. The experimental results demonstrate that the improved network has better HPE performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. SSD Optimization Model Based on Shallow Feature Fusion.
- Author
-
Yang, Zhe, Bu, Zi-Yu, and Liu, Chun-Ping
- Subjects
OBJECT recognition (Computer vision) ,DEEP learning ,COMPUTER vision ,VISUAL fields ,PYRAMIDS - Abstract
Object detection has been an important research branch in the field of computer vision. The single-shot-detection (SSD) is an object detection model based on deep learning, which can achieve a good balance between the detection accuracy and the detection speed, but has the problem of poor recognition accuracy for small objects. To address this limitation, this paper improves the structure of the SSD feature pyramid and up-samples the shallow feature map with small object information and fuses it with the upper feature map, thus enhancing the ability of the shallow feature map to represent detailed information. In this way, not only the overall detection accuracy of the SSD is improved, but also a relatively high detection speed is maintained. The proposed model is verified by experiments on two common datasets, the Pascal VOC and MS COCO datasets. On the Pascal VOC07+12, MS COCO14, and VOC07+12+COCO datasets, the improved model achieves the mean average precision values of 80.1% (+3.3% compared with the conventional model), 49.9% (+6.8%), and 82.1% (+3.0%), respectively. Meanwhile, the proposed model can achieve the detection speed of 42.2 frames per second. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Automatic Image Pixel Clustering based on Mussels Wandering Optimization.
- Author
-
Zhong, Xin and Shih, Frank Y.
- Subjects
MUSSELS ,COMPUTER vision ,NP-hard problems ,SUM of squares ,SWARM intelligence ,PIXELS - Abstract
Image pixel clustering or segmentation intends to identify pixel groups on an image without any preliminary labels. It remains a challenging task in computer vision since the size and shape of object segments are varied. Moreover, determining the segment number in an image without prior knowledge of the image content is an NP-hard problem. In this paper, we present an automatic image pixel clustering scheme based on mussels wandering optimization. An activation variable is applied to determine the number of clusters automatically with the cluster centers optimization. We revise the within- and between-class sum of squares ratio for random natural image content and develop a novel fitness function for the image pixel clustering task. Our proposed scheme is compared against existing state-of-the-art techniques using both synthetic data and real ASD dataset. Experimental results show the superiority performance of the proposed scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
26. Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification.
- Author
-
Rasheed, Ali Salim, Zabihzadeh, Davood, and Al-Obaidi, Sumia Abdulhussien Razooqi
- Subjects
CONTENT-based image retrieval ,DISTANCE education ,COMPUTER vision ,MACHINE learning - Abstract
Metric learning algorithms aim to make the conceptually related data items closer and keep dissimilar ones at a distance. The most common approach for metric learning on the Mahalanobis method. Despite its success, this method is limited to find a linear projection and also suffer from scalability respecting both the dimensionality and the size of input data. To address these problems, this paper presents a new scalable metric learning algorithm for multi-modal data. Our method learns an optimal metric for any feature set of the multi-modal data in an online fashion. We also combine the learned metrics with a novel Passive/Aggressive (PA)-based algorithm which results in a higher convergence rate compared to the state-of-the-art methods. To address scalability with respect to dimensionality, Dual Random Projection (DRP) is adopted in this paper. The present method is evaluated on some challenging machine vision datasets for image classification and Content-Based Information Retrieval (CBIR) tasks. The experimental results confirm that the proposed method significantly surpasses other state-of-the-art metric learning methods in most of these datasets in terms of both accuracy and efficiency. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
27. Visual Interfaces to Computers: A Systems-Oriented First Course in Reliable Control via Imagery (“Visual Interfaces”).
- Author
-
Kender, John R.
- Subjects
COMPUTER vision ,COMPUTER interfaces ,EDUCATION - Abstract
We present the rationale, description, and critique of a first course in image computing that is not a traditional computer vision principles-and-tools course. "Visual Interfaces to Computers" is instead complementary to standard Computer Vision, User Interface, and Graphics courses; in fact, VI:CV::UI:G. It is organized by case studies of working visual systems that use camera input for data or control information in service of higher user goals, such as GUI control, user identification, or automobile steering. Many CV scientific principles and engineering tools are therefore taught, as well as those of psychophysics, AI, and EE, but taught selectively and always within the context of total system design. Course content is derived from conference and journal articles and Ph.D. theses, augmented with video tapes and real-time web site demos. Students do two homework assignments, one to design a "visual combination lock", and one to parse an image into English. They also do a final paper or project of their own choosing, often in teams of two, and often with surprisingly deep results. The course is assisted by a custom C-based tool kit, "XILite", a user-friendly (and comparatively bug-free) modification of Sun's X-windows Image Library for our lab's camera-equipped Sun workstations. The course has been offered twice to a wide audience with good reviews. [ABSTRACT FROM AUTHOR]
- Published
- 2001
- Full Text
- View/download PDF
28. An Adaptive Parameter Choosing Approach for Regularization Model.
- Author
-
Xu, Xiaowei and Bu, Ting
- Subjects
TIKHONOV regularization ,REGULARIZATION parameter ,PARAMETER estimation ,PATTERN recognition systems ,ARTIFICIAL intelligence ,COMPUTER vision - Abstract
The choice of regularization parameters is a troublesome issue for most regularization methods, e.g. Tikhonov regularization method, total variation (TV) method, etc. An appropriate parameter for a certain regularization approach can obtain fascinating results. However, general methods of choosing parameters, e.g. Generalized Cross Validation (GCV), cannot get more precise results in practical applications. In this paper, we consider exploiting the more appropriate regularization parameter within a possible range, and apply the estimated parameter to Tikhonov model. In the meanwhile, we obtain the optimal regularization parameter by the designed criterions and evaluate the recovered solution. Moreover, referred parameter intervals and designed criterions of this method are also presented in the paper. Numerical experiments demonstrate that our method outperforms GCV method evidently for image deblurring application. Especially, the parameter estimation algorithm can also be applied to many regularization models related to pattern recognition, artificial intelligence, computer vision, etc. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
29. SRDT: A Novel Robust RGB-D Tracker Based on Siamese Region Proposal Network and Depth Information.
- Author
-
Sun, Zhen, Wu, Junfei, Wang, Lu, and Li, Qingdang
- Subjects
INFORMATION networks ,ARTIFICIAL neural networks ,OBJECT tracking (Computer vision) ,COMPUTER vision ,VISUAL fields ,FEATURE extraction - Abstract
Visual tracking is still a challenging fundamental task in the field of computer vision, especially in complex scenes such as long-term occlusion, nonrigid deformation and fast movement. In this paper, we presented an RGB-D tracker based on the Siamese Region Proposal Network and Depth Information. First, Siamese Network with shared parameters was constructed to perform feature extraction on the target patch and search area. Second, Region Proposal Network was constructed to estimate the target position in the RGB channels. At the same time, the depth information in the RGB-D video was used to determine the target occlusion state and fine-tune the target position. Finally, the tracker used depth information to achieve occlusion recovery when the target was fully occluded. The experimental result shows that the method has better performance in tracking accuracy and tracking speed on the large-scale Princeton RGB-D Tracking Benchmark (PTB) dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
30. Research on Human Movement Target Recognition Algorithm in Complex Traffic Environment.
- Author
-
Zou, Ying, Wang, Dahu, and Liu, Leian
- Subjects
HUMAN mechanics ,PATTERN recognition systems ,HUMAN experimentation ,TRAFFIC monitoring ,SUPPORT vector machines ,COMPUTER vision ,MOTION analysis - Abstract
With the increase in the total population of the society and the continuous increase in the number of trips, the traffic pressures faced by people are increasing. With the development and advancement of computer technology, the emergence of intelligent transportation provides a better way to solve the problem of effectively alleviating traffic pressure and reducing the incidence of traffic accidents. In recent years, intelligent traffic monitoring system, as one of the important branches in the field of intelligent transportation, has also received more and more attention. Among them, video-based moving target recognition technology involves theoretical knowledge in various fields such as artificial intelligence, image processing, pattern recognition and computer vision. It is an important means to realize "safe city" and "smart city" and a key technology for intelligent monitoring. Therefore, the research on human motion target recognition algorithm in complex traffic environment has important theoretical and practical value. In the field of intelligent traffic monitoring, the moving target detection and recognition effect of video images will have certain influence on the classification and behavior understanding of subsequent moving targets. In this paper, the commonly used moving target detection methods are studied first, and the convergence problem of the traditional Adaboost algorithm is improved. An Adaboost algorithm based on adaptive weight update is proposed, and then the support vector machine (SVM) is used. The algorithm identifies the detected moving target. Finally, through simulation experiments on the acquired video images, the results show that the proposed human motion target recognition algorithm based on adaptive weight update Adaboost and SVM has good feasibility and rationality. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
31. Algorithm for Curved Surface Mesh Generation Based on Delaunay Refinement.
- Author
-
zhou, Longquan, Wang, Hongjuan, Lu, Xinming, Zhang, Wei, and Zhang, Xingli
- Subjects
CURVED surfaces ,FINITE element method ,ALGORITHMS ,COMPUTER vision ,COMPUTER graphics ,NUMERICAL grid generation (Numerical analysis) - Abstract
Curved surface mesh generation is a key step for many areas. Here, a mesh generation algorithm for closed curved surface based on Delaunay refinement is proposed. We focus on improving the shape quality of the meshes generated and making them conform to 2-manifold. The Delaunay tetrahedralization of initial sample is generated first, the initial surface mesh which is a subset of the Delaunay tetrahedralization can be achieved. A triangle is refined by inserting a new point if it is large or of bad quality. For each sample, we also check the triangles that adjoin it whether from a topological disk. If not, the largest triangle will be refined. Finally, the surface mesh is updated after a new point is inserted into the sample. The definition of mesh size function for surface mesh generation is also put in this paper. Meshing experiments of some models demonstrate that the new algorithm is advantageous in generating high quality surface mesh, the count of mesh is suitable and can well approximate the curved surface. The presented method can be used for a wide range of problems including computer graphics, computer vision and finite element method. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
32. Enhanced Graph Neural Network with Multi-Task Learning and Data Augmentation for Semi-Supervised Node Classification.
- Author
-
Fan, Cheng, Wang, Buhong, and Wang, Zhen
- Subjects
- *
DATA augmentation , *SUPERVISED learning , *COMPUTER vision , *CLASSIFICATION , *PROBLEM solving - Abstract
Graph neural networks (GNNs) have achieved impressive success in various applications. However, training dedicated GNNs for small-scale graphs still faces many problems such as over-fitting and deficiencies in performance improvements. Traditional methods such as data augmentation are commonly used in computer vision (CV) but are barely applied to graph structure data to solve these problems. In this paper, we propose a training framework named MTDA (Multi-Task learning with Data Augmentation)-GNN, which combines data augmentation and multi-task learning to improve the node classification performance of GNN on small-scale graph data. First, we use Graph Auto-Encoders (GAE) as a link predictor, modifying the original graphs' topological structure by promoting intra-class edges and demoting interclass edges, in this way to denoise the original graph and realize data augmentation. Then the modified graph is used as the input of the node classification model. Besides defining the node pair classification as an auxiliary task, we introduce multi-task learning during the training process, forcing the predicted labels to conform to the observed pairwise relationships and improving the model's classification ability. In addition, we conduct an adaptive dynamic weighting strategy to distribute the weight of different tasks automatically. Experiments on benchmark data sets demonstrate that the proposed MTDA-GNN outperforms traditional GNNs in graph-based semi-supervised node classification. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
33. An Accelerated and Flexible SIFT Parallel-Computing Approach Based on the General Multi-Core Platform.
- Author
-
Wang, Gang, Zhou, Mingliang, Fang, Bin, Huang, Haichao, Shu, Zhenyu, and Chen, Xueshu
- Subjects
IMAGE registration ,COMPUTER vision ,RASPBERRY Pi ,COMPUTER engineering ,PARALLEL programming - Abstract
Visual retrieval has been a significant technology in the computer vision task. Visual feature descriptors are the key to the visual retrieval. The famous local feature descriptor is called the Scale Invariant Feature Transform (SIFT), which can keep invariant mapping for the scale, rotate and simulate images. To utilize effectively the SIFT feature descriptor for visual matching on different hardware platforms, this paper proposes an accelerated SIFT algorithm based on the SIFT feature computing principle of the general multi-core platform. First, our multi-core task allocation method introduces the WFM theory into task assignment for each core to improve the core computing resource utilization for high-efficient parallel computing. Then, to improve the efficiency of picture matching, we introduce global geometric constraints condition to optimal picture matching for the multi-core parallelization approach. Experimental results show that the proposed approach can save on average 87.31% on the Intel X86 platform, compared to the single-core time. Also, our approach can save on average 33.79% on the Raspberry Pi platform, compared to the single-core time. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. LWRN: Light-Weight Residual Network for Edge Detection.
- Author
-
Han, Chen, Li, Dingyu, and Wang, Xuanyin
- Subjects
CONVOLUTIONAL neural networks ,COMPUTER vision ,VISUAL fields ,IMAGE representation - Abstract
Edge detection is one of the most fundamental fields in computer vision. With the rapid development of the combination of Convolutional Neural Network and Multi-Scale Representation of image, significant progress has been made in this field. However, most of them have a huge size, which makes it hard to apply in reality, and a huge number of parameters may lead to waste of computing resources. In this paper, we focus on qualitative analysis of the role of each part in the network, and propose a modified light-weight architecture based on our result and the study of former works. Our new architecture is composed of residual-blocks, max-pooling layers and batch normalization layers. Compared with the previous models, the new architecture performs better in memory, convergence and computation efficiency with similar model size. Moreover, the new architecture can achieve better accuracy with smaller model size. When evaluating our model on the well-known BSDS500 benchmark, we achieve ODS F-measure of 0.769 with parameters less than 0.3 M, which shows a better property than the state-of-the-art result 0.766 at this level. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. SentiNet: A Nonverbal Facial Sentiment Analysis Using Convolutional Neural Network.
- Author
-
Refat, Md Abu Rumman, Singh, Bikash Chandra, and Rahman, Mohammad Muntasir
- Subjects
CONVOLUTIONAL neural networks ,SENTIMENT analysis ,FACIAL expression ,DATA augmentation ,DEEP learning ,AUGMENTED reality ,COMPUTER vision - Abstract
Human facial expressions are an essential and fundamental component for expressing the state of the human mind. The automatic analysis of these nonverbal facial expressions has become a fascinating and quite challenging problem in computer vision, with its application in different areas, such as psychology, human–machine interaction, health, and augmented reality. Recently, deep learning (DL) has become a widespread technique for studying human nonverbal facial sentiment expressions, and some research attempts have been made to propose a certain model on this topic. The purpose of this paper is to apply the appropriate convolutional neural network (CNN) approach by adding several layers of different dimensions, which allows the CNN approach to efficiently classify human facial sentiment expressions with data augmentation capable of recognizing seven basic human facial expressions: anger, sadness, fear, disgust, happiness, surprise, and neutral. In particular, this study mainly proposes a convolution neural network architecture, as well as learning factors that minimize the memory space and total training time of the proposed network due to the shallow architecture of the model. Following that, we demonstrated our proposed model's network complexity, computational cost, and classification accuracy on the three benchmark datasets: FER2013, KDEF, and JAFFE. As a result, our proposed approach achieves accuracy of 6 7. 5 % , 7 9. 5 % , 9 0. 0 % in the FER2013, KDEF, and JAFFE, respectively, which is better compared to other state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Nondestructive Visual Inspection Method of Double-Yolked Duck Egg.
- Author
-
Li, Li, Wang, Qiaohua, Weng, Fujiong, and Yuan, Cheng
- Subjects
INSPECTION & review ,LIGHT transmission ,EGG yolk ,EGGS ,BIRD eggs ,COMPUTER vision ,IMAGE processing ,PATTERN recognition systems - Abstract
Duck eggs are rich in good protein and deeply favored by consumers. Among them, double-yolked duck eggs have high commercial value but are not easy to incubate, so they are often picked out by inspection before processing, circulation or being incubated in egg products industry. This paper used the computer vision technology on single- and double-yolked duck eggs with similar shape and size that were hard to distinguish merely by their external characteristics, studied the effective image processing algorithm and built a cognition model. Then, we collected duck eggs' transmission light color image, extracted B channel image, carried out iterative morphological open reconstruction and threshold segmentation to acquire yolk area, used convexity defects to identify double-yolked eggs and finally utilized watershed and ellipse fitting to separate yolk. Through inspection, 150 single- and double-yolked eggs were distinguished by using this method, with correct recognition rate reaching to 99% and 100%, respectively. The experimental results indicate that this method has a high accuracy in recognizing double-yolked duck eggs and can provide technical support in nondestructive identification of double-yolked duck egg. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. A MULTI-LAYER CONTRAST ANALYSIS METHOD FOR TEXTURE CLASSIFICATION BASED ON LBP.
- Author
-
CHEN, HENG-XIN, TANG, Y. Y., FANG, BIN, and WANG, PATRICK S. P.
- Subjects
CONTRAST effect ,TEXTURES ,PATTERN recognition systems ,COMPUTER vision ,CODING theory ,INVARIANTS (Mathematics) ,CLASSIFICATION - Abstract
Texture classification is one of the important fields in pattern recognition and machine vision research. LBP method,
13-15 proposed by Ojala, can be used to classify texture images effectively. And the LBP method has rotation-invariant, illumination-invariant, multi-resolution characteristics. But, since the contrast is not considered between neighbor pixels, the correct classification rate produced by this method has been remarkably influenced by light source type and light source orientation. The LMLCP (Local Multiple Layer Contrast Pattern) method, proposed by this paper, maps the contrast value between two near pixels to a rank value, which represent a relative contrast value range, and computes the statistic histogram referring to the work in LBP method. The LMLCP method can bring out the rapid expansion of feature dimension, so a special feature encoding method used in 3DLBP6 is adopted by this paper. The experiment, which is built based on Outex_TC_00012,12 demonstrates that the LMLCP can evidently make a more accurate classification rate than LBP method. [ABSTRACT FROM AUTHOR]- Published
- 2011
- Full Text
- View/download PDF
38. Sit-to-Stand Test for Neurodegenerative Diseases Video Classification.
- Author
-
Convertini, Nicola, Dentamaro, Vincenzo, Impedovo, Donato, and Pirlo, Giuseppe
- Subjects
NOSOLOGY ,NEURODEGENERATION ,HUMAN mechanics ,DIAGNOSIS ,VIDEO recording - Abstract
In this extended version of this paper, an automatic video diagnosis system for dementia classification is presented. Starting from video recordings of patients and control subjects, performing sit-to-stand test, the designed system is capable of extracting relevant patterns for binary discern patients with dementia from healthy subjects. The original system achieved an accuracy 0.808 by using the rigorous inter-patient separation scheme especially suited for medical purposes. This separation scheme provides the use of some people for training and others, different, people for testing. The implementation of features from the kinematic theory of rapid human movement and its sigma-lognormal model together with classic features increased the overall accuracy of the system to 0.947 F1 score. In addition, multi-class classification was performed with the aim of classifying neurodegenerative disease severities. This work is an original and pioneering work on sit-to-stand video classification for neurodegenerative diseases, its novelties are on phases segmentation, experimental setup and the application of kinematic theory of rapid human movements to sit-to-stand videos for neurodegenerative disease assessment. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. MSFE-PANet: Improved YOLOv4-Based Small Object Detection Method in Complex Scenes.
- Author
-
Pan, Xiaoying, Jia, Ningxin, Mu, Yuanzhen, and Bai, Weidong
- Subjects
- *
OBJECT recognition (Computer vision) , *ARTIFICIAL intelligence , *ARTIFICIAL vision , *OBJECT tracking (Computer vision) , *COMPUTER vision , *PROBLEM solving - Abstract
With the rapid development of computer vision and artificial intelligence technology, visual object detection has made unprecedented progress, and small object detection in complex scenes has attracted more and more attention. To solve the problems of ambiguity, overlap and occlusion in small object detection in complex scenes. In this paper, a multi-scale fusion feature enhanced path aggregation network MSFE-PANet is proposed. By adding attention mechanism and feature fusion, the fusion of strong positioning information of deep feature map and strong semantic information of shallow feature map is enhanced, which helps the network to find interesting areas in complex scenes and improve its sensitivity to small objects. The rejection loss function and network prediction scale are designed to solve the problems of missing detection and false detection of overlapping and blocking small objects in complex backgrounds. The proposed method achieves an accuracy of 40.7% on the VisDrone2021 dataset and 89.7% on the PASCAL VOC dataset. Comparative analysis with mainstream object detection algorithms proves the superiority of this method in detecting small objects in complex scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. Real-Time Implementation of Traffic Signs Detection and Identification Application on Graphics Processing Units.
- Author
-
Ayachi, Riadh, Afif, Mouna, Said, Yahia, and Abdelali, Abdessalem Ben
- Subjects
TRAFFIC monitoring ,TRAFFIC signs & signals ,DEEP learning ,GRAPHICS processing units ,CONVOLUTIONAL neural networks ,COMPUTER vision - Abstract
Traffic signs detection has become an important feature of Advanced driving assisting systems and even self-driving cars. In this paper, we present an implementation of a traffic signs detection method on Graphics Processing Units (GPU) under real-time conditions. The proposed model is based on deep convolutional neural networks, a deep learning model used in computer vision applications. The deep convolutional neural networks have recently been used to solve many computer vision tasks successfully. Unlike old techniques, the model is used to detect and identify the traffic signs at the same time without the need for any external modules. To achieve real-time inference, we implement the proposed model on the GPU as a natural choice for the implementation of deep learning-based models. Also, we build large traffic signs detection dataset. The dataset contains 10 000 images captured from the Chinese roads under real-world factors like lightning, occlusion, complex background, etc. 73 traffic sign classes were considered in this dataset. The evaluation of the proposed model on the proposed dataset shows robust performance in terms of speed and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
41. Lip Segmentation Based on Combined Color Space and ACM with Rhombic Initial Contour.
- Author
-
Lu, Yuanyao and Liu, Qingqing
- Subjects
IMAGE segmentation ,PATTERN recognition systems ,DISCRETE Hartley transforms ,APPROXIMATION algorithms ,COMPUTER vision - Abstract
Lip segmentation is one of critical steps in a lip-reading system, because it closely relates to the accuracy of system recognition. In this paper, we aim to improve the accuracy of lip segmentation. A novel color space is proposed which consists of the component in the CIE-LUV space and the sum of 2 and 3 components of the image after discrete Hartley transform (DHT). We select a rhombus as the initial contour as its shape is approximate to a closed lip shape relatively. These notions are achieved based on the method of the Active contour model. The active contour model (ACM) is performed by the Chan-Vese model, and the result of each component is gained separately. Finally, the ultimate results are obtained by merging the result of each component together. Through experiments we can get a conclusion that this method can get more accurate and smoother lip contour. Meanwhile, the proposed method is more efficient compared with the classic ACM because it avoids some problems in the classic active contour model, like the radius of the initial contour needs to be set manually according to the size of images. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
42. Real-Time Calibration and Registration Method for Indoor Scene with Joint Depth and Color Camera.
- Author
-
Zhang, Fengquan, Lei, Tingshen, Li, Jinhong, Cai, Xingquan, Shao, Xuqiang, Chang, Jian, and Tian, Feng
- Subjects
CALIBRATION ,IMAGE registration ,COMPUTER vision ,COMPUTATIONAL complexity ,VIDEO processing ,COLOR image processing - Abstract
Traditional vision registration technologies require the design of precise markers or rich texture information captured from the video scenes, and the vision-based methods have high computational complexity while the hardware-based registration technologies lack accuracy. Therefore, in this paper, we propose a novel registration method that takes advantages of RGB-D camera to obtain the depth information in real-time, and a binocular system using the Time of Flight (ToF) camera and a commercial color camera is constructed to realize the three-dimensional registration technique. First, we calibrate the binocular system to get their position relationships. The systematic errors are fitted and corrected by the method of B-spline curve. In order to reduce the anomaly and random noise, an elimination algorithm and an improved bilateral filtering algorithm are proposed to optimize the depth map. For the real-time requirement of the system, it is further accelerated by parallel computing with CUDA. Then, the Camshift-based tracking algorithm is applied to capture the real object registered in the video stream. In addition, the position and orientation of the object are tracked according to the correspondence between the color image and the 3D data. Finally, some experiments are implemented and compared using our binocular system. Experimental results are shown to demonstrate the feasibility and effectiveness of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
43. Evaluation of Image Complexity Based on SVOR.
- Author
-
Xiao, Bo, Duan, Jin, Liu, Xuelian, Zhu, Yong, and Wang, Hao
- Subjects
SUPPORT vector machines ,IMAGE processing ,COMPUTATIONAL complexity ,COMPUTER vision ,SENSORY perception ,STATISTICAL correlation - Abstract
Because of the subjectivity of human beings, the evaluation of image complexity in the Human Vision System (HVS) cannot be provided accurately by traditional image complexity evaluation models. In the 2016 Conference on Computer Vision and Pattern Recognition (CVPR 2016), an evaluation method of visual search difficulty based on the visual search time was proposed for the first time. In this paper, the ordinal relation of the image complexity for human perception was discussed, and a quantitative evaluation model based on Convolutional Neural Network (CNN) features and Support Vector Ordinal Regression (SVOR) with explicit inequality constraints on the thresholds was proposed. The results showed that the evaluation models based on SVOR and pyramid CNN features of images can describe the order relation of image complexity among different images more accurately, which achieve the Kendall's tau correlation of 0.4858, better than SVR overall under the same condition, whose highest Kendall's tau correlation is 0.4794. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
44. A Visual Secret Sharing Scheme Based on Improved Local Binary Pattern.
- Author
-
Zhang, Wenyin, Shih, Frank Y., Hu, Shunbo, and Jian, Muwei
- Subjects
PATTERN recognition systems ,COMPUTER vision ,INFORMATION sharing ,DATA security ,ALGORITHMS - Abstract
A visual secret sharing (VSS) scheme is intended to share secret information in a group to avoid potential treat of interruption and modification. In this paper, we present a novel VSS scheme based on the improved local binary pattern (LBP) operator. It makes full use of local contrast features of LBP for concealing secret image data into different image shares, which can be used to recover the secret easily and exactly. By varying LBP extensions, we can design various kinds of VSS schemes for sharing secret information. Compared to the currently available VSS algorithms, the proposed scheme demonstrates better randomness in shares with less pixel expansion and exact determination in reconstruction with lower computational cost. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
45. A Study on the Analysis Model of the Ranking of the Theme of Weibo.
- Author
-
Zhang, Rui, Jin, Zhigang, and Liu, Xiaohui
- Subjects
RANKINGS of websites ,IMAGE retrieval ,COMPUTER vision ,TEXT processing (Computer science) ,DEEP learning - Abstract
Sina Weibo, the most popular Chinese social platform with hundreds and millions of user-contributed images and texts, is growing rapidly. However, the noise between the image and text, as well as their incomplete correspondence, makes accurate image retrieval and ranking difficult. In this paper, we propose a deep learning framework using visual features, text content and popularity of Weibo to calculate the similarity between the image and the text based on training the model to maximize the likelihood of the target description sentence given the training image. In addition, the retrieval results are reranked using the popularity of the image. The comparison experiment of the large-scale Sina Weibo dataset proves the validity of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
46. Local Stereo Matching: An Adaptive Weighted Guided Image Filtering-Based Approach.
- Author
-
Zhang, Ben and Zhu, Denglin
- Subjects
BINOCULAR vision ,COMPUTER vision ,REGULARIZATION parameter ,COMPUTER systems ,INTELLIGENT transportation systems ,ALGORITHMS - Abstract
Innovative applications in rapidly evolving domains such as robotic navigation and autonomous (driverless) vehicles rely on binocular computer vision systems that meet stringent response time and accuracy requirements. A key problem in these vision systems is stereo matching, which involves matching pixels from two input images in order to construct the output, a 3D map. Building upon the existing local stereo matching algorithms, this paper proposes a novel stereo matching algorithm that is based on a weighted guided filtering foundation. The proposed algorithm consists of three main steps; each step is designed with the goal of improving accuracy. First, the matching costs are computed using a unique combination of complementary methods (absolute difference, Census, and gradient algorithms) to reduce errors. Second, the costs are aggregated using an adaptive weighted guided image filtering method. Here, the regularization parameters are adjusted adaptively using the Canny method, further reducing errors. Third, a disparity map is generated using the winner-take-all strategy; this map is subsequently refined using a densification method to reduce errors. Our experimental results indicate that the proposed algorithm provides a higher level of accuracy in comparison to a collection of the existing state-of-the-art local algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. PREFACE.
- Author
-
BLOCH, ISABELLE and CESAR JR., ROBERTO M.
- Subjects
PREFACES & forewords ,PATTERN recognition systems ,SCIENCE periodicals ,COMPUTER vision ,IMAGE processing ,IMAGE analysis - Published
- 2013
- Full Text
- View/download PDF
48. Geometric Positioning and Color Recognition of Greenhouse Electric Work Robot Based on Visual Processing.
- Author
-
Xu, Zhifu, Shi, Xiaoyan, Ye, Hongbao, and Hua, Shan
- Subjects
GREENHOUSES ,COMPUTER vision ,INDUSTRIAL robots ,ROBOT kinematics ,AUTOMATION ,INDUSTRIAL efficiency ,MOTION control devices ,ROBOT vision - Abstract
With the continuous development of science and technology, industrial production technology is also constantly developing, and production efficiency is also constantly improving. Greenhouse electric working robots are industrial production tools with automatic control technology as the core, which affects the quality of industrial products and thus affects the profitability of the factory. According to the set programming work, the greenhouse electric working robot can realize the reproduction production and reduce the workload of the workers. In today's era, the industrial production steps are more complicated, the production process is more flexible, and the robot's unchanging posture and movement cannot meet the needs of modern industry, which restricts the development of the factory. In order to better complete the work of industrial robots, it is necessary to study the geometric positioning and color recognition of industrial robots based on machine vision to improve the working efficiency of industrial robots. This paper established an active positioning machine vision system for precise positioning of robot parts greenhouse electric working stations. The matching method using image processing and feature recognition area based on the shape of the binding phase combines the threshold shape criterion to identify object features. The experiments prove that the method can quickly and accurately obtain the object boundaries and centroid calculations and identification data, the robot kinematics combined with real-time motion control of the robot in order to eliminate this error, meet the requirements of the industrial robot self-aligned. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
49. Design and Development of Image Recognition Toolkit Based on Deep Learning.
- Author
-
Zhao, Hui, Zhang, Hai-Xia, Cao, Qing-Jiao, Sun, Sheng-Juan, Han, Xuanzhe, and Palaoag, Thelma D.
- Subjects
IMAGE recognition (Computer vision) ,DEEP learning ,ARTIFICIAL neural networks ,COMPUTER vision ,CONVOLUTIONAL neural networks ,MACHINE learning - Abstract
Deep learning algorithms have shown superior performance than traditional algorithms when dealing with computationally intensive tasks in many fields. The algorithm model based on deep learning has good performance and can improve the recognition accuracy in relevant applications in the field of computer vision. TensorFlow is a flexible opensource machine learning platform proposed by Google, which can run on a variety of platforms, such as CPU, GPU, and mobile devices. TensorFlow platform can also support current popular deep learning models. In this paper, an image recognition toolkit based on TensorFlow is designed and developed to simplify the development process of more and more image recognition applications. The toolkit uses convolutional neural networks to build a training model, which consists of two convolutional layers: one batch normalization layer before each convolutional layer, and the other pooling layer after each convolutional layer. The last two layers of the model use the full connection layer to output recognition results. Batch gradient descent algorithm is adopted in the optimization algorithm, and it integrates the advantages of both the gradient descent algorithm and the stochastic gradient descent algorithm, which greatly reduces the number of convergence iterations and has little influence on the convergence effect. The total training parameters of the toolkit model reach 1.7 million. In order to prevent overfitting problems, the dropout layer before each full connection layer is added and the threshold of 0.5 is set in the design. The convolution neural network model is trained and tested by the MNIST set on TensorFlow. The experimental result shows that the toolkit achieves the recognition accuracy of 99% on the MNIST test set. The development of the toolkit provides powerful technical support for the development of various image recognition applications, reduces its difficulty, and improves the efficiency of resource utilization. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
50. Target Object Recognition Using Multiresolution SVD and Guided Filter with Convolutional Neural Network.
- Author
-
Biswas, Biswajit, Ghosh, Swarup Kr, Ghosh, Anupam, Chakraborty, Chandan, and Mitra, Pabitra
- Subjects
CONVOLUTIONAL neural networks ,IMAGE fusion ,SINGULAR value decomposition ,COMPUTER vision ,FILTERS & filtration - Abstract
To design an efficient fusion scheme for the generation of a highly informative fused image by combining multiple images is still a challenging task in computer vision. A fast and effective image fusion scheme based on multi-resolution singular value decomposition (MR-SVD) with guided filter (GF) has been introduced in this paper. The proposed scheme decomposes an image of two-scale by MR-SVD into a lower approximate layer and a detailed layer containing the lower and higher variations of pixel intensity. It generates lower and details of left focused (LF) and right focused (RF) layers by applying the MR-SVD on each series of multi-focus images. GF is utilized to create a refined and smooth-textured weight fusion map by the weighted average approach on spatial features of the lower and detail layers of each image. A fused image of LF and RF has been achieved by the inverse MR-SVD. Finally, a deep convolutional autoencoder (CAE) has been applied to segment the fused results by generating the trained-patches mechanism. Comparing the results by state-of-the-art fusion and segmentation methods, we have illustrated that the proposed schemes provide superior fused and its segment results in terms of both qualitatively and quantitatively. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.