Publication Type: Academic Journals and Reviews / Topic: computer vision and deep learning - Searchworks@Jio Institute Digital Library Search Results

1. Web-based diagnostic platform for microorganism-induced deterioration on paper-based cultural relics with iterative training from human feedback

Author: Liu, Chenshu, Ben, Songbin, Liu, Chongwen, Li, Xianchao, Meng, Qingxia, Hao, Yilin, Jiao, Qian, and Yang, Pinyi
Published: 2024
Full Text: View/download PDF

2. Web-based diagnostic platform for microorganism-induced deterioration on paper-based cultural relics with iterative training from human feedback

Author: Chenshu Liu, Songbin Ben, Chongwen Liu, Xianchao Li, Qingxia Meng, Yilin Hao, Qian Jiao, and Pinyi Yang
Subjects: Paper-based cultural relics, Conservation, Computer vision, Deep learning, Strain classification, Web application, Fine Arts, Analytical chemistry, QD71-142
Abstract: Abstract Purpose Paper-based artifacts hold significant cultural and social values. However, paper is intrinsically fragile to microorganisms, such as mold, due to its cellulose composition, which can serve as a microorganisms’ nutrient source. Mold not only can damage papers’ structural integrity and pose significant challenges to conservation works but also may subject individuals attending the contaminated artifacts to health risks. Current approaches for strain identification usually require extensive training, prolonged time for analysis, expensive operation costs, and higher risks of secondary damage due to sampling. Thus, in current conservation practices with mold-contaminated artifacts, little pre-screening or strain identification was performed before mold removal, and the cleaning techniques are usually broad-spectrum rather than strain-specific. With deep learning showing promising applications across various domains, this study investigated the feasibility of using a convolutional neural network (CNN) for fast in-situ recognition and classification of mold on paper. Methods Molds were first non-invasively sampled from ancient Xuan Paper-based Chinese books from the Qing and Ming dynasties. Strains were identified using molecular biology methods and the four most prevalent strains were inoculated on Xuan paper to create mockups for image collection. Microscopic images of the molds as well as their stains situated on paper were collected using a compound microscope and commercial microscope lens for cell phone cameras, which were then used for training CNN models with a transfer learning scheme to perform the classification of mold. To enable involvement and contribution from the research community, a web interface that actuates the process while providing interactive features for users to learn about the information of the classified strain was constructed. Moreover, a feedback functionality in the web interface was embedded for catching potential classification errors, adding additional training images, or introducing new strains, all to refine the generalizability and robustness of the model. Results & Conclusion In the study, we have constructed a suite of high-confidence classification CNN models for the diagnostic process for mold contamination in conservation. At the same time, a web interface was constructed that allows recurrently refining the model with human feedback through engaging the research community. Overall, the proposed framework opens new avenues for effective and timely identification of mold, thus enabling proactive and targeted mold remediation strategies in conservation.
Published: 2024
Full Text: View/download PDF

3. NSTU-BDTAKA: An open dataset for Bangladeshi paper currency detection and recognition

Author: Md. Jubayar Alam Rafi, Mohammad Rony, and Nazia Majadi
Subjects: Computer vision, Deep learning, Image analysis, Taka detection, Taka recognition, YOLOv5 model, Computer applications to medicine. Medical informatics, R858-859.7, Science (General), Q1-390
Abstract: One of the most popular and well-established forms of payment in use today is paper money. Handling paper money might be challenging for those with vision impairments. Assistive technology has been reinventing itself throughout time to better serve the elderly and disabled people. To detect paper currency and extract other useful information from them, image processing techniques and other advanced technologies, such as Artificial Intelligence, Deep Learning, etc., can be used. In this paper, we present a meticulously curated and comprehensive dataset named ‘NSTU-BDTAKA’ tailored for the simultaneous detection and recognition of a specific object of cultural significance - the Bangladeshi paper currency (in Bengali it is called ‘Taka’). This research aims to facilitate the development and evaluation of models for both taka detection and recognition tasks, offering a rich resource for researchers and practitioners alike. The dataset is divided into two distinct components: (i) taka detection, and (ii) taka recognition. The taka detection subset comprises 3,111 high-resolution images, each meticulously annotated with rectangular bounding boxes that encompass instances of the taka. These annotations serve as ground truth for training and validating object detection models, and we adopt the state-of-the-art YOLOv5 architecture for this purpose. In the taka recognition subset, the dataset has been extended to include a vast collection of 28,875 images, each showcasing various instances of the taka captured in diverse contexts and environments. The recognition dataset is designed to address the nuanced task of taka recognition providing researchers with a comprehensive set of images to train, validate, and test recognition models. This subset encompasses challenges such as variations in lighting, scale, orientation, and occlusion, further enhancing the robustness of developed recognition algorithms. The dataset NSTU-BDTAKA not only serves as a benchmark for taka detection and recognition but also fosters advancements in object detection and recognition methods that can be extrapolated to other cultural artifacts and objects. We envision that the dataset will catalyze research efforts in the field of computer vision, enabling the development of more accurate, robust, and efficient models for both detection and recognition tasks.
Published: 2024
Full Text: View/download PDF

4. An Overview of Machine Learning in Orthopedic Surgery: An Educational Paper.

Author: Padash, Sirwa, Mickley, John P., Vera Garcia, Diana V., Nugen, Fred, Khosravi, Bardia, Erickson, Bradley J., Wyles, Cody C., and Taunton, Michael J.
Abstract: The growth of artificial intelligence combined with the collection and storage of large amounts of data in the electronic medical record collection has created an opportunity for orthopedic research and translation into the clinical environment. Machine learning (ML) is a type of artificial intelligence tool well suited for processing the large amount of available data. Specific areas of ML frequently used by orthopedic surgeons performing total joint arthroplasty include tabular data analysis (spreadsheets), medical imaging processing, and natural language processing (extracting concepts from text). Previous studies have discussed models able to identify fractures in radiographs, identify implant type in radiographs, and determine the stage of osteoarthritis based on walking analysis. Despite the growing popularity of ML, there are limitations including its reliance on "good" data, potential for overfitting, long life cycle for creation, and ability to only perform one narrow task. This educational article will further discuss a general overview of ML, discussing these challenges and including examples of successfully published models. [Display omitted] [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Computer vision digitization of smartphone images of anesthesia paper health records from low-middle income countries.

Author: Folks, Ryan D., Naik, Bhiken I., Brown, Donald E., and Durieux, Marcel E.
Subjects: *MEDICAL records, *ARTIFICIAL neural networks, *COMPUTER vision, *DIASTOLIC blood pressure, *MEDICAL personnel, *DEEP learning, *SYSTOLIC blood pressure
Abstract: Background: In low-middle income countries, healthcare providers primarily use paper health records for capturing data. Paper health records are utilized predominately due to the prohibitive cost of acquisition and maintenance of automated data capture devices and electronic medical records. Data recorded on paper health records is not easily accessible in a digital format to healthcare providers. The lack of real time accessible digital data limits healthcare providers, researchers, and quality improvement champions to leverage data to improve patient outcomes. In this project, we demonstrate the novel use of computer vision software to digitize handwritten intraoperative data elements from smartphone photographs of paper anesthesia charts from the University Teaching Hospital of Kigali. We specifically report our approach to digitize checkbox data, symbol-denoted systolic and diastolic blood pressure, and physiological data. Methods: We implemented approaches for removing perspective distortions from smartphone photographs, removing shadows, and improving image readability through morphological operations. YOLOv8 models were used to deconstruct the anesthesia paper chart into specific data sections. Handwritten blood pressure symbols and physiological data were identified, and values were assigned using deep neural networks. Our work builds upon the contributions of previous research by improving upon their methods, updating the deep learning models to newer architectures, as well as consolidating them into a single piece of software. Results: The model for extracting the sections of the anesthesia paper chart achieved an average box precision of 0.99, an average box recall of 0.99, and an mAP0.5-95 of 0.97. Our software digitizes checkbox data with greater than 99% accuracy and digitizes blood pressure data with a mean average error of 1.0 and 1.36 mmHg for systolic and diastolic blood pressure respectively. Overall accuracy for physiological data which includes oxygen saturation, inspired oxygen concentration and end tidal carbon dioxide concentration was 85.2%. Conclusions: We demonstrate that under normal photography conditions we can digitize checkbox, blood pressure and physiological data to within human accuracy when provided legible handwriting. Our contributions provide improved access to digital data to healthcare practitioners in low-middle income countries. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Deep Learning for Automated Visual Inspection in Manufacturing and Maintenance: A Survey of Open- Access Papers

Author: Nils Hütten, Miguel Alves Gomes, Florian Hölken, Karlo Andricevic, Richard Meyes, and Tobias Meisen
Subjects: automated visual inspection, industrial applications, deep learning, computer vision, convolutional neural network, vision transformer, Technology, Applied mathematics. Quantitative methods, T57-57.97
Abstract: Quality assessment in industrial applications is often carried out through visual inspection, usually performed or supported by human domain experts. However, the manual visual inspection of processes and products is error-prone and expensive. It is therefore not surprising that the automation of visual inspection in manufacturing and maintenance is heavily researched and discussed. The use of artificial intelligence as an approach to visual inspection in industrial applications has been considered for decades. Recent successes, driven by advances in deep learning, present a possible paradigm shift and have the potential to facilitate automated visual inspection, even under complex environmental conditions. For this reason, we explore the question of to what extent deep learning is already being used in the field of automated visual inspection and which potential improvements to the state of the art could be realized utilizing concepts from academic research. By conducting an extensive review of the openly accessible literature, we provide an overview of proposed and in-use deep-learning models presented in recent years. Our survey consists of 196 open-access publications, of which 31.7% are manufacturing use cases and 68.3% are maintenance use cases. Furthermore, the survey also shows that the majority of the models currently in use are based on convolutional neural networks, the current de facto standard for image classification, object recognition, or object segmentation tasks. Nevertheless, we see the emergence of vision transformer models that seem to outperform convolutional neural networks but require more resources, which also opens up new research opportunities for the future. Another finding is that in 97% of the publications, the authors use supervised learning techniques to train their models. However, with the median dataset size consisting of 2500 samples, deep-learning models cannot be trained from scratch, so it would be beneficial to use other training paradigms, such as self-supervised learning. In addition, we identified a gap of approximately three years between approaches from deep-learning-based computer vision being published and their introduction in industrial visual inspection applications. Based on our findings, we additionally discuss potential future developments in the area of automated visual inspection.
Published: 2024
Full Text: View/download PDF

7. Deep Learning for 3D Reconstruction, Augmentation, and Registration: A Review Paper.

Author: Vinodkumar, Prasoon Kumar, Karabulut, Dogus, Avots, Egils, Ozcinar, Cagri, and Anbarjafari, Gholamreza
Subjects: *DEEP learning, *COMPUTER vision, *GRAPH neural networks, *ARTIFICIAL intelligence, *MACHINE learning, *GENERATIVE adversarial networks
Abstract: The research groups in computer vision, graphics, and machine learning have dedicated a substantial amount of attention to the areas of 3D object reconstruction, augmentation, and registration. Deep learning is the predominant method used in artificial intelligence for addressing computer vision challenges. However, deep learning on three-dimensional data presents distinct obstacles and is now in its nascent phase. There have been significant advancements in deep learning specifically for three-dimensional data, offering a range of ways to address these issues. This study offers a comprehensive examination of the latest advancements in deep learning methodologies. We examine many benchmark models for the tasks of 3D object registration, augmentation, and reconstruction. We thoroughly analyse their architectures, advantages, and constraints. In summary, this report provides a comprehensive overview of recent advancements in three-dimensional deep learning and highlights unresolved research areas that will need to be addressed in the future. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Special issue on intelligent systems: ISMIS 2022 selected papers.

Author: Ceci, Michelangelo, Flesca, Sergio, Manco, Giuseppe, and Masciari, Elio
Subjects: MACHINE learning, ARTIFICIAL intelligence, DECISION support systems, KNOWLEDGE representation (Information theory), COMPUTER vision, DEEP learning
Abstract: This document is a special issue of the Journal of Intelligent Information Systems, focusing on the selected papers from the International Symposium on Methodologies for Intelligent Systems (ISMIS 2022). The symposium, held in Cosenza, Italy, showcased research on various topics related to artificial intelligence, including decision support, knowledge representation, machine learning, computer vision, and more. The special issue includes eleven papers that have undergone rigorous peer-reviewing and cover a wide range of research topics, such as deep learning, anomaly detection, malware detection, sentiment classification, and healthcare professionals' burnout. The authors express their gratitude to the contributors and reviewers for their valuable contributions. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

9. An interactive assessment framework for residential space layouts using pix2pix predictive model at the early-stage building design

Author: Mostafavi, Fatemeh, Tahsildoost, Mohammad, Zomorodian, Zahra Sadat, and Shahrestani, Seyed Shayan
Published: 2024
Full Text: View/download PDF

10. AI Machine Vision based Oven White Paper Color Classification and Label Position Real-time Monitoring System to Check Direction.

Author: Hee-Chul Kim, Youn-Saup Yoon, and Yong-Mo Kim
Subjects: COMPUTER vision, DEEP learning, JOB classification, MANUFACTURING process automation, ARTIFICIAL intelligence, COLOR image processing
Abstract: We develop a vision system for batch inspection by oven white paper model color by manufacturing a machine vision system for the oven manufacturing automation process. In the vision system, white paper object detection (spring), color clustering, and histogram extraction are performed. In addition, for the automated process of home appliances, we intend to develop an automatic mold combination detection algorithm that inspects the label position and direction (angle/coordinate) using deep learning. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. Enhancing IoT Network Security: Unveiling the Power of Self-Supervised Learning against DDoS Attacks.

Author: Almaraz-Rivera, Josue Genaro, Cantoral-Ceballos, Jose Antonio, and Botero, Juan Felipe
Subjects: SUPERVISED learning, DENIAL of service attacks, COMPUTER network security, INTRUSION detection systems (Computer security), COMPUTER network traffic, INTERNET of things, ELECTRONIC paper
Abstract: The Internet of Things (IoT), projected to exceed 30 billion active device connections globally by 2025, presents an expansive attack surface. The frequent collection and dissemination of confidential data on these devices exposes them to significant security risks, including user information theft and denial-of-service attacks. This paper introduces a smart, network-based Intrusion Detection System (IDS) designed to protect IoT networks from distributed denial-of-service attacks. Our methodology involves generating synthetic images from flow-level traffic data of the Bot-IoT and the LATAM-DDoS-IoT datasets and conducting experiments within both supervised and self-supervised learning paradigms. Self-supervised learning is identified in the state of the art as a promising solution to replace the need for massive amounts of manually labeled data, as well as providing robust generalization. Our results showcase that self-supervised learning surpassed supervised learning in terms of classification performance for certain tests. Specifically, it exceeded the F1 score of supervised learning for attack detection by 4.83% and by 14.61% in accuracy for the multiclass task of protocol classification. Drawing from extensive ablation studies presented in our research, we recommend an optimal training framework for upcoming contrastive learning experiments that emphasize visual representations in the cybersecurity realm. This training approach has enabled us to highlight the broader applicability of self-supervised learning, which, in some instances, outperformed supervised learning transferability by over 5% in precision and nearly 1% in F1 score. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

12. Improved HardNet and Stricter Outlier Filtering to Guide Reliable Matching.

Author: Meng Xu, Chen Shen, Jun Zhang, Zhipeng Wang, Zhiwei Ruan, Stefan Poslad, and Pengfei Xu
Subjects: DEEP learning, IMAGE registration, COMPUTER vision, IMAGE retrieval, CONFERENCE papers, STATISTICAL sampling, MICROPOLAR elasticity
Abstract: As the fundamental problem in the computer vision area, image matching has wide applications in pose estimation, 3D reconstruction, image retrieval, etc. Suffering from the influence of external factors, the process of image matching using classical local detectors, e.g., scale-invariant feature transform (SIFT), and the outlier filtering approaches, e.g., Random sample consensus (RANSAC), show high computation speed and pool robustness under changing illumination and viewpoints conditions, while image matching approaches with deep learning strategy (such as HardNet, OANet) display reliable achievements in large-scale datasets with challenging scenes. However, the past learning-based approaches are limited to the distinction and quality of the dataset and the training strategy in the image-matching approaches. As an extension of the previous conference paper, this paper proposes an accurate and robust image matching approach using fewer training data in an end-to-end manner, which could be used to estimate the pose error This research first proposes a novel dataset cleaning and construction strategy to eliminate the noise and improve the training efficiency; Secondly, a novel loss named quadratic hinge triplet loss (QHT) is proposed to gather more effective and stable feature matching; Thirdly, in the outlier filtering process, the stricter OANet and bundle adjustment are applied for judging samples by adding the epipolar distance constraint and triangulation constraint to generate more outstanding matches; Finally, to recall the matching pairs, dynamic guided matching is used and then submit the inliers after the PyRANSAC process. Multiple evaluation metrics are used and reported in the 1st place in the Track1 of CVPR Image-Matching Challenge Workshop. The results show that the proposed method has advanced performance in large-scale and challenging Phototourism benchmark. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

13. Pantograph Slider Detection Architecture and Solution Based on Deep Learning.

Author: Guo, Qichang, Tang, Anjie, and Yuan, Jiabin
Subjects: IMAGE processing, TECHNICAL specifications, COMPUTER vision, DEEP learning, PANTOGRAPH
Abstract: Railway transportation has been integrated into people's lives. According to the "Notice on the release of the General Technical Specification of High-speed Railway Power Supply Safety Testing (6C System) System" issued by the National Railway Administration of China in 2012, it is required to install pantograph and slide monitoring devices in high-speed railway stations, station throats and the inlet and exit lines of high-speed railway sections, and it is required to detect the damage of the slider with high precision. It can be seen that the good condition of the pantograph slider is very important for the normal operation of the railway system. As a part of providing power for high-speed rail and subway, the pantograph must be paid attention to in railway transportation to ensure its integrity. The wear of the pantograph is mainly due to the contact power supply between the slide block and the long wire during high-speed operation, which inevitably produces scratches, resulting in depressions on the upper surface of the pantograph slide block. During long-term use, because the depression is too deep, there is a risk of fracture. Therefore, it is necessary to monitor the slider regularly and replace the slider with serious wear. At present, most of the traditional methods use automation technology or simple computer vision technology for detection, which is inefficient. Therefore, this paper introduces computer vision and deep learning technology into pantograph slide wear detection. Specifically, this paper mainly studies the wear detection of the pantograph slider based on deep learning and the main purpose is to improve the detection accuracy and improve the effect of segmentation. From a methodological perspective, this paper employs a linear array camera to enhance the quality of the data sets. Additionally, it integrates an attention mechanism to improve segmentation performance. Furthermore, this study introduces a novel image stitching method to address issues related to incomplete images, thereby providing a comprehensive solution. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. A deep learning counting model applied to quality control

Author: Jaramillo, Juan R.
Published: 2023
Full Text: View/download PDF

15. Leveraging supplementary modalities in automated real estate valuation using comparative judgments and deep learning

Author: Despotovic, Miroslav, Koch, David, Stumpe, Eric, Brunauer, Wolfgang A., and Zeppelzauer, Matthias
Published: 2023
Full Text: View/download PDF

16. Intelligent Noncontact Structural Displacement Detection Method Based on Computer Vision and Deep Learning.

Author: Liu, Hongbo, Zhang, Fan, Ma, Rui, Wang, Longxuan, Chen, Zhihua, Zhang, Qian, and Guo, Liulu
Subjects: DISPLACEMENT (Psychology), COMPUTER vision, EUCLIDEAN distance, COMPUTER simulation, DEEP learning, BAMBOO
Abstract: Accurate identification of structural displacements is important for structural state assessment and performance evaluation. This paper proposes a real-time structural displacement detection model based on computer vision and deep learning. The model consists of three stages: identification, tracking, and displacement resolution. First, the displacement target is identified and tracked by the improved YOLO v7 algorithm and the improved DeepSORT algorithm. Then, the Euclidean distance method based on inverse perspective mapping (IPM-ED) is proposed for the analytical conversion of the displacement. Next, the accuracy and effectiveness of this displacement detection model are evaluated through four groups of bamboo axial compression tests. A comparative analysis is conducted between the IPM-ED displacement analysis method and the commonly used ED displacement analysis method. Finally, the robustness of this method is tested by using a cable breakage test of a cable dome structure as an application case. The research results demonstrate that the maximum average error of the four groups of bamboo displacement tests is only 3.10 mm, and the maximum relative error of peak displacement is only 6.54 mm. The RMSE basically stays around 3.5 mm. The maximum displacement error in the application case is only 4.91 mm, with a maximum MAPE of 4.94%. In addition, the error percentage under the IPM-ED algorithm is basically within 5%, while the error percentage of the ED algorithm is more than 10%. The method in this paper achieves efficient and intelligent identification of structural displacements in a non-contact manner. The proposed method is suitable for environments where the contact displacement sensor is easily affected by vibration, the site is complex and requires additional displacement sensor fixing equipment, the displacement sensor with super-high structure is unsafe to deploy, and the contact displacement sensor in narrow space is inconvenient to deploy, so it has broad application prospects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. The people behind the papers – Thomas Naert and Soeren Lienkamp.

Subjects: *DEEP learning, *POLYCYSTIC kidney disease, *CYSTIC kidney disease, *DEVELOPMENTAL biology, *COMPUTER vision
Published: 2021
Full Text: View/download PDF

18. INTRODUCTION TO THE SPECIAL ISSUE ON NEXT GENERATION PERVASIVE RECONFIGURABLE COMPUTING FOR HIGH PERFORMANCE REAL TIME APPLICATIONS.

Author: VENKATESAN, C., YU-DONG ZHANG, CHOW CHEE ONN, and AND YONG SHI
Subjects: MACHINE learning, REINFORCEMENT learning, HIGH performance computing, COMPUTER vision, ARTIFICIAL intelligence, PARSING (Computer grammar), DEEP learning
Abstract: This document introduces a special issue of the journal "Scalable Computing: Practice & Experience" focused on next-generation pervasive reconfigurable computing for high-performance real-time applications. The authors discuss the importance of adaptable platforms for real-time tasks and highlight the benefits of reconfigurable computing in accelerating applications like image processing and machine learning. The special issue aims to explore recent advancements in this field and includes research papers on topics such as network security, malware detection, software reliability prediction, and optimization algorithms for wing design. The papers cover a range of computer science and technology topics, showcasing advancements and their potential impact on various computing domains. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

19. Guest Editorial: Advanced image restoration and enhancement in the wild.

Author: Wang, Longguang, Li, Juncheng, Yokoya, Naoto, Timofte, Radu, and Guo, Yulan
Subjects: IMAGE intensifiers, IMAGE reconstruction, COMPUTER vision, SCHOLARSHIPS, COMPUTER engineering, IMAGE denoising, DEEP learning, VIDEO compression
Abstract: This document is a guest editorial from the journal IET Computer Vision, discussing the topic of advanced image restoration and enhancement. The editorial highlights the challenges faced in this field, such as the complexity of degradation models for real-world low-quality images and the difficulty of acquiring paired data. It also introduces a special issue of the journal that includes five accepted papers, which focus on video reconstruction and image super-resolution. The editorial concludes by providing brief summaries of each accepted paper. The guest editors of the special issue are also mentioned, along with their research interests and affiliations. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

20. Camera eats first: exploring food aesthetics portrayed on social media using deep learning

Author: Gambetti, Alessandro and Han, Qiwei
Published: 2022
Full Text: View/download PDF

21. Recognizing workers' construction activities on a reinforcement processing area through the position relationship of objects detected by faster R-CNN

Author: Li, Jiaqi, Zhou, Guangyi, Li, Dongfang, Zhang, Mingyuan, and Zhao, Xuefeng
Published: 2023
Full Text: View/download PDF

22. The emergence and evolution of urban AI.

Author: Batty, Michael
Subjects: ARTIFICIAL intelligence, DEEP learning, COMPUTER vision
Abstract: The fourth paper which is about "Emotional AI and Crime" takes the argument into the key area of how good or bad are AI techniques that are designed for facial and related recognition. To an extent, the focus on AI is wider than what might find in any discussion of AI in the narrower technical field for context is all important to see urban AI in context. Artificial intelligence (AI) emerged alongside the development of the digital computer more than 80 years ago during the second world war. [Extracted from the article]
Published: 2023
Full Text: View/download PDF

23. OBJECT DETECTION FOR THE VISUALLY IMPAIRED.

Author: CHIMWANGA, BRIGHTSON
Subjects: OBJECT recognition (Computer vision), MACHINE learning, ARTIFICIAL intelligence, ASSISTIVE computer technology, COMPUTER vision
Abstract: This paper presents the design and development of a mobile application, built using Flutter, that leverages object detection to enhance the lives of visually impaired individuals. The application addresses a crucial challenge faced by this community: the lack of real-time information about their surroundings. A solution is proposed that utilizes pre-trained machine learning models, potentially through TensorFlow Lite for on-device processing, to identify objects within the user's field of view as captured by the smartphone camera. The application goes beyond simple object recognition; detected objects are translated into natural language descriptions through text-to-speech functionality, providing crucial auditory cues about the environment. This real-time information stream empowers users to navigate their surroundings with greater confidence and independence. Accessibility is a core principle of this paper. The user interface will be designed for compatibility with screen readers, ensuring seamless interaction for users who rely on assistive technologies. Haptic feedback mechanisms will be incorporated to provide non-visual cues and enhance the user experience. The ultimate goal of this paper is to create a user-friendly and informative application that empowers visually impaired individuals to gain greater independence in their daily lives. The application has the potential to improve spatial awareness, foster a sense of security, and promote overall inclusion within society. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. A lightweight dual-attention network for tomato leaf disease identification.

Author: Enxu Zhang, Ning Zhang, Fei Li, and Cheng Lv
Subjects: MACHINE learning, COMPUTER vision, RICE diseases & pests, IMAGE recognition (Computer vision), PLANT diseases, DIGITAL-to-analog converters, DEEP learning
Abstract: Tomato disease image recognition plays a crucial role in agricultural production. Today, whilemachine vision methods based on deep learning have achieved some success indiseaserecognition, theystill faceseveralchallenges. These include issues such as imbalanced datasets, unclear disease features, small inter-class differences, and large intra-class variations. To address these challenges, this paper proposes a method for classifying and recognizing tomato leaf diseases based on machine vision. First, to enhance the disease feature details in images, a piecewise linear transformation method is used for image enhancement, and oversampling is employed to expand the dataset, compensating for the imbalanced dataset. Next, this paper introduces a convolutional block with a dual attention mechanismcalled DAC Block, which is used to construct a lightweight model named LDAMNet. The DAC Block innovatively uses Hybrid Channel Attention (HCA) and Coordinate Attention (CSA) to process channel information and spatial information of input images respectively, enhancing the model's feature extraction capabilities. Additionally, this paper proposes a Robust Cross-Entropy (RCE) loss function that is robust tonoisylabels, aimedat reducing the impact of noisy labels on the LDAM Net model during training. Experimental results show that this method achieves an average recognition accuracy of 98.71% on the tomato disease dataset, effectively retaining disease information in images and capturing disease areas. Furthermore, the method also demonstrates strong recognition capabilities on rice crop disease datasets, indicating good generalization performance and the ability to function effectively in disease recognition across different crops. The research findings of this paper provide new ideas and methods for the field of crop disease recognition. However, future research needs to further optimize the model's structure and computational efficiency, and validate its application effects in more practical scenarios. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. STDPNet: a dual-path surface defect detection neural network based on shearlet transform.

Author: An, Dong, Hu, Ronghua, Fan, Liting, Chen, Zhili, Liu, Zetong, and Zhou, Peng
Subjects: OBJECT recognition (Computer vision), COMPUTER vision, SURFACE defects, CONVOLUTIONAL neural networks, MANUFACTURING processes
Abstract: Defect detection systems based on machine vision have been widely used as an essential part of intelligent manufacturing systems. However, in traditional object detection methods that rely on images as input, differences in defect areas, blurred images, and complex background interference can seriously impair detection accuracy. To meet these challenges, this paper proposed a dual-path neural network based on shearlet transform (STDPNet) by taking advantage of shearlet transform in multi-scale analysis and combining it with the improved object detection algorithm proposed in this paper. First, images are multi-scale and multi-directional decomposed with shearlet transform, and multi-directional sub-band information is input to the detection network instead of image information. Then, this paper proposed a dual-path object detection network for the differences between different frequency bands and introduced a transfer learning strategy between paths to improve the model performance. Finally, the training results on the NEU surface defect public dataset show that the mean average precision of STDPNet achieves 86.81% at a detection speed of 44.45 f/s, which exceeds that of Faster R-CNN by 12%. Experiments on different datasets prove that the accuracy is significantly superior to other models, and the proposed method is more advantageous compared to other models in large, fuzzy, and indistinguishable defect types. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Small aircraft detection using deep learning

Author: Kiyak, Emre and Unal, Gulay
Published: 2021
Full Text: View/download PDF

27. Measuring water holding capacity in pork meat images using deep learning.

Author: de Sousa Reis, Vinicius Clemente, Ferreira, Isaura Maria, Durval, Mariah Castro, Antunes, Robson Carlos, and Backes, Andre Ricardo
Subjects: *DEEP learning, *FILTER paper, *MEAT, *COMPUTER vision, *WATER sampling, *IMAGE segmentation
Abstract: Water holding capacity (WHC) plays an important role when obtaining a high-quality pork meat. This attribute is usually estimated by pressing the meat and measuring the amount of water expelled by the sample and absorbed by a filter paper. In this work, we used the Deep Learning (DL) architecture named U-Net to estimate water holding capacity (WHC) from filter paper images of pork samples obtained using the press method. We evaluated the ability of the U-Net to segment the different regions of the WHC images and, since the images are much larger than the traditional input size of the U-Net, we also evaluated its performance when we change the input size. Results show that U-Net can be used to segment the external and internal areas of the WHC images with great precision, even though the difference in the appearance of these areas is subtle. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. Tourists’ perceptions of urban space: a computer vision approach

Author: Zhang, Kun, Zhang, Jinyi, Li, Chunlin, Jiao, Yan, and Wang, Ying
Published: 2022
Full Text: View/download PDF

29. From words to pixels: text and image mining methods for service research

Author: Villarroel Ordenes, Francisco and Zhang, Shunyuan
Published: 2019
Full Text: View/download PDF

30. Smart library book sorting application with intelligence computer vision technology

Author: Shi, Xiaohua, Tang, Kaicheng, and Lu, Hongtao
Published: 2021
Full Text: View/download PDF

31. Bibliometric and visualized analysis of deep learning in remote sensing.

Author: Bai, Yang, Sun, Xiyan, Ji, Yuanfa, Huang, Jianhua, Fu, Wentao, and Shi, Huien
Subjects: BIBLIOMETRICS, DEEP learning, REMOTE sensing, DISTANCE education, COMPUTER vision, CONVOLUTIONAL neural networks, IMAGE processing
Abstract: Deep learning (DL) has been proven to be a powerful method in computer vision and is receiving increasing attention in remote sensing. It is important to analyse the research progress, hotspots, trends and methods in the field of deep learning in remote sensing. First, the main research countries (11), research institutions (20), researchers (20), and the most cited references (20) and hotspots (8) in this field were identified by analysing a total of 2,467 published papers with the bibliometric and visualized analysis (BVA) method. Then, based on the above analysis results, the research basis and the progress of hotspots in this field were summarized by reading a total of 181 relevant papers in detail with the traditional literature combing (TLC) method. The results indicate that deep learning is becoming an important tool for remote sensing and has been widely used in the vast majority of remote sensing tasks related to image processing. Among the following deep learning methods, the convolutional neural network (CNN) is undoubtedly the most widely used model. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

32. Guest Editorial: Special issue on advances in representation learning for computer vision.

Author: Teoh, Andrew Beng Jin, Song Ong, Thian, Lim, Kian Ming, and Lee, Chin Poo
Subjects: COMPUTER vision, DEEP learning, ARTIFICIAL neural networks, IMAGE representation, CONVOLUTIONAL neural networks, IMAGE recognition (Computer vision), DATA privacy
Abstract: This document is a guest editorial for a special issue of the CAAI Transactions on Intelligence Technology journal. The special issue focuses on advances in representation learning for computer vision. The editorial highlights the success of deep learning methods in deriving powerful representations from visual data, but also acknowledges the challenges of conducting representation learning with deep models, especially with large and noisy datasets. The document provides summaries of several research papers included in the special issue, covering topics such as cancellable biometrics, medical image analysis, watermarking for medical images, facial pattern description, multi-biometric strategies, semantic segmentation, image enhancement, image classification, and hyperspectral image super-resolution. The authors express their hope that these papers will enhance readers' understanding of current trends and guide future research in the field. The document also includes brief biographies of the authors. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

33. Towards automatic waste containers management in cities via computer vision: containers localization and geo-positioning in city maps.

Author: Moral, Paula, García-Martín, Álvaro, Escudero-Viñolo, Marcos, Martínez, José M., Bescós, Jesús, Peñuela, Jesús, Martínez, Juan Carlos, and Alvis, Gonzalo
Subjects: *WASTE management, *ORGANIC wastes, *PAPER recycling, *COMPUTER vision, *GLASS recycling, *DEEP learning, *SCIENTIFIC method, *OBJECT recognition (Computer vision)
Abstract: • Methodology to automatically generate geo-located waste container maps. • Use of Computer Vision algorithms to detect waste containers. • Automatic division into locations with and without containers in city maps. • Robust model with consistent performance disregarding the container type. • System evaluated in eleven Spanish cities with an average accuracy of 89%. This paper describes the scientific achievements of a collaboration between a research group and the waste management division of a company. While these results might be the basis for several practical or commercial developments, we here focus on a novel scientific contribution: a methodology to automatically generate geo-located waste container maps. It is based on the use of Computer Vision algorithms to detect waste containers and identify their geographic location and dimensions. Algorithms analyze a video sequence and provide an automatic discrimination between images with and without containers. More precisely, two state-of-the-art object detectors based on deep learning techniques have been selected for testing, according to their performance and to their adaptability to an on-board real-time environment: EfficientDet and YOLOv5. Experimental results indicate that the proposed visual model for waste container detection is able to effectively operate with consistent performance disregarding the container type (organic waste, plastic, glass and paper recycling,...) and the city layout, which has been assessed by evaluating it on eleven different Spanish cities that vary in terms of size, climate, urban layout and containers' appearance. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

34. Recent Advances in Computer Vision: Technologies and Applications.

Author: Gao, Mingliang, Zou, Guofeng, Li, Yun, and Guo, Xiangyu
Subjects: IMAGE recognition (Computer vision), COMPUTER vision, NATURAL language processing, ARTIFICIAL intelligence, MACHINE learning, DEEP learning, IMAGE segmentation
Abstract: This document is a special issue of the journal "Electronics" that focuses on recent advances in computer vision. The introduction explains how computer vision has transformed various industries and daily life by enabling machines to interpret and understand visual information. It also highlights the challenges that still exist in the field, such as model robustness and interpretability. The future of computer vision is discussed, including the development of multimodal models and advancements in areas like self-supervised learning. The special issue includes 10 papers that cover a range of topics, including stereo matching, low-light image enhancement, automated test grading, image segmentation, virtual clothing design, large-scale learning, camera pose estimation, few-shot segmentation, and image-to-audio conversion. The papers present novel studies, approaches, and reviews that contribute to the advancement of computer vision. The document concludes by emphasizing the importance of computer vision in addressing various challenges and the need for ongoing research and interdisciplinary collaboration to tackle complex real-world problems. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

35. Surface roughness measurement using microscopic vision and deep learning.

Author: Chuhan Shang, Zhang Lieping, Gepreel, Khaled A., Huaian Yi, Bo-Lin Jian, and Lu Enhui
Subjects: SURFACE roughness measurement, DEEP learning, COMPUTER vision, STATISTICAL correlation, FRACTALS, SURFACE roughness
Abstract: Due to the self-affine property of the grinding surface, the sample images with different roughness captured by the micron-scale camera exhibit certain similarities. This similarity affects the prediction accuracy of the deep learning model. In this paper, we propose an illumination method that can mitigate the impact of self-affinity using the two-scale fractal theory as a foundation. This is followed by the establishment of a machine vision detection method that integrates a neural network and correlation function. Initially, a neural network is employed to categorize and forecast the microscopic image of the workpiece surface, thereby determining its roughness category. Subsequently, the corresponding correlation function is determined in accordance with the established roughness category. Finally, the surface roughness of the workpiece was calculated based on the correlation function. The experimental results demonstrate that images obtained using this lighting method exhibit significantly enhanced accuracy in neural network classification. In comparison to traditional lighting methods, the accuracy of this method on the micrometer scale has been found to have significantly increased from approximately 50% to over 95%. Concurrently, the mean squared error (MSE) of the surface roughness calculated by the proposed method does not exceed 0.003, and the mean relative error (MRE) does not exceed 5%. The two-scale fractal geometry offers a novel approach to image processing and machine learning, with significant potential for advancement. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. EVPTMFF: Bridge Crack Detection Based on Efficient Visual Pyramid Transformer and Multiple-Feature Fusion.

Author: Li, Gang, Zhou, Pan, Shen, Dan, and Zhao, Shanmeng
Subjects: TRANSFORMER models, COMPUTER vision, PYRAMIDS, BRIDGE maintenance & repair, SURFACE cracks, ECONOMIC security
Abstract: One of the key tasks to ensure infrastructure safety is the periodic detection of bridge cracks. Since manual crack detection is subjective and inefficient, it is very important to develop an automatic crack recognition system by using machine vision. Inspired by the pyramid vision transformer (PVT) and the feature pyramid network (FPN) variants, a method combining PVT, residual transformer (REST), holistically nested edge detection (HED), and downstream detection tasks is proposed in this paper, which is named EVPTMFF (efficient visual pyramid transformer and multiple-feature fusion). Based on the PVT, the multiheaded attention module was replaced and the efficient attention module was adopted, which could process the data efficiently and flexibly. To improve the performance of EVPTMFF, the original perceptual field windows were changed. The adjacent windows were partially overlapped, which was more conducive to feature interaction and improves detection performance. To prove the generalization ability of the model, three different data sets related to bridges were collected and formed. We carried out experiments on these three data sets, and EVPTMFF showed good results. Especially for larger data sets, the performance advantage was more significant. Practical Applications: The crack detection model proposed in this paper presents a good detection effect under different illumination and interference. And the detection results, collected data, time, and other information are saved to the bridge crack detection software. This can help engineers quickly and accurately detect cracks on the bridge surface, as well as predict the development trend of cracks and possible safety issues. In practical application, the bridge crack detection system can help engineers find and solve the bridge crack problem in time and avoid the security risks and economic losses caused by cracks. At the same time, the efficiency and accuracy of bridge maintenance can be improved, and the maintenance cost and time can be reduced. The bridge crack detection system can be integrated with other hardware equipment and management systems to form a complete bridge management platform, which contributes to the traffic construction and social and economic development of the city. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Machine translation from signed to spoken languages: state of the art and challenges.

Author: De Coster, Mathieu, Shterionov, Dimitar, Van Herreweghe, Mieke, and Dambre, Joni
Subjects: LITERATURE reviews, LANGUAGE policy, SIGN language, COMPUTER vision, ORAL communication, MACHINE translating
Abstract: Automatic translation from signed to spoken languages is an interdisciplinary research domain on the intersection of computer vision, machine translation (MT), and linguistics. While the domain is growing in terms of popularity—the majority of scientific papers on sign language (SL) translation have been published in the past five years—research in this domain is performed mostly by computer scientists in isolation. This article presents an extensive and cross-domain overview of the work on SL translation. We first give a high level introduction to SL linguistics and MT to illustrate the requirements of automatic SL translation. Then, we present a systematic literature review of the state of the art in the domain. Finally, we outline important challenges for future research. We find that significant advances have been made on the shoulders of spoken language MT research. However, current approaches often lack linguistic motivation or are not adapted to the different characteristics of SLs. We explore challenges related to the representation of SL data, the collection of datasets and the evaluation of SL translation models. We advocate for interdisciplinary research and for grounding future research in linguistic analysis of SLs. Furthermore, the inclusion of deaf and hearing end users of SL translation applications in use case identification, data collection, and evaluation, is of utmost importance in the creation of useful SL translation models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. RESEARCH ON AUTOMATIC UNATTENDED BILL COLLECTION, PASTE AND VERIFICTION INTEGRATED ROBOT EQUIPMENT AND CONTROL PLATFORM BASED ON DEEP CONVOLUTIONAL NEURAL NETWORK.

Author: CHAO WANG, XI CHEN, and YING WANG
Subjects: CONVOLUTIONAL neural networks, COLLECTING of accounts, ROBOT control systems, COMPUTER vision, DEEP learning, AUTOMATIC identification
Abstract: A new solution for fully automated and unmanned ticket pasting verification based on deep convolutional neural networks is designed to address the issues of low efficiency, error-proneness, and wastage of manpower in the supplier service hall. The technology makes full use of machine vision and image processing, AI precise positioning correction algorithm and other methods to build an automatic unattended bill collection, paste and verification platform. Through the technologies of high-speed identification of invoice information, 3D vision-guidance planning, control of the path of robotic arm, detection of invoice pasting and repeating based on ultrasonic sensors, and tidal temporary storage of paper invoices, and so on, the automatic high-speed identification and inspection of bills in the supplier service hall are realized, and the efficiency and accuracy of bill processing in the supplier hall are improved. Experiments show that this research method reinforces ability of identification calibration and order correlation, and improves the efficiency of Invoice filing. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. HeMoDU: High-Efficiency Multi-Object Detection Algorithm for Unmanned Aerial Vehicles on Urban Roads.

Author: Shi, Hanyi, Wang, Ningzhi, Xu, Xinyao, Qian, Yue, Zeng, Lingbin, and Zhu, Yi
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, DEEP learning, TRAFFIC monitoring
Abstract: Unmanned aerial vehicle (UAV)-based object detection methods are widely used in traffic detection due to their high flexibility and extensive coverage. In recent years, with the increasing complexity of the urban road environment, UAV object detection algorithms based on deep learning have gradually become a research hotspot. However, how to further improve algorithmic efficiency in response to the numerous and rapidly changing road elements, and thus achieve high-speed and accurate road object detection, remains a challenging issue. Given this context, this paper proposes the high-efficiency multi-object detection algorithm for UAVs (HeMoDU). HeMoDU reconstructs a state-of-the-art, deep-learning-based object detection model and optimizes several aspects to improve computational efficiency and detection accuracy. To validate the performance of HeMoDU in urban road environments, this paper uses the public urban road datasets VisDrone2019 and UA-DETRAC for evaluation. The experimental results show that the HeMoDU model effectively improves the speed and accuracy of UAV object detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. Special Issue on Recent Advances in Machine Learning and Computational Intelligence.

Author: Wu, Yue, Zhang, Xinglong, and Jia, Pengfei
Subjects: MACHINE learning, REINFORCEMENT learning, NATURAL language processing, OPTIMIZATION algorithms, COMPUTER vision, COMPUTATIONAL intelligence, DEEP learning
Abstract: In reviewing this Special Issue, various topics have been addressed, predominantly machine learning techniques and heuristic search algorithms. Machine learning and computational intelligence are currently high-profile research areas attracting the attention of many researchers. In the first paper, L. Zhao and H. Jin improved the traditional vector-weighted optimization algorithm (INFO) and designed a promising optimization algorithm (IDEINFO) [[8]]. [Extracted from the article]
Published: 2023
Full Text: View/download PDF

41. Automatic guava disease detection using different deep learning approaches.

Author: Tewari, Vaibhav, Azeem, Noamaan Abdul, and Sharma, Sanjeev
Subjects: DEEP learning, GUAVA, COMPUTER vision, CONVOLUTIONAL neural networks, IMAGE recognition (Computer vision), PLANT diseases
Abstract: In many countries, agriculture plays a major role in the economy. The health of the crop is therefore very important, but there are many plant diseases that are difficult to diagnose. A close inspection is necessary in many cases, or an expert's advice is required. As a result, it is important to address diseases in plants. Several attempts have been made to develop programs that detect diseases in plants because of the ever-rising growth of computer vision and deep learning. This paper focuses on detecting diseases in guava fruits. We use a dataset with pictures of 4 common diseases found in the fruit namely Phytopthra, Red Rust, Scab, Styler and Root. With the help of transfer learning and Convolutional Neural Networks (CNN) this paper train various models on the dataset and compare the results. The results we evaluated using accuracy, precision, Recall and F1 score. We had multiple models with test accuracy of 99%, with highest accuracy of 99.62% for DenseNet169. The results of this study are also compared with previous methods. According to the results, the proposed methods achieved better results than the previous approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. The strong substructure and feature attention mechanism for image semantic segmentation.

Author: Zhang, Yuhang, Ren, Hongshuai, Yang, Wensi, Wang, Yang, Ye, Kejiang, and Xu, Cheng‐Zhong
Subjects: DEEP learning, GENERATIVE adversarial networks, COMPUTER vision
Abstract: Semantic segmentation is a hot topic in computer vision and various deep learning networks are designed to achieve higher accuracy on that by fully exploring the capability of neural networks. This paper aims to address the issue and proposes the substructures with novelty for popular networks. Meanwhile, we present a cross‐channel structure, which simultaneously reduces parameter while the kernel size becomes larger. After that, to overcome the weakness of insufficient dataset which refers to satellite image data, we propose a feature attention mechanism with generative adversarial network to enhance the images' features. We show the recognition result on the satellite image dataset with a large picture. This paper evaluates substructures on the PASCAL VOC2012 dataset and improves the mIOU from 74.68% to 88.15%. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

43. Pose estimation algorithm based on point pair features using PointNet + +.

Author: Chen, Yifan, Li, Zhenjian, Li, Qingdang, and Zhang, Mingyue
Subjects: MACHINE learning, POINT cloud, COMPUTER vision, FIX-point estimation, BASE pairs, DEEP learning
Abstract: This study proposes an innovative deep learning algorithm for pose estimation based on point clouds, aimed at addressing the challenges of pose estimation for objects affected by the environment. Previous research on using deep learning for pose estimation has primarily been conducted using RGB-D data. This paper introduces an algorithm that utilizes point cloud data for deep learning-based pose computation. The algorithm builds upon previous work by integrating PointNet + + technology and the classical Point Pair Features algorithm, achieving accurate pose estimation for objects across different scene scales. Additionally, an adaptive parameter-density clustering method suitable for point clouds is introduced, effectively segmenting clusters in varying point cloud density environments. This resolves the complex issue of parameter determination for density clustering in different point cloud environments and enhances the robustness of clustering. Furthermore, the LineMod dataset is transformed into a point cloud dataset, and experiments are conducted on the transformed dataset to achieve promising results with our algorithm. Finally, experiments under both strong and weak lighting conditions demonstrate the algorithm's robustness. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Biometric Based E-Learning Application To Detect Learner Attention Prediction.

Author: Pushpa, P. N., Subapriya. V., Uday, Vismaya C., Saha, Shithi, and Dhanya S., Sree
Subjects: COMPUTER vision, CONVOLUTIONAL neural networks, DIGITAL learning, BIOMETRY, SYSTEMS theory, MOBILE learning
Abstract: Dynamic student's attention monitoring in the class rooms, online study is essential. Existing model fail to predict the student attention, understanding, engagement automatically using biometric features. Current traditional system is manual with less efficiency and time consuming. Even in both class room based traditional education, e-learning systems, dynamic learner behaviour is not considered. To address this problem statement, this survey paper explores how dynamic learner behaviour of audience, students can be obtained using computer vision and machine learning techniques to predict the engagement level. This research paper also focus on an image processing with deep learning process of invoking, extracting biometric features from the students and detect students attention, distraction towards the class and based on the result outcome the proposed system recommends theory and video based content to obtain the attention back. For deep learning model, we use Convolutional Neural Network (CNN) for training the input models. [ABSTRACT FROM AUTHOR]
Published: 2021

45. Object detection network based on dense dilated encoder net.

Author: Liu, Shaohua, Yang, Ao, She, Chundong, and Du, Kang
Subjects: OBJECT recognition (Computer vision), COMPUTER vision, INFORMATION sharing, PYRAMIDS, ANCHORS, OBJECT tracking (Computer vision)
Abstract: In this paper, the authors apply the feature pyramid network (FPN) to the single‐stage anchor‐free object detection algorithm CenterNet, and the effectiveness of the multi‐level feature fusion of FPN for the object detection algorithm is proved by experiments. However, multi‐level feature fusion leads to an increase in computational cost. In this regard, this paper proposes an object detection algorithm, called DDE‐Net, that does not use multi‐level feature fusion and only uses single‐level feature for optimization. The key component in it: the dense dilated encoder, which encourages dense information exchange of features between different spatial scales. This paper presents extensive experiments, and DDE‐Net shows strong performance compared to that of other popular models on the PASCAL VOC and on the COCO2017 dataset. On the COCO2017 dataset, the authors' DDE‐Net achieves comparable results with its feature pyramids counterpart RetinaNet, while applying the same backbone with smaller params and GFLOPs than RetinaNet. With an image size of 512 × 512, DDE‐Net achieves 37.3 AP running at 81 fps on 2080 Ti. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

46. An intuitive pre-processing method based on human–robot interactions: zero-shot learning semantic segmentation based on synthetic semantic template.

Author: Chen, Yen-Chun and Lai, Chin-Feng
Subjects: HUMAN-robot interaction, DEEP learning, COMPUTER vision, MACHINE learning, DATABASES
Abstract: In industry, robots are widely used to solve repetitive or dangerous actions in product production, so that product production can be more efficient. However, the problem that robots are often challenged is the convenience and the efficiency of introducing the production line. Therefore, the intuitive robot guidance method is an important issue; this paper will introduce the concept of human–robot interactions (HRI) and use deep learning methods on the machine vision system to complete the robot-guided assembly operation analysis, and the assembly operation analysis requires semantic segmentation as pre-processing. Therefore, we propose a novel semantic template correlation model architecture based on zero-shot learning (ZSL) to achieve the effect of rapid deployment. The semantic template correlation model is to search for the object area offline learning through the semantic template generated by the physics engine, and when inferring online, we can directly enter the semantic template to obtain the relevant object region. Finally, this paper verifies that the MIoU can be increased by more than 5% through the verification of the general database VOC2012. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

47. Elastic Adaptively Parametric Compounded Units for Convolutional Neural Network.

Author: Zhang, Changfan, Xu, Yifu, and Sheng, Zhenwen
Subjects: CONVOLUTIONAL neural networks, COMPUTER vision, IMAGE recognition (Computer vision)
Abstract: The activation function introduces nonlinearity into convolutional neural network, which greatly promotes the development of computer vision tasks. This paper proposes elastic adaptively parametric compounded units to improve the performance of convolutional neural networks for image recognition. The activation function takes the structural advantages of two mainstream functions as the function's fundamental architecture. The SENet model is embedded in the proposed activation function to adaptively recalibrate the feature mapping weight in each channel, thereby enhancing the fitting capability of the activation function. In addition, the function has an elastic slope in the positive input region by simulating random noise to improve the generalization capability of neural networks. To prevent the generated noise from producing overly large variations during training, a special protection mechanism is adopted. In order to verify the effectiveness of the activation function, this paper uses CIFAR-10 and CIFAR-100 image datasets to conduct comparative experiments of the activation function under the exact same model. Experimental results show that the proposed activation function showed superior performance beyond other functions. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

48. DEEP LEARNING-BASED IRAQI BANKNOTES CLASSIFICATION SYSTEM FOR BLIND PEOPLE.

Author: Awad, Sohaib Rajab, Sharef, Baraa T., Salih, Abdulkreem M., and Malallah, Fahad Layth
Subjects: COMPUTER vision, CONVOLUTIONAL neural networks, PEOPLE with visual disabilities, MACHINE learning, DEEP learning, HUMAN-computer interaction
Abstract: Modern systems have been focusing on improving the quality of life for people. Hence, new technologies and systems are currently utilized extensively in different sectors of our societies, such as education and medicine. One of the medical applications is using computer vision technology to help blind people in their daily endeavors and reduce their frequent dependence on their close people and also create a state of independence for visually impaired people in conducting daily financial operations. Motivated by this fact, the work concentrates on assisting the visually impaired to distinguish among Iraqi banknotes. In essence, we employ computer vision in conjunction with Deep Learning algorithms to build a multiclass classification model for classifying the banknotes. This system will produce specific vocal commands that are equivalent to the categorized banknote image, and then inform the visually impaired people of the denomination of each banknote. To classify the Iraqi banknotes, it is important to know that they have two sides: the Arabic side and the English side, which is considered one of the important issues for human-computer interaction (HCI) in constructing the classification model. In this paper, we use a database, which comprises 3,961 image samples of the seven Iraqi paper currency categories. Furthermore, a nineteen layers Convolutional Neural Network (CNN) is trained using this database in order to distinguish among the denominations of the banknotes. Finally, the developed system has exhibited an accuracy of 98.6 %, which proves the feasibility of the proposed model. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

49. Enhancing automated vehicle identification by integrating YOLO v8 and OCR techniques for high-precision license plate detection and recognition.

Author: Moussaoui, Hanae, Akkad, Nabil El, Benslimane, Mohamed, El-Shafai, Walid, Baihan, Abdullah, Hewage, Chaminda, and Rathore, Rajkumar Singh
Subjects: AUTOMOBILE license plates, PATTERN recognition systems, AUTONOMOUS vehicles, COMPUTER vision, TEXT recognition, DEEP learning, IDENTIFICATION
Abstract: Vehicle identification systems are vital components that enable many aspects of contemporary life, such as safety, trade, transit, and law enforcement. They improve community and individual well-being by increasing vehicle management, security, and transparency. These tasks entail locating and extracting license plates from images or video frames using computer vision and machine learning techniques, followed by recognizing the letters or digits on the plates. This paper proposes a new license plate detection and recognition method based on the deep learning YOLO v8 method, image processing techniques, and the OCR technique for text recognition. For this, the first step was the dataset creation, when gathering 270 images from the internet. Afterward, CVAT (Computer Vision Annotation Tool) was used to annotate the dataset, which is an open-source software platform made to make computer vision tasks easier to annotate and label images and videos. Subsequently, the newly released Yolo version, the Yolo v8, has been employed to detect the number plate area in the input image. Subsequently, after extracting the plate the k-means clustering algorithm, the thresholding techniques, and the opening morphological operation were used to enhance the image and make the characters in the license plate clearer before using OCR. The next step in this process is using the OCR technique to extract the characters. Eventually, a text file containing only the character reflecting the vehicle's country is generated. To ameliorate the efficiency of the proposed approach, several metrics were employed, namely precision, recall, F1-Score, and CLA. In addition, a comparison of the proposed method with existing techniques in the literature has been given. The suggested method obtained convincing results in both detection as well as recognition by obtaining an accuracy of 99% in detection and 98% in character recognition. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion.

Author: Li, Nianfeng, Wang, Zhenyan, Huang, Yongyuan, Tian, Jia, Li, Xinyuan, and Xiao, Zhiguo
Subjects: TEXT recognition, FEATURE extraction, COMPUTER vision, VISUAL fields, DEEP learning
Abstract: Scene text detection is an important research field in computer vision, playing a crucial role in various application scenarios. However, existing scene text detection methods often fail to achieve satisfactory results when faced with text instances of different sizes, shapes, and complex backgrounds. To address the challenge of detecting diverse texts in natural scenes, this paper proposes a multi-scale natural scene text detection method based on attention feature extraction and cascaded feature fusion. This method combines global and local attention through an improved attention feature fusion module (DSAF) to capture text features of different scales, enhancing the network's perception of text regions and improving its feature extraction capabilities. Simultaneously, an improved cascaded feature fusion module (PFFM) is used to fully integrate the extracted feature maps, expanding the receptive field of features and enriching the expressive ability of the feature maps. Finally, to address the cascaded feature maps, a lightweight subspace attention module (SAM) is introduced to partition the concatenated feature maps into several sub-space feature maps, facilitating spatial information interaction among features of different scales. In this paper, comparative experiments are conducted on the ICDAR2015, Total-Text, and MSRA-TD500 datasets, and comparisons are made with some existing scene text detection methods. The results show that the proposed method achieves good performance in terms of accuracy, recall, and F-score, thus verifying its effectiveness and practicality. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

7,489 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources