347 results on '"Umapada Pal"'
Search Results
2. A comprehensive scheme for tattoo text detection
- Author
-
Ayan Banerjee, Palaiahnakote Shivakumara, Umapada Pal, Ramachandra Raghavendra, and Cheng-Lin Liu
- Subjects
Artificial Intelligence ,Signal Processing ,Computer Vision and Pattern Recognition ,Software - Published
- 2022
3. A New Deep Wavefront Based Model for Text Localization in 3D Video
- Author
-
Umapada Pal, Apostolos Antonacopoulos, Tong Lu, Yue Lu, Lokesh Nandanwar, Palaiahnakote Shivakumara, and Raghavendra Ramachandra
- Subjects
Wavefront ,Computer science ,business.industry ,Search engine indexing ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Video processing ,Bounding overwatch ,Shadow ,Polygon ,Media Technology ,Point (geometry) ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,Focus (optics) ,business - Abstract
With the evolution of electronic devices, such as 3D cameras, addressing the challenges of text localization in 3D video (e.g., for indexing) is increasingly drawing the attention of the multimedia and video processing community. Existing methods focus on 2D video and their performance in the presence of the challenges in 3D video, such as shadow areas associated with text and irregularly sized and shaped text, degrades. This paper proposes the first approach that successfully addresses the challenges of 3D video in addition to those of 2D. It employs a number of innovations, among which, the first is the Generalized Gradient Vector Flow (GGVF) for dominant points detection. The second is the Wavefront concept for text candidate point detection from those dominant points. In addition, an Adaptive B-Spline Polygon Curve Network (ABS-Net) is proposed for accurate text localization in 3D videos by constructing tight fitting bounding polygons using text candidate points. Extensive experiments on custom (3D video) and standard datasets (2D video and scene text) show that the proposed method is practical and useful, and overall outperforms existing state-of-the-art methods.
- Published
- 2022
4. An Episodic Learning Network for Text Detection on Human Bodies in Sports Images
- Author
-
Pinaki Nath Chowdhury, Palaiahnakote Shivakumara, Ramachandra Raghavendra, Sauradip Nag, Umapada Pal, Tong Lu, and Daniel Lopresti
- Subjects
Visual search ,Exploit ,Computer science ,Image quality ,business.industry ,Pooling ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Motion (physics) ,Range (mathematics) ,Face (geometry) ,Scalability ,Media Technology ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business - Abstract
Due to the proliferation of sports-related multimedia content on the WWW, effective visual search and retrieval present interesting research challenges. These are caused by poor image quality, a wide range of possible camera points of view, pose variations on the part of athletes engaged in playing a sport, deformations of text appearing on sports person’s clothing and uniforms in motion, occlusions caused by other objects, etc. To address these challenges, this paper presents a new method for detecting text on human bodies in sports images. Unlike most existing methods, which attempt to exploit locations of a player’s torso, face, and skin, we propose an end-to-end episodic learning approach that employs inductive learning criteria for detecting clothing regions in an image, which are, in turn, then used for text detection. Our method integrates a Residual Network (ResNet) and Pyramidal Pooling Module (PPM) for generating a spatial attention map. The Progressive Scalable Expansion Algorithm (PSE) is adapted for text detection from these regions. Experimental results on our own dataset as well as several benchmarks (like RBNR and MMM which contain images of runners in marathons, and Re-ID which is a person re-identification dataset) demonstrate that the proposed method outperforms existing methods in terms of precision and F1-score. We also present results for sports images chosen from natural scene text detection datasets such as CTW1500 and MS-COCO to show the proposed method is effective and reliable across a range of inputs.
- Published
- 2022
5. Multi‐gradient‐direction based deep learning model for arecanut disease identification
- Author
-
S. B. Mallikarjuna, Palaiahnakote Shivakumara, Vijeta Khare, M. Basavanna, Umapada Pal, and B. Poornima
- Subjects
Human-Computer Interaction ,Artificial Intelligence ,Computer Networks and Communications ,Computer Vision and Pattern Recognition ,Information Systems - Published
- 2022
6. Oil palm tree counting in drone images
- Author
-
Tong Lu, Palaiahnakote Shivakumara, Umapada Pal, Lokesh Nandanwar, Faizal Samiron, and Pinaki Nath Chowdhury
- Subjects
Pixel ,Computer science ,business.industry ,Diagonal ,Pattern recognition ,k-nearest neighbors algorithm ,Image (mathematics) ,Tree (data structure) ,Intersection ,Artificial Intelligence ,Region of interest ,Signal Processing ,Point (geometry) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
When the images are captured by drones, the effect of oblique angles, distance variations and open environment are the main challenges for successful palm tree detection. This paper presents a method towards palm tree counting in Drone images using a novel idea of detecting dominant points by exploring Generalized Gradient Vector Flow, which defines symmetry based on gradient direction of the pixels. For each dominant point, we use angle information for classifying diagonal dominant points. It is intuition that the direction of the branches of tree converges at center of tree irrespective of the type of tree and plants. This observation motivated us to expand the direction of diagonal dominant points until it finds intersection point with another diagonal dominant point and this results in candidate points. For each candidate point, the proposed method constructs the ring by considering the distance between the intersection point and nearest neighbor candidate point as radius. This outputs region of interest and it includes center of each tree in the image. To ease the effect of complex background, we explore YOLOv5 architecture to remove false region of interests. This step results in counting oil palm trees in the mages irrespective of tree type of palm family. Experimental results on our dataset of the images captured by drones and standard dataset of coconut images captured by unmanned aerial vehicle of different trees show that the proposed method is effective and performs better than SOTA methods.
- Published
- 2022
7. Deformable scene text detection using harmonic features and modified pixel aggregation network
- Author
-
Umapada Pal, Tanmay Jain, Shivakumara Palaiahnakote, and Cheng-Lin Liu
- Subjects
Maximally stable extremal regions ,Pixel ,Character (computing) ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Harmonic (mathematics) ,Pattern recognition ,Text detection ,Image (mathematics) ,Artificial Intelligence ,Component (UML) ,Signal Processing ,Classifier (linguistics) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Although text detection methods have addressed several challenges in the past, there is a dearth of effective methods for text detection in deformable images, such as images containing text embedded on cloth, banners, rubber, sports jerseys, uniforms, etc. This is because deformable regions contain surfaces of arbitrarily shapes, which lead to poor text quality. This paper presents a new method for deformable text detection in natural scene images. It is observed that although the shapes of characters change in a deformable region, the pixel values and spatial relationship between the pixels do not change. This motivated us to explore extraction of Maximally Stable Extremal Regions (MSER) in an image in which pixels that share common features are grouped into components. The unique character shape variations led us to explore harmonic features to represent the component shape variations, using which a classifier classifies text and non-text components from the output of the MSER step. Additionally, the objective of developing a lightweight method with low computational cost motivated us to introduce a modified Pixel Aggression Network (PAN) for text deformable text detection at a component level. Comprehensive experiments which include experiments on our Deformable Text Dataset (DTD) and standard natural scene text datasets, namely, MSRATD-500, ICDAR 2019 MLT, Total-Text, CTW1500, ICDAR 2019 ArT and DSTA1500 datasets show that the proposed model outperforms the existing methods for our dataset as well as the standard datasets.
- Published
- 2021
8. A Knowledge Enforcement Network-Based Approach for Classifying a Photographer’s Images
- Author
-
Palaiahnakote Shivakumara, Pinaki Nath Chowdhury, Umapada Pal, David Doermann, Raghavendra Ramachandra, Tong Lu, and Michael Blumenstein
- Subjects
Artificial Intelligence ,Computer Vision and Pattern Recognition ,Software - Abstract
Classification of photos captured by different photographers is an important and challenging problem in knowledge-based and image processing. Monitoring and authenticating images uploaded on social media are essential, and verifying the source is one key piece of evidence. We present a novel framework for classifying photos of different photographers based on the combination of local features and deep learning models. The proposed work uses focused and defocused information in the input images to extract contextual information. The model estimates the weighted gradient and calculates entropy to strengthen context features. The focused and defocused information is fused to estimate cross-covariance and define a linear relationship between them. This relationship results in a feature matrix fed to Knowledge Enforcement Network (KEN) for obtaining representative features. Due to the strong discriminative ability of deep learning models, we employ the lightweight and accurate MobileNetV2. The output of KEN and MobileNetV2 is sent to a classifier for photographer classification. Experimental results of the proposed model on our dataset of 46 photographer classes (46234 images) and publicly available datasets of 41 photographer classes (218303 images) show that the method outperforms the existing techniques by 5%–10% on average. The dataset created for the experimental purpose will be made available upon publication.
- Published
- 2022
9. A new deep model for family and non-family photo identification
- Author
-
Nor Badrul Anuar, Tong Lu, Umapada Pal, Palaiahnakote Shivakumara, Tapan Karnik, and Pinaki Nath Chowdhury
- Subjects
Information retrieval ,Parsing ,Computer Networks and Communications ,business.industry ,Computer science ,Deep learning ,computer.software_genre ,Identification (information) ,Hardware and Architecture ,Region of interest ,Photo identification ,Media Technology ,Special property ,Human trafficking ,Artificial intelligence ,business ,Nuclear family ,computer ,Software - Abstract
Human trafficking is a global issue of the world and the problems related to human trafficking remain unsolved. This paper presents a new method for the identification of photos of different types of families and non-families such that the method can assist investigation team to find a solution to such issue. We believe that parts of human beings are the main resources for representing family and non-family photos. Based on this intuition, we propose to segment hair, head, cloth, torso, and skin regions from each human in input photos by exploring a self-correlation for human parsing method. This step results in region of interest (ROI). Motivated by ability of deep learning models in solving complex issues and special property of MobileNet, which is light weight model, we further explore MobileNetv2 for the identification of photos of different families and non-families by considering ROI as the input. For the experiment of this work, we consider a dataset of ten classes, which include five family classes, namely, Couple, Nuclear Family, Multi-Cultural Family, Father–Child, Mother–Child and five more non-family classes, namely, Male Friends, Female Friends, Mixed Friends, Male Celebrity, Female Celebrity. The results of the proposed method are demonstrated by testing on our dataset of family and non-family photos classification. Comparative results with the existing methods show that our proposed method outperforms existing methods in terms of classification rate and F-Score.
- Published
- 2021
10. Modeling local and global behavior for trajectory classification using graph based algorithm
- Author
-
Rajkumar Saini, Umapada Pal, Partha Pratim Roy, and Pradeep Kumar
- Subjects
Dynamic time warping ,Computer science ,Graph based ,Particle swarm optimization ,02 engineering and technology ,Minimum spanning tree ,01 natural sciences ,Complete bipartite graph ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Bipartite graph ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,010306 general physics ,Algorithm ,Software - Abstract
Understanding motion patterns is of great importance to analyze the behavior of objects in the vigilance area. Grouping the motion patterns into clusters in such a way that similar motion patterns lie in same cluster and the inter-cluster variance is maximized in a challenging work. Variation in the duration of trajectory patterns in terms of time or number of points in them (even in the trajectories from same cluster) makes it more difficult to correctly classify in respective clusters while using full length trajectories, local clue can be used along with the global information. Trajectories can be segmented into distinctive parts and local contribution of these parts can be used to improve the performance of the system. In this work, we have formulated the trajectory classification problem into graph based similarity problem using Douglas–Peucker (DP) algorithm, Complete Bipartite Graphs (CBG), and Minimum Spanning Tree (MST). Local behavior of objects has been analyzed using their motion segments and Dynamic Time Warping (DTW) has been used for finding similarity among motion trajectories. Class-wise global and local costs have been computed using DTW, CBG, and MST and their fusion has been done using Particle Swarm Optimization (PSO) to improve the classification rate. Trajectory datasets, namely T15, LabOmni, and CROSS have been used in experiments. The proposed method yields encouraging results and outperforms the state of the art techniques.
- Published
- 2021
11. A Conformable Moments-Based Deep Learning System for Forged Handwriting Detection
- Author
-
Lokesh Nandanwar, Palaiahnakote Shivakumara, Hamid A. Jalab, Rabha W. Ibrahim, Ramachandra Raghavendra, Umapada Pal, Tong Lu, and Michael Blumenstein
- Subjects
Artificial Intelligence ,Computer Networks and Communications ,Artificial Intelligence & Image Processing ,Software ,Computer Science Applications - Abstract
Detecting forged handwriting is important in a wide variety of machine learning applications, and it is challenging when the input images are degraded with noise and blur. This article presents a new model based on conformable moments (CMs) and deep ensemble neural networks (DENNs) for forged handwriting detection in noisy and blurry environments. Since CMs involve fractional calculus with the ability to model nonlinearities and geometrical moments as well as preserving spatial relationships between pixels, fine details in images are preserved. This motivates us to introduce a DENN classifier, which integrates stenographic kernels and spatial features to classify input images as normal (original, clean images), altered (handwriting changed through copy-paste and insertion operations), noisy (added noise to original image), blurred (added blur to original image), altered-noise (noise is added to the altered image), and altered-blurred (blur is added to the altered image). To evaluate our model, we use a newly introduced dataset, which comprises handwritten words altered at the character level, as well as several standard datasets, namely ACPR 2019, ICPR 2018-FDC, and the IMEI dataset. The first two of these datasets include handwriting samples that are altered at the character and word levels, and the third dataset comprises forged International Mobile Equipment Identity (IMEI) numbers. Experimental results demonstrate that the proposed method outperforms the existing methods in terms of classification rate.
- Published
- 2022
12. A new DCT-PCM method for license plate number detection in drone images
- Author
-
Hon Hock Woon, Hamam Mokayed, Palaiahnakote Shivakumara, Mohan S. Kankanhalli, Umapada Pal, and Tong Lu
- Subjects
business.industry ,Computer science ,Deep learning ,Geometric transformation ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,01 natural sciences ,Drone ,Phase congruency ,Perspective distortion ,Artificial Intelligence ,Distortion ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Discrete cosine transform ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Focus (optics) ,Software - Abstract
License plate number detection in drone images is a complex problem because the images are generally captured at oblique angles and pose several challenges like perspective distortion, non-uniform illumination effect, degradations, blur, occlusion, loss of visibility etc. Unlike, most existing methods that focus on images captured by orthogonal direction (head-on), the proposed work focuses on drone text images. Inspired by the Phase Congruency Model (PCM), which is invariant to non-uniform illuminations, contrast variations, geometric transformation and to some extent to distortion, we explore the combination of DCT and PCM (DCT-PCM) for detecting license plate number text in drone images. Motivated by the strong discriminative power of deep learning models, the proposed method exploits fully connected neural networks for eliminating false positives to achieve better detection results. Furthermore, the proposed work constructs working model that fits for real environment. To evaluate the proposed method, we use our own dataset captured by drones and benchmark license plate datasets, namely, Medialab for experimentation. We also demonstrate the effectiveness of the proposed method on benchmark natural scene text detection datasets, namely, SVT, MSRA-TD-500, ICDAR 2017 MLT and Total-Text.
- Published
- 2021
13. Local Resultant Gradient Vector Difference and Inpainting for 3D Text Detection in the Wild
- Author
-
Dajian Zhong, Palaiahnakote Shivakumara, Lokesh Nandanwar, Umapada Pal, Michael Blumenstein, and Yue Lu
- Subjects
Artificial Intelligence ,Computer Vision and Pattern Recognition ,Software - Abstract
Three-dimensional (3D) text appearing in natural scene images is common due to 3D cameras and the capture of text from different angles, which presents new problems for text detection. This is because of the presence of depth information, shadows, and decorative characters in the images. In this work, we consider those images where 3D text appears with depth, as well as shadow information for text detection. We propose a novel method based on local resultant gradient vector difference (LRGVD), inpainting and a deep learning model for detecting 3D as well as two-dimensional (2D) texts in natural scene images. The boundary of components that are invariant to the above challenges is detected by exploring LRGVD. The LRGVD uses gradient magnitude and direction in a novel way for detecting the boundary of the components. Further, we propose an inpainting method in a new way for restoring the character background information using boundaries. For a given region and the input image, the inpainting method divides the whole image into planes and then propagates the values in the planes into the missing region based on posterior probabilities and neighboring information. This results in text regions with false positives. Then, the differential binarization network (DB-Net) is proposed for detecting text irrespective of orientation, background, 3D or 2D, etc. Experiments conducted on our 3D text images and standard datasets of natural scene text images, namely ICDAR 2019 MLT, ICDAR 2019 ArT, DAST1500, Total-Text and SCUT-CTW1500, show that the proposed method is effective in detecting 3D and 2D texts in the images.
- Published
- 2022
14. New Deep Spatio-Structural Features of Handwritten Text Lines for Document Age Classification
- Author
-
Palaiahnakote Shivakumara, Alloy Das, K. S. Raghunandan, Umapada Pal, and Michael Blumenstein
- Subjects
Artificial Intelligence ,Computer Vision and Pattern Recognition ,Software - Abstract
Document age estimation using handwritten text line images is useful for several pattern recognition and artificial intelligence applications such as forged signature verification, writer identification, gender identification, personality traits identification, and fraudulent document identification. This paper presents a novel method for document age classification at the text line level. For segmenting text lines from handwritten document images, the wavelet decomposition is used in a novel way. We explore multiple levels of wavelet decomposition, which introduce blur as the number of levels increases for detecting word components. The detected components are then used for a direction guided-driven growing approach with linearity, and nonlinearity criteria for segmenting text lines. For classification of text line images of different ages, inspired by the observation that, as the age of a document increases, the quality of its image degrades, the proposed method extracts the structural, contrast, and spatial features to study degradations at different wavelet decomposition levels. The specific advantages of DenseNet, namely, strong feature propagation, mitigation of the vanishing gradient problem, reuse of features, and the reduction of the number of parameters motivated us to use DenseNet121 along with a Multi-layer Perceptron (MLP) for the classification of text lines of different ages by feeding features and the original image as input. To demonstrate the efficacy of the proposed model, experiments were conducted on our own as well as standard datasets for both text line segmentation and document age classification. The results show that the proposed method outperforms the existing methods for text line segmentation in terms of precision, recall, F-measure, and document age classification in terms of average classification rate.
- Published
- 2022
15. A rotation and scale invariant approach for multi-oriented floor plan image retrieval
- Author
-
Chiranjoy Chattopadhyay, Rasika Khade, Umapada Pal, and Krupa N. Jariwala
- Subjects
Computer science ,business.industry ,Feature extraction ,Floor plan ,Digital image ,Artificial Intelligence ,Feature (computer vision) ,Signal Processing ,Query by Example ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Rotation (mathematics) ,Image retrieval ,Software ,Digitization ,computer.programming_language - Abstract
An automatic system for analysis and retrieval of building floor plans images is helpful for the architects while designing new projects and providing recommendations to the buyers. For such systems, query by example is preferred over query by keyword, for which user’s requirements must be available in digital image form. Floor plans are converted to digital form by scanning and often get rotated slightly by a certain degree of angle during digitization. In this paper, we have proposed a geometric feature-based approach for floor plan image retrieval and our key contribution is to handle different kinds of rotation and scale while retrieving similar floor plan from the database. Our framework is divided into three phases, namely outer shape feature extraction, internal object feature extraction, followed by matching and retrieval. For our experimentation, we have rotated images of ROBIN dataset as currently no rotated floor plan dataset was available. Our experiment shows that the proposed methodology outperforms recent competing methods.
- Published
- 2021
16. A new context-based feature for classification of emotions in photographs
- Author
-
Divya Krishnani, Umapada Pal, Tong Lu, G.H. Kumar, Daniel P. Lopresti, and Palaiahnakote Shivakumara
- Subjects
Facial expression ,Computer Networks and Communications ,Computer science ,business.industry ,Feature vector ,Emotion classification ,Context (language use) ,Pattern recognition ,Hardware and Architecture ,Feature (computer vision) ,Robustness (computer science) ,Media Technology ,Benchmark (computing) ,Artificial intelligence ,business ,Classifier (UML) ,Software - Abstract
A high volume of images is shared on the public Internet each day. Many of these are photographs of people with facial expressions and actions displaying various emotions. In this work, we examine the problem of classifying broad categories of emotions based on such images, including Bullying, Mildly Aggressive, Very Aggressive, Unhappy, Disdain and Happy. This work proposes the Context-based Features for Classification of Emotions in Photographs (CFCEP). The proposed method first detects faces as a foreground component, and other information (non-face) as background components to extract context features. Next, for each foreground and background component, we explore the Hanman transform to study local variations in the components. The proposed method combines the Hanman transform (H) values of foreground and background components according to their merits, which results in two feature vectors. The two feature vectors are fused by deriving weights to generate one feature vector. Furthermore, the feature vector is fed to a CNN classifier for classification of images of different emotions uploaded on social media and public internet. Experimental results on our dataset of different emotion classes and the benchmark dataset show that the proposed method is effective in terms of average classification rate. It reports 91.7% for our 10-class dataset, 92.3% for 5 classes of standard dataset and 81.4% for FERPlus dataset. In addition, a comparative study with existing methods on the benchmark dataset of 5-classes, standard dataset of facial expression (FERPlus) and another dataset of 10-classes show that the proposed method is best in terms of scalability and robustness.
- Published
- 2021
17. Benchmarked multi-script Thai scene text dataset and its multi-class detection solution
- Author
-
Muhammad Saqib, Hemmaphan Suwanwiwat, Umapada Pal, and Abhijit Das
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Arabic ,Intonation (linguistics) ,020207 software engineering ,02 engineering and technology ,computer.software_genre ,Class (biology) ,Convolutional neural network ,language.human_language ,Numeral system ,Annotation ,Hardware and Architecture ,Scripting language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,language ,Artificial intelligence ,business ,Precision and recall ,computer ,Software ,Natural language processing - Abstract
Detecting text portion from scene images can be found to be one of the prevalent research topics. Text detection is considered challenging and non-interoperable since there could be multiple scripts in a scene image. Each of these scripts can have different properties, therefore, it is crucial to research the scene text detection based on the geographical location owing to different scripts. As no work on large-scale multi-script Thai scene text detection is found in the literature, the work conducted in this study focuses on multi-script text that includes Thai, English (Roman), Chinese or Chinese-like script, and Arabic. These scripts can generally be seen around Thailand. Thai script contains more consonants, vowels, and has numerals when compared to the Roman/ English script. Furthermore, the placement of letters, intonation marks, as well as vowels, are different from English or Chinese-like script. Hence, it could be considered challenging to detect and recognise the Thai text. This study proposed a multi-script dataset which includes the aforementioned scripts and numerals, along with a benchmarking employing Single Shot Multi-Box Detector (SSD) and Faster Regions with Convolutional Neural Networks (F-RCNN). The proposed dataset contains scene images which were recorded in Thailand. The dataset consists of 600 images, together with their manual detection annotation. This study also proposed a detection technique hypothesising a multiscript scene text detection problem as a multi-class detection problem which found to work more effective than legacy approaches. The experimental results from employing the proposed technique with the dataset achieved encouraging precision and recall rates when compared with such methods. The proposed dataset is available upon email request to the corresponding authors.
- Published
- 2021
18. Arbitrarily-Oriented Text Detection in Low Light Natural Scene Images
- Author
-
Palaiahnakote Shivakumara, Umapada Pal, Yao Xiao, Minglong Xue, Chao Zhang, Zhibo Yang, Daniel P. Lopresti, and Tong Lu
- Subjects
Maximally stable extremal regions ,Pixel ,Image quality ,business.industry ,Computer science ,Feature extraction ,Pattern recognition ,Image segmentation ,Convolutional neural network ,Computer Science Applications ,Signal Processing ,Line (geometry) ,Media Technology ,Artificial intelligence ,Electrical and Electronic Engineering ,Focus (optics) ,business - Abstract
Text detection in low light natural scene images is challenging due to poor image quality and low contrast. Unlike most existing methods that focus on well-lit (normally daylight) images, the proposed method considers much darker natural scene images. For this task, our method first integrates spatial and frequency domain features through fusion to enhance fine details in the image. Next, we use Maximally Stable Extremal Regions (MSER) for detecting text candidates from the enhanced images. We then introduce Cloud of Line Distribution (COLD) features, which capture the distribution of pixels of text candidates in the polar domain. The extracted features are sent to a Convolution Neural Network (CNN) to correct the bounding boxes for arbitrarily oriented text lines by removing false positives. Experiments are conducted on a dataset of low light images to evaluate the proposed enhancement step. The results show our approach is more effective compared to existing methods in terms of standard quality measures, namely, BRISQE, NIQE and PIQE. In addition, experimental results on a variety of standard benchmark datasets, namely, ICDAR 2013, ICDAR 2015, SVT, Total-Text, ICDAR 2017-MLT and CTW1500, show that the proposed approach not only produces better results for low light images, at the same time it is also competitive for daylight images.
- Published
- 2021
19. Forged text detection in video, scene, and document images
- Author
-
Prabir Mondal, Umapada Pal, Tong Lu, K. S. Raghunandan, Lokesh Nandanwar, Palaiahnakote Shivakumara, and Daniel P. Lopresti
- Subjects
Brightness ,Contextual image classification ,business.industry ,Computer science ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Text detection ,Image segmentation ,Image (mathematics) ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Identity (object-oriented programming) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Software ,Principal axis theorem - Abstract
Rapid advances in artificial intelligence have made it possible to produce forgeries good enough to fool an average user. As a result, there is growing interest in developing robust methods to counter such forgeries. This study presents a new Fourier spectrum-based method for detecting forged text in video images. The authors' premise is that brightness distribution and the spectrum shape exhibit irregular patterns (inconsistencies) for forged text, while appearing more regular for original text. The method divides the spectrum of an input image into sectors and tracks to highlight these effects. Specifically, positive and negative coefficients for sectors and tracks are extracted to quantify the brightness distribution. Variations in the shape of the spectrum are analysed by determining the angular relationship between the principal axes and the sectors/tracks of the spectrum. Next, it combines these two features to detect forged text in the images of IMEI (International Mobile Equipment Identity) numbers and document. For evaluation, the following datasets are used: own video dataset and standard datasets, namely, IMEI number, ICPR 2018 Fraud Document Contest, and a natural scene text dataset. Experimental results show that the proposed method outperforms existing methods in terms of average classification rate and F-score.
- Published
- 2020
20. Graph attention network for detecting license plates in crowded street scenes
- Author
-
Umapada Pal, Palaiahnakote Shivakumara, Tong Lu, Daniel P. Lopresti, Swati Kanchan, Ramachandra Raghavendra, and Pinaki Nath Chowdhury
- Subjects
Pixel ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,01 natural sciences ,Artificial Intelligence ,Attention network ,0103 physical sciences ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,Single vehicle ,020201 artificial intelligence & image processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,License ,Software - Abstract
Detecting multiple license plate numbers in crowded street scenes is challenging and requires the attention of researchers. In contrast to existing methods that focus on images that are not crowded with vehicles, in this work we aim at situations that are common and complex, for example, in city environments where numerous vehicles of different types like cars, trucks, motorbike etc. may present in a single image. In such cases, one can expect large variations in license plates in terms of quality, backgrounds, and various forms of occlusion. To address these challenges, we explore Adaptive Progressive Scale Expansion based Graph Attention Network (APSEGAT). This approach extracts local information which represents the license plates irrespective of vehicle types and numbers because it works at the pixel level in a progressive way, and identifies the dominant information in the image. This may include other parts of vehicles, drivers and pedestrians, and various other background objects. To overcome this problem, we integrate concepts of graph attention networks with progressive scale expansion networks. For evaluating the proposed method, we use our own dataset, named as AMLPR, which contains images captured in different crowded street scenes in different time span, and the benchmark dataset namely, UFPR-ALPR, which provides images of a single vehicle, and another benchmark dataset called, UCSD, which contains images of cars with different orientations. Experimental results on these datasets show that the method outperforms existing methods and is effective in detecting license plate numbers in crowded street scenes.
- Published
- 2020
21. A new augmentation-based method for text detection in night and day license plate images
- Author
-
Tong Lu, Michael Blumenstein, Umapada Pal, Pinaki Nath Chowdhury, and Palaiahnakote Shivakumara
- Subjects
Pixel ,Computer Networks and Communications ,business.industry ,Computer science ,Process (computing) ,Software Engineering ,020207 software engineering ,02 engineering and technology ,Color space ,0803 Computer Software, 0805 Distributed Computing, 0806 Information Systems, 0801 Artificial Intelligence and Image Processing ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Artificial Intelligence & Image Processing ,Computer vision ,Artificial intelligence ,Focus (optics) ,business ,License ,Software - Abstract
Despite a number of methods that have been developed for License Plate Detection (LPD), most of these focus on day images for license plate detection. As a result, license plate detection in night images is still an elusive goal for researchers. This paper presents a new method for LPD based on augmentation and Gradient Vector Flow (GVF) in night and day images. The augmentation involves expanding windows for each pixel in R, G and B color spaces of the input image until the process finds dominant pixels in both night and day license plate images of the respective color spaces. We propose to fuse the dominant pixels in R, G and B color spaces to restore missing pixels. For the results of fusing night and day images, the proposed method explores Gradient Vector Flow (GVF) patterns to eliminate false dominant pixels, which results in candidate pixels. The proposed method explores further GVF arrow patterns to define a unique loop pattern that represents hole in the characters, which gives candidate components. Furthermore, the proposed approach uses a recognition concept to fix the bounding boxes, merging the bounding boxes and eliminating false positives, resulting in text/license plate detection in both night and day images. Experimental results on night images of our dataset and day images of standard license plate datasets, demonstrate that the proposed approach is robust compared to the state-of-the-art methods. To show the effectiveness of the proposed method, we also tested our approach on standard natural scene datasets, namely, ICDAR 2015, MSRA-TD-500, ICDAR 2017-MLT, Total-Text, CTW1500 and MS-COCO datasets, and their results are discussed.
- Published
- 2020
22. Distance Metric Learned Collaborative Representation Classifier(DML-CRC)
- Author
-
Tapabrata Chakraborti, Steven Mills, Umapada Pal, and Brendan McCane
- Subjects
Contextual image classification ,business.industry ,Computer science ,Feature vector ,Feature extraction ,Machine learning ,computer.software_genre ,Convolutional neural network ,Abstraction layer ,Task analysis ,Artificial intelligence ,Transfer of learning ,business ,computer ,Classifier (UML) - Abstract
Any generic deep machine learning algorithm is essentially a function fitting exercise, where the network tunes its weights and parameters to learn discriminatory features by minimizing some cost function. Though the network tries to learn the optimal feature space, it seldom tries to learn an optimal distance metric in the cost function, and hence misses out on an additional layer of abstraction. We present a simple effective way of achieving this by learning a generic Mahalanabis distance in a collaborative loss function in an end-to-end fashion with any standard convolutional network as the feature learner. The proposed method DML-CRC gives state-of-the-art performance on benchmark fine-grained classification datasets CUB Birds, Oxford Flowers and Oxford-IIIT Pets using the VGG-19 deep network. The method is network agnostic and can be used for other similar classification tasks.
- Published
- 2020
23. Indic handwritten script identification using offline-online multi-modal deep network
- Author
-
Partha Pratim Roy, Ankan Kumar Bhunia, Aneeshan Sain, Ayan Kumar Bhunia, Subham Mukherjee, and Umapada Pal
- Subjects
Scheme (programming language) ,Modality (human–computer interaction) ,business.industry ,Computer science ,Deep learning ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Data set ,Identification (information) ,Modal ,Hardware and Architecture ,Scripting language ,Signal Processing ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Software ,Word (computer architecture) ,Natural language processing ,Information Systems ,computer.programming_language - Abstract
In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage. Our method uses a multi-modal deep network which takes both offline and online modality of the data as input in order to explore the information from both the modalities jointly for script identification task. We take handwritten data in either modality as input and the opposite modality is generated through intermodality conversion. Thereafter, we feed this offline-online modality pair to our network. Hence, along with the advantage of utilizing information from both the modalities, the proposed framework can work for both offline and online script identification which alleviates the need for designing two separate script identification modules for individual modality. We also propose a novel conditional multi-modal fusion scheme to combine the information from offline and online modality which takes into account the original modality of the data being fed to our network and thus it combines adaptively. An exhaustive experimental study has been done on a data set including English(Roman) and 6 other official Indic scripts. Our proposed scheme outperforms traditional classifiers along with handcrafted features and deep learning based methods. Experiment results show that using only character level training data can achieve competitive performance against traditional training using word level data.
- Published
- 2020
24. DELP-DAR system for license plate detection and recognition
- Author
-
M. Adel Alimi, Zied Selmi, Mohamed Ben Halima, and Umapada Pal
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,02 engineering and technology ,01 natural sciences ,Convolutional neural network ,Machine Learning (cs.LG) ,Artificial Intelligence ,Robustness (computer science) ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Segmentation ,010306 general physics ,License ,business.industry ,Deep learning ,Pattern recognition ,Signal Processing ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
Automatic License Plate detection and Recognition (ALPR) is a quite popular and active research topic in the field of computer vision, image processing and intelligent transport systems. ALPR is used to make detection and recognition processes more robust and efficient in highly complicated environments and backgrounds. Several research investigations are still necessary due to some constraints such as: completeness of numbering systems of countries, different colors, various languages, multiple sizes and varied fonts. For this, we present in this paper an automatic framework for License Plate (LP) detection and recognition from complex scenes. Our framework is based on mask region convolutional neural networks used for LP detection, segmentation and recognition. Although some studies have focused on LP detection, LP recognition, LP segmentation or just two of them, our study uses the maskr-cnn in the three stages. The evaluation of our framework is enhanced by four datasets for different countries and consequently with various languages. In fact, it tested on four datasets including images captured from multiple scenes under numerous conditions such as varied orientation, poor quality images, blurred images and complex environmental backgrounds. Extensive experiments show the robustness and efficiency of our suggested Extensive experiments show the robustness and efficiency of our suggested system that achieves in accuracy rate 99.3% on AOLP and 98.9% on Caltech dataset.
- Published
- 2020
25. Delaunay triangulation based text detection from multi-view images of natural scene
- Author
-
Soumyadip Roy, Govindaraj Hemantha Kumar, Umapada Pal, Tong Lu, and Palaiahnakote Shivakumara
- Subjects
Similarity (geometry) ,Degree (graph theory) ,business.industry ,Computer science ,Delaunay triangulation ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,Measure (mathematics) ,k-nearest neighbors algorithm ,Artificial Intelligence ,0103 physical sciences ,Signal Processing ,Line (geometry) ,0202 electrical engineering, electronic engineering, information engineering ,Natural (music) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,business ,Software - Abstract
Text detection in the wild is still considered as a challenging issue to the researchers because of its several real time applications like forensic application, where CCTV camera captures images at different angles of the same scene. Unlike the existing methods that consider a single view captured orthogonally for text detection, this paper considers multi-view (view-1 and view-2 of the same spot) of the same scene captured at different angles or different height distances for text detection. For each pair of the same scene, the proposed method extracts features that describe characteristics of text components based on Delaunay Triangulation (DT), namely corner points, area and cavity of the DT. The features of corresponding DT in view-1 and view-2 are compared through cosine distance measure to estimate the similarity between two components of respective view-1 and view-2. If the pair satisfies the similarity condition, the components are considered as Candidate Text Components (CTC). In other words, these are the common components for view-1 and view-2 that satisfy the similarity condition. From each CTC of view-1 and view-2, the proposed method finds nearest neighbor components to restore the components of the same text line based on estimating degree of similarly between CTC and neighbor components using Chi-square and cosine distance measures. Furthermore, the proposed method uses a recognition step to detect correct texts by comparing recognition results of view-1 and view-2. The same recognition step is used for removing false positives to improve the performance of the proposed method. Experimental results on our own dataset, which contains pair of images of different situations, and the standard datasets, namely, ICDAR 2013, MSRATD-500, CTW1500, Total-text, ICDAR 2017 MLT and COCO-text, show that the proposed method outperforms the existing methods.
- Published
- 2020
26. Multi-scale Attention Guided Pose Transfer
- Author
-
Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, and Umapada Pal
- Subjects
FOS: Computer and information sciences ,Artificial Intelligence ,Computer Vision and Pattern Recognition (cs.CV) ,Signal Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Vision and Pattern Recognition ,Software ,Computer Science - Multimedia ,Multimedia (cs.MM) - Abstract
Pose transfer refers to the probabilistic image generation of a person with a previously unseen novel pose from another image of that person having a different pose. Due to potential academic and commercial applications, this problem is extensively studied in recent years. Among the various approaches to the problem, attention guided progressive generation is shown to produce state-of-the-art results in most cases. In this paper, we present an improved network architecture for pose transfer by introducing attention links at every resolution level of the encoder and decoder. By utilizing such dense multi-scale attention guided approach, we are able to achieve significant improvement over the existing methods both visually and analytically. We conclude our findings with extensive qualitative and quantitative comparisons against several existing methods on the DeepFashion dataset., 14 pages
- Published
- 2022
27. A deep action-oriented video image classification system for text detection and recognition
- Author
-
Tong Lu, Daniel P. Lopresti, Abhra Chaudhuri, G. Hemantha Kumar, Pinaki Nath Chowdhury, Palaiahnakote Shivakumara, and Umapada Pal
- Subjects
Action image classification ,Technology ,Maximally stable extremal regions ,Computer science ,Science ,General Chemical Engineering ,General Physics and Astronomy ,Text detection ,Face detection ,Image (mathematics) ,Deep neural networks ,General Materials Science ,General Environmental Science ,Text recognition ,Artificial neural network ,business.industry ,General Engineering ,Pattern recognition ,Video image ,Action (philosophy) ,General Earth and Planetary Sciences ,Artificial intelligence ,business ,Hybrid model - Abstract
Abstract For the video images with complex actions, achieving accurate text detection and recognition results is very challenging. This paper presents a hybrid model for classification of action-oriented video images which reduces the complexity of the problem to improve text detection and recognition performance. Here, we consider the following five categories of genres, namely concert, cooking, craft, teleshopping and yoga. For classifying action-oriented video images, we explore ResNet50 for learning the general pixel-distribution level information and the VGG16 network is implemented for learning the features of Maximally Stable Extremal Regions and again another VGG16 is used for learning facial components obtained by a multitask cascaded convolutional network. The approach integrates the outputs of the three above-mentioned models using a fully connected neural network for classification of five action-oriented image classes. We demonstrated the efficacy of the proposed method by testing on our dataset and two other standard datasets, namely, Scene Text Dataset dataset which contains 10 classes of scene images with text information, and the Stanford 40 Actions dataset which contains 40 action classes without text information. Our method outperforms the related existing work and enhances the class-specific performance of text detection and recognition, significantly. Article highlights The method uses pixel, stable-region and face-component information in a noble way for solving complex classification problems. The proposed work fuses different deep learning models for successful classification of action-oriented images. Experiments on our own dataset as well as standard datasets show that the proposed model outperforms related state-of-the-art (SOTA) methods.
- Published
- 2021
28. A New Method for Detecting Altered Text in Document Images
- Author
-
Lokesh Nandanwar, Bidyut B. Chaudhuri, Umapada Pal, Daniel P. Lopresti, Tong Lu, Bhagesh Seraogi, and Palaiahnakote Shivakumara
- Subjects
Information retrieval ,Graphics software ,Artificial Intelligence ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Computer Vision and Pattern Recognition ,Artificial intelligence ,computer.software_genre ,business ,computer ,Software - Abstract
As more and more office documents are captured, stored, and shared in digital format, and as image editing software are becoming increasingly more powerful, there is a growing concern about document authenticity. To prevent illicit activities, this paper presents a new method for detecting altered text in document images. The proposed method explores the relationship between positive and negative coefficients of DCT to extract the effect of distortions caused by tampering by fusing reconstructed images of respective positive and negative coefficients, which results in Positive-Negative DCT coefficients Fusion (PNDF). To take advantage of spatial information, we propose to fuse R, G, and B color channels of input images, which results in RGBF (RGB Fusion). Next, the same fusion operation is used for fusing PNDF and RGBF, which results in a fused image for the original input one. We compute a histogram to extract features from the fused image, which results in a feature vector. The feature vector is then fed to a deep neural network for classifying altered text images. The proposed method is tested on our own dataset and the standard datasets from the ICPR 2018 Fraud Contest, Altered Handwriting (AH), and faked IMEI number images. The results show that the proposed method is effective and the proposed method outperforms the existing methods irrespective of image type.
- Published
- 2021
29. A New Hybrid Method for Caption and Scene Text Classification in Action Video Images
- Author
-
Michael Blumenstein, Umapada Pal, Palaiahnakote Shivakumara, Lokesh Nandanwar, and Tong Lu
- Subjects
Action (philosophy) ,Artificial Intelligence ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Artificial Intelligence & Image Processing ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,0801 Artificial Intelligence and Image Processing, 1702 Cognitive Sciences ,Video image ,Software - Abstract
Achieving a better recognition rate for text in action video images is challenging due to multiple types of text with unpredictable actions in the background. In this paper, we propose a new method for the classification of caption (which is edited text) and scene text (text that is a part of the video) in video images. This work considers five action classes, namely, Yoga, Concert, Teleshopping, Craft, and Recipes, where it is expected that both types of text play a vital role in understanding the video content. The proposed method introduces a new fusion criterion based on Discrete Cosine Transform (DCT) and Fourier coefficients to obtain the reconstructed images for caption and scene text. The fusion criterion involves computing the variances for coefficients of corresponding pixels of DCT and Fourier images, and the same variances are considered as the respective weights. This step results in Reconstructed image-1. Inspired by the special property of Chebyshev-Harmonic-Fourier-Moments (CHFM) that has the ability to reconstruct a redundancy-free image, we explore CHFM for obtaining the Reconstructed image-2. The reconstructed images along with the input image are passed to a Deep Convolutional Neural Network (DCNN) for classification of caption/scene text. Experimental results on five action classes and a comparative study with the existing methods demonstrate that the proposed method is effective. In addition, the recognition results of the before and after the classification obtained from different methods show that the recognition performance improves significantly after classification, compared to before classification.
- Published
- 2021
30. Text proposals with location-awareness-attention network for arbitrarily shaped scene text detection and recognition
- Author
-
Dajian Zhong, Shujing Lyu, Palaiahankote Shivakumara, Umapada Pal, and Yue Lu
- Subjects
Artificial Intelligence ,General Engineering ,Computer Science Applications - Published
- 2022
31. ARNet: Active-Reference Network for Few-Shot Image Semantic Segmentation
- Author
-
Guangchen Shi, Umapada Pal, Yirui Wu, Shivakumara Palaiahnakote, and Tong Lu
- Subjects
Forgetting ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Boundary (topology) ,Pattern recognition ,Image (mathematics) ,Constraint (information theory) ,Feature (computer vision) ,Segmentation ,Artificial intelligence ,Representation (mathematics) ,Focus (optics) ,business - Abstract
To make predictions on unseen classes, few-shot segmentation becomes a research focus recently. However, most methods build on pixel-level annotation requiring quantity of manual work. Moreover, inherent information on same-category objects to guide segmentation could have large diversity in feature representation due to differences in size, appearance, layout, and so on. To tackle these problems, we present an active-reference network (ARNet) for few-shot segmentation. The proposed active-reference mechanism not only supports accurately cooccurrent objects in either support or query images, but also relaxes high constraint on pixel-level labeling, allowing for weakly boundary labeling. To extract more intrinsic feature representation, a category-modulation module (CMM) is further applied to fuse features extracted from multiple support images, thus forgetting useless and enhancing contributive information. Experiments on PASCAL-5 i dataset show the proposed method achieves a m-IOU score of 56.5% for 1-shot and 59.8% for 5-shot segmentation, being 0.5% and 1.3% higher than current state-of-the-art method.
- Published
- 2021
32. Improved Ring Radius Transform-Based Reconstruction for Video Character Recognition
- Author
-
Bhaarat Chetty, Tong Lu, Zhiheng Huang, Palaiahnakote Shivakumara, Michael Blumenstein, Umapada Pal, and G. Hemantha Kumar
- Subjects
Ring (mathematics) ,Orientation (computer vision) ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,Radius ,01 natural sciences ,0801 Artificial Intelligence and Image Processing, 1702 Cognitive Sciences ,Character (mathematics) ,Low contrast ,Artificial Intelligence ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial Intelligence & Image Processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,010306 general physics ,Shape reconstruction ,business ,Software ,Character recognition ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Character shape reconstruction in video is challenging due to low contrast, complex backgrounds and arbitrary orientation of characters. This work proposes an Improved Ring Radius Transform (IRRT) for reconstructing impaired characters through medial axis prediction. At first, the technique proposes a novel idea based on the Tangent Vector (TV) concept that identifies each actual pair of end pixels caused by gaps in impaired character components. Next, the actual direction to predict medial axis pixels using IRRT for each pair of end pixels is proposed with a new normal vector concept. The process of prediction repeats iteratively to find all the medial axis pixels for every gap in question. Further, medial axis pixels with their radii are used to reconstruct the shapes of impaired characters. The proposed technique is tested on benchmark datasets consisting of video, natural scenes, objects and multi-lingual data to demonstrate that it reconstructs shapes well, even for heterogeneous data. Comparative studies with different binarization and character recognition methods show that the proposed technique is effective, useful and outperforms existing methods.
- Published
- 2021
33. Static Palm Sign Gesture Recognition with Leap Motion and Genetic Algorithm
- Author
-
Rajkumar Saini, Umapada Pal, Sumit Rakesh, Hamam Mokayed, and György Kovács
- Subjects
InformationSystems_INFORMATIONINTERFACESANDPRESENTATION(e.g.,HCI) ,business.industry ,Computer science ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Video camera ,law.invention ,Support vector machine ,Naive Bayes classifier ,ComputingMethodologies_PATTERNRECOGNITION ,law ,Gesture recognition ,Classifier (linguistics) ,Computer vision ,Artificial intelligence ,business ,Sign (mathematics) ,Gesture - Abstract
Sign gesture recognition is the field that models sign gestures in order to facilitate communication with hearing and speech impaired people. Sign gestures are recorded with devices like a video camera or a depth camera. Palm gestures are also recorded with the Leap motion sensor. In this paper, we address palm sign gesture recognition using the Leap motion sensor. We extract geometric features from Leap motion recordings. Next, we encode the Genetic Algorithm (GA) for feature selection. Genetically selected features are fed to different classifiers for gesture recognition. Here we have used Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) classifiers to have their comparative results. The gesture recognition accuracy of 74.00% is recorded with RF classifier on the Leap motion sign gesture dataset.
- Published
- 2021
34. Special issue on deep learning for video text analysis
- Author
-
Ujjwal Maulik, Subhadip Basu, and Umapada Pal
- Subjects
Text mining ,Multimedia ,Artificial Intelligence ,business.industry ,Computer science ,Deep learning ,Signal Processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer.software_genre ,computer ,Software - Published
- 2020
35. Curved text detection in blurred/non-blurred video/scene images
- Author
-
Palaiahnakote Shivakumara, Minglong Xue, Umapada Pal, Chao Zhang, and Tong Lu
- Subjects
Deblurring ,Pixel ,Computer Networks and Communications ,Computer science ,business.industry ,Low-pass filter ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,k-means clustering ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,020207 software engineering ,02 engineering and technology ,k-nearest neighbors algorithm ,Computer Science::Graphics ,Hardware and Architecture ,Minimum bounding box ,Computer Science::Computer Vision and Pattern Recognition ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Bhattacharyya distance ,Computer vision ,Artificial intelligence ,Cluster analysis ,business ,Software ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
Text detection in video/images is challenging due to the presence of multiple blur caused by defocus and motion. In this paper, we present a new method for detecting texts in blurred/non-blurred images. Unlike the existing methods that use deblurring or classifiers, the proposed method estimates degree of blur in images based on contrast variations in neighbor pixels and a low pass filter, which results in candidate pixels for deblurring. We consider gradient values of each pixel as the weight for the degree of blur. The proposed method then performs K-means clustering on weighted values of candidate pixels to get text candidates irrespective of blur types. Next, Bhattacharyya distance is used to extract symmetry property of texts to remove false text candidates, which provides text components. Further, the proposed method fixes bounding box for each text component based on the nearest neighbor criteria and direction of the text component. Experimental results on defocus, motion, non-blurred images and standard datasets of curved text show that the proposed method outperforms the existing methods.
- Published
- 2019
36. Multi-Script-Oriented Text Detection and Recognition in Video/Scene/Born Digital Images
- Author
-
K. S. Raghunandan, Palaiahnakote Shivakumara, G. Hemantha Kumar, Sangheeta Roy, Umapada Pal, and Tong Lu
- Subjects
Orientation (computer vision) ,Computer science ,business.industry ,Feature vector ,Feature extraction ,Pattern recognition ,02 engineering and technology ,Contourlet ,k-nearest neighbors algorithm ,Digital image ,Wavelet ,Most significant bit ,Computer Science::Computer Vision and Pattern Recognition ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,020201 artificial intelligence & image processing ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Bit plane - Abstract
Achieving good text detection and recognition results for multi-script-oriented images is a challenging task. First, we explore bit plane slicing in order to utilize the advantage of the most significant bit information to identify text components. A new iterative nearest neighbor symmetry is then proposed based on shapes of convex and concave deficiencies of text components in bit planes to identify candidate planes. Further, we introduce a new concept called mutual nearest neighbor pair components based on gradient direction to identify representative pairs of texts in each candidate bit plane. The representative pairs are used to restore words with the help of edge image of the input one, which results in text detection results (words). Second, we propose a new idea by fixing window for character components of arbitrary oriented words based on angular relationship between sub-bands and a fused band. For each window, we extract features in contourlet wavelet domain to detect characters with the help of an SVM classifier. Further, we propose to explore HMM for recognizing characters and words of any orientation using the same feature vector. The proposed method is evaluated on standard databases such as ICDAR, YVT video, ICDAR, SVT, MSRA scene data, ICDAR born digital data, and multi-lingual data to show its superiority to the state of the art methods.
- Published
- 2019
37. Fractional means based method for multi-oriented keyword spotting in video/scene/license plate images
- Author
-
Rabha W. Ibrahim, Sangheeta Roy, Umapada Pal, Palaiahnakote Shivakumara, Tong Lu, Ainuddin Wahid Abdul Wahab, Vijeta Khare, and Hamid A. Jalab
- Subjects
0209 industrial biotechnology ,business.industry ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,General Engineering ,Process (computing) ,Context (language use) ,Pattern recognition ,02 engineering and technology ,Spotting ,Computer Science Applications ,020901 industrial engineering & automation ,Artificial Intelligence ,Keyword spotting ,0202 electrical engineering, electronic engineering, information engineering ,Canny edge detector ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Cluster analysis - Abstract
Retrieving desired information from databases containing video, natural scene, and license plate images through keyword spotting is a big challenge to expert systems due to different complexities that occur because of background and foreground variations of texts in real-time environments. To reduce background complexity of input images, we introduce a new model based on fractional means that considers neighboring information of pixels to widen the gap between text and background. To do so, the process obtains text candidates with the help of k-means clustering. The proposed approach explores the combination of Radon and Fourier coefficients to define context features based on regular patterns given by coefficient distributions for foreground and background of text candidates. This process eliminates non-text candidates regardless of different font types and sizes, colors, orientations and scripts, and results in representatives of texts. The proposed approach then exploits the fact that text pixels share almost the same values to restore missing text components using Canny edge image by proposing a new idea of minimum cost path based ring growing, and then outputs keywords. Furthermore, the proposed approach extracts the same above-mentioned features locally and globally for spotting words from images. Experimental results on different benchmark databases, namely, ICDAR 2013, ICDAR 2015, YVT, NUS video data, ICDAR 2013, ICDAR 2015, SVT, MSRA, UCSC, Medialab and Uninusubria license plate data show that the proposed method is effective and useful compared to the existing methods.
- Published
- 2019
38. Sub-Stroke-Wise Relative Feature for Online Indic Handwriting Recognition
- Author
-
Umapada Pal, Nilanjana Bhattacharya, and Partha Pratim Roy
- Subjects
Similarity (geometry) ,General Computer Science ,business.industry ,Character (computing) ,Computer science ,02 engineering and technology ,Distinctive feature ,computer.software_genre ,language.human_language ,Bengali ,Handwriting recognition ,Devanagari ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Cursive ,computer ,Natural language processing - Abstract
The main problem of Bangla (Bengali) and Devanagari handwriting recognition is the shape similarity of characters. There are only a few pieces of work on writer-independent cursive online Indian text recognition, and the shape similarity problem needs more attention from the researchers. To handle the shape similarity problem of cursive characters of Bangla and Devanagari scripts, in this article, we propose a new category of features called ‘ sub-stroke-wise relative feature ’ (SRF) which are based on relative information of the constituent parts of the handwritten strokes. Relative information among some of the parts within a character can be a distinctive feature as it scales up small dissimilarities and enhances discrimination among similar-looking shapes. Also, contextual anticipatory phenomena are automatically modeled by this type of feature, as it takes into account the influence of previous and forthcoming strokes. We have tested popular state-of-the-art feature sets as well as proposed SRF using various (up to 20,000-word) lexicons and noticed that SRF significantly outperforms the state-of-the-art feature sets for online Bangla and Devanagari cursive word recognition.
- Published
- 2018
39. A New Convolutional Neural Network based on a Saprse Convolutional Layer for Animal Face Detection
- Author
-
Wael Ouarda, Adel M. Alimi, Fatma BenSaid, Islem Jarraya, and Umapada Pal
- Subjects
Computer science ,business.industry ,Classifier (linguistics) ,Detector ,Pattern recognition ,Feature selection ,Artificial intelligence ,Layer (object-oriented design) ,Face detection ,business ,Convolutional neural network - Abstract
This paper focuses on the face detection problem of three popular animal cat-egories that need control such as horses, cats and dogs. To be precise, a new Convolutional Neural Network for Animal Face Detection (CNNAFD) is actu-ally investigated using processed filters based on gradient features and applied with a new way. A new convolutional layer is proposed through a sparse feature selection method known as Automated Negotiation-based Online Feature Selection (ANOFS). CNNAFD ends by stacked fully connected layers which represent a strong classifier. The fusion of CNNAFD and MobileNetV2 constructs the newnetwork CNNAFD-MobileNetV2 which improves the classification results and gives better detection decisions. Our work also introduces a new Tunisian Horse Detection Database (THDD). The proposed detector with the new CNNAFD-MobileNetV2 network achieved an average precision equal to 99.78%, 99% and 98.28% for cats, dogs and horses respectively.
- Published
- 2021
40. Recognizing Bengali Word Images - A Zero-Shot Learning Perspective
- Author
-
Prashant Kumar Prasad, Lambert Schomaker, Sukalpa Chanda, Jochem Baas, Umapada Pal, Daniel Haitink, and Artificial Intelligence
- Subjects
Computer science ,business.industry ,Character recognition ,Perspective (graphical) ,Shape ,Image recognition ,02 engineering and technology ,computer.software_genre ,Zero shot learning ,Class (biology) ,language.human_language ,Signature (logic) ,Bengali ,Word recognition ,0202 electrical engineering, electronic engineering, information engineering ,language ,Couplings ,Training ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
Zero-Shot Learning(ZSL) techniques could classify a completely unseen class, which it has never seen before during training. Thus, making it more apt for any real-life classification problem, where it is not possible to train a system with annotated data for all possible class types. This work investigates recognition of word images written in Bengali Script in a ZSL framework. The proposed approach performs Zero-Shot word recognition by coupling deep learned features procured from various CNN architectures along with 13 basic shapes/stroke primitives commonly observed in Bengali script characters. As per the notion of ZSL framework those 13 basic shapes are termed as “Signature/Semantic Attributes”. The obtained results are promising while evaluation was carried out in a Five-Fold cross-validation setup dealing with samples from 250 word classes.
- Published
- 2021
41. DCT-phase statistics for forged IMEI numbers and air ticket detection
- Author
-
Palaiahnakote Shivakumara, Tong Lu, Devanur S. Guru, Lokesh Nandanwar, Michael Blumenstein, Umapada Pal, V. Basavaraja, and Swati Kanchan
- Subjects
0209 industrial biotechnology ,Computer science ,Feature vector ,General Engineering ,02 engineering and technology ,Computer Science Applications ,01 Mathematical Sciences, 08 Information and Computing Sciences, 09 Engineering ,020901 industrial engineering & automation ,Artificial Intelligence ,Ticket ,Statistics ,0202 electrical engineering, electronic engineering, information engineering ,Discrete cosine transform ,020201 artificial intelligence & image processing ,Artificial Intelligence & Image Processing ,Classifier (UML) - Abstract
New tools have been developing with the intention of having more flexibility and greater user-friendliness for editing the images and documents in digital technologies, but, unfortunately, they are also being used for manipulating and tampering information. Examples of such crimes include creating forged International Mobile Equipment Identity (IMEI) numbers which are embedded on mobile packages and inside smart mobile cases for illicit activities. Another example of such crimes is altering the name or date on air tickets for breaching security at the airport. This paper presents a new expert system for detecting forged IMEI numbers as well as altered air ticket images. The proposed method derives the phase spectrum using the Discrete Cosine Transform (DCT) to highlight the suspicious regions; it is unlike the phase spectrum from a Fourier transform, which is ineffective due to power spectrum noise. From the phase spectrum, our method extracts phase statistics to study the effect of distortions introduced by forgery operations. This results in feature vectors, which are fed to a Support Vector Machine (SVM) classifier for detection of forged IMEI numbers and air ticket images. Experimental results on our dataset of forged IMEI numbers (which is created by us for this work), on altered air tickets, on benchmark datasets of video caption text (which is tampered text), and on altered receipts of the ICPR 2018 FDC dataset, show that the proposed method is robust across different datasets. Furthermore, comparative studies of the proposed method with the existing methods on the same datasets show that the proposed method outperforms the existing methods. The dataset created will be available freely on request to the authors.
- Published
- 2021
42. Inception-based Deep Learning Architecture for Tuberculosis Screening using Chest X-rays
- Author
-
Umapada Pal, Dipayan Das, and K. C. Santosh
- Subjects
Computer science ,business.industry ,Deep learning ,0206 medical engineering ,Feature extraction ,CAD ,02 engineering and technology ,Machine learning ,computer.software_genre ,020601 biomedical engineering ,Convolutional neural network ,Facial recognition system ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,Computer-aided diagnosis ,Benchmark (computing) ,Artificial intelligence ,business ,computer ,Mass screening - Abstract
The motivation for this work is the primary need of screening Tuberculosis (TB) positive patients in the severely resource constrained regions of the world. Chest X-ray (CXR) is considered to be a promising indicator for the onset of TB, but the lack of skilled radiologists in such resource constrained regions degrades the situation. Therefore, several computer aided diagnosis (CAD) systems have been proposed to solve the decision making problem, which includes hand-engineered feature extraction methods to deep learning or Convolutional Neural Network (CNN) based methods. Feature extraction, being a time and resource intensive process, often delays the process of mass screening. Hence, an end to end CNN architecture is proposed in this work to solve the problem. Two benchmark CXR datasets have been used in this work, collected from Shenzhen (China) and Montgomery County (USA), on which the proposed methodology achieved a maximum abnormality detection accuracy (ACC) of 91.7% (0.96 AUC) and 87.47% (0.92 AUC) respectively. Considering these datasets, to the best of our knowledge, the obtained results are superior to the state of the art deep learning based works.
- Published
- 2021
43. Local Gradient Difference Features for Classification of 2D-3D Natural Scene Text Images
- Author
-
Nor Badrul Anuar, Lokesh Nandanwar, Palaiahnakote Shivakumara, Tong Lu, Daniel P. Lopresti, Umapada Pal, and Ramachandra Raghavendra
- Subjects
Artificial neural network ,Pixel ,Computer science ,business.industry ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Text detection ,Measure (mathematics) ,Image (mathematics) ,0202 electrical engineering, electronic engineering, information engineering ,Natural (music) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Graphical model ,business - Abstract
Methods developed for normal 2D text detection do not work well for text that is rendered using decorative, 3D effects, etc. This paper proposes a new method for classification of 2D and 3D natural scene text images so that an appropriate recognition method can be chosen accordingly based on the classification results for better performance. The proposed method explores local gradient differences for obtaining candidate pixels, which represent a stroke. To study the spatial distribution of candidate pixels, we propose a measure, called COLD, which is denser for pixels toward the center of strokes and scattered for non-stroke pixels. This observation leads us to introduce mass features for extracting the regular spatial pattern of COLD, which indicates a 2D text image. The extracted features are fed into a Neural Network (NN) for classification. The proposed method is tested on (i) a new dataset introduced in this work (ii) a second dataset assembled from standard natural scene datasets (iii) Non-Text Image datasets which does not contain text, rather it contains objects. Experimental results of the proposed method on images with text and non-text show that the proposed method is independent of text. The proposed approach improves text detection and recognition performance significantly after classification.
- Published
- 2021
44. Chebyshev-Harmonic-Fourier-Moments and Deep CNNs for Detecting Forged Handwriting
- Author
-
Tong Lu, Sayani Kundu, Umapada Pal, Lokesh Nandanwar, Palaiahnakote Shivakumara, and Daniel P. Lopresti
- Subjects
Contextual image classification ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Pattern recognition ,02 engineering and technology ,01 natural sciences ,Convolutional neural network ,ComputingMethodologies_PATTERNRECOGNITION ,Discriminative model ,Handwriting recognition ,Distortion ,0103 physical sciences ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,010306 general physics ,business ,Image restoration - Abstract
Recently developed sophisticated image processing techniques and tools have made easier the creation of high-quality forgeries of handwritten documents including financial and property records. To detect such forgeries of handwritten documents, this paper presents a new method by exploring the combination of Chebyshev-Harmonic-Fourier-Moments (CHFM) and deep Convolutional Neural Networks (D-CNNs). Unlike existing methods work based on abrupt changes due to distortion created by forgery operation, the proposed method works based on inconsistencies and irregular changes created by forgery operations. Inspired by the special properties of CHFM, such as its reconstruction ability by removing redundant information, the proposed method explores CHFM to obtain reconstructed images for the color components of the Original, Forged Noisy and Blurred classes. Motivated by the strong discriminative power of deep CNNs, for the reconstructed images of respective color components, the proposed method used deep CNNs for forged handwriting detection. Experimental results on our dataset and benchmark datasets (namely, ACPR 2019, ICPR 2018 FCD and IMEI datasets) show that the proposed method outperforms existing methods in terms of classification rate.
- Published
- 2021
45. Air Writing: Recognizing Multi-Digit Numeral String Traced in Air Using RNN-LSTM Architecture
- Author
-
Adil Rahman, Prasun Roy, and Umapada Pal
- Subjects
Source code ,Computer science ,business.industry ,media_common.quotation_subject ,English numerals ,String (computer science) ,Pattern recognition ,Numerical digit ,Numeral system ,Noise ,Sliding window protocol ,Artificial intelligence ,business ,MNIST database ,media_common - Abstract
Air writing provides a more natural and immersive way of interacting with devices, with the potential of having significant application in fields like augmented reality and education. However, such systems often rely on expensive hardware, making them less accessible for general purposes. In this study, we propose a robust and inexpensive system for the recognition of multi-digit numerals traced in an air-writing environment which uses only a generic device camera for input. We employ a sliding window-based algorithm to isolate a small segment of the input for processing. A dual network configuration consisting of RNN-LSTM networks are used for noise elimination and digit recognition. We conduct our experiments on English numerals using the MNIST dataset as the baseline model to allow easy adaptability of our method. Our results are further improved by the use of Pendigits and ISI-Air online datasets. We observed a drop in accuracy with increase in the number of digits owing to the accumulation of transition noise. However, bi-directional scanning considerably reduces the impact of such noise on the recognition accuracy. Under standard conditions, our system produced an accuracy of 98.75% and 85.27% for single and multi-digit English numerals, respectively. Incorporation of selective frame skipping in the sliding window algorithm resulted in a 60% reduction in computational time, significantly improving the system performance. We provide a link to the source code of our system at the end of this paper.
- Published
- 2021
46. Text and Non-text Frame Classification in Video
- Author
-
Palaiahnakote Shivakumara and Umapada Pal
- Subjects
business.industry ,Computer science ,Frame (networking) ,Computer vision ,Artificial intelligence ,business - Published
- 2021
47. ICDAR 2021 Competition on Script Identification in the Wild
- Author
-
Abdeljalil Gattal, Nguyen Quoc Cuong, Moises Diaz, Miguel Ferrer, Tadahito Yao, Abhijit Das, Hongliang Li, Seungjae Kim, Aythami Morales, Le Quang Hung, Umapada Pal, Wentao Yang, Kensho Ota, and Donato Impedovo
- Subjects
Research groups ,business.industry ,Computer science ,Deep learning ,Image processing ,Document analysis ,computer.software_genre ,Test (assessment) ,Competition (economics) ,Identification (information) ,Scripting language ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
The paper presents a summary of the 1st Competition on Script Identification in the Wild (SIW 2021) organised in conjunction with 16th International Conference on Document Analysis and Recognition (ICDAR 2021). The goal of SIW is to evaluate the limits of script identification approaches through a large scale in the wild database including 13 scripts (MDIW-13 dataset) and two different scenarios (handwritten and printed). The competition includes the evaluation over three different tasks depending of the nature of the data used for training and testing. Nineteen research groups registered for SIW 2021, out of which 6 teams from both academia and industry took part in the final round and submitted a total of 166 algorithms for scoring. Submissions included a wide variety of deep-learning solutions as well as approaches based on standard image processing techniques. The performance achieved by the participants prove the elevate accuracy of deep learning methods in comparison with traditional statistical approaches. The best approach obtained classification accuracies of 99% in all three tasks with experiments over more than 50K test samples. The results suggest that there is still room for improvements, specially over handwritten samples and specific scripts.
- Published
- 2021
48. DCINN: Deformable Convolution and Inception Based Neural Network for Tattoo Text Detection Through Skin Region
- Author
-
Palaiahnakote Shivakumara, Sukalpa Chanda, Umapada Pal, Ramachandra Raghavendra, Tamal Chowdhury, and Tong Lu
- Subjects
Identification (information) ,Artificial neural network ,Feature (computer vision) ,Bounding overwatch ,Computer science ,business.industry ,Orientation (computer vision) ,SKIN REGIONS ,Pattern recognition ,Text detection ,Artificial intelligence ,business ,Convolution - Abstract
Identifying Tattoo is an integral part of forensic investigation and crime identification. Tattoo text detection is challenging because of its freestyle handwriting over the skin region with a variety of decorations. This paper introduces Deformable Convolution and Inception based Neural Network (DCINN) for detecting tattoo text. Before tattoo text detection, the proposed approach detects skin regions in the tattoo images based on color models. This results in skin regions containing Tattoo text, which reduces the background complexity of the tattoo text detection problem. For detecting tattoo text in the skin regions, we explore a DCINN, which generates binary maps from the final feature maps using differential binarization technique. Finally, polygonal bounding boxes are generated from the binary map for any orientation of text. Experiments on our Tattoo-Text dataset and two standard datasets of natural scene text images, namely, Total-Text, CTW1500 show that the proposed method is effective in detecting Tattoo text as well as natural scene text in the images. Furthermore, the proposed method outperforms the existing text detection methods in several criteria.
- Published
- 2021
49. Automatic Signature-Based Writer Identification in Mixed-Script Scenarios
- Author
-
Himadri Mukherjee, Umapada Pal, Kaushik Roy, Sk Md Obaidullah, and Mridul Ghosh
- Subjects
Biometrics ,business.industry ,Computer science ,Deep learning ,computer.software_genre ,Signature (logic) ,Identification (information) ,Scripting language ,Devanagari ,Pattern recognition (psychology) ,Feature (machine learning) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Automated approach for human identification based on biometric traits has become popular research topic among the scientists since last few decades. Among the several biometric modalities, handwritten signature is one of the very common and most prevalent approaches. In the past, researchers have proposed different handcrafted feature-based techniques for automatic writer identification from offline signatures. Currently huge interests towards deep learning-based solutions for several real-life pattern recognition problems have been found which revealed promising results. In this paper, we propose a light-weight CNN architecture to identify writers from offline signatures written by two popular scripts namely Devanagari and Roman. Experiments were conducted using two different frameworks which are as follows: (i) In first case, signature script separation has been carried out followed by script-wise writer identification, (ii) Secondly, signature of two scripts was mixed together with various ratios and writer identification has been performed in a script independent manner. Outcome of both the frameworks have been analyzed to get the comparative idea. Furthermore, comparative analysis was done with recognized CNN architectures as well as handcrafted feature-based approaches and the proposed method shows better outcome. The dataset used in this paper can be freely downloaded from the link: https://ieee-dataport.org/open-access/multi-script-handwritten-signature-roman-devanagari for research purpose.
- Published
- 2021
50. Word and Character Segmentation
- Author
-
Palaiahnakote Shivakumara and Umapada Pal
- Subjects
Space (punctuation) ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Text detection ,computer.software_genre ,Video image ,Character (mathematics) ,Natural (music) ,Segmentation ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
In the previous chapter, the methods for text detection from natural scene and video image are discussed. To recognize text, the methods require characters. Therefore, this chapter focuses on word and character segmentation based on the space between words and characters.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.