1,405 results
Search Results
2. Information Extraction from Images of Paper-Based Maps.
- Author
-
Kasturi, Rangachar and Alemany, Juan
- Subjects
- *
GEOGRAPHIC information systems , *MAPS , *DATABASES , *INFORMATION storage & retrieval systems , *IMAGE processing , *ARTIFICIAL intelligence , *PATTERN recognition systems , *QUERY (Information retrieval system) - Abstract
The goal of the research described in this paper is the design of a system to automatically extract information from paper-based maps and answer queries related to spatial features and structure of geographic data. The foundation to such a system is a set of image analysis algorithms to extract spatial features from images of paper-based maps. Efficient algorithms to detect symbols, identify and track various types of lines, follow closed contours, compute distances, find shortest paths, etc., from simplified map images have been developed. A query processor analyzes the queries presented by the user in a predefined syntax, controls the operation of the image processing algorithms, and interacts with the user. The query processor is written in Lisp and calls image analysis routines written in Fortran. [ABSTRACT FROM AUTHOR]
- Published
- 1988
- Full Text
- View/download PDF
3. Generating paper texture of historical documents using statistical moments
- Author
-
Rafael Dueire Lins and C.A.B. Mello
- Subjects
business.industry ,Computer science ,Segmentation-based object categorization ,Binary image ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-space segmentation ,Pattern recognition ,Image processing ,Image segmentation ,Automatic image annotation ,Image texture ,Region growing ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Segmentation ,Artificial intelligence ,business ,Feature detection (computer vision) - Abstract
This paper presents a scheme for generating paper texture of historical documents. A new entropy based segmentation algorithm is used to decompose the image of documents into the image of the paper background and the printing of the document. Statistical analysis allows filling in the gaps from the printing, yielding a blank sheet of paper with similar texture to the original document.
- Published
- 2002
4. Pushing the Limits of Deep CNNs for Pedestrian Detection.
- Author
-
Hu, Qichang, Wang, Peng, Shen, Chunhua, van den Hengel, Anton, and Porikli, Fatih
- Subjects
ARTIFICIAL neural networks ,ARTIFICIAL intelligence ,ALGORITHMS ,IMAGE processing ,BIG data - Abstract
Compared with other applications in computer vision, convolutional neural networks (CNNs) have underperformed on pedestrian detection. A breakthrough was made very recently using sophisticated deep CNN (DCNN) models, with a number of handcrafted features or explicit occlusion handling mechanism. In this paper, we show that by reusing the convolutional feature maps of a DCNN model as image features to train an ensemble of boosted decision models, we are able to achieve the best reported accuracy without using specially designed learning algorithms. We empirically identify and disclose important implementation details. We also show that pixel labeling may be simply combined with a detector to boost the detection performance. By adding complementary handcrafted features such as optical flow, the DCNN-based detector can be further improved. We advance the state-of-the-art results by lowering the log-average miss rate from 11.7% to 8.9% on the Caltech data set and from 11.2% to 8.6% on the Inria data set. We also achieve a comparable result to state-of-the-art approaches on the KITTI data set. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
5. Spatial and Temporal Downsampling in Event-Based Visual Classification.
- Author
-
Cohen, Gregory, Afshar, Saeed, Orchard, Garrick, Tapson, Jonathan, Benosman, Ryad, and van Schaik, Andre
- Subjects
SPATIO-temporal variation ,ARTIFICIAL intelligence ,IMAGE processing - Abstract
As the interest in event-based vision sensors for mobile and aerial applications grows, there is an increasing need for high-speed and highly robust algorithms for performing visual tasks using event-based data. As event rate and network structure have a direct impact on the power consumed by such systems, it is important to explore the efficiency of the event-based encoding used by these sensors. The work presented in this paper represents the first study solely focused on the effects of both spatial and temporal downsampling on event-based vision data and makes use of a variety of data sets chosen to fully explore and characterize the nature of downsampling operations. The results show that both spatial downsampling and temporal downsampling produce improved classification accuracy and, additionally, a lower overall data rate. A finding is particularly relevant for bandwidth and power constrained systems. For a given network containing 1000 hidden layer neurons, the spatially downsampled systems achieved a best case accuracy of 89.38% on N-MNIST as opposed to 81.03% with no downsampling at the same hidden layer size. On the N-Caltech101 data set, the downsampled system achieved a best case accuracy of 18.25%, compared with 7.43% achieved with no downsampling. The results show that downsampling is an important preprocessing technique in event-based visual processing, especially for applications sensitive to power consumption and transmission bandwidth. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
6. Cover.
- Subjects
IMAGE recognition (Computer vision) ,ARTIFICIAL intelligence ,IMAGE processing ,MACHINE learning ,IMAGE retrieval ,SOFTWARE architecture - Abstract
These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
7. Lightweight Modules for Efficient Deep Learning Based Image Restoration.
- Author
-
Lahiri, Avisek, Bairagya, Sourav, Bera, Sutanu, Haldar, Siddhant, and Biswas, Prabir Kumar
- Subjects
IMAGE reconstruction ,IMAGE denoising ,IMAGE processing ,GENERATIVE adversarial networks ,DEEP learning ,ARTIFICIAL intelligence ,BRIDGE design & construction - Abstract
Low level image restoration is an integral component of modern artificial intelligence (AI) driven camera pipelines. Most of these frameworks are based on deep neural networks which present a massive computational overhead on resource constrained platform like a mobile phone. In this paper, we propose several lightweight low-level modules which can be used to create a computationally low cost variant of a given baseline model. Recent works for efficient neural networks design have mainly focused on classification. However, low-level image processing falls under the ‘image-to-image’ translation genre which requires some additional computational modules not present in classification. This paper seeks to bridge this gap by designing generic efficient modules which can replace essential components used in contemporary deep learning based image restoration networks. We also present and analyse our results highlighting the drawbacks of applying depthwise separable convolutional kernel (a popular method for efficient classification network) for sub-pixel convolution based upsampling (a popular upsampling strategy for low-level vision applications). This shows that concepts from domain of classification cannot always be seamlessly integrated into ‘image-to-image’ translation tasks. We extensively validate our findings on three popular tasks of image inpainting, denoising and super-resolution. Our results show that proposed networks consistently output visually similar reconstructions compared to full capacity baselines with significant reduction of parameters, memory footprint and execution speeds on contemporary mobile devices. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Instance Selection Using Nonlinear Sparse Modeling.
- Author
-
Dornaika, Fadi and Aldine, Ihab Kamal
- Subjects
DATA mining ,ALGORITHMS ,BIG data ,IMAGE processing ,ARTIFICIAL intelligence - Abstract
Sparse modeling representative selection (SMRS) has been recently introduced for selecting the most relevant examples in data sets. SMRS exploits data self-representativeness coding in order to infer a coding matrix with block sparsity constraint. The relevance scores of samples are then derived from the estimated matrix of coefficients. Since SMRS is based on a linear model for data self-representation, it cannot always provide good relevant samples. Besides, most of its selected samples can be found in dense areas in input space. In this paper, we propose to overcome the SMRS method’s shortcomings that are related to the coding matrix estimation. We introduce two nonlinear data self-representativeness coding schemes that are based on Hilbert space and column generation. Experimental evaluation is carried out on summarizing a video movie and on summarizing training image data sets used for classification tasks. These experiments demonstrated that the proposed nonlinear methods can outperform state-of-the art selection methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
9. Semi-Continuity of Skeletons in Two-Manifold and Discrete Voronoi Approximation.
- Author
-
Liu, Yong-Jin
- Subjects
VORONOI polygons ,COMPUTER vision ,IMAGE processing ,ARTIFICIAL intelligence ,PATTERN matching - Abstract
The skeleton of a 2D shape is an important geometric structure in pattern analysis and computer vision. In this paper we study the skeleton of a 2D shape in a two-manifold \mathcal M
, based on a geodesic metric. We present a formal definition of the skeleton $S(\Omega )$ in S(\Omega )$ distinct from its Euclidean counterpart in . We further prove that for a shape sequence \lbrace \Omega _i\rbrace$ in \mathcal M gives a good approximation to the skeleton $S(\Omega )$ . Examples of skeleton computation in topography and brain morphometry are illustrated. [ABSTRACT FROM AUTHOR]- Published
- 2015
- Full Text
- View/download PDF
10. A Review on Intelligence Dehazing and Color Restoration for Underwater Images.
- Author
-
Han, Min, Lyu, Zhiyu, Qiu, Tie, and Xu, Meiling
- Subjects
IMAGE reconstruction ,TELECOMMUNICATION cables ,IMAGE processing ,AUTONOMOUS underwater vehicles ,ARTIFICIAL intelligence - Abstract
Underwater image processing is an intelligence research field that has great potential to help developers better explore the underwater environment. Underwater image processing has been used in a wide variety of fields, such as underwater microscopic detection, terrain scanning, mine detection, telecommunication cables, and autonomous underwater vehicles. However, underwater imagery suffers from strong absorption, scattering, color distortion, and noise from the artificial light sources, causing image blur, haziness, and a bluish or greenish tone. Therefore, the enhancement of underwater imagery can be divided into two methods: 1) underwater image dehazing and 2) underwater image color restoration. This paper presents the reason for underwater image degradation, surveys the state-of-the-art intelligence algorithms like deep learning methods in underwater image dehazing and restoration, demonstrates the performance of underwater image dehazing and color restoration with different methods, introduces an underwater image color evaluation metric, and provides an overview of the major underwater image applications. Finally, we summarize the application of underwater image processing. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. GPGPU-Based ATPG System: Myth or Reality?
- Author
-
Lai, Liyang, Tsai, Kun-Han, and Li, Huawei
- Subjects
MIXED reality ,GRAPHICS processing units ,MYTH ,VIDEO processing ,DATA mining ,IMAGE processing ,ARTIFICIAL intelligence - Abstract
General-purpose computing on graphics processing units (GPGPUs) is a programming model that uses graphics cards to perform computations traditionally done by CPU. It began to become practical with the advent of programmable shaders and floating-point support on GPU in around 2001. The spread of GPGPU has been accelerated with introduction of CUDA from NVIDIA in 2006 and later OpenCL in 2009. Nowadays GPGPU is widely deployed in various applications, such as data mining, artificial intelligence, and many scientific computations. GPGPU seemingly promises immense parallelism with massive concurrent cores, and thus much shorter run times. This is true for algorithms that bear intrinsic data and task parallelism, such as image and video processing. For an ATPG system where some algorithms are sequential in nature, the speedup is not easy to achieve in the real world. Flaws in setting up speedup evaluation can lead to false promises. Will GPGPU-based ATPG system become a reality? Or it is just a myth. In this paper, we try to provide an answer by surveying state-of-the-art works and by analyzing practical aspects of today’s industrial designs. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. Topology-Aware Differential Privacy for Decentralized Image Classification.
- Author
-
Guo, Shangwei, Zhang, Tianwei, Xu, Guowen, Yu, Han, Xiang, Tao, and Liu, Yang
- Subjects
PRIVACY ,NOISE control ,ARTIFICIAL intelligence ,FAULT tolerance (Engineering) ,DEEP learning ,QUEUING theory - Abstract
Image classification is a fundamental artificial intelligence task that labels images into one of some predefined classes. However, training complex image classification models requires a large amount of computation resources and data in order to reach state-of-the-art performance. This demand drives the growth of distributed deep learning, where multiple agents cooperatively train global models with their individual datasets. Among such learning systems, decentralized learning is particularly attractive, as it can improve the efficiency and fault tolerance by eliminating the centralized parameter server, which could be the single point of failure or performance bottleneck. Although the agents do not need to disclose their training image samples, they exchange parameters with each other at each iteration, which can put them at the risk of data privacy leakage. Past works demonstrated the possibility of recovering training images from the exchanged parameters. One common defense direction is to adopt Differential Privacy (DP) to secure the optimization algorithms such as Stochastic Gradient Descent (SGD). Those DP-based methods mainly focus on standalone systems, or centralized distributed learning. How to enforce and optimize DP protection in decentralized learning systems is unknown and challenging, due to their complex communication topologies and distinct learning characteristics. In this paper, we design TOP- DP, a novel solution to optimize the differential privacy protection of decentralized image classification systems. The key insight of our solution is to leverage the unique features of decentralized communication topologies to reduce the noise scale and improve the model usability. (1) We enhance the DP-SGD algorithm with this topology-aware noise reduction strategy, and integrate the time-aware noise decay technique. (2) We design two novel learning protocols (synchronous and asynchronous) to protect systems with different network connectivities and topologies. We formally analyze and prove the DP requirement of our proposed solutions. Experimental evaluations demonstrate that our solution achieves a better trade-off between usability and privacy than prior works. To the best of our knowledge, this is the first DP optimization work from the perspective of network topologies. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation.
- Author
-
Niu, Yulei, Lu, Zhiwu, Wen, Ji-Rong, Xiang, Tao, and Chang, Shih-Fu
- Subjects
IMAGE processing ,IMAGE reconstruction ,DEEP learning ,ARTIFICIAL neural networks ,ARTIFICIAL intelligence - Abstract
Image annotation aims to annotate a given image with a variable number of class labels corresponding to diverse visual concepts. In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept and 2) how to annotate an image with the optimal number of class labels. To address the first issue, we propose a novel multi-scale deep model for extracting rich and discriminative features capable of representing a wide range of visual concepts. Specifically, a novel two-branch deep neural network architecture is proposed, which comprises a very deep main network branch and a companion feature fusion network branch designed for fusing the multi-scale features computed from the main branch. The deep model is also made multi-modal by taking noisy user-provided tags as model input to complement the image input. For tackling the second issue, we introduce a label quantity prediction auxiliary task to the main label prediction task to explicitly estimate the optimal label number for a given image. Extensive experiments are carried out on two large-scale image annotation benchmark datasets, and the results show that our method significantly outperforms the state of the art. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
14. PAT—Probabilistic Axon Tracking for Densely Labeled Neurons in Large 3-D Micrographs.
- Author
-
Skibbe, Henrik, Reisert, Marco, Nakae, Ken, Watakabe, Akiya, Hata, Junichi, Mizukami, Hiroaki, Okano, Hideyuki, Yamamori, Tetsuo, and Ishii, Shin
- Subjects
BRAIN imaging ,BRAIN mapping ,ARTIFICIAL intelligence ,IMAGE processing ,MONTE Carlo method - Abstract
A major goal of contemporary neuroscience research is to map the structural connectivity of mammalian brain using microscopy imaging data. In this context, the reconstruction of densely labeled axons from two-photon microscopy images is a challenging and important task. The visually overlapping, crossing, and often strongly distorted images of the axons allow many ambiguous interpretations to be made. We address the problem of tracking axons in densely labeled samples of neurons in large image data sets acquired from marmoset brains. Our high-resolution images were acquired using two-photon microscopy and they provided whole brain coverage, occupying terabytes of memory. Both the image distortions and the large data set size frequently make it impractical to apply present-day neuron tracing algorithms to such data due to the optimization of such algorithms to the precise tracing of either single or sparse sets of neurons. Thus, new tracking techniques are needed. We propose a probabilistic axon tracking algorithm (PAT). PAT tackles the tracking of axons in two steps: locally (L-PAT) and globally (G-PAT). L-PAT is a probabilistic tracking algorithm that can tackle distorted, cluttered images of densely labeled axons. L-PAT divides a large micrograph into smaller image stacks. It then processes each image stack independently before mapping the axons in each image to a sparse model of axon trajectories. G-PAT merges the sparse L-PAT models into a single global model of axon trajectories by minimizing a global objective function using a probabilistic optimization method. We demonstrate the superior performance of PAT over standard approaches on synthetic data. Furthermore, we successfully apply PAT to densely labeled axons in large images acquired from marmoset brains. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. The Augmented Homogeneous Coordinates Matrix-Based Projective Mismatch Removal for Partial-Duplicate Image Search.
- Author
-
Zheng, Yan and Lin, Zhouchen
- Subjects
IMAGE processing ,IMAGE registration ,ARTIFICIAL intelligence ,ALGORITHMS ,BIG data - Abstract
Mismatch removal is a key step in many computer vision problems that involve point matching. The existing methods for checking geometric consistency mainly focus on similarity or affine transformations. In this paper, we propose a novel mismatch removal method that can cope with the projective transformation between two corresponding point sets. Our approach is based on the augmented homogeneous coordinates matrix constructed from the coordinates of anchor matches, whose degeneracy can indicate the correctness of anchor matches. The set of anchor matches is initially all the matches and is iteratively updated by calculating the difference between the estimated matched points, which can be easily computed in a closed form, and the actually matched points and removing those with large differences. Experimental results on synthetic 2D point matching data sets and real image matching data sets verify that our method achieves the highest $F$ -score among all the methods under similarity, affine, and projective transformations with noises and outliers. Our method can also achieve faster speed than all other iterative methods. Those non-iterative methods with slight advantage in speed are not competitive in accuracy when compared with ours. We also show that the set of anchor matches is stable through the iteration and the computation time grows very slowly with respect to the number of matched points. When applied to mismatch removal in partial-duplicate image search, our method achieves the best retrieval precision, and its computing time is also highly competitive. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
16. Geometry-Based Ensembles: Toward a Structural Characterization of the Classification Boundary.
- Author
-
Pujol, Oriol and Masip, David
- Subjects
PATTERN perception ,FEATURE extraction ,COMPUTER vision ,ARTIFICIAL intelligence ,IMAGE processing ,INFORMATION processing - Abstract
Abstract-This paper introduces a novel binary discriminative learning technique based on the approximation of the nonlinear decision boundary by a piecewise linear smooth additive model. The decision border is geometrically defined by means of the characterizing boundary points—points that belong to the optimal boundary under a certain notion of robustness. Based on these points, a set of locally robust linear classifiers is defined and assembled by means of a Tikhonov regularized optimization procedure in an additive model to create a final λ-smooth decision rule. As a result, a very simple and robust classifier with a strong geometrical meaning and nonlinear behavior is obtained. The simplicity of the method allows its extension to cope with some of today's machine learning challenges, such as online learning, large-scale learning or parallelization, with linear computational complexity. We validate our approach on the UCI database, comparing with several state-of-the-art classification techniques. Finally, we apply our technique in online and large-scale scenarios and in six real-life computer vision and pattern recognition problems: gender recognition based on face images, intravascular ultrasound tissue classification, speed traffic sign detection, Chagas' disease myocardial damage severity detection, old musical scores clef classification, and action recognition using 3D accelerometer data from a wearable device. The results are promising and this paper opens a line of research that deserves further attention. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
17. Gentle Nearest Neighbors Boosting over Proper Scoring Rules.
- Author
-
Nock, Richard, Ali, Wafa Bel Haj, DAmbrosio, Roberto, Nielsen, Frank, and Barlaud, Michel
- Subjects
ARTIFICIAL intelligence ,PATTERN recognition systems ,IMAGE processing ,SUN computer peripherals ,ELECTRONIC records - Abstract
Tailoring nearest neighbors algorithms to boosting is an important problem. Recent papers study an approach, unn, which provably minimizes particular convex surrogates under weak assumptions. However, numerical issues make it necessary to experimentally tweak parts of the unn algorithm, at the possible expense of the algorithm’s convergence and performance. In this paper, we propose a lightweight Newton-Raphson alternative optimizing proper scoring rules from a very broad set, and establish formal convergence rates under the boosting framework that compete with those known for unn. To the best of our knowledge, no such boosting-compliant convergence rates were previously known in the popular Gentle Adaboost’s lineage. We provide experiments on a dozen domains, including Caltech and SUN computer vision databases, comparing our approach to major families including support vector machines, (Ada)boosting and stochastic gradient descent. They support three major conclusions: (i) gnnb significantly outperforms unn , in terms of convergence rate and quality of the outputs, (ii) gnnb performs on par with or better than computationally intensive large margin approaches, (iii) on large domains that rule out those latter approaches for computational reasons, gnnb provides a simple and competitive contender to stochastic gradient descent. Experiments include a divide-and-conquer improvement of gnnb exploiting the link with proper scoring rules optimization. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
18. Efficient Intra Mode Selection for Depth-Map Coding Utilizing Spatiotemporal, Inter-Component and Inter-View Correlations in 3D-HEVC.
- Author
-
Shen, Liquan, Li, Kai, Feng, Guorui, An, Ping, and Liu, Zhi
- Subjects
VIDEO coding ,SPATIOTEMPORAL processes ,IMAGE recognition (Computer vision) ,ARTIFICIAL intelligence ,IMAGE processing - Abstract
3D-high efficiency video coding (HEVC) is developed for the compression of the multi-view video plus depth format, which is based on the latest generation of video coding standard, HEVC. It further adopts several new intra prediction modes, depth-modeling modes (DMMs) in intra candidate modes for a better representation of edges in depth maps, which introduces a drastic increase in the computational complexity. The procedure of depth intra mode decision together with DMMs and existing intra modes is a very time consuming part due to huge complexity of full rate distortion (RD) cost calculation. In this paper, a low complexity intra mode selection algorithm is proposed to reduce complexity of depth intra prediction in both intra-frames and inter-frames. An experimental analysis is first performed to study the inter-view correlation and the inter-component (texture video and its associated depth) correlation in intra coding information such as the intra mode and RD cost. All intra modes available in 3D-HEVC are classified into three activity classes assigned with different mode-weight factors, and the coding mode complexity of a coding unit (CU) is defined according to the intra mode information from available spatiotemporal, inter-view, and inter-component neighboring coded CUs. The coding mode complexity analysis is utilized to assign different candidate intra modes for different types of CUs. The optimal intra prediction mode and the RD cost value in current CU depth level are further used to skip unnecessary intra prediction sizes. Experimental results show that the proposed fast depth intra coding algorithm achieves 61% complexity reduction on intra prediction, while incurring a 0.2% Bjontegaard metric increase for coded and synthesized views compared to the test model of 3D-HEVC. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. An Adaptive Patch-Based Reconstruction Scheme for View Synthesis by Disparity Estimation Using Optical Flow.
- Author
-
Rezaee Kaviani, Hoda and Shirani, Shahram
- Subjects
IMAGE processing ,IMAGING systems ,ARTIFICIAL intelligence ,IMAGE quality analysis ,IMAGE reconstruction - Abstract
Due to the rapid growth of technology and the dropping cost of cameras, multiview imaging applications have attracted many researchers in recent years. Free viewpoint and 3D Televisions are among these interesting applications. One of the problems that should be solved to realize such applications is rendering. In this paper, we propose an optical flow-assisted adaptive patch-based view synthesis algorithm. This patch-based scheme reduces the size and number of holes during reconstruction. The size of patch is determined in response to edge information for better reconstruction, especially near the boundaries. In the first stage of the algorithm, disparity is obtained using optical flow estimation. Then, a reconstructed version of the left and right views is generated using our adaptive patch-based algorithm. The mismatches between each view and its reconstructed version are obtained in the mismatch detection steps. This stage results in two masks as outputs, which help with the refinement of disparities and the selection of the best patches for final synthesis. Finally, the remaining holes are filled using our simple hole-filling scheme and the refined disparities. The objective and subjective performances of the proposed algorithm are compared with recent methods. The results show that the proposed algorithm achieves an improvement of 2.14 dB on average. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
20. An Unsupervised Method to Extract Video Object via Complexity Awareness and Object Local Parts.
- Author
-
Luo, Bing, Li, Hongliang, Meng, Fanman, Wu, Qingbo, and Ngan, King N.
- Subjects
IMAGE processing ,VIDEOS ,IMAGE segmentation ,DIGITAL image processing ,ARTIFICIAL intelligence - Abstract
Existing unsupervised video object segmentation generates object information from the whole video, which ignores analysis of the local clips. However, we observe that local clips and their relationships are also useful for the video object segmentation. For example, the simple background clips can be used to improve the segmentation of complex background clips. In this paper, we propose a novel unsupervised segmentation framework to segment the primary object based on two aspects, i.e., the complexity awareness of video clips and their segmentation propagation. The first one is used to select the simple clips with smooth backgrounds and the second one generates an object prior from the simple clips and propagates the object prior to help and improve the segmentation of the complex clips. A complexity awareness method using the static cues and the dynamic cues are proposed to evaluate the complexity of the video frames. A new object prior learning model based on the local part structure is designed and a local part-based prior propagation is proposed for the complex clip segmentation. To verify our method, we collect a new challenging video segmentation data set, in which each video contains diverse backgrounds. Experimental results demonstrate that our method outperforms several state-of-the-art methods both on a classical data set and our new data set. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
21. Ensemble Subspace Segmentation Under Blockwise Constraints.
- Author
-
Zhao, Handong, Ding, Zhengming, and Fu, Yun
- Subjects
COMPUTER algorithms ,IMAGE processing ,GRAPH theory ,GRAPHIC methods ,ARTIFICIAL intelligence - Abstract
The graph-based subspace segmentation technique has garnered a lot of attention in the visual data representation problem. In general, data (e.g., tracks of moving objects) are drawn from multiple linear subspaces. Thus, how to build a block-diagonal affinity matrix is the critical problem. In this paper, we propose a novel graph-based method, Ensemble Subspace Segmentation under Blockwise constraints (ESSB), which unifies least squares regression and a locality preserving graph regularizer into an ensemble learning framework. Specifically, compact encoding using least squares regression coefficients helps achieve a block-diagonal representation matrix among all samples. Meanwhile, the locality preserving regularizer tends to capture the intrinsic local structure, which further enhances the block-diagonal property. Both the blockwise efforts, i.e., least squares regression and the sparse regularizer, work jointly and are formulated in the ensemble learning framework, making ESSB more robust and efficient, especially when handling high-dimensional data. Finally, an efficient optimization solution based on inexact augmented Lagrange multiplier is derived with theoretical time complexity analysis. To demonstrate the effectiveness of the proposed method, we consider three different applications: face clustering, object clustering, and motion segmentation. Extensive results of both accuracy and normalized mutual information on four benchmarks, i.e., YaleB, ORL, COIL and Hopkins155, are reported. Also, the evaluations of computational cost are provided, based on which the superiority of our proposed method in both accuracy and efficiency is demonstrated compared with 12 baseline algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
22. Spatiotemporal Low-Rank Modeling for Complex Scene Background Initialization.
- Author
-
Javed, Sajid, Mahmood, Arif, Bouwmans, Thierry, and Jung, Soon Ki
- Subjects
IMAGE processing ,COMPUTER vision ,VIDEOS ,ARTIFICIAL intelligence ,BIG data - Abstract
Background modeling constitutes the building block of many computer-vision tasks. Traditional schemes model the background as a low rank matrix with corrupted entries. These schemes operate in batch mode and do not scale well with the data size. Moreover, without enforcing spatiotemporal information in the low-rank component, and because of occlusions by foreground objects and redundancy in video data, the design of a background initialization method robust against outliers is very challenging. To overcome these limitations, this paper presents a spatiotemporal low-rank modeling method on dynamic video clips for estimating the robust background model. The proposed method encodes spatiotemporal constraints by regularizing spectral graphs. Initially, a motion-compensated binary matrix is generated using optical flow information to remove redundant data and to create a set of dynamic frames from the input video sequence. Then two graphs are constructed, one between frames for temporal consistency and the other between features for spatial consistency, to encode the local structure for continuously promoting the intrinsic behavior of the low-rank model against outliers. These two terms are then incorporated in the iterative Matrix Completion framework for improved segmentation of background. Rigorous evaluation on severely occluded and dynamic background sequences demonstrates the superior performance of the proposed method over state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. Phantomless Auto-Calibration and Online Calibration Assessment for a Tracked Freehand 2-D Ultrasound Probe.
- Author
-
Toews, Matthew and Wells, William M.
- Subjects
MAGNETIC resonance imaging of the brain ,IMAGING phantoms ,IMAGE quality in radiography ,COMPUTED tomography ,ARTIFICIAL intelligence - Abstract
This paper presents a method for automatically calibrating and assessing the calibration quality of an externally tracked 2-D ultrasound (US) probe by scanning arbitrary, natural tissues, as opposed a specialized calibration phantom as is the typical practice. A generative topic model quantifies the posterior probability of calibration parameters conditioned on local 2-D image features arising from a generic underlying substrate. Auto-calibration is achieved by identifying the maximum a-posteriori image-to-probe transform, and calibration quality is assessed online in terms of the posterior probability of the current image-to-probe transform. Both are closely linked to the 3-D point reconstruction error (PRE) in aligning feature observations arising from the same underlying physical structure in different US images. The method is of practical importance in that it operates simply by scanning arbitrary textured echogenic structures, e.g., in-vivo tissues in the context of the US-guided procedures, without requiring specialized calibration procedures or equipment. Observed data take the form of local scale-invariant features that can be extracted and fit to the model in near real-time. Experiments demonstrate the method on a public data set of in vivo human brain scans of 14 unique subjects acquired in the context of neurosurgery. Online calibration assessment can be performed at approximately 3 Hz for the US images of $640\times 480$ pixels. Auto-calibration achieves an internal mean PRE of 1.2 mm and a discrepancy of [2 mm, 6 mm] in comparison to the calibration via a standard phantom-based method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
24. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors.
- Subjects
MANUSCRIPTS ,ARTIFICIAL intelligence ,GEODESICS ,COMPUTER vision ,IMAGE processing - Abstract
Provides instructions and guidelines to prospective authors who wish to submit manuscripts. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
25. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge.
- Author
-
Vinyals, Oriol, Toshev, Alexander, Bengio, Samy, and Erhan, Dumitru
- Subjects
IMAGE processing ,ARTIFICIAL intelligence ,NATURAL language processing ,COMPUTER vision ,MATHEMATICAL models ,MACHINE translating - Abstract
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. Finally, given the recent surge of interest in this task, a competition was organized in 2015 using the newly released COCO dataset. We describe and analyze the various improvements we applied to our own baseline and show the resulting performance in the competition, which we won ex-aequo with a team from Microsoft Research. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
26. Online Scheme for Multiple Camera Multiple Target Tracking Based on Multiple Hypothesis Tracking.
- Author
-
Yoo, Haanju, Kim, Kikyung, Byeon, Moonsub, Jeon, Younghan, and Choi, Jin Young
- Subjects
CAMERAS ,ALGORITHMS ,COMPUTER vision ,ARTIFICIAL intelligence ,IMAGE processing - Abstract
We propose an online tracking algorithm for multiple target tracking with multiple cameras. In this paper, we suggest a multiple hypothesis tracking (MHT) framework to find an unknown number of multiple tracks through the spatio-temporal association between tracklets generated from multiple cameras. In this framework, the MHT is realized online by solving the maximum weighted clique problem (MWCP) at every frame to estimate the 3D trajectories of the targets. To handle the NP-hard issue of the MWCP, we propose a novel online scheme that formulates the MWCP using feedback information from the previous frame’s result to find optimal tracks at every frame. This scheme enables the MWCP to be formulated by multiple subproblems and will significantly reduce the computation. The experiments show that the proposed algorithm performs comparably with the state-of-the-art batch algorithms, even though it adopts an online scheme. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
27. Structure and Motion Recovery Based on Spatial-and-Temporal-Weighted Factorization.
- Author
-
Guanghui Wang, Zelek, John S., and Wu, Q. M. Jonathan
- Subjects
FOCUS (Optics) ,COMPUTER vision ,IMAGE processing ,EMPIRICAL research ,ARTIFICIAL intelligence ,IMAGE reconstruction ,FACTORIZATION - Abstract
This paper focuses on the problem of structure and motion recovery from uncalibrated image sequences. It has been empirically proven that image measurement uncertainties can be modeled spatially and temporally by virtue of reprojection residuals. Consequently, a spatial-and-temporal-weighted factorization (STWF) algorithm is proposed to handle significant noise contained in the tracking data. This paper presents three novelties and contributions. First, the image reprojection residual of a feature point is demonstrated to be generally proportional to the error magnitude associated with the image point. Second, the error distributions are estimated from a different perspective, that of the reprojection residuals. The image errors are modeled both spatially and temporally to cope with different kinds of uncertainties. Previous studies have considered only the spatial information. Third, based on the estimated error distributions, an STWF algorithm is proposed to improve the overall accuracy and robustness of traditional approaches. Unlike existing approaches, the proposed technique does not require prior information of image measurement and is easy to implement. Extensive experiments on synthetic data and real images validate the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
28. Spectral Clustering Ensemble Applied to SAR Image Segmentation.
- Author
-
Xiangrong Zhang, Licheng Jiao, Fang Liu, Liefeng Bo, and Maoguo Gong
- Subjects
ARTIFICIAL intelligence ,ELECTRONIC pulse techniques ,IMAGE processing ,PATTERN recognition systems ,SYNTHETIC aperture radar ,COMPUTER vision ,IMAGING systems ,COHERENT radar - Abstract
Spectral clustering (SC) has been used with success in the field of computer vision for data clustering. In this paper, a new algorithm named SC ensemble (SCE) is proposed for the segmentation of synthetic aperture radar. (SAR) images. The gray-level cooccurrence matrix-based statistic features and the energy features from the undecimated wavelet decomposition extracted for each pixel being the input, our algorithm performs segmentation by combining multiple SC results as opposed to using outcomes of a single clustering process in the existing literature. The random subspace, random scaling parameter, and Nyström approximation for component SC are applied to construct the SCE. This technique provides necessary diversity as well as high quality of component learners for an efficient ensemble. It also overcomes the shortcomings faced by the SC, such as the selection of scaling parameter, and the instability resulted from the Nyström approximation method in image segmentation. Experimental results show that the proposed method is effective for SAR image segmentation and insensitive to the scaling parameter. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
29. Segmentation by Fusion of Histogram-Based K-Means Clusters in Different Color Spaces.
- Author
-
Mignotte, Max
- Subjects
IMAGE quality analysis ,IMAGE reconstruction ,IMAGE stabilization ,IMAGE processing ,COMPUTER vision ,ARTIFICIAL intelligence - Abstract
This paper presents a new, simple, and efficient segmentation approach, based on a fusion procedure which aims at combining several segmentation maps associated to simpler partition models in order to finally get a more reliable and accurate segmentation result. The different label fields to be fused in our application are given by the same and simple (K-means based) clustering technique on an input image expressed in different color spaces. Our fusion strategy aims at combining these segmentation maps with a final clustering procedure using as input features, the local histogram of the class labels, previously estimated and associated to each site and for all these initial partitions. This fusion framework remains simple to implement, fast, general enough to be applied to various computer vision applications (e.g., motion detection and segmentation), and has been successfully applied on the Berkeley image database. The experiments herein reported in this paper illustrate the potential of this approach compared to the state-of-the-art segmentation methods recently proposed in the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
30. Multiview Photometric Stereo.
- Author
-
Hernández, Carlos, Vogiatzis, George, and Cipolla, Roberto
- Subjects
PATTERN recognition systems ,PATTERN perception ,COMPUTER vision ,MACHINE learning ,ARTIFICIAL intelligence ,OPTICAL pattern recognition ,PHOTOMETRY ,IMAGE reconstruction ,IMAGE processing - Abstract
This paper addresses the problem of obtaining complete, detailed reconstructions of textureless shiny objects. We present an algorithm which uses silhouettes of the object, as well as images obtained under changing illumination conditions. In contrast with previous photometric stereo techniques, ours is not limited to a single viewpoint but produces accurate reconstructions in full 3D. A number of images of the object are obtained from multiple viewpoints, under varying lighting conditions. Starting from the silhouettes, the algorithm recovers camera motion and constructs the object's visual hull. This is then used to recover the illumination and initialize a multiview photometric stereo scheme to obtain a closed surface reconstruction. There are two main contributions in this paper: First, we describe a robust technique to estimate light directions and intensities and, second, we introduce a novel formulation of photometric stereo which combines multiple viewpoints and, hence, allows closed surface reconstructions. The algorithm has been implemented as a practical model acquisition system. Here, a quantitative evaluation of the algorithm on synthetic data is presented together with complete reconstructions of challenging real objects. Finally, we show experimentally how, even in the case of highly textured objects, this technique can greatly improve on correspondence-based multiview stereo results. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
31. A Theory of Frequency Domain Invariants: Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency.
- Author
-
Mahajan, Dhruv, Ramamoorthi, Ravi, and Curless, Brian
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,IMAGE processing ,PATTERN recognition systems ,OPTICAL reflection ,SPHERICAL harmonics ,LIGHTING ,AMBIENCE (Environment) ,ALGORITHMS - Abstract
This paper develops a theory of frequency domain invariants in computer vision. We derive novel identities using spherical harmonics, which are the angular frequency domain analog to common spatial domain invariants such as reflectance ratios. These invariants are derived from the spherical harmonic convolution framework for reflection from a curved surface. Our identities apply in a number of canonical cases, including single and multiple images of objects under the same and different lighting conditions. One important case we consider is two different glossy objects in two different lighting environments. For this case, we derive a novel identity, independent of the specific lighting configurations or BRDFs, that allows us to directly estimate the fourth image if the other three are available. The identity can also be used as an invariant to detect tampering in the images. Although this paper is primarily theoretical, it has the potential to lay the mathematical foundations for two important practical applications. First, we can develop more general algorithms for inverse rendering problems, which can directly relight and change material properties by transferring the BRDF or lighting from another object or illumination. Second, we can check the consistency of an image to detect tampering or image splicing. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
32. Vision Processing for Realtime 3-D Data Acquisition Based on Coded Structured Light.
- Author
-
Chen, S. Y., Li, Y. F., and Jianwei Zhang
- Subjects
COMPUTER vision ,IMAGE processing ,CODING theory ,ARTIFICIAL intelligence ,IMAGE analysis - Abstract
Structured light vision systems have been successfully used for accurate measurement of 3-D surfaces in computer vision. However, their applications are mainly limited to scanning stationary objects so far since tens of images have to be captured for recovering one 3-D scene. This paper presents an idea for real-time acquisition of 3-D surface data by a specially coded vision system. To achieve 3-D measurement for a dynamic scene, the data acquisition must be performed with only a single image. A principle of uniquely color-encoded pattern projection is proposed to design a color matrix for improving the reconstruction efficiency. The matrix is produced by a special code sequence and a number of state transitions. A color projector is controlled by a computer to generate the desired color patterns in the scene. The unique indexing of the light codes is crucial here for color projection since it is essential that each light grid be uniquely identified by incorporating local neighborhoods so that 3-D reconstruction can be performed with only local analysis of a single image. A scheme is presented to describe such a vision processing method for fast 3-D data acquisition. Practical experimental performance is provided to analyze the efficiency of the proposed methods. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
33. Unsupervised, Information-Theoretic, Adaptive Image Filtering for Image Restoration.
- Author
-
Awate, Suyash P. and Whitaker, Ross T.
- Subjects
IMAGE reconstruction ,IMAGE processing ,COMPUTER vision ,ARTIFICIAL intelligence ,IMAGING systems ,PATTERN recognition systems - Abstract
Image restoration is an important and widely studied problem in computer vision and image processing. Various image filtering strategies have been effective, but invariably make strong assumptions about the properties of the signal and/or degradation. Hence, these methods lack the generality to be easily applied to new applications or diverse image collections. This paper describes a novel unsupervised, information-theoretic, adaptive filter (UINTA) that improves the predictability of pixel intensities from their neighborhoods by decreasing their joint entropy. In this way, UINTA automatically discovers the statistical properties of the signal and can thereby restore a wide spectrum of images. The paper describes the formulation to minimize the joint entropy measure and presents several important practical considerations in estimating neighborhood statistics. It presents a series of results on both real and synthetic data along with comparisons with current state-of-the-art techniques, including novel applications to medical image processing. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
34. Artificial Neural Networks for Document Analysis and Recognition.
- Author
-
Marinal, Simone, Gori, Marco, and Soda, Giovanni
- Subjects
ARTIFICIAL intelligence ,NEURAL circuitry ,ARTIFICIAL neural networks ,IMAGE processing ,DOCUMENT imaging systems ,COMPUTER graphics - Abstract
Artificial neural networks have been extensively applied to document analysis and recognition. Most efforts have been devoted to the recognition of isolated handwritten and printed characters with widely recognized successful results. However, many other document processing tasks, like preprocessing, layout analysis, character segmentation, word recognition, and signature verification, have been effectively faced with very promising results. This paper surveys the most significant problems in the area of offline document image processing, where connectionist-based approaches have been applied. Similarities and differences between approaches belonging to different categories are discussed. A particular emphasis is given on the crucial role of prior knowledge for the conception of both appropriate architectures and learning algorithms. Finally, the paper provides a critical analysis on the reviewed approaches and depicts the most promising research guidelines in the field. In particular, a second generation of connectionist-based models are foreseen which are based on appropriate graphical representations of the learning environment. [ABSTRACT FROM AUTHOR]
- Published
- 2005
35. Panoramic Appearance-Based Recognition of Video Contents Using Matching Graphs.
- Author
-
Chu-Song Chen, Wen-Teng Hsieh, Wayne J., and Jiun-Hung Chen
- Subjects
COMPUTER vision ,IMAGE processing ,PATTERN recognition systems ,DATABASES ,ARTIFICIAL intelligence ,IMAGING systems - Abstract
This paper proposes a general scheme for recognizing the contents of a video using a set of panoramas recorded in a data-base. In essence, a panorama inherently records the appearances of an omni-directional scene from its central point to arbitrary viewing directions and, thus, can serve as a compact representation of an environment. In particular, this paper emphasizes the use of a sequence of successive frames in a video taken with a video camera, instead of a single frame, for visual recognition. The associated recognition task is formulated as a shortest-path searching problem, and a dynamic-programming technique is used to solve it. Experimental results show that our method can effectively recognize a video. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
36. Stereo Reconstruction from Multiperspective Panoramas.
- Author
-
Yin Li, Heung-Yeung Shum, Chi-Keung Tang, and Szeliski, Richard
- Subjects
IMAGE processing ,PANORAMIC photography ,ARTIFICIAL intelligence ,COMPUTER simulation - Abstract
A new approach to computing a panoramic (360 degrees) depth map is presented in this paper. Our approach uses a large collection of images taken by a camera whose motion has been constrained to planar concentric circles. We resample regular perspective images to produce a set of multiperspective panoramas and then compute depth maps directly from these resampled panoramas. Our panoramas sample uniformly in three dimensions: rotation angle, inverse radial distance, and vertical elevation. The use of multiperspective panoramas eliminates the limited overlap present in the original input images and, thus, problems as in conventional multibaseline stereo can be avoided. Our approach differs from stereo matching of single-perspective panoramic images taken from different locations, where the epipolar constraints are sine curves. For our multiperspective panoramas, the epipolar geometry, to the first order approximation, consists of horizontal lines. Therefore, any traditional stereo algorithm can be applied to multiperspective panoramas with little modification. In this paper, we describe two reconstruction algorithms. The first is a cylinder sweep algorithm that uses a small number of resampled multiperspective panoramas to obtain dense 3D reconstruction. The second algorithm, in contrast, uses a large number of multiperspective panoramas and takes advantage of the approximate horizontal epipolar geometry inherent in multiperspective panoramas. It comprises a novel and efficient 1D multibaseline matching technique, followed by tensor voting to extract the depth surface. Experiments show that our algorithms are capable of producing comparable high quality depth maps which can be used for applications such as view interpolation. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
37. Shape and Reflectance Estimation in the Wild.
- Author
-
Oxholm, Geoffrey and Nishino, Ko
- Subjects
LIGHTING ,REFLECTANCE ,IMAGE processing ,MACHINE learning ,ARTIFICIAL intelligence - Abstract
Our world is full of objects with complex reflectances situated in rich illumination environments. Though stunning, the diversity of appearance that arises from this complexity is also daunting. For this reason, past work on geometry recovery has tried to frame the problem into simplistic models of reflectance (such as Lambertian, mirrored, or dichromatic) or illumination (one or more distant point light sources). In this work, we directly tackle the problem of joint reflectance and geometry estimation under known but uncontrolled natural illumination by fully exploiting the surface orientation cues that become embedded in the appearance of the object. Intuitively, salient scene features (such as the sun or stained glass windows) act analogously to the point light sources of traditional geometry estimation frameworks by strongly constraining the possible orientations of the surface patches reflecting them. By jointly estimating the reflectance of the object, which modulates the illumination, the appearance of a surface patch can be used to derive a nonparametric distribution of its possible orientations. If only a single image exists, these strongly constrained surface patches may then be used to anchor the geometry estimation and give context to the less-descriptive regions. When multiple images exist, the distribution of possible surface orientations becomes tighter as additional context is given, though integrating the separate views poses additional challenges. In this paper we introduce two methods, one for the single image case, and another for the case of multiple images. The effectiveness of our methods is evaluated extensively on synthetic and real-world data sets that span the wide range of real-world environments and reflectances that lies between the extremes that have been the focus of past work. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
38. A Generalized Probabilistic Framework for Compact Codebook Creation.
- Author
-
Liu, Lingqiao, Wang, Lei, and Shen, Chunhua
- Subjects
VISUAL perception ,PROBABILISTIC number theory ,IMAGE processing ,ARTIFICIAL intelligence ,BAYESIAN analysis - Abstract
Compact and discriminative visual codebooks are preferred in many visual recognition tasks. In the literature, a number of works have taken the approach of hierarchically merging visual words of an initial large-sized codebook, but implemented this approach with different merging criteria. In this work, we propose a single probabilistic framework to unify these merging criteria, by identifying two key factors: the function used to model the class-conditional distribution and the method used to estimate the distribution parameters. More importantly, by adopting new distribution functions and/or parameter estimation methods, our framework can readily produce a spectrum of novel merging criteria. Three of them are specifically discussed in this paper. For the first criterion, we adopt the multinomial distribution with the Bayesian method; For the second criterion, we integrate the Gaussian distribution with maximum likelihood parameter estimation. For the third criterion, which shows the best merging performance, we propose a max-margin-based parameter estimation method and apply it with the multinomial distribution. Extensive experimental study is conducted to systematically analyze the performance of the above three criteria and compare them with existing ones. As demonstrated, the best criterion within our framework achieves the overall best merging performance among the compared merging criteria developed in the literature. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
39. Cover3.
- Subjects
PERIODICAL publishing ,PATTERN recognition systems ,ARTIFICIAL intelligence ,COMPUTER vision ,IMAGE processing ,PUBLISHING - Published
- 2011
- Full Text
- View/download PDF
40. Cover3.
- Subjects
MAGAZINE covers ,PATTERN perception ,ARTIFICIAL intelligence ,TECHNOLOGY periodical publishing ,PUBLISHING ,IMAGE processing ,IMAGE retrieval ,INFORMATION technology - Published
- 2011
- Full Text
- View/download PDF
41. Cover3.
- Subjects
ARTIFICIAL intelligence ,PATTERN perception ,MACHINE learning ,EDITORS ,IMAGE processing ,PALEOGRAPHY - Published
- 2011
- Full Text
- View/download PDF
42. Deep Mixture of Diverse Experts for Large-Scale Visual Recognition.
- Author
-
Zhao, Tianyi, Chen, Qiuyu, Kuang, Zhenzhong, Yu, Jun, Zhang, Wei, and Fan, Jianping
- Subjects
MACHINE learning ,ARTIFICIAL intelligence ,IMAGE processing ,ATOMIC layer deposition ,X-ray diffraction - Abstract
In this paper, a deep mixture of diverse experts algorithm is developed to achieve more efficient learning of a huge (mixture) network for large-scale visual recognition application. First, a two-layer ontology is constructed to assign large numbers of atomic object classes into a set of task groups according to the similarities of their learning complexities, where certain degrees of inter-group task overlapping are allowed to enable sufficient inter-group message passing. Second, one particular base deep CNNs with $M+1$ M + 1 outputs is learned for each task group to recognize its $M$ M atomic object classes and identify one special class of “not-in-group”, where the network structure (numbers of layers and units in each layer) of the well-designed deep CNNs (such as AlexNet, VGG, GoogleNet, ResNet) is directly used to configure such base deep CNNs. For enhancing the separability of the atomic object classes in the same task group, two approaches are developed to learn more discriminative base deep CNNs: (a) our deep multi-task learning algorithm that can effectively exploit the inter-class visual similarities; (b) our two-layer network cascade approach that can improve the accuracy rates for the hard object classes at certain degrees while effectively maintaining the high accuracy rates for the easy ones. Finally, all these complementary base deep CNNs with diverse but overlapped outputs are seamlessly combined to generate a mixture network with larger outputs for recognizing tens of thousands of atomic object classes. Our experimental results have demonstrated that our deep mixture of diverse experts algorithm can achieve very competitive results on large-scale visual recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
43. Long-Short-Term Features for Dynamic Scene Classification.
- Author
-
Huang, Yuanjun, Cao, Xianbin, Wang, Qi, Zhang, Baochang, Zhen, Xiantong, and Li, Xuelong
- Subjects
ARTIFICIAL intelligence ,ARTIFICIAL neural networks ,IMAGE processing ,SUPPORT vector machines ,MACHINE learning - Abstract
Dynamic scene classification has been extensively studied in computer vision due to its widespread applications. The key to dynamic scene classification lies in jointly characterizing spatial appearance and temporal dynamics to achieve informative representation, which remains an outstanding task in the literature. In this paper, we propose a unified framework to extract spatial and temporal features for dynamic scene representation. More specifically, we deploy two variants of deep convolutional neural networks to encode spatial appearance and short-term dynamics into short-term deep features (STDF). Based on STDF, we propose using the autoregressive moving average model to extract long-term frequency features (LTFF). By combining STDF and LTFF, we establish the long–short-term feature (LSTF) representations of dynamic scenes. The LSTF characterizes both spatial and temporal patterns of dynamic scenes for comprehensive and information representation that enables more accurate classification. Extensive experiments on three-dynamic scene classification benchmarks have shown that the proposed LSTF achieves high performance and substantially surpasses the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
44. Sampling at Unknown Locations: Uniqueness and Reconstruction Under Constraints.
- Author
-
Elhami, Golnoosh, Pacholska, Michalina, Haro, Benjamin Bejar, Vetterli, Martin, and Scholefield, Adam
- Subjects
SIGNAL processing ,SIGNAL theory ,IMAGE processing ,WIRELESS sensor networks ,MACHINE learning ,ARTIFICIAL intelligence - Abstract
Traditional sampling results assume that the sample locations are known. Motivated by simultaneous localization and mapping (SLAM) and structure from motion (SfM), we investigate sampling at unknown locations. Without further constraints, the problem is often hopeless. For example, we recently showed that, for polynomial and bandlimited signals, it is possible to find two signals, arbitrarily far from each other, that fit the measurements. However, we also showed that this can be overcome by adding constraints to the sample positions. In this paper, we show that these constraints lead to a uniform sampling of a composite of functions. Furthermore, the formulation retains the key aspects of the SLAM and SfM problems, whilst providing uniqueness, in many cases. We demonstrate this by studying two simple examples of constrained sampling at unknown locations. In the first, we consider sampling a periodic bandlimited signal composite with an unknown linear function. We derive the sampling requirements for uniqueness and present an algorithm that recovers both the bandlimited signal and the linear warping. Furthermore, we prove that, when the requirements for uniqueness are not met, the cases of multiple solutions have measure zero. For our second example, we consider polynomials sampled such that the sampling positions are constrained by a rational function. We previously proved that, if a specific sampling requirement is met, uniqueness is achieved. In addition, we present an alternate minimization scheme for solving the resulting non-convex optimization problem. Finally, fully reproducible simulation results are provided to support our theoretical analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
45. Epipolar Geometry Estimation for Urban Scenes with Repetitive Structures.
- Author
-
Kushnir, Maria and Shimshoni, Ilan
- Subjects
IMAGE recognition (Computer vision) ,CONSTRUCTION ,COMPUTERS in geometry ,ARTIFICIAL intelligence ,COMPUTER algorithms - Abstract
Algorithms for the estimation of epipolar geometry from a pair of images have been very successful in dealing with challenging wide baseline images. In this paper the problem of scenes with repeated structures is addressed, dealing with the common case where the overlap between the images consists mainly of facades of a building. These facades may contain many repeated structures that can not be matched locally, causing state-of-the-art algorithms to fail. Assuming that the repeated structures lie on a planar surface in an ordered fashion the goal is to match them. Our algorithm first rectifies the images such that the facade is fronto-parallel. It then clusters similar features in each of the two images and matches the clusters. From them a set of hypothesized homographies of the facade is generated, using local groups of features. For each homography the epipole is recovered, yielding a fundamental matrix. For the best solution, it then decides whether the fundamental matrix has been recovered reliably and, if not, returns only the homography. The algorithm has been tested on a large number of challenging image pairs of buildings from the benchmark ZuBuD database, outperforming several state-of-the-art algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
46. Visual Tracking: An Experimental Survey.
- Author
-
Smeulders, Arnold W. M., Chu, Dung M., Cucchiara, Rita, Calderara, Simone, Dehghan, Afshin, and Shah, Mubarak
- Subjects
PATTERN recognition systems ,IMAGE processing ,BIG data ,LIGHTING ,ARTIFICIAL intelligence ,OBJECT tracking (Computer vision) ,COMPUTER network resources - Abstract
There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
47. What Makes a Photograph Memorable?
- Author
-
Isola, Phillip, Xiao, Jianxiong, Parikh, Devi, Torralba, Antonio, and Oliva, Aude
- Subjects
WEB browsing ,PRINT materials ,ARTIFICIAL intelligence ,IMAGE processing ,DATABASES - Abstract
When glancing at a magazine, or browsing the Internet, we are continuously exposed to photographs. Despite this overflow of visual information, humans are extremely good at remembering thousands of pictures along with some of their visual details. But not all images are equal in memory. Some stick in our minds while others are quickly forgotten. In this paper, we focus on the problem of predicting how memorable an image will be. We show that memorability is an intrinsic and stable property of an image that is shared across different viewers, and remains stable across delays. We introduce a database for which we have measured the probability that each picture will be recognized after a single view. We analyze a collection of image features, labels, and attributes that contribute to making an image memorable, and we train a predictor based on global image descriptors. We find that predicting image memorability is a task that can be addressed with current computer vision techniques. While making memorable images is a challenging task in visualization, photography, and education, this work is a first attempt to quantify this useful property of images. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
48. Interactive Stereoscopic Video Conversion.
- Author
-
Zhang, Zhebin, Zhou, Chen, Wang, Yizhou, and Gao, Wen
- Subjects
IMAGE processing ,SIGNAL processing ,STEREOSCOPIC views ,MOTION analysis ,COMPUTER vision ,ARTIFICIAL intelligence ,COMPUTER algorithms - Abstract
This paper presents a system of converting conventional monocular videos to stereoscopic ones. In the system, an input monocular video is firstly segmented into shots so as to reduce operations on similar frames. An automatic depth estimation method is proposed to compute the depth maps of the video frames utilizing three monocular depth cues—depth-from-defocus, aerial perspective, and motion. Foreground/background objects can be interactively segmented on selected key frames and their depth values can be adjusted by users. Such results are propagated from key frames to nonkey frames within each video shot. Equipped with a depth-to-disparity conversion module, the system synthesizes the counterpart (either left or right) view for stereoscopic display by warping the original frames according to their disparity maps. The quality of converted videos is evaluated by human mean opinion scores, and experiment results demonstrate that the proposed conversion method achieves encouraging performance. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
49. Integrating Orientation Cue With EOH-OLBP-Based Multilevel Features for Human Detection.
- Author
-
Ma, Yingdong, Deng, Liang, Chen, Xiankai, and Guo, Ning
- Subjects
IMAGE processing ,SIGNAL processing ,MOTION analysis ,COMPUTER vision ,ARTIFICIAL intelligence ,COMPUTER algorithms ,PATTERN recognition systems - Abstract
Detecting pedestrians efficiently and accurately is a fundamental step for many computer vision applications, such as smart cars and robotics. In this paper, we introduce a pedestrian detection system to extract human objectives using an on-board monocular camera. First of all, we use an experiment to demonstrate that the orientation information is critical in human detection. Secondly, the local binary patterns-based feature, oriented LBP (OLBP), is discussed. The OLBP feature integrates pixel intensity difference with texture orientation information to capture salient object features. Thirdly, a set of edge orientation histogram (EOH) and OLBP-based intrablock and interblock features is presented to describe cell-level and block-level structure information. These multilevel features capture larger-scale structure information which is more informative for pedestrian localization. Experiments on the Institut national de recherche en informatique et en automatique (INRIA) dataset and the Caltech pedestrian detection benchmark demonstrate that the new pedestrian detection system is not only comparable to the existing pedestrian detectors, but also performs at a faster speed. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
50. Robust Target Tracking by Online Random Forests and Superpixels.
- Author
-
Wang, Wei, Wang, Chunping, Liu, Si, Zhang, Tianzhu, and Cao, Xiaochun
- Subjects
COMPUTER vision ,OBJECT tracking (Computer vision) ,IMAGE recognition (Computer vision) ,IMAGE processing ,ARTIFICIAL intelligence - Abstract
This paper presents a robust joint discriminative appearance model-based tracking method using online random forests and mid-level feature (superpixels). To achieve superpixel-wise discriminative ability, we propose a joint appearance model that consists of two random forest-based models, i.e., the background-target discriminative model (BTM) and the distractor-target discriminative model (DTM). More specifically, the BTM effectively learns discriminative information between the target object and the background. In contrast, the DTM is used to suppress distracting superpixels, which significantly improves the tracker’s robustness and alleviates the drifting problem. A novel online random forest regression algorithm is proposed to build the two models. The BTM and DTM are linearly combined into a joint model to compute a confidence map. Tracking results are estimated using the confidence map, in which the position and scale of the target are estimated orderly. Furthermore, we design a model updating strategy to adapt the appearance changes over time by discarding degraded trees of the BTM and DTM and initializing new trees as replacements. We test the proposed tracking method on two large tracking benchmarks, the CVPR2013 tracking benchmark and VOT2014 tracking challenge. Experimental results show that the tracker runs at real-time speed and achieves favorable tracking performance compared with the state-of-the-art methods. The results also suggest that the DTM improves tracking performance significantly and plays an important role in robust tracking. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.