170 results
Search Results
2. Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and Prospects.
- Author
-
Warner, Elisa, Lee, Joonsang, Hsu, William, Syeda-Mahmood, Tanveer, Kahn Jr., Charles E., Gevaert, Olivier, and Rao, Arvind
- Subjects
- *
CLINICAL decision support systems , *DECISION support systems , *MACHINE learning , *ARTIFICIAL intelligence , *IMAGE analysis , *DEEP learning - Abstract
Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models. This survey navigates the current landscape of multimodal ML, focusing on its profound impact on medical image analysis and clinical decision support systems. Emphasizing challenges and innovations in addressing multimodal representation, fusion, translation, alignment, and co-learning, the paper explores the transformative potential of multimodal models for clinical predictions. It also highlights the need for principled assessments and practical implementation of such models, bringing attention to the dynamics between decision support systems and healthcare providers and personnel. Despite advancements, challenges such as data biases and the scarcity of "big data" in many biomedical domains persist. We conclude with a discussion on principled innovation and collaborative efforts to further the mission of seamless integration of multimodal ML models into biomedical practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Editorial.
- Author
-
Nielse, Mads, Niessen, Wiro, and Westin, Carl-Fredrik
- Subjects
ARTIFICIAL intelligence ,IMAGE processing - Abstract
The article introduces the 2009 issue of the "International Journal of Computer Vision."
- Published
- 2009
- Full Text
- View/download PDF
4. Artificial Intelligence for Dunhuang Cultural Heritage Protection: The Project and the Dataset.
- Author
-
Yu, Tianxiu, Lin, Cong, Zhang, Shijie, Wang, Chunxue, Ding, Xiaohong, An, Huili, Liu, Xiaoxiang, Qu, Ting, Wan, Liang, You, Shaodi, Wu, Jian, and Zhang, Jiawan
- Subjects
ARTIFICIAL intelligence ,PROTECTION of cultural property ,CULTURAL property ,SILK Road ,CULTURAL intelligence ,IMAGE reconstruction ,COMPUTER vision - Abstract
In this work, we introduce our project on Dunhuang cultural heritage protection using artificial intelligence. The Dunhuang Mogao Grottoes in China, also known as the Grottoes of the Thousand Buddhas, is a religious and cultural heritage located on the Silk Road. The grottoes were built from the 4th century to the 14th century. After thousands of years, the in grottoes decaying is serious. In addition, numerous historical records were destroyed throughout the years, making it difficult for archaeologists to reconstruct history. We aim to use modern computer vision and machine learning technologies to solve such challenges. First, we propose to use deep networks to automatically perform the restoration. Through out experiments, we find the automated restoration can provide comparable quality as those manually restored from an archaeologist. This can significantly speed up the restoration given the enormous size of the historical paintings. Second, we propose to use detection and retrieval for further analyzing the tremendously large amount of objects because it is unreasonable to manually label and analyze them. Several state-of-the-art methods are rigorously tested and quantitatively compared in different criteria and categorically. In this work, we created a new dataset, namely, AI for Dunhuang, to facilitate the research. Version v1.0 of the dataset comprises of data and label for the restoration, style transfer, detection, and retrieval. Specifically, the dataset has 10,000 images for restoration, 3455 for style transfer, and 6147 for property retrieval. Lastly, we propose to use style transfer to link and analyze the styles over time, given that the grottoes were build over 1000 years by numerous artists. This enables the possibly to analyze and study the art styles over 1000 years and further enable future researches on cross-era style analysis. We benchmark representative methods and conduct a comparative study on the results for our solution. The dataset will be publicly available along with this paper. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Special Issue on Generating Realistic Visual Data of Human Behavior.
- Author
-
Alameda-Pineda, Xavier, Ricci, Elisa, Salah, Albert Ali, Sebe, Nicu, and Yan, Shuicheng
- Subjects
HUMAN behavior ,BEHAVIORAL assessment ,FACIAL expression ,HUMAN body ,VISUAL analytics ,ARTIFICIAL intelligence - Abstract
The fast and broad progress in AI has not only enabled great advances in the analysis of human behavior but has also opened new possibilities for generating realistic human-like behavioral data. For human behavior understanding in video, semantic segmentation at the pixel level can be seen as local modeling, whereas aggregation of motion cues is at a more global level of analysis. [Extracted from the article]
- Published
- 2020
- Full Text
- View/download PDF
6. A Viewpoint Invariant, Sparsely Registered, Patch Based, Face Verifier.
- Author
-
Lucey, Simon and Chen, Tsuhan
- Subjects
FACE perception ,COMPUTER algorithms ,COMPUTER vision ,IMAGE processing ,PATTERN recognition systems ,ARTIFICIAL intelligence - Abstract
Sparsely registering a face (i.e., locating 2–3 fiducial points) is considered a much easier task than densely registering one; especially with varying viewpoints. Unfortunately, the converse tends to be true for the task of viewpoint-invariant face verification; the more registration points one has the better the performance. In this paper we present a novel approach to viewpoint invariant face verification which we refer to as the “patch-whole” algorithm. The algorithm is able to obtain good verification performance with sparsely registered faces. Good performance is achieved by not assuming any alignment between gallery and probe view faces, but instead trying to learn the joint likelihood functions for faces of similar and dissimilar identities. Generalization is encouraged by factorizing the joint gallery and probe appearance likelihood, for each class, into an ensemble of “patch-whole” likelihoods. We make an additional contribution in this paper by reviewing existing approaches to viewpoint-invariant face verification and demonstrating how most of them fall into one of two categories; namely viewpoint-generative or viewpoint-discriminative. This categorization is instructive as it enables us to compare our “patch-whole” algorithm to other paradigms in viewpoint-invariant face verification and also gives deeper insights into why the algorithm performs so well. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
7. What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?
- Author
-
Mayer, Nikolaus, Ilg, Eddy, Fischer, Philipp, Dosovitskiy, Alexey, Brox, Thomas, Hazirbas, Caner, and Cremers, Daniel
- Subjects
DEEP learning ,COMPUTER vision ,VISUAL perception ,ANNOTATIONS ,ARTIFICIAL intelligence - Abstract
The finding that very large networks can be trained efficiently and reliably has led to a paradigm shift in computer vision from engineered solutions to learning formulations. As a result, the research challenge shifts from devising algorithms to creating suitable and abundant training data for supervised learning. How to efficiently create such training data? The dominant data acquisition method in visual recognition is based on web data and manual annotation. Yet, for many computer vision problems, such as stereo or optical flow estimation, this approach is not feasible because humans cannot manually enter a pixel-accurate flow field. In this paper, we promote the use of synthetically generated data for the purpose of training deep networks on such tasks. We suggest multiple ways to generate such data and evaluate the influence of dataset properties on the performance and generalization properties of the resulting networks. We also demonstrate the benefit of learning schedules that use different types of data at selected stages of the training process. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
8. Reconstruction from Projections Using Grassmann Tensors.
- Author
-
Hartley, Richard and Schaffalitzky, Frederik
- Subjects
SET (Computer network protocol) ,PROJECTIVE spaces ,ARTIFICIAL intelligence ,COMPUTER vision ,PATTERN recognition systems ,IMAGE reconstruction - Abstract
In this paper a general procedure is given for reconstruction of a set of feature points in an arbitrary dimensional projective space from their projections into lower dimensional spaces. This extends the methods applied in the well-studied problem of reconstruction of scene points in ℘
3 given their projections in a set of images. In this case, the bifocal, trifocal and quadrifocal tensors are used to carry out this computation. It is shown that similar methods will apply in a much more general context, and hence may be applied to projections from ℘n to ℘m , which have been used in the analysis of dynamic scenes, and in radial distortion correction. For sufficiently many generic projections, reconstruction of the scene is shown to be unique up to projectivity, except in the case of projections onto one-dimensional image spaces (lines), in which case there are two solutions. Projections from ℘n to ℘2 have been considered by Wolf and Shashua (in International Journal of Computer Vision 48(1): 53–67, ), where they were applied to several different problems in dynamic scene analysis. They analyzed these projections using tensors, but no general way of defining such tensors, and computing the projections was given. This paper settles the general problem, showing that tensor definition and retrieval of the projections is always possible. [ABSTRACT FROM AUTHOR]- Published
- 2009
- Full Text
- View/download PDF
9. Semantic Representation and Recognition of Continued and Recursive Human Activities.
- Author
-
Ryoo, M. and Aggarwal, J.
- Subjects
SELF-organizing systems ,KNOWLEDGE representation (Information theory) ,COMPUTER vision ,INTELLIGENT agents ,IMAGE processing ,FIFTH generation computers ,MACHINE theory ,ARTIFICIAL intelligence - Abstract
This paper describes a methodology for automated recognition of complex human activities. The paper proposes a general framework which reliably recognizes high-level human actions and human-human interactions. Our approach is a description-based approach, which enables a user to encode the structure of a high-level human activity as a formal representation. Recognition of human activities is done by semantically matching constructed representations with actual observations. The methodology uses a context-free grammar (CFG) based representation scheme as a formal syntax for representing composite activities. Our CFG-based representation enables us to define complex human activities based on simpler activities or movements. Our system takes advantage of both statistical recognition techniques from computer vision and knowledge representation concepts from traditional artificial intelligence. In the low-level of the system, image sequences are processed to extract poses and gestures. Based on the recognition of gestures, the high-level of the system hierarchically recognizes composite actions and interactions occurring in a sequence of image frames. The concept of hallucinations and a probabilistic semantic-level recognition algorithm is introduced to cope with imperfect lower-layers. As a result, the system recognizes human activities including ‘fighting’ and ‘assault’, which are high-level activities that previous systems had difficulties. The experimental results show that our system reliably recognizes sequences of complex human activities with a high recognition rate. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
10. An Intrinsic Framework for Analysis of Facial Surfaces.
- Author
-
Samir, Chafik, Srivastava, Anuj, Daoudi, Mohamed, and Klassen, Eric
- Subjects
BIOMETRY ,DIFFERENTIAL geometry ,CURVES ,FACE perception ,COMPUTER vision ,PATTERN recognition systems ,ARTIFICIAL intelligence ,IMAGE processing - Abstract
A statistical analysis of shapes of facial surfaces can play an important role in biometric authentication and other face-related applications. The main difficulty in developing such an analysis comes from the lack of a canonical system to represent and compare all facial surfaces. This paper suggests a specific, yet natural, coordinate system on facial surfaces, that enables comparisons of their shapes. Here a facial surface is represented as an indexed collection of closed curves, called facial curves, that are level curves of a surface distance function from the tip of the nose. Defining the space of all such representations of face, this paper studies its differential geometry and endows it with a Riemannian metric. It presents numerical techniques for computing geodesic paths between facial surfaces in that space. This Riemannian framework is then used to: (i) compute distances between faces to quantify differences in their shapes, (ii) find optimal deformations between faces, and (iii) define and compute average of a given set of faces. Experimental results generated using laser-scanned faces are presented to demonstrate these ideas. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
11. Detecting Pedestrians Using Patterns of Motion and Appearance.
- Author
-
Viola, Paul, Jones, Michael, and Snow, Daniel
- Subjects
DETECTORS ,IMAGE processing ,COMPUTER vision ,ARTIFICIAL intelligence ,INFORMATION processing ,IMAGING systems ,MOTION - Abstract
This paper describes a pedestrian detection system that integrates image intensity information with motion information. We use a detection style algorithm that scans a detector over two consecutive frames of a video sequence. The detector is trained (using AdaBoost) to take advantage of both motion and appearance information to detect a walking person. Past approaches have built detectors based on motion information or detectors based on appearance information, but ours is the first to combine both sources of information in a single detector. The implementation described runs at about 4 frames/second, detects pedestrians at very small scales (as small as 20 × 15 pixels), and has a very low false positive rate.Our approach builds on the detection work of Viola and Jones. Novel contributions of this paper include: (i) development of a representation of image motion which is extremely efficient, and (ii) implementation of a state of the art pedestrian detection system which operates on low resolution images under difficult conditions (such as rain and snow). [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
12. Image Parsing: Unifying Segmentation, Detection, and Recognition.
- Author
-
Tu, Zhuowen, Chen, Xiangrong, Yuille, Alan, and Zhu, Song-Chun
- Subjects
IMAGE processing ,BAYESIAN analysis ,ALGORITHMS ,COMPUTER vision ,ARTIFICIAL intelligence - Abstract
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language. The algorithmconstructsthe parsing graph andre-configuresit dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches-generative(top-down) methods anddiscriminative(bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns-generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002.IEEE Trans. PAMI, 24(5):657-673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
13. Correctness Prediction, Accuracy Improvement and Generalization of Stereo Matching Using Supervised Learning.
- Author
-
Spyropoulos, Aristotle and Mordohai, Philippos
- Subjects
MACHINE learning ,COMPUTER vision ,ARTIFICIAL intelligence ,MACHINE theory ,ESTIMATION theory - Abstract
Machine learning has been instrumental in most areas of computer vision, but has not been applied to the problem of stereo matching with similar frequency or success. In this paper, we present a supervised learning approach by defining a set of features that capture various forms of information about each pixel, and then by using them to predict the correctness of stereo matches based on a random forest. We show highly competitive results in predicting the correctness of matches and in confidence estimation, which allows us to rank pixels according to the reliability of their assigned disparities. Moreover, we show how these confidence values can be used to improve the accuracy of disparity maps by integrating them with an MRF-based stereo algorithm. This is an important distinction from current literature that has mainly focused on sparsification by removing potentially erroneous disparities to generate quasi-dense disparity maps. Finally, we demonstrate domain generalization of our method by applying classifiers to datasets different than those they were trained on with minimal loss of accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
14. Video Question Answering with Spatio-Temporal Reasoning.
- Author
-
Jang, Yunseok, Song, Yale, Kim, Chris Dongjoo, Yu, Youngjae, Kim, Youngjin, and Kim, Gunhee
- Subjects
VIDEOS ,ARTIFICIAL intelligence ,QUESTION answering systems ,NATURAL languages ,REASONING ,QUESTIONING - Abstract
Vision and language understanding has emerged as a subject undergoing intense study in Artificial Intelligence. Among many tasks in this line of research, visual question answering (VQA) has been one of the most successful ones, where the goal is to learn a model that understands visual content at region-level details and finds their associations with pairs of questions and answers in the natural language form. Despite the rapid progress in the past few years, most existing work in VQA have focused primarily on images. In this paper, we focus on extending VQA to the video domain and contribute to the literature in three important ways. First, we propose three new tasks designed specifically for video VQA, which require spatio-temporal reasoning from videos to answer questions correctly. Next, we introduce a new large-scale dataset for video VQA named TGIF-QA that extends existing VQA work with our new tasks. Finally, we propose a dual-LSTM based approach with both spatial and temporal attention and show its effectiveness over conventional VQA techniques through empirical evaluations. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. Unsupervised Learning of Foreground Object Segmentation.
- Author
-
Croitoru, Ioana, Bogolin, Simion-Vlad, and Leordeanu, Marius
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,STUDENT teachers ,LEARNING problems ,IMAGE segmentation - Abstract
Unsupervised learning represents one of the most interesting challenges in computer vision today. The task has an immense practical value with many applications in artificial intelligence and emerging technologies, as large quantities of unlabeled images and videos can be collected at low cost. In this paper, we address the unsupervised learning problem in the context of segmenting the main foreground objects in single images. We propose an unsupervised learning system, which has two pathways, the teacher and the student, respectively. The system is designed to learn over several generations of teachers and students. At every generation the teacher performs unsupervised object discovery in videos or collections of images and an automatic selection module picks up good frame segmentations and passes them to the student pathway for training. At every generation multiple students are trained, with different deep network architectures to ensure a better diversity. The students at one iteration help in training a better selection module, forming together a more powerful teacher pathway at the next iteration. In experiments, we show that the improvement in the selection power, the training of multiple students and the increase in unlabeled data significantly improve segmentation accuracy from one generation to the next. Our method achieves top results on three current datasets for object discovery in video, unsupervised image segmentation and saliency detection. At test time, the proposed system is fast, being one to two orders of magnitude faster than published unsupervised methods. We also test the strength of our unsupervised features within a well known transfer learning setup and achieve competitive performance, proving that our unsupervised approach can be reliably used in a variety of computer vision tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
16. Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection.
- Author
-
Li, Hongyang, Liu, Yu, Wang, Xiaogang, and Ouyang, Wanli
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,IMAGE processing ,OBJECT tracking (Computer vision) ,DEEP learning ,ARTIFICIAL neural networks - Abstract
In this paper, we propose a zoom-out-and-in network for generating object proposals. A key observation is that it is difficult to classify anchors of different sizes with the same set of features. Anchors of different sizes should be placed accordingly based on different depth within a network: smaller boxes on high-resolution layers with a smaller stride while larger boxes on low-resolution counterparts with a larger stride. Inspired by the conv/deconv structure, we fully leverage the low-level local details and high-level regional semantics from two feature map streams, which are complimentary to each other, to identify the objectness in an image. A map attention decision (MAD) unit is further proposed to aggressively search for neuron activations among two streams and attend the most contributive ones on the feature learning of the final loss. The unit serves as a decision-maker to adaptively activate maps along certain channels with the solely purpose of optimizing the overall training loss. One advantage of MAD is that the learned weights enforced on each feature channel is predicted on-the-fly based on the input context, which is more suitable than the fixed enforcement of a convolutional kernel. Experimental results on three datasets demonstrate the effectiveness of our proposed algorithm over other state-of-the-arts, in terms of average recall for region proposal and average precision for object detection. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
17. $${L_q}$$ -Closest-Point to Affine Subspaces Using the Generalized Weiszfeld Algorithm.
- Author
-
Aftab, Khurrum, Hartley, Richard, and Trumpf, Jochen
- Subjects
COMPUTER vision ,PATTERN recognition systems ,IMAGE processing ,ARTIFICIAL intelligence ,VISUAL optics - Abstract
This paper presents a method for finding an $$L_q$$ -closest-point to a set of affine subspaces, that is a point for which the sum of the q-th power of orthogonal distances to all the subspaces is minimized, where $$1 \le q < 2$$ . We give a theoretical proof for the convergence of the proposed algorithm to a unique $$L_q$$ minimum. The proposed method is motivated by the $$L_q$$ Weiszfeld algorithm, an extremely simple and rapid averaging algorithm, that finds the $$L_q$$ mean of a set of given points in a Euclidean space. The proposed algorithm is applied to the triangulation problem in computer vision by finding the $$L_q$$ -closest-point to a set of lines in 3D. Our experimental results for the triangulation problem confirm that the $$L_q$$ -closest-point method, for $$1 \le q < 2$$ , is more robust to outliers than the $$L_2$$ -closest-point method. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
18. Equiangular Basis Vectors: A Novel Paradigm for Classification Tasks.
- Author
-
Shen, Yang, Sun, Xuhao, Wei, Xiu-Shen, Xu, Anqi, and Gao, Lingyan
- Subjects
- *
IMAGE recognition (Computer vision) , *OBJECT recognition (Computer vision) , *COMPUTER vision , *MACHINE learning , *DEEP learning , *ARTIFICIAL intelligence - Abstract
In this paper, we propose Equiangular Basis Vectors (EBVs) as a novel training paradigm of deep learning for image classification tasks. Differing from prominent training paradigms, e.g.,
k -way classification layers (mapping the learned representations to the label space) and deep metric learning (quantifying sample similarity), our method generates normalized vector embeddings as "predefined classifiers", which act as the fixed learning targets corresponding to different categories. By minimizing the spherical distance of the embedding of an input between its categorical EBV in training, the predictions can be obtained by identifying the categorical EBV with the smallest distance during inference. More importantly, by directly adding EBVs corresponding to newly added categories of equal status on the basis of existing EBVs, our method exhibits strong scalability to deal with the large increase of training categories in open-environment machine learning. In experiments, we evaluate EBVs on diverse computer vision tasks with large-scale real-world datasets, including classification on ImageNet-1K, object detection on COCO, semantic segmentation on ADE20K, etc. We further collected a dataset consisting of 100,000 categories to validate the superior performance of EBVs when handling a large number of categories. Comprehensive experiments validate both the effectiveness and scalability of our EBVs. Our method won the first place in the 2022 DIGIX Global AI Challenge, code along with all associated logs are open-source and available at https://github.com/aassxun/Equiangular-Basis-Vectors. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
19. A Predual Proximal Point Algorithm Solving a Non Negative Basis Pursuit Denoising Model.
- Author
-
Malgouyres, F. and Zeng, T.
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,PATTERN recognition systems ,IMAGE processing ,IMAGE reconstruction ,IMAGE registration - Abstract
This paper develops an implementation of a Predual Proximal Point Algorithm (PPPA) solving a Non Negative Basis Pursuit Denoising model. The model imposes a constraint on the l
2 norm of the residual, instead of penalizing it. The PPPA solves the predual of the problem with a Proximal Point Algorithm (PPA). Moreover, the minimization that needs to be performed at each iteration of PPA is solved with a dual method. We can prove that these dual variables converge to a solution of the initial problem. Our analysis proves that we turn a constrained non differentiable convex problem into a short sequence of nice concave maximization problems. By nice, we mean that the functions which are maximized are differentiable and their gradient is Lipschitz. The algorithm is easy to implement, easier to tune and more general than the algorithms found in the literature. In particular, it can be applied to the Basis Pursuit Denoising (BPDN) and the Non Negative Basis Pursuit Denoising (NNBPDN) and it does not make any assumption on the dictionary. We prove its convergence to the set of solutions of the model and provide some convergence rates. Experiments on image approximation show that the performances of the PPPA are at the current state of the art for the BPDN. [ABSTRACT FROM AUTHOR]- Published
- 2009
- Full Text
- View/download PDF
20. Global Optimization through Rotation Space Search.
- Author
-
Hartley, Richard and Kahl, Fredrik
- Subjects
PATTERN recognition systems ,IMAGE processing ,MATHEMATICAL optimization ,COMPUTER vision ,ARTIFICIAL intelligence ,IMAGING systems ,BRANCH & bound algorithms - Abstract
This paper introduces a new algorithmic technique for solving certain problems in geometric computer vision. The main novelty of the method is a branch-and-bound search over rotation space, which is used in this paper to determine camera orientation. By searching over all possible rotations, problems can be reduced to known fixed-rotation problems for which optimal solutions have been previously given. In particular, a method is developed for the estimation of the essential matrix, giving the first guaranteed optimal algorithm for estimating the relative pose using a cost function based on reprojection errors. Recently convex optimization techniques have been shown to provide optimal solutions to many of the common problems in structure from motion. However, they do not apply to problems involving rotations. The search method described in this paper allows such problems to be solved optimally. Apart from the essential matrix, the algorithm is applied to the camera pose problem, providing an optimal algorithm. The approach has been implemented and tested on a number of both synthetically generated and real data sets with good performance. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
21. Topology-Invariant Similarity of Nonrigid Shapes.
- Author
-
Bronstein, Alexander, Bronstein, Michael, and Kimmel, Ron
- Subjects
TOPOLOGY ,POLYHEDRA ,LINEAR algebra ,SIMILARITY (Geometry) ,COMPUTER vision ,PATTERN recognition systems ,IMAGE processing ,ARTIFICIAL intelligence - Abstract
This paper explores the problem of similarity criteria between nonrigid shapes. Broadly speaking, such criteria are divided into intrinsic and extrinsic, the first referring to the metric structure of the object and the latter to how it is laid out in the Euclidean space. Both criteria have their advantages and disadvantages: extrinsic similarity is sensitive to nonrigid deformations, while intrinsic similarity is sensitive to topological noise. In this paper, we approach the problem from the perspective of metric geometry. We show that by unifying the extrinsic and intrinsic similarity criteria, it is possible to obtain a stronger topology-invariant similarity, suitable for comparing deformed shapes with different topology. We construct this new joint criterion as a tradeoff between the extrinsic and intrinsic similarity and use it as a set-valued distance. Numerical results demonstrate the efficiency of our approach in cases where using either extrinsic or intrinsic criteria alone would fail. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
22. An Improved FoE Model for Image Deblurring.
- Author
-
Xu, Dahong and Wang, Runsheng
- Subjects
IMAGE reconstruction ,IMAGE processing ,INFORMATION processing ,COMPUTER vision ,ARTIFICIAL intelligence ,IMAGING systems - Abstract
Image restoration from noisy and blurred image is one of the important tasks in image processing and computer vision systems. In this paper, an improved Fields of Experts model for deconvolution of isotropic Gaussian blur is developed, where edges are preserved in deconvolution by introducing local prior information. The edges with different local background in a blur image are retained since local prior information is adaptively estimated. Experiments indicate that the proposed approach is capable of producing highly accurate solutions and preserving more edge and object boundaries than many other algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
23. Modeling the World from Internet Photo Collections.
- Author
-
Snavely, Noah, Seitz, Steven, and Szeliski, Richard
- Subjects
SEARCH algorithms ,VISUAL perception ,COMPUTER vision ,IMAGE processing ,ARTIFICIAL intelligence ,VISUAL programming languages (Computer science) ,PATTERN recognition systems - Abstract
There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization. We present structure-from-motion and image-based rendering algorithms that operate on hundreds of images downloaded as a result of keyword-based image search queries like “Notre Dame” or “Trevi Fountain.” This approach, which we call Photo Tourism, has enabled reconstructions of numerous well-known world sites. This paper presents these algorithms and results as a first step towards 3D modeling of the world’s well-photographed sites, cities, and landscapes from Internet imagery, and discusses key open problems and challenges for the research community. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
24. Simultaneous Segmentation and Pose Estimation of Humans Using Dynamic Graph Cuts.
- Author
-
Kohli, Pushmeet, Rihan, Jonathan, Bray, Matthieu, and Torr, Philip
- Subjects
STOCHASTIC processes ,RANDOM fields ,COMPUTER vision ,IMAGE processing ,PATTERN recognition systems ,ARTIFICIAL intelligence - Abstract
This paper presents a novel algorithm for performing integrated segmentation and 3D pose estimation of a human body from multiple views. Unlike other state of the art methods which focus on either segmentation or pose estimation individually, our approach tackles these two tasks together. Our method works by optimizing a cost function based on a Conditional Random Field ( CRF). This has the advantage that all information in the image (edges, background and foreground appearances), as well as the prior information on the shape and pose of the subject can be combined and used in a Bayesian framework. Optimizing such a cost function would have been computationally infeasible. However, our recent research in dynamic graph cuts allows this to be done much more efficiently than before. We demonstrate the efficacy of our approach on challenging motion sequences. Although we target the human pose inference problem in the paper, our method is completely generic and can be used to segment and infer the pose of any rigid, deformable or articulated object. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
25. Constraints Between Distant Lines in the Labelling of Line Drawings of Polyhedral Scenes.
- Author
-
Cooper, Martin
- Subjects
POLYHEDRAL functions ,MODULAR functions ,LINE drawing ,COMPUTER vision ,IMAGE processing ,ARTIFICIAL intelligence - Abstract
The machine interpretation of line drawings has applications both in vision and geometric modelling. This paper extends the classic technique of assigning semantic labels to lines subject to junction constraints, by introducing new constraints (often between distant lines). These include generic constraints between lines lying on a path in the drawing as well as preference constraints between the labellings of pairs of junctions lying on parallel lines. Such constraints are essential to avoid an exponential number of legal labellings of drawings of objects with non-trihedral vertices. The strength of these constraints is demonstrated by their ability to identify the unique correct labelling of many drawings of polyhedral objects with tetrahedral vertices. These new constraints also allowed us to deduce a general polyhedral junction constraint for the case when there is no limit on the number of faces which can meet at a junction. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
26. Image-Based Rendering Using Image-Based Priors.
- Author
-
Fitzgibbon, Andrew, Wexler, Yonatan, and Zisserman, Andrew
- Subjects
IMAGE processing ,COMPUTER vision ,ARTIFICIAL intelligence ,INFORMATION processing ,IMAGING systems - Abstract
Given a set of images acquired from known viewpoints, we describe a method for synthesizing the image which would be seen from a new viewpoint. In contrast to existing techniques, which explicitly reconstruct the 3D geometry of the scene, we transform the problem to the reconstruction of colour rather than depth. This retains the benefits of geometric constraints, but projects out the ambiguities in depth estimation which occur in textureless regions.On the other hand, regularization is still needed in order to generate high-quality images. The paper’s second contribution is to constrain the generated views to lie in the space of images whose texture statistics are those of the input images. This amounts to animage-basedprior on the reconstruction which regularizes the solution, yielding realistic synthetic views. Examples are given of new view generation for cameras interpolated between the acquisition viewpoints-which enables synthetic steadicam stabilization of a sequence with a high level of realism. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
27. Baseline Detection and Localization for Invisible Omnidirectional Cameras.
- Author
-
Ishiguro, Hiroshi, Sogo, Takushi, and Barth, Matthew
- Subjects
CAMERAS ,PHOTOGRAPHIC equipment ,COMPUTER vision ,IMAGE processing ,ARTIFICIAL intelligence - Abstract
Two key problems for camera networks that observe wide areas with many distributed cameras are self-localization and camera identification. Although there are many methods for localizing the cameras, one of the easiest and most desired methods is to estimate camera positions by having the cameras observe each other; hence the term self-localization. If the cameras have a wide viewing field, e.g. an omnidirectional camera, and can observe each other, baseline distances between pairs of cameras and relative locations can be determined. However, if the projection of a camera is relatively small on the image of other cameras and is not readily visible, the baselines cannot be detected. In this paper, a method is proposed to determine the baselines and relative locations of these “invisible” cameras. The method consists of two processes executed simultaneously: (a) to statistically detect the baselines among the cameras, and (b) to localize the cameras by using information from (a) and propagating triangle constraints. Process (b) works for the localization in the case where the cameras are observed each other, and it does not require complete observation among the cameras. However, if many cameras cannot be observed each other because of the poor image resolution, it dose not work. The baseline detection by process (a) solves the problem. This methodology is described in detail and results are provided for several scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
28. Lucas-Kanade 20 Years On: A Unifying Framework.
- Author
-
Baker, Simon and Matthews, Iain
- Subjects
COMPUTER vision ,ALGORITHMS ,ALGEBRA ,ARTIFICIAL intelligence ,IMAGE processing ,APPROXIMATION theory - Abstract
Since the Lucas-Kanade algorithm was proposed in 1981 image alignment has become one of the most widely used techniques in computer vision. Applications range from optical flow and tracking to layered motion, mosaic construction, and face coding. Numerous algorithms have been proposed and a wide variety of extensions have been made to the original formulation. We present an overview of image alignment, describing most of the algorithms and their extensions in a consistent framework. We concentrate on the inverse compositional algorithm, an efficient algorithm that we recently proposed. We examine which of the extensions to Lucas-Kanade can be used with the inverse compositional algorithm without any significant loss of efficiency, and which cannot. In this paper, Part 1 in a series of papers, we cover the quantity approximated, the warp update rule, and the gradient descent approximation. In future papers, we will cover the choice of the error function, how to allow linear appearance variation, and how to impose priors on the parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
29. Polyhedral Object Localization in an Image by Referencing to a Single Model View.
- Author
-
Chung, Ronald and Wong, Hau-San
- Subjects
COMPUTER vision ,IMAGE processing ,PATTERN recognition systems ,ARTIFICIAL intelligence ,THREE-dimensional imaging ,POLYHEDRA - Abstract
Identifying a three-dimensional (3D) object in an image is traditionally dealt with by referencing to a 3D model of the object. In the last few years there has been a growing interest of using not a 3D shape but multiple views of the object as the reference. This paper attempts a further step in the direction, using not multiple views but a single clean view as the reference model. The key issue is how to establish correspondences from the model view where the boundary of the object is explicitly available, to the scene view where the object can be surrounded by various distracting entities and its boundary disturbed by noise. We propose a solution to the problem, which is based upon a mechanism of predicting correspondences from just four particular initial point correspondences. The object is required to be polyhedral or near-polyhedral. The correspondence mechanism has a computational complexity linear with respect to the total number of visible corners of the object in the model view. The limitation of the mechanism is also analyzed thoroughly in this paper. Experimental results over real images are presented to illustrate the performance of the proposed solution. [ABSTRACT FROM AUTHOR]
- Published
- 2003
- Full Text
- View/download PDF
30. Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images.
- Author
-
Tolias, Giorgos, Avrithis, Yannis, and Jégou, Hervé
- Subjects
COMPUTER vision ,PATTERN recognition systems ,HAMMING codes ,ARTIFICIAL intelligence ,IMAGE processing ,KERNEL operating systems - Abstract
This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the vector or locally aggregated descriptors descriptor and matching techniques such as hamming embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. The representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks. We show that the same aggregation procedure, originally applied per image, can effectively operate on groups of similar features found across multiple images. This method implicitly performs feature set augmentation, while enjoying savings in memory requirements at the same time. Finally, the proposed method is shown effective for place recognition, outperforming state of the art methods on a large scale landmark recognition benchmark. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
31. Putting the User in the Loop for Image-Based Modeling.
- Author
-
Kowdle, Adarsh, Chang, Yao-Jen, Gallagher, Andrew, Batra, Dhruv, and Chen, Tsuhan
- Subjects
ACTIVE learning ,IMAGE processing ,PROGRAM transformation ,MARKOV processes ,ARTIFICIAL intelligence - Abstract
We refer to the task of recovering the 3D structure of an object or a scene using 2D images as image-based modeling. In this paper, we formulate the task of recovering the 3D structure as a discrete optimization problem solved via energy minimization. In this standard framework of a Markov random field (MRF) defined over the image we present algorithms that allow the user to intuitively interact with the algorithm. We introduce an algorithm where the user guides the process of image-based modeling to find and model the object of interest by manually interacting with the nodes of the graph. We develop end user applications using this algorithm that allow object of interest 3D modeling on a mobile device and 3D printing of the object of interest. We also propose an alternate active learning algorithm that guides the user input. An initial attempt is made at reconstructing the scene without supervision. Given the reconstruction, an active learning algorithm uses intuitive cues to quantify the uncertainty of the algorithm and suggest regions, querying the user to provide support for the uncertain regions via simple scribbles. These constraints are used to update the unary and the pairwise energies that, when solved, lead to better reconstructions. We show through machine experiments and a user study that the proposed approach intelligently queries the users for constraints, and users achieve better reconstructions of the scene faster, especially for scenes with textureless surfaces lacking strong textural or structural cues that algorithms typically require. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
32. Euler Principal Component Analysis.
- Author
-
Liwicki, Stephan, Tzimiropoulos, Georgios, Zafeiriou, Stefanos, and Pantic, Maja
- Subjects
PRINCIPAL components analysis ,PATTERN recognition systems ,COMPUTER vision ,COMPUTER algorithms ,ARTIFICIAL intelligence - Abstract
Principal Component Analysis (PCA) is perhaps the most prominent learning tool for dimensionality reduction in pattern recognition and computer vision. However, the ℓ-norm employed by standard PCA is not robust to outliers. In this paper, we propose a kernel PCA method for fast and robust PCA, which we call Euler-PCA ( e-PCA). In particular, our algorithm utilizes a robust dissimilarity measure based on the Euler representation of complex numbers. We show that Euler-PCA retains PCA's desirable properties while suppressing outliers. Moreover, we formulate Euler-PCA in an incremental learning framework which allows for efficient computation. In our experiments we apply Euler-PCA to three different computer vision applications for which our method performs comparably with other state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
33. Max Margin Learning of Hierarchical Configural Deformable Templates (HCDTs) for Efficient Object Parsing and Pose Estimation.
- Author
-
Long Zhu, Yuanhao Chen, Chenxi Lin, and Yuille, Alan
- Subjects
CONFIGURATION management ,DIGITAL image processing ,ADAPTIVE computing systems ,PARSING (Computer grammar) ,COMPUTER vision ,IMAGE registration ,ARTIFICIAL intelligence - Abstract
In this paper we formulate a hierarchical configurable deformable template (HCDT) to model articulated visual objects-such as horses and baseball players-for tasks such as parsing, segmentation, and pose estimation. HCDTs represent an object by an AND/OR graph where the OR nodes act as switches which enables the graph topology to vary adaptively. This hierarchical representation is compositional and the node variables represent positions and properties of subparts of the object. The graph and the node variables are required to obey the summarization principle which enables an efficient compositional inference algorithm to rapidly estimate the state of the HCDT. We specify the structure of the AND/OR graph of the HCDT by hand and learn the model parameters discriminatively by extending Max-Margin learning to AND/OR graphs. We illustrate the three main aspects of HCDTs-representation, inference, and learning-on the tasks of segmenting, parsing, and pose (configuration) estimation for horses and humans. We demonstrate that the inference algorithm is fast and that max-margin learning is effective. We show that HCDTs gives state of the art results for segmentation and pose estimation when compared to other methods on benchmarked datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
34. Learning Photometric Invariance for Object Detection.
- Author
-
Álvarez, Jose M., Gevers, Theo, and López, Antonio M.
- Subjects
DIGITAL image processing ,IMAGING systems ,ROBOT vision ,COLOR ,ARTIFICIAL intelligence - Abstract
Color is a powerful visual cue in many computer vision applications such as image segmentation and object recognition. However, most of the existing color models depend on the imaging conditions that negatively affect the performance of the task at hand. Often, a reflection model (e.g., Lambertian or dichromatic reflectance) is used to derive color invariant models. However, this approach may be too restricted to model real-world scenes in which different reflectance mechanisms can hold simultaneously. Therefore, in this paper, we aim to derive color invariance by learning from color models to obtain diversified color invariant ensembles. First, a photometrical orthogonal and non-redundant color model set is computed composed of both color variants and invariants. Then, the proposed method combines these color models to arrive at a diversified color ensemble yielding a proper balance between invariance (repeatability) and discriminative power (distinctiveness). To achieve this, our fusion method uses a multi-view approach to minimize the estimation error. In this way, the proposed method is robust to data uncertainty and produces properly diversified color invariant ensembles. Further, the proposed method is extended to deal with temporal data by predicting the evolution of observations over time. Experiments are conducted on three different image datasets to validate the proposed method. Both the theoretical and experimental results show that the method is robust against severe variations in imaging conditions. The method is not restricted to a certain reflection model or parameter tuning, and outperforms state-of-the-art detection techniques in the field of object, skin and road recognition. Considering sequential data, the proposed method (extended to deal with future observations) outperforms the other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
35. Active, Foveated, Uncalibrated Stereovision.
- Author
-
Monaco, James, Bovik, Alan, and Cormack, Lawrence
- Subjects
DIGITAL image processing ,PATTERN recognition systems ,COMPUTER vision ,ARTIFICIAL intelligence ,GEOMETRY - Abstract
Biological vision systems have inspired and will continue to inspire the development of computer vision systems. One biological tendency that has never been exploited is the symbiotic relationship between foveation and uncalibrated active, binocular vision systems. The primary goal of any binocular vision system is the correspondence of the two retinal images. For calibrated binocular rigs the search for corresponding points can be restricted to epipolar lines. In an uncalibrated system the precise geometry is unknown. However, the set of possible geometries can be restricted to some reasonable range; and consequently, the search for matching points can be confined to regions delineated by the union of all possible epipolar lines over all possible geometries. We call these regions epipolar spaces. The accuracy and complexity of any correspondence algorithm is directly proportional to the size of these epipolar spaces. Consequently, the introduction of a spatially variant foveation strategy that reduces the average area per epipolar space is highly desirable. This paper provides a set of sampling theorems that offer a path for designing foveation strategies that are optimal with respect to average epipolar area. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
36. Reversible Interpolation of Vectorial Images by an Anisotropic Diffusion-Projection PDE.
- Author
-
Roussos, Anastasios and Maragos, Petros
- Subjects
NUMERICAL analysis ,INTERPOLATION ,IMAGE analysis ,COMPUTER vision ,ARTIFICIAL intelligence ,PATTERN recognition systems - Abstract
In this paper, a nonlinear model for the interpolation of vector-valued images is proposed. This model is based on an anisotropic diffusion PDE and performs an interpolation that is reversible. The interpolation solution is restricted to the subspace of functions that can recover the discrete input image, after an appropriate smoothing and sampling. The proposed nonlinear diffusion flow lies on this subspace while its strength and anisotropy adapt to the local variations and geometry of image structures. The derived method effectively reconstructs the real image structures and yields a satisfactory interpolation result. Compared to classic and other existing PDE-based interpolation methods, our proposed method seems to increase the accuracy of the result and to reduce the undesirable artifacts, such as blurring, ringing, block effects and edge distortion. We present extensive experimental results that demonstrate the potential of the method as applied to graylevel and color images. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
37. Scale Selection for Compact Scale-Space Representation of Vector-Valued Images.
- Author
-
Vanhamel, I., Mihai, C., Sahli, H., Katartzis, A., and Pratikakis, I.
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,IMAGE processing ,PATTERN recognition systems ,INFORMATION processing ,IMAGE quality analysis - Abstract
This paper investigates the scale selection problem for nonlinear diffusion scale-spaces. This topic comprises the notions of localization scale selection and scale space discretization. For the former, we present a new approach. It aims at maximizing the image content's presence by finding the scale that has a maximum correlation with the noise-free image. For the latter, we propose to adapt the optimal diffusion stopping time criterion of Mrázek and Navara in such a way that it may identify multiple scales of importance. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
38. On Local Region Models and a Statistical Interpretation of the Piecewise Smooth Mumford-Shah Functional.
- Author
-
Brox, Thomas and Cremers, Daniel
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,IMAGE processing ,PATTERN recognition systems ,IMAGE analysis ,IMAGING systems - Abstract
The Mumford-Shah functional is a general and quite popular variational model for image segmentation. In particular, it provides the possibility to represent regions by smooth approximations. In this paper, we derive a statistical interpretation of the full (piecewise smooth) Mumford-Shah functional by relating it to recent works on local region statistics. Moreover, we show that this statistical interpretation comes along with several implications. Firstly, one can derive extended versions of the Mumford-Shah functional including more general distribution models. Secondly, it leads to faster implementations. Finally, thanks to the analytical expression of the smooth approximation via Gaussian convolution, the coordinate descent can be replaced by a true gradient descent. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
39. Cooperative Object Segmentation and Behavior Inference in Image Sequences.
- Author
-
Gui, Laura, Thiran, Jean-Philippe, and Paragios, Nikos
- Subjects
IMAGE analysis ,IMAGING systems ,COMPUTER vision ,ARTIFICIAL intelligence ,IMAGE processing ,PATTERN recognition systems - Abstract
In this paper, we propose a general framework for fusing bottom-up segmentation with top-down object behavior inference over an image sequence. This approach is beneficial for both tasks, since it enables them to cooperate so that knowledge relevant to each can aid in the resolution of the other, thus enhancing the final result. In particular, the behavior inference process offers dynamic probabilistic priors to guide segmentation. At the same time, segmentation supplies its results to the inference process, ensuring that they are consistent both with prior knowledge and with new image information. The prior models are learned from training data and they adapt dynamically, based on newly analyzed images. We demonstrate the effectiveness of our framework via particular implementations that we have employed in the resolution of two hand gesture recognition applications. Our experimental results illustrate the robustness of our joint approach to segmentation and behavior inference in challenging conditions involving complex backgrounds and occlusions of the target object. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
40. New Possibilities with Sobolev Active Contours.
- Author
-
Sundaramoorthi, Ganesh, Yezzi, Anthony, Mennucci, Andrea, and Sapiro, Guillermo
- Subjects
SOBOLEV gradients ,CONJUGATE gradient methods ,COMPUTER vision ,ARTIFICIAL intelligence ,IMAGE processing ,PATTERN recognition systems - Abstract
Recently, the Sobolev metric was introduced to define gradient flows of various geometric active contour energies. It was shown that the Sobolev metric outperforms the traditional metric for the same energy in many cases such as for tracking where the coarse scale changes of the contour are important. Some interesting properties of Sobolev gradient flows include that they stabilize certain unstable traditional flows, and the order of the evolution PDEs are reduced when compared with traditional gradient flows of the same energies. In this paper, we explore new possibilities for active contours made possible by Sobolev metrics. The Sobolev method allows one to implement new energy-based active contour models that were not otherwise considered because the traditional minimizing method render them ill-posed or numerically infeasible. In particular, we exploit the stabilizing and the order reducing properties of Sobolev gradients to implement the gradient descent of these new energies. We give examples of this class of energies, which include some simple geometric priors and new edge-based energies. We also show that these energies can be quite useful for segmentation and tracking. We also show that the gradient flows using the traditional metric are either ill-posed or numerically difficult to implement, and then show that the flows can be implemented in a stable and numerically feasible manner using the Sobolev gradient. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
41. Contour Grouping Based on Contour-Skeleton Duality.
- Author
-
Adluru, Nagesh and Latecki, Longin
- Subjects
COMPUTER vision ,PATTERN recognition systems ,ROBOT vision ,IMAGE processing ,ARTIFICIAL intelligence ,MATHEMATICAL mappings ,TEXTURE mapping - Abstract
In this paper we present a method for grouping relevant object contours in edge maps by taking advantage of contour-skeleton duality. Regularizing contours and skeletons simultaneously allows us to combine both low level perceptual constraints as well as higher level model constraints in a very effective way. The models are represented using paths in symmetry sets. Skeletons are treated as trajectories of an imaginary virtual robot in a discrete space of “symmetric points” obtained from pairs of edge segments. Boundaries are then defined as the maps obtained by grouping the associated pairs of edge segments along the trajectories. Casting the grouping problem in this manner makes it similar to the problem of Simultaneous Localization and Mapping (SLAM). Hence we adapt the state-of-the-art probabilistic framework namely Rao-Blackwellized particle filtering that has been successfully applied to SLAM. We use the framework to maximize the joint posterior over skeletons and contours. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
42. Robust Higher Order Potentials for Enforcing Label Consistency.
- Author
-
Kohli, Pushmeet, Ladický, L’ubor, and Torr, Philip
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,PARTIAL differential equations ,RANDOM fields ,MARKOV processes ,PATTERN recognition systems ,PROBLEM solving ,IMAGE processing ,PIXELS - Abstract
This paper proposes a novel framework for labelling problems which is able to combine multiple segmentations in a principled manner. Our method is based on higher order conditional random fields and uses potentials defined on sets of pixels (image segments) generated using unsupervised segmentation algorithms. These potentials enforce label consistency in image regions and can be seen as a generalization of the commonly used pairwise contrast sensitive smoothness potentials. The higher order potential functions used in our framework take the form of the Robust P
n model and are more general than the Pn Potts model recently proposed by Kohli et al. We prove that the optimal swap and expansion moves for energy functions composed of these potentials can be computed by solving a st-mincut problem. This enables the use of powerful graph cut based move making algorithms for performing inference in the framework. We test our method on the problem of multi-class object segmentation by augmenting the conventional crf used for object segmentation with higher order potentials defined on image regions. Experiments on challenging data sets show that integration of higher order potentials quantitatively and qualitatively improves results leading to much better definition of object boundaries. We believe that this method can be used to yield similar improvements for many other labelling problems. [ABSTRACT FROM AUTHOR]- Published
- 2009
- Full Text
- View/download PDF
43. Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos.
- Author
-
Marat, Sophie, Ho Phuoc, Tien, Granjon, Lionel, Guyader, Nathalie, Pellerin, Denis, and Guérin-Dugué, Anne
- Subjects
STREAMING technology ,EYE movements ,VIDEOS ,VISUAL perception ,VISION research ,ARTIFICIAL intelligence ,FORECASTING ,VISUAL fields - Abstract
This paper presents a spatio-temporal saliency model that predicts eye movement during video free viewing. This model is inspired by the biology of the first steps of the human visual system. The model extracts two signals from video stream corresponding to the two main outputs of the retina: parvocellular and magnocellular. Then, both signals are split into elementary feature maps by cortical-like filters. These feature maps are used to form two saliency maps: a static and a dynamic one. These maps are then fused into a spatio-temporal saliency map. The model is evaluated by comparing the salient areas of each frame predicted by the spatio-temporal saliency map to the eye positions of different subjects during a free video viewing experiment with a large database (17000 frames). In parallel, the static and the dynamic pathways are analyzed to understand what is more or less salient and for what type of videos our model is a good or a poor predictor of eye movement. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
44. Occlusion Boundaries from Motion: Low-Level Detection and Mid-Level Reasoning.
- Author
-
Stein, Andrew and Hebert, Martial
- Subjects
IMAGE processing ,ARTIFICIAL intelligence ,COMPUTER vision ,PATTERN recognition systems ,INFORMATION processing ,GEOGRAPHIC boundaries ,DIGITAL image processing ,THREE-dimensional display systems ,THREE-dimensional imaging - Abstract
The boundaries of objects in an image are often considered a nuisance to be “handled” due to the occlusion they exhibit. Since most, if not all, computer vision techniques aggregate information spatially within a scene, information spanning these boundaries, and therefore from different physical surfaces, is invariably and erroneously considered together. In addition, these boundaries convey important perceptual information about 3D scene structure and shape. Consequently, their identification can benefit many different computer vision pursuits, from low-level processing techniques to high-level reasoning tasks. While much focus in computer vision is placed on the processing of individual, static images, many applications actually offer video, or sequences of images, as input. The extra temporal dimension of the data allows the motion of the camera or the scene to be used in processing. In this paper, we focus on the exploitation of subtle relative-motion cues present at occlusion boundaries. When combined with more standard appearance information, we demonstrate these cues’ utility in detecting occlusion boundaries locally. We also present a novel, mid-level model for reasoning more globally about object boundaries and propagating such local information to extract improved, extended boundaries. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
45. Spectral Curvature Clustering (SCC).
- Author
-
Chen, Guangliang and Lerman, Gilad
- Subjects
IMAGE processing ,PATTERN recognition systems ,PATTERN perception ,COMPUTER vision ,ARTIFICIAL intelligence ,OPTICAL pattern recognition - Abstract
This paper presents novel techniques for improving the performance of a multi-way spectral clustering framework (Govindu in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 1150–1157, ; Chen and Lerman, , preprint in the supplementary webpage) for segmenting affine subspaces. Specifically, it suggests an iterative sampling procedure to improve the uniform sampling strategy, an automatic scheme of inferring the tuning parameter from the data, a precise initialization procedure for K-means, as well as a simple strategy for isolating outliers. The resulting algorithm, Spectral Curvature Clustering (SCC), requires only linear storage and takes linear running time in the size of the data. It is supported by theory which both justifies its successful performance and guides our practical choices. We compare it with other existing methods on a few artificial instances of affine subspaces. Application of the algorithm to several real-world problems is also discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
46. Rank Constraints for Homographies over Two Views: Revisiting the Rank Four Constraint.
- Author
-
Chen, Pei and Suter, David
- Subjects
COMPUTER engineering ,COMPUTER vision ,ALGORITHMS ,PATTERN recognition systems ,ARTIFICIAL intelligence ,HOMOGRAPHY (Computer vision) - Abstract
It is well known that one can collect the coefficients of five (or more) homographies between two views into a large, rank deficient matrix. In principle, this implies that one can refine the accuracy of the estimates of the homography coefficients by exploiting the rank constraint. However, the standard rank-projection approach is impractical for two different reasons: it requires many homographies to even score a modest gain; and, secondly, correlations between the errors in the coefficients will lead to poor estimates. In this paper we study these problems and provide solutions to each. Firstly, we show that the matrices of the homography coefficients can be recast into two parts, each consistent with ranks of only one. This immediately establishes the prospect of realistically (that is, with as few as only three or four homographies) exploiting the redundancies of the homographies over two views. We also tackle the remaining issue: correlated coefficients. We compare our approach with the “gold standard”; that is, non-linear bundle adjustment (initialized from the ground truth estimate—the ideal initialization). The results confirm our theory and show one can implement rank-constrained projection and come close to the gold standard in effectiveness. Indeed, our algorithm (by itself), or our algorithm further refined by a bundle adjustment stage; may be a practical algorithm: providing generally better results than the “standard” DLT (direct linear transformation) algorithm, and even better than the bundle adjustment result with the DLT result as the starting point. Our unoptimized version has roughly the same cost as bundle adjustment and yet can generally produce close to the “gold standard” estimate (as illustrated by comparison with bundle adjustment initialized from the ground truth). Independent of the merits or otherwise of our algorithm, we have illuminated why the naive approach of direct rank-projection is relatively doomed to failure. Moreover, in revealing that there are further rank constraints, not previously known; we have added to the understanding of these issues, and this may pave the way for further improvements. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
47. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context.
- Author
-
Shotton, Jamie, Winn, John, Rother, Carsten, and Criminisi, Antonio
- Subjects
ARTIFICIAL intelligence ,COMPUTER vision ,STOCHASTIC processes ,PATTERN recognition systems ,IMAGE processing ,PATTERN perception ,CLASSIFICATION ,DATABASES - Abstract
This paper details a new approach for learning a discriminative model of object classes, incorporating texture, layout, and context information efficiently. The learned model is used for automatic visual understanding and semantic segmentation of photographs. Our discriminative model exploits texture-layout filters, novel features based on textons, which jointly model patterns of texture and their spatial layout. Unary classification and feature selection is achieved using shared boosting to give an efficient classifier which can be applied to a large number of classes. Accurate image segmentation is achieved by incorporating the unary classifier in a conditional random field, which (i) captures the spatial interactions between class labels of neighboring pixels, and (ii) improves the segmentation of specific object instances. Efficient training of the model on large datasets is achieved by exploiting both random feature selection and piecewise training methods. High classification and segmentation accuracy is demonstrated on four varied databases: (i) the MSRC 21-class database containing photographs of real objects viewed under general lighting conditions, poses and viewpoints, (ii) the 7-class Corel subset and (iii) the 7-class Sowerby database used in He et al. (Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 695–702, June ), and (iv) a set of video sequences of television shows. The proposed algorithm gives competitive and visually pleasing results for objects that are highly textured (grass, trees, etc.), highly structured (cars, faces, bicycles, airplanes, etc.), and even articulated (body, cow, etc.). [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
48. A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach.
- Author
-
Paris, Sylvain and Durand, Frédo
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,PATTERN recognition systems ,IMAGE processing ,DIGITAL image processing ,COMPUTERS ,COMPUTER graphics - Abstract
The bilateral filter is a nonlinear filter that smoothes a signal while preserving strong edges. It has demonstrated great effectiveness for a variety of problems in computer vision and computer graphics, and fast versions have been proposed. Unfortunately, little is known about the accuracy of such accelerations. In this paper, we propose a new signal-processing analysis of the bilateral filter which complements the recent studies that analyzed it as a PDE or as a robust statistical estimator. The key to our analysis is to express the filter in a higher-dimensional space where the signal intensity is added to the original domain dimensions. Importantly, this signal-processing perspective allows us to develop a novel bilateral filtering acceleration using downsampling in space and intensity. This affords a principled expression of accuracy in terms of bandwidth and sampling. The bilateral filter can be expressed as linear convolutions in this augmented space followed by two simple nonlinearities. This allows us to derive criteria for downsampling the key operations and achieving important acceleration of the bilateral filter. We show that, for the same running time, our method is more accurate than previous acceleration techniques. Typically, we are able to process a 2 megapixel image using our acceleration technique in less than a second, and have the result be visually similar to the exact computation that takes several tens of minutes. The acceleration is most effective with large spatial kernels. Furthermore, this approach extends naturally to color images and cross bilateral filtering. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
49. Building Blocks for Computer Vision with Stochastic Partial Differential Equations.
- Author
-
Preusser, Tobias, Scharr, Hanno, Krajsek, Kai, and Kirby, Robert
- Subjects
COMPUTER graphics ,STOCHASTIC processes ,STOCHASTIC partial differential equations ,COMPUTER vision ,ARTIFICIAL intelligence ,BASES (Linear topological spaces) ,IMAGE processing ,RANDOM walks ,ESTIMATION theory - Abstract
We discuss the basic concepts of computer vision with stochastic partial differential equations (SPDEs). In typical approaches based on partial differential equations (PDEs), the end result in the best case is usually one value per pixel, the “expected” value. Error estimates or even full probability density functions PDFs are usually not available. This paper provides a framework allowing one to derive such PDFs, rendering computer vision approaches into measurements fulfilling scientific standards due to full error propagation. We identify the image data with random fields in order to model images and image sequences which carry uncertainty in their gray values, e.g. due to noise in the acquisition process. The noisy behaviors of gray values is modeled as stochastic processes which are approximated with the method of generalized polynomial chaos (Wiener-Askey-Chaos). The Wiener-Askey polynomial chaos is combined with a standard spatial approximation based upon piecewise multi-linear finite elements. We present the basic building blocks needed for computer vision and image processing in this stochastic setting, i.e. we discuss the computation of stochastic moments, projections, gradient magnitudes, edge indicators, structure tensors, etc. Finally we show applications of our framework to derive stochastic analogs of well known PDEs for de-noising and optical flow extraction. These models are discretized with the stochastic Galerkin method. Our selection of SPDE models allows us to draw connections to the classical deterministic models as well as to stochastic image processing not based on PDEs. Several examples guide the reader through the presentation and show the usefulness of the framework. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
50. Limits of Learning-Based Superresolution Algorithms.
- Author
-
Lin, Zhouchen, He, Junfeng, Tang, Xiaoou, and Tang, Chi-Keung
- Subjects
COMPUTER vision ,PATTERN recognition systems ,ARTIFICIAL intelligence ,DIGITAL image processing ,IMAGE registration ,IMAGE processing ,IMAGING systems ,FEATURE extraction - Abstract
Learning-based superresolution (SR) is a popular SR technique that uses application dependent priors to infer the missing details in low resolution images (LRIs). However, their performance still deteriorates quickly when the magnification factor is only moderately large. This leads us to an important problem: “Do limits of learning-based SR algorithms exist?” This paper is the first attempt to shed some light on this problem when the SR algorithms are designed for general natural images. We first define an expected risk for the SR algorithms that is based on the root mean squared error between the superresolved images and the ground truth images. Then utilizing the statistics of general natural images, we derive a closed form estimate of the lower bound of the expected risk. The lower bound only involves the covariance matrix and the mean vector of the high resolution images (HRIs) and hence can be computed by sampling real images. We also investigate the sufficient number of samples to guarantee an accurate estimate of the lower bound. By computing the curve of the lower bound w.r.t. the magnification factor, we could estimate the limits of learning-based SR algorithms, at which the lower bound of the expected risk exceeds a relatively large threshold. We perform experiments to validate our theory. And based on our observations we conjecture that the limits may be independent of the size of either the LRIs or the HRIs. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.