Author: "Kosecka, Jana" / Publication Type: Electronic Resources - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Kosecka, Jana"' showing total 45 results

Start Over Author "Kosecka, Jana" Publication Type Electronic Resources

45 results on '"Kosecka, Jana"'

1. Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM

Author: Rajabi, Navid, Kosecka, Jana, Rajabi, Navid, and Kosecka, Jana
Abstract: Vision and Language Models (VLMs) continue to demonstrate remarkable zero-shot (ZS) performance across various tasks. However, many probing studies have revealed that even the best-performing VLMs struggle to capture aspects of compositional scene understanding, lacking the ability to properly ground and localize linguistic phrases in images. Recent VLM advancements include scaling up both model and dataset sizes, additional training objectives and levels of supervision, and variations in the model architectures. To characterize the grounding ability of VLMs, such as phrase grounding, referring expressions comprehension, and relationship understanding, Pointing Game has been used as an evaluation metric for datasets with bounding box annotations. In this paper, we introduce a novel suite of quantitative metrics that utilize GradCAM activations to rigorously evaluate the grounding capabilities of pre-trained VLMs like CLIP, BLIP, and ALBEF. These metrics offer an explainable and quantifiable approach for a more detailed comparison of the zero-shot capabilities of VLMs and enable measuring models' grounding uncertainty. This characterization reveals interesting tradeoffs between the size of the model, the dataset size, and their performance., Comment: Accepted to CVPR 2024, Second Workshop on Foundation Models (WFM)
Published: 2024

2. Graph-CoVis: GNN-based Multi-view Panorama Global Pose Estimation

Author: Nejatishahidin, Negar, Hutchcroft, Will, Narayana, Manjunath, Boyadzhiev, Ivaylo, Li, Yuguang, Khosravan, Naji, Kosecka, Jana, Kang, Sing Bing, Nejatishahidin, Negar, Hutchcroft, Will, Narayana, Manjunath, Boyadzhiev, Ivaylo, Li, Yuguang, Khosravan, Naji, Kosecka, Jana, and Kang, Sing Bing
Abstract: In this paper, we address the problem of wide-baseline camera pose estimation from a group of 360$^\circ$ panoramas under upright-camera assumption. Recent work has demonstrated the merit of deep-learning for end-to-end direct relative pose regression in 360$^\circ$ panorama pairs [11]. To exploit the benefits of multi-view logic in a learning-based framework, we introduce Graph-CoVis, which non-trivially extends CoVisPose [11] from relative two-view to global multi-view spherical camera pose estimation. Graph-CoVis is a novel Graph Neural Network based architecture that jointly learns the co-visible structure and global motion in an end-to-end and fully-supervised approach. Using the ZInD [4] dataset, which features real homes presenting wide-baselines, occlusion, and limited visual overlap, we show that our model performs competitively to state-of-the-art approaches.
Published: 2023

3. U2RLE: Uncertainty-Guided 2-Stage Room Layout Estimation

Author: Fayyazsanavi, Pooya, Wan, Zhiqiang, Hutchcroft, Will, Boyadzhiev, Ivaylo, Li, Yuguang, Kosecka, Jana, Kang, Sing Bing, Fayyazsanavi, Pooya, Wan, Zhiqiang, Hutchcroft, Will, Boyadzhiev, Ivaylo, Li, Yuguang, Kosecka, Jana, and Kang, Sing Bing
Abstract: While the existing deep learning-based room layout estimation techniques demonstrate good overall accuracy, they are less effective for distant floor-wall boundary. To tackle this problem, we propose a novel uncertainty-guided approach for layout boundary estimation introducing new two-stage CNN architecture termed U2RLE. The initial stage predicts both floor-wall boundary and its uncertainty and is followed by the refinement of boundaries with high positional uncertainty using a different, distance-aware loss. Finally, outputs from the two stages are merged to produce the room layout. Experiments using ZInD and Structure3D datasets show that U2RLE improves over current state-of-the-art, being able to handle both near and far walls better. In particular, U2RLE outperforms current state-of-the-art techniques for the most distant walls., Comment: To be Appear on CVPR 2023
Published: 2023

4. Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models

Author: Fayyazsanavi, Pooya, Nejatishahidin, Negar, Kosecka, Jana, Fayyazsanavi, Pooya, Nejatishahidin, Negar, and Kosecka, Jana
Abstract: We address the task of American Sign Language fingerspelling translation using videos in the wild. We exploit advances in more accurate hand pose estimation and propose a novel architecture that leverages the transformer based encoder-decoder model enabling seamless contextual word translation. The translation model is augmented by a novel loss term that accurately predicts the length of the finger-spelled word, benefiting both training and inference. We also propose a novel two-stage inference approach that re-ranks the hypotheses using the language model capabilities of the decoder. Through extensive experiments, we demonstrate that our proposed method outperforms the state-of-the-art models on ChicagoFSWild and ChicagoFSWild+ achieving more than 10% relative improvement in performance. Our findings highlight the effectiveness of our approach and its potential to advance fingerspelling recognition in sign language translation. Code is also available at https://github.com/pooyafayyaz/Fingerspelling-PoseNet., Comment: WACV 2024
Published: 2023

5. Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models

Author: Li, Yimeng, Rajabi, Navid, Shrestha, Sulabh, Reza, Md Alimoor, Kosecka, Jana, Li, Yimeng, Rajabi, Navid, Shrestha, Sulabh, Reza, Md Alimoor, and Kosecka, Jana
Abstract: The image annotation stage is a critical and often the most time-consuming part required for training and evaluating object detection and semantic segmentation models. Deployment of the existing models in novel environments often requires detecting novel semantic classes not present in the training data. Furthermore, indoor scenes contain significant viewpoint variations, which need to be handled properly by trained perception models. We propose to leverage the recent advancements in state-of-the-art models for bottom-up segmentation (SAM), object detection (Detic), and semantic segmentation (MaskFormer), all trained on large-scale datasets. We aim to develop a cost-effective labeling approach to obtain pseudo-labels for semantic segmentation and object instance detection in indoor environments, with the ultimate goal of facilitating the training of lightweight models for various downstream tasks. We also propose a multi-view labeling fusion stage, which considers the setting where multiple views of the scenes are available and can be used to identify and rectify single-view inconsistencies. We demonstrate the effectiveness of the proposed approach on the Active Vision dataset and the ADE20K dataset. We evaluate the quality of our labeling process by comparing it with human annotations. Also, we demonstrate the effectiveness of the obtained labels in downstream tasks such as object goal navigation and part discovery. In the context of object goal navigation, we depict enhanced performance using this fusion approach compared to a zero-shot baseline that utilizes large monolithic vision-language pre-trained models.
Published: 2023

6. Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models

Author: Rajabi, Navid, Kosecka, Jana, Rajabi, Navid, and Kosecka, Jana
Abstract: Large vision-and-language models (VLMs) trained to match images with text on large-scale datasets of image-text pairs have shown impressive generalization ability on several vision and language tasks. Several recent works, however, showed that these models lack fine-grained understanding, such as the ability to count and recognize verbs, attributes, or relationships. The focus of this work is to study the understanding of spatial relations. This has been tackled previously using image-text matching (e.g., Visual Spatial Reasoning benchmark) or visual question answering (e.g., GQA or VQAv2), both showing poor performance and a large gap compared to human performance. In this work, we show qualitatively (using explainability tools) and quantitatively (using object detectors) that the poor object localization "grounding" ability of the models is a contributing factor to the poor image-text matching performance. We propose an alternative fine-grained, compositional approach for recognizing and ranking spatial clauses that combines the evidence from grounding noun phrases corresponding to objects and their locations to compute the final rank of the spatial clause. We demonstrate the approach on representative VLMs (such as LXMERT, GPV, and MDETR) and compare and highlight their abilities to reason about spatial relationships., Comment: Accepted to DMLR @ ICLR 2024
Published: 2023

7. Comparison of Model-Free and Model-Based Learning-Informed Planning for PointGoal Navigation

Author: Li, Yimeng, Debnath, Arnab, Stein, Gregory J., Kosecka, Jana, Li, Yimeng, Debnath, Arnab, Stein, Gregory J., and Kosecka, Jana
Abstract: In recent years several learning approaches to point goal navigation in previously unseen environments have been proposed. They vary in the representations of the environments, problem decomposition, and experimental evaluation. In this work, we compare the state-of-the-art Deep Reinforcement Learning based approaches with Partially Observable Markov Decision Process (POMDP) formulation of the point goal navigation problem. We adapt the (POMDP) sub-goal framework proposed by [1] and modify the component that estimates frontier properties by using partial semantic maps of indoor scenes built from images' semantic segmentation. In addition to the well-known completeness of the model-based approach, we demonstrate that it is robust and efficient in that it leverages informative, learned properties of the frontiers compared to an optimistic frontier-based planner. We also demonstrate its data efficiency compared to the end-to-end deep reinforcement learning approaches. We compare our results against an optimistic planner, ANS and DD-PPO on Matterport3D dataset using the Habitat Simulator. We show comparable, though slightly worse performance than the SOTA DD-PPO approach, yet with far fewer data., Comment: arXiv admin note: text overlap with arXiv:2211.07898
Published: 2022

8. Learning-Augmented Model-Based Planning for Visual Exploration

Author: Li, Yimeng, Debnath, Arnab, Stein, Gregory, Kosecka, Jana, Li, Yimeng, Debnath, Arnab, Stein, Gregory, and Kosecka, Jana
Abstract: We consider the problem of time-limited robotic exploration in previously unseen environments where exploration is limited by a predefined amount of time. We propose a novel exploration approach using learning-augmented model-based planning. We generate a set of subgoals associated with frontiers on the current map and derive a Bellman Equation for exploration with these subgoals. Visual sensing and advances in semantic mapping of indoor scenes are exploited for training a deep convolutional neural network to estimate properties associated with each frontier: the expected unobserved area beyond the frontier and the expected timesteps (discretized actions) required to explore it. The proposed model-based planner is guaranteed to explore the whole scene if time permits. We thoroughly evaluate our approach on a large-scale pseudo-realistic indoor dataset (Matterport3D) with the Habitat simulator. We compare our approach with classical and more recent RL-based exploration methods. Our approach surpasses the greedy strategies by 2.1% and the RL-based exploration methods by 8.4% in terms of coverage., Comment: Accepted to IROS 2023
Published: 2022

9. Self-supervised Pre-training for Semantic Segmentation in an Indoor Scene

Author: Shrestha, Sulabh, Li, Yimeng, Kosecka, Jana, Shrestha, Sulabh, Li, Yimeng, and Kosecka, Jana
Abstract: The ability to endow maps of indoor scenes with semantic information is an integral part of robotic agents which perform different tasks such as target driven navigation, object search or object rearrangement. The state-of-the-art methods use Deep Convolutional Neural Networks (DCNNs) for predicting semantic segmentation of an image as useful representation for these tasks. The accuracy of semantic segmentation depends on the availability and the amount of labeled data from the target environment or the ability to bridge the domain gap between test and training environment. We propose RegConsist, a method for self-supervised pre-training of a semantic segmentation model, exploiting the ability of the agent to move and register multiple views in the novel environment. Given the spatial and temporal consistency cues used for pixel level data association, we use a variant of contrastive learning to train a DCNN model for predicting semantic segmentation from RGB views in the target environment. The proposed method outperforms models pre-trained on ImageNet and achieves competitive performance when using models that are trained for exactly the same task but on a different dataset. We also perform various ablation studies to analyze and demonstrate the efficacy of our proposed method.
Published: 2022

10. Using Unmanned Aerial Systems (UAS) for Assessing and Monitoring Fall Hazard Prevention Systems in High-rise Building Projects

Author: Li, Yimeng, Esmaeili, Behzad, Gheisari, Masoud, Kosecka, Jana, Rashidi, Abbas, Li, Yimeng, Esmaeili, Behzad, Gheisari, Masoud, Kosecka, Jana, and Rashidi, Abbas
Abstract: This study develops a framework for unmanned aerial systems (UASs) to monitor fall hazard prevention systems near unprotected edges and openings in high-rise building projects. A three-step machine-learning-based framework was developed and tested to detect guardrail posts from the images captured by UAS. First, a guardrail detector was trained to localize the candidate locations of posts supporting the guardrail. Since images were used in this process collected from an actual job site, several false detections were identified. Therefore, additional constraints were introduced in the following steps to filter out false detections. Second, the research team applied a horizontal line detector to the image to properly detect floors and remove the detections that were not close to the floors. Finally, since the guardrail posts are installed with approximately normal distribution between each post, the space between them was estimated and used to find the most likely distance between the two posts. The research team used various combinations of the developed approaches to monitor guardrail systems in the captured images from a high-rise building project. Comparing the precision and recall metrics indicated that the cascade classifier achieves better performance with floor detection and guardrail spacing estimation. The research outcomes illustrate that the proposed guardrail recognition system can improve the assessment of guardrails and facilitate the safety engineer's task of identifying fall hazards in high-rise building projects.
Published: 2022

11. Object Pose Estimation using Mid-level Visual Representations

Author: Nejatishahidin, Negar, Fayyazsanavi, Pooya, Kosecka, Jana, Nejatishahidin, Negar, Fayyazsanavi, Pooya, and Kosecka, Jana
Abstract: This work proposes a novel pose estimation model for object categories that can be effectively transferred to previously unseen environments. The deep convolutional network models (CNN) for pose estimation are typically trained and evaluated on datasets specifically curated for object detection, pose estimation, or 3D reconstruction, which requires large amounts of training data. In this work, we propose a model for pose estimation that can be trained with small amount of data and is built on the top of generic mid-level representations \cite{taskonomy2018} (e.g. surface normal estimation and re-shading). These representations are trained on a large dataset without requiring pose and object annotations. Later on, the predictions are refined with a small CNN neural network that exploits object masks and silhouette retrieval. The presented approach achieves superior performance on the Pix3D dataset \cite{pix3d} and shows nearly 35\% improvement over the existing models when only 25\% of the training data is available. We show that the approach is favorable when it comes to generalization and transfer to novel environments. Towards this end, we introduce a new pose estimation benchmark for commonly encountered furniture categories on challenging Active Vision Dataset \cite{Ammirato2017ADF} and evaluated the models trained on the Pix3D dataset.
Published: 2022

12. Uncertainty Aware Proposal Segmentation for Unknown Object Detection

Author: Li, Yimeng, Kosecka, Jana, Li, Yimeng, and Kosecka, Jana
Abstract: Recent efforts in deploying Deep Neural Networks for object detection in real world applications, such as autonomous driving, assume that all relevant object classes have been observed during training. Quantifying the performance of these models in settings when the test data is not represented in the training set has mostly focused on pixel-level uncertainty estimation techniques of models trained for semantic segmentation. This paper proposes to exploit additional predictions of semantic segmentation models and quantifying its confidences, followed by classification of object hypotheses as known vs. unknown, out of distribution objects. We use object proposals generated by Region Proposal Network (RPN) and adapt distance aware uncertainty estimation of semantic segmentation using Radial Basis Functions Networks (RBFN) for class agnostic object mask prediction. The augmented object proposals are then used to train a classifier for known vs. unknown objects categories. Experimental results demonstrate that the proposed method achieves parallel performance to state of the art methods for unknown object detection and can also be used effectively for reducing object detectors' false positive rate. Our method is well suited for applications where prediction of non-object background categories obtained by semantic segmentation is reliable., Comment: Accepted to WACV 2022 DNOW Workshop
Published: 2021

13. Diverse Knowledge Distillation (DKD): A Solution for Improving The Robustness of Ensemble Models Against Adversarial Attacks

Author: Mirzaeian, Ali, Kosecka, Jana, Homayoun, Houman, Mohsenin, Tinoosh, Sasan, Avesta, Mirzaeian, Ali, Kosecka, Jana, Homayoun, Houman, Mohsenin, Tinoosh, and Sasan, Avesta
Abstract: This paper proposes an ensemble learning model that is resistant to adversarial attacks. To build resilience, we introduced a training process where each member learns a radically distinct latent space. Member models are added one at a time to the ensemble. Simultaneously, the loss function is regulated by a reverse knowledge distillation, forcing the new member to learn different features and map to a latent space safely distanced from those of existing members. We assessed the security and performance of the proposed solution on image classification tasks using CIFAR10 and MNIST datasets and showed security and performance improvement compared to the state of the art defense methods.
Published: 2020

14. FineHand: Learning Hand Shapes for American Sign Language Recognition

Author: Hosain, Al Amin, Santhalingam, Panneer Selvam, Pathak, Parth, Rangwala, Huzefa, Kosecka, Jana, Hosain, Al Amin, Santhalingam, Panneer Selvam, Pathak, Parth, Rangwala, Huzefa, and Kosecka, Jana
Abstract: American Sign Language recognition is a difficult gesture recognition problem, characterized by fast, highly articulate gestures. These are comprised of arm movements with different hand shapes, facial expression and head movements. Among these components, hand shape is the vital, often the most discriminative part of a gesture. In this work, we present an approach for effective learning of hand shape embeddings, which are discriminative for ASL gestures. For hand shape recognition our method uses a mix of manually labelled hand shapes and high confidence predictions to train deep convolutional neural network (CNN). The sequential gesture component is captured by recursive neural network (RNN) trained on the embeddings learned in the first stage. We will demonstrate that higher quality hand shape models can significantly improve the accuracy of final video gesture classification in challenging conditions with variety of speakers, different illumination and significant motion blurr. We compare our model to alternative approaches exploiting different modalities and representations of the data and show improved video gesture recognition accuracy on GMU-ASL51 benchmark dataset
Published: 2020

15. Generative Multi-Stream Architecture For American Sign Language Recognition

Author: Huh, Dom, Gurrapu, Sai, Olson, Frederick, Rangwala, Huzefa, Pathak, Parth, Kosecka, Jana, Huh, Dom, Gurrapu, Sai, Olson, Frederick, Rangwala, Huzefa, Pathak, Parth, and Kosecka, Jana
Abstract: With advancements in deep model architectures, tasks in computer vision can reach optimal convergence provided proper data preprocessing and model parameter initialization. However, training on datasets with low feature-richness for complex applications limit and detriment optimal convergence below human performance. In past works, researchers have provided external sources of complementary data at the cost of supplementary hardware, which are fed in streams to counteract this limitation and boost performance. We propose a generative multi-stream architecture, eliminating the need for additional hardware with the intent to improve feature richness without risking impracticability. We also introduce the compact spatio-temporal residual block to the standard 3-dimensional convolutional model, C3D. Our rC3D model performs comparatively to the top C3D residual variant architecture, the pseudo-3D model, on the FASL-RGB dataset. Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.
Published: 2020

16. Hierarchical Kinematic Human Mesh Recovery

Author: Georgakis, Georgios, Li, Ren, Karanam, Srikrishna, Chen, Terrence, Kosecka, Jana, Wu, Ziyan, Georgakis, Georgios, Li, Ren, Karanam, Srikrishna, Chen, Terrence, Kosecka, Jana, and Wu, Ziyan
Abstract: We consider the problem of estimating a parametric model of 3D human mesh from a single image. While there has been substantial recent progress in this area with direct regression of model parameters, these methods only implicitly exploit the human body kinematic structure, leading to sub-optimal use of the model prior. In this work, we address this gap by proposing a new technique for regression of human parametric model that is explicitly informed by the known hierarchical structure, including joint interdependencies of the model. This results in a strong prior-informed design of the regressor architecture and an associated hierarchical optimization that is flexible to be used in conjunction with the current standard frameworks for 3D human mesh recovery. We demonstrate these aspects by means of extensive experiments on standard benchmark datasets, showing how our proposed new design outperforms several existing and popular methods, establishing new state-of-the-art results. By considering joint interdependencies, our method is equipped to infer joints even under data corruptions, which we demonstrate by conducting experiments under varying degrees of occlusion., Comment: 17 pages, 8 figures, 5 tables, ECCV 2020
Published: 2020

17. Learning View and Target Invariant Visual Servoing for Navigation

Author: Li, Yimeng, Kosecka, Jana, Li, Yimeng, and Kosecka, Jana
Abstract: The advances in deep reinforcement learning recently revived interest in data-driven learning based approaches to navigation. In this paper we propose to learn viewpoint invariant and target invariant visual servoing for local mobile robot navigation; given an initial view and the goal view or an image of a target, we train deep convolutional network controller to reach the desired goal. We present a new architecture for this task which rests on the ability of establishing correspondences between the initial and goal view and novel reward structure motivated by the traditional feedback control error. The advantage of the proposed model is that it does not require calibration and depth information and achieves robust visual servoing in a variety of environments and targets without any parameter fine tuning. We present comprehensive evaluation of the approach and comparison with other deep learning architectures as well as classical visual servoing methods in visually realistic simulation environment. The presented model overcomes the brittleness of classical visual servoing based methods and achieves significantly higher generalization capability compared to the previous learning approaches., Comment: Accepted to ICRA 2020
Published: 2020

18. Simultaneous Mapping and Target Driven Navigation

Author: Georgakis, Georgios, Li, Yimeng, Kosecka, Jana, Georgakis, Georgios, Li, Yimeng, and Kosecka, Jana
Abstract: This work presents a modular architecture for simultaneous mapping and target driven navigation in indoors environments. The semantic and appearance stored in 2.5D map is distilled from RGB images, semantic segmentation and outputs of object detectors by convolutional neural networks. Given this representation, the mapping module learns to localize the agent and register consecutive observations in the map. The navigation task is then formulated as a problem of learning a policy for reaching semantic targets using current observations and the up-to-date map. We demonstrate that the use of semantic information improves localization accuracy and the ability of storing spatial semantic map aids the target driven navigation policy. The two modules are evaluated separately and jointly on Active Vision Dataset and Matterport3D environments, demonstrating improved performance on both localization and navigation tasks.
Published: 2019

19. Sign Language Recognition Analysis using Multimodal Data

Author: Hosain, Al Amin, Santhalingam, Panneer Selvam, Pathak, Parth, Kosecka, Jana, Rangwala, Huzefa, Hosain, Al Amin, Santhalingam, Panneer Selvam, Pathak, Parth, Kosecka, Jana, and Rangwala, Huzefa
Abstract: Voice-controlled personal and home assistants (such as the Amazon Echo and Apple Siri) are becoming increasingly popular for a variety of applications. However, the benefits of these technologies are not readily accessible to Deaf or Hard-ofHearing (DHH) users. The objective of this study is to develop and evaluate a sign recognition system using multiple modalities that can be used by DHH signers to interact with voice-controlled devices. With the advancement of depth sensors, skeletal data is used for applications like video analysis and activity recognition. Despite having similarity with the well-studied human activity recognition, the use of 3D skeleton data in sign language recognition is rare. This is because unlike activity recognition, sign language is mostly dependent on hand shape pattern. In this work, we investigate the feasibility of using skeletal and RGB video data for sign language recognition using a combination of different deep learning architectures. We validate our results on a large-scale American Sign Language (ASL) dataset of 12 users and 13107 samples across 51 signs. It is named as GMUASL51. We collected the dataset over 6 months and it will be publicly released in the hope of spurring further machine learning research towards providing improved accessibility for digital assistants., Comment: conference : IEEE DSAA, 2019, Washington DC
Published: 2019

20. Learning Local RGB-to-CAD Correspondences for Object Pose Estimation

Author: Georgakis, Georgios, Karanam, Srikrishna, Wu, Ziyan, Kosecka, Jana, Georgakis, Georgios, Karanam, Srikrishna, Wu, Ziyan, and Kosecka, Jana
Abstract: We consider the problem of 3D object pose estimation. While much recent work has focused on the RGB domain, the reliance on accurately annotated images limits their generalizability and scalability. On the other hand, the easily available CAD models of objects are rich sources of data, providing a large number of synthetically rendered images. In this paper, we solve this key problem of existing methods requiring expensive 3D pose annotations by proposing a new method that matches RGB images to CAD models for object pose estimation. Our key innovations compared to existing work include removing the need for either real-world textures for CAD models or explicit 3D pose annotations for RGB images. We achieve this through a series of objectives that learn how to select keypoints and enforce viewpoint and modality invariance across RGB images and CAD model renderings. We conduct extensive experiments to demonstrate that the proposed method can reliably estimate object pose in RGB images, as well as generalize to object instances not seen during training., Comment: 10 pages, 6 figures, 4 tables, ICCV 2019
Published: 2018

21. On Evaluation of Embodied Navigation Agents

Author: Anderson, Peter, Chang, Angel, Chaplot, Devendra Singh, Dosovitskiy, Alexey, Gupta, Saurabh, Koltun, Vladlen, Kosecka, Jana, Malik, Jitendra, Mottaghi, Roozbeh, Savva, Manolis, Zamir, Amir R., Anderson, Peter, Chang, Angel, Chaplot, Devendra Singh, Dosovitskiy, Alexey, Gupta, Saurabh, Koltun, Vladlen, Kosecka, Jana, Malik, Jitendra, Mottaghi, Roozbeh, Savva, Manolis, and Zamir, Amir R.
Abstract: Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence. The past two years have seen a surge of creative work on navigation. This creative output has produced a plethora of sometimes incompatible task definitions and evaluation protocols. To coordinate ongoing and future research in this area, we have convened a working group to study empirical methodology in navigation research. The present document summarizes the consensus recommendations of this working group. We discuss different problem statements and the role of generalization, present evaluation measures, and provide standard scenarios that can be used for benchmarking., Comment: Report of a working group on empirical methodology in navigation research. Authors are listed in alphabetical order
Published: 2018

22. Visual Representations for Semantic Target Driven Navigation

Author: Mousavian, Arsalan, Toshev, Alexander, Fiser, Marek, Kosecka, Jana, Wahid, Ayzaan, Davidson, James, Mousavian, Arsalan, Toshev, Alexander, Fiser, Marek, Kosecka, Jana, Wahid, Ayzaan, and Davidson, James
Abstract: What is a good visual representation for autonomous agents? We address this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a complex environment to a target object, e.g. go to the refrigerator. Instead of acquiring a metric semantic map of an environment and using planning for navigation, our approach learns navigation policies on top of representations that capture spatial layout and semantic contextual cues. We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy. This choice allows using additional data, from orthogonal sources, to better train different parts of the model the representation extraction is trained on large standard vision datasets while the navigation component leverages large synthetic environments for training. This combination of real and synthetic is possible because equitable feature representations are available in both (e.g., segmentation and detection masks), which alleviates the need for domain adaptation. Both the representation and the navigation policy can be readily applied to real non-synthetic environments as demonstrated on the Active Vision Dataset [1]. Our approach gets successfully to the target in 54% of the cases in unexplored environments, compared to 46% for non-learning based approach, and 28% for the learning-based baseline., Comment: Accepted to ICRA 2019 and ECCV 2018 Workshop on Visual Learning and Embodied Agents in Simulation Environments
Published: 2018

23. Self-supervisory Signals for Object Discovery and Detection

Author: Pot, Etienne, Toshev, Alexander, Kosecka, Jana, Pot, Etienne, Toshev, Alexander, and Kosecka, Jana
Abstract: In robotic applications, we often face the challenge of discovering new objects while having very little or no labelled training data. In this paper we explore the use of self-supervision provided by a robot traversing an environment to learn representations of encountered objects. Knowledge of ego-motion and depth perception enables the agent to effectively associate multiple object proposals, which serve as training data for learning object representations from unlabelled images. We demonstrate the utility of this representation in two ways. First, we can automatically discover objects by performing clustering in the learned embedding space. Each resulting cluster contains examples of one instance seen from various viewpoints and scales. Second, given a small number of labeled images, we can efficiently learn detectors for these labels. In the few-shot regime, these detectors have a substantially higher mAP of 0.22 compared to 0.12 of off-the-shelf standard detectors trained on this limited data. Thus, the proposed self-supervision results in effective environment specific object discovery and detection at no or very small human labeling cost.
Published: 2018

24. Target Driven Instance Detection

Author: Ammirato, Phil, Fu, Cheng-Yang, Shvets, Mykhailo, Kosecka, Jana, Berg, Alexander C., Ammirato, Phil, Fu, Cheng-Yang, Shvets, Mykhailo, Kosecka, Jana, and Berg, Alexander C.
Abstract: While state-of-the-art general object detectors are getting better and better, there are not many systems specifically designed to take advantage of the instance detection problem. For many applications, such as household robotics, a system may need to recognize a few very specific instances at a time. Speed can be critical in these applications, as can the need to recognize previously unseen instances. We introduce a Target Driven Instance Detector(TDID), which modifies existing general object detectors for the instance recognition setting. TDID not only improves performance on instances seen during training, with a fast runtime, but is also able to generalize to detect novel instances.
Published: 2018

25. End-to-end learning of keypoint detector and descriptor for pose invariant 3D matching

Author: Georgakis, Georgios, Karanam, Srikrishna, Wu, Ziyan, Ernst, Jan, Kosecka, Jana, Georgakis, Georgios, Karanam, Srikrishna, Wu, Ziyan, Ernst, Jan, and Kosecka, Jana
Abstract: Finding correspondences between images or 3D scans is at the heart of many computer vision and image retrieval applications and is often enabled by matching local keypoint descriptors. Various learning approaches have been applied in the past to different stages of the matching pipeline, considering detector, descriptor, or metric learning objectives. These objectives were typically addressed separately and most previous work has focused on image data. This paper proposes an end-to-end learning framework for keypoint detection and its representation (descriptor) for 3D depth maps or 3D scans, where the two can be jointly optimized towards task-specific objectives without a need for separate annotations. We employ a Siamese architecture augmented by a sampling layer and a novel score loss function which in turn affects the selection of region proposals. The positive and negative examples are obtained automatically by sampling corresponding region proposals based on their consistency with known 3D pose labels. Matching experiments with depth data on multiple benchmark datasets demonstrate the efficacy of the proposed approach, showing significant improvements over state-of-the-art methods., Comment: 9 pages, 9 figures, 3 tables, CVPR 2018
Published: 2018

26. Dense Piecewise Planar RGB-D SLAM for Indoor Environments

Author: Le, Phi-Hung, Kosecka, Jana, Le, Phi-Hung, and Kosecka, Jana
Abstract: The paper exploits weak Manhattan constraints to parse the structure of indoor environments from RGB-D video sequences in an online setting. We extend the previous approach for single view parsing of indoor scenes to video sequences and formulate the problem of recovering the floor plan of the environment as an optimal labeling problem solved using dynamic programming. The temporal continuity is enforced in a recursive setting, where labeling from previous frames is used as a prior term in the objective function. In addition to recovery of piecewise planar weak Manhattan structure of the extended environment, the orthogonality constraints are also exploited by visual odometry and pose graph optimization. This yields reliable estimates in the presence of large motions and absence of distinctive features to track. We evaluate our method on several challenging indoors sequences demonstrating accurate SLAM and dense mapping of low texture environments. On existing TUM benchmark we achieve competitive results with the alternative approaches which fail in our environments., Comment: International Conference on Intelligent Robots and Systems (IROS) 2017
Published: 2017

27. A Dataset for Developing and Benchmarking Active Vision

Author: Ammirato, Phil, Poirson, Patrick, Park, Eunbyung, Kosecka, Jana, Berg, Alexander C., Ammirato, Phil, Poirson, Patrick, Park, Eunbyung, Kosecka, Jana, and Berg, Alexander C.
Abstract: We present a new public dataset with a focus on simulating robotic vision tasks in everyday indoor environments using real imagery. The dataset includes 20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely captured in 9 unique scenes. We train a fast object category detector for instance detection on our data. Using the dataset we show that, although increasingly accurate and fast, the state of the art for object detection is still severely impacted by object scale, occlusion, and viewing direction all of which matter for robotics applications. We next validate the dataset for simulating active vision, and use the dataset to develop and evaluate a deep-network-based system for next best move prediction for object classification using reinforcement learning. Our dataset is available for download at cs.unc.edu/~ammirato/active_vision_dataset_website/., Comment: To appear at ICRA 2017
Published: 2017

28. Synthesizing Training Data for Object Detection in Indoor Scenes

Author: Georgakis, Georgios, Mousavian, Arsalan, Berg, Alexander C., Kosecka, Jana, Georgakis, Georgios, Mousavian, Arsalan, Berg, Alexander C., and Kosecka, Jana
Abstract: Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training data which is time consuming and costly to obtain. In this work we explore the ability of using synthetically generated composite images for training state-of-the-art object detectors, especially for object instance detection. We superimpose 2D images of textured object models into images of real environments at variety of locations and scales. Our experiments evaluate different superimposition strategies ranging from purely image-based blending all the way to depth and semantics informed positioning of the object models into real scenes. We demonstrate the effectiveness of these object detector training strategies on two publicly available datasets, the GMU-Kitchens and the Washington RGB-D Scenes v2. As one observation, augmenting some hand-labeled training data with synthetic examples carefully composed onto scenes yields object detectors with comparable performance to using much more hand-labeled data. Broadly, this work charts new opportunities for training detectors for new objects by exploiting existing object model repositories in either a purely automatic fashion or with only a very small number of human-annotated examples., Comment: Added more experiments and link to project webpage
Published: 2017

29. 3D Bounding Box Estimation Using Deep Learning and Geometry

Author: Mousavian, Arsalan, Anguelov, Dragomir, Flynn, John, Kosecka, Jana, Mousavian, Arsalan, Anguelov, Dragomir, Flynn, John, and Kosecka, Jana
Abstract: We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D bounding box. The first network output estimates the 3D object orientation using a novel hybrid discrete-continuous loss, which significantly outperforms the L2 loss. The second output regresses the 3D object dimensions, which have relatively little variance compared to alternatives and can often be predicted for many object types. These estimates, combined with the geometric constraints on translation imposed by the 2D bounding box, enable us to recover a stable and accurate 3D object pose. We evaluate our method on the challenging KITTI object detection benchmark both on the official metric of 3D orientation estimation and also on the accuracy of the obtained 3D bounding boxes. Although conceptually simple, our method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and sub-category detection. Our discrete-continuous loss also produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset., Comment: To appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
Published: 2016

30. Multiview RGB-D Dataset for Object Instance Detection

Author: Georgakis, Georgios, Reza, Md Alimoor, Mousavian, Arsalan, Le, Phi-Hung, Kosecka, Jana, Georgakis, Georgios, Reza, Md Alimoor, Mousavian, Arsalan, Le, Phi-Hung, and Kosecka, Jana
Abstract: This paper presents a new multi-view RGB-D dataset of nine kitchen scenes, each containing several objects in realistic cluttered environments including a subset of objects from the BigBird dataset. The viewpoints of the scenes are densely sampled and objects in the scenes are annotated with bounding boxes and in the 3D point cloud. Also, an approach for detection and recognition is presented, which is comprised of two parts: i) a new multi-view 3D proposal generation method and ii) the development of several recognition baselines using AlexNet to score our proposals, which is trained either on crops of the dataset or on synthetically composited training images. Finally, we compare the performance of the object proposals and a detection baseline to the Washington RGB-D Scenes (WRGB-D) dataset and demonstrate that our Kitchen scenes dataset is more challenging for object detection and recognition. The dataset is available at: http://cs.gmu.edu/~robot/gmu-kitchens.html.
Published: 2016

31. Fast Single Shot Detection and Pose Estimation

Author: Poirson, Patrick, Ammirato, Phil, Fu, Cheng-Yang, Liu, Wei, Kosecka, Jana, Berg, Alexander C., Poirson, Patrick, Ammirato, Phil, Fu, Cheng-Yang, Liu, Wei, Kosecka, Jana, and Berg, Alexander C.
Abstract: For applications in navigation and robotics, estimating the 3D pose of objects is as important as detection. Many approaches to pose estimation rely on detecting or tracking parts or keypoints [11, 21]. In this paper we build on a recent state-of-the-art convolutional network for slidingwindow detection [10] to provide detection and rough pose estimation in a single shot, without intermediate stages of detecting parts or initial bounding boxes. While not the first system to treat pose estimation as a categorization problem, this is the first attempt to combine detection and pose estimation at the same level using a deep learning approach. The key to the architecture is a deep convolutional network where scores for the presence of an object category, the offset for its location, and the approximate pose are all estimated on a regular grid of locations in the image. The resulting system is as accurate as recent work on pose estimation (42.4% 8 View mAVP on Pascal 3D+ [21] ) and significantly faster (46 frames per second (FPS) on a TITAN X GPU). This approach to detection and rough pose estimation is fast and accurate enough to be widely applied as a pre-processing step for tasks including high-accuracy pose estimation, object tracking and localization, and vSLAM.
Published: 2016

32. Semantic Image Based Geolocation Given a Map

Author: Mousavian, Arsalan, Kosecka, Jana, Mousavian, Arsalan, and Kosecka, Jana
Abstract: The problem visual place recognition is commonly used strategy for localization. Most successful appearance based methods typically rely on a large database of views endowed with local or global image descriptors and strive to retrieve the views of the same location. The quality of the results is often affected by the density of the reference views and the robustness of the image representation with respect to viewpoint variations, clutter and seasonal changes. In this work we present an approach for geo-locating a novel view and determining camera location and orientation using a map and a sparse set of geo-tagged reference views. We propose a novel technique for detection and identification of building facades from geo-tagged reference view using the map and geometry of the building facades. We compute the likelihood of camera location and orientation of the query images using the detected landmark (building) identities from reference views, 2D map of the environment, and geometry of building facades. We evaluate our approach for building identification and geo-localization on a new challenging outdoors urban dataset exhibiting large variations in appearance and viewpoint.
Published: 2016

33. Reinforcement Learning for Semantic Segmentation in Indoor Scenes

Author: Reza, Md. Alimoor, Kosecka, Jana, Reza, Md. Alimoor, and Kosecka, Jana
Abstract: Future advancements in robot autonomy and sophistication of robotics tasks rest on robust, efficient, and task-dependent semantic understanding of the environment. Semantic segmentation is the problem of simultaneous segmentation and categorization of a partition of sensory data. The majority of current approaches tackle this using multi-class segmentation and labeling in a Conditional Random Field (CRF) framework or by generating multiple object hypotheses and combining them sequentially. In practical settings, the subset of semantic labels that are needed depend on the task and particular scene and labelling every single pixel is not always necessary. We pursue these observations in developing a more modular and flexible approach to multi-class parsing of RGBD data based on learning strategies for combining independent binary object-vs-background segmentations in place of the usual monolithic multi-label CRF approach. Parameters for the independent binary segmentation models can be learned very efficiently, and the combination strategy---learned using reinforcement learning---can be set independently and can vary over different tasks and environments. Accuracy is comparable to state-of-art methods on a subset of the NYU-V2 dataset of indoor scenes, while providing additional flexibility and modularity.
Published: 2016

34. Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks

Author: Mousavian, Arsalan, Pirsiavash, Hamed, Kosecka, Jana, Mousavian, Arsalan, Pirsiavash, Hamed, and Kosecka, Jana
Abstract: Multi-scale deep CNNs have been used successfully for problems mapping each pixel to a label, such as depth estimation and semantic segmentation. It has also been shown that such architectures are reusable and can be used for multiple tasks. These networks are typically trained independently for each task by varying the output layer(s) and training objective. In this work we present a new model for simultaneous depth estimation and semantic segmentation from a single RGB image. Our approach demonstrates the feasibility of training parts of the model for each task and then fine tuning the full, combined model on both tasks simultaneously using a single loss function. Furthermore we couple the deep CNN with fully connected CRF, which captures the contextual relationships and interactions between the semantic and depth cues improving the accuracy of the final results. The proposed model is trained and evaluated on NYUDepth V2 dataset outperforming the state of the art methods on semantic segmentation and achieving comparable results on the task of depth estimation.
Published: 2016

35. Special issue on robot vision

Author: Kosecka, Jana, Marchand, Eric, Corke, Peter, Kosecka, Jana, Marchand, Eric, and Corke, Peter
Abstract: The International Journal of Robotics Research (IJRR) has a long history of publishing the state-of-the-art in the field of robotic vision. This is the fourth special issue devoted to the topic. Previous special issues were published in 2012 (Volume 31, No. 4), 2010 (Volume 29, Nos 2–3) and 2007 (Volume 26, No. 7, jointly with the International Journal of Computer Vision). In a closely related field was the special issue on Visual Servoing published in IJRR, 2003 (Volume 22, Nos 10–11). These issues nicely summarize the highlights and progress of the past 12 years of research devoted to the use of visual perception for robotics.
Published: 2015

36. Deep Convolutional Features for Image Based Retrieval and Scene Categorization

Author: Mousavian, Arsalan, Kosecka, Jana, Mousavian, Arsalan, and Kosecka, Jana
Abstract: Several recent approaches showed how the representations learned by Convolutional Neural Networks can be repurposed for novel tasks. Most commonly it has been shown that the activation features of the last fully connected layers (fc7 or fc6) of the network, followed by a linear classifier outperform the state-of-the-art on several recognition challenge datasets. Instead of recognition, this paper focuses on the image retrieval problem and proposes a examines alternative pooling strategies derived for CNN features. The presented scheme uses the features maps from an earlier layer 5 of the CNN architecture, which has been shown to preserve coarse spatial information and is semantically meaningful. We examine several pooling strategies and demonstrate superior performance on the image retrieval task (INRIA Holidays) at the fraction of the computational cost, while using a relatively small memory requirements. In addition to retrieval, we see similar efficiency gains on the SUN397 scene categorization dataset, demonstrating wide applicability of this simple strategy. We also introduce and evaluate a novel GeoPlaces5K dataset from different geographical locations in the world for image retrieval that stresses more dramatic changes in appearance and viewpoint.
Published: 2015

37. Detecting Simple Objects in RGB-D Data

Author: GEORGE MASON UNIV FAIRFAX VA, Kosecka, Jana, Zhou, Xing, GEORGE MASON UNIV FAIRFAX VA, Kosecka, Jana, and Zhou, Xing
Abstract: In this paper we present an approach for detection of simple objects in RGB-D data. Object detection in cluttered indoors environments is an important perceptual capability of robotic systems required for object search and pick and deliver tasks. For long term autonomy robots should learn how objects look like and where they appear in an weakly supervised manner. In this work we exploit the depth information to provide evidence about occlusion boundaries and scale of the objects. The depth discontinuities along with image contours computed in the vicinity of the detection window boundary form an {\em objectness} measure, which is used to train an SVM classifier. In the testing stage we exploit the knowledge of the actual size of the object to propose the scale of the detection window significantly pruning the number window candidates to be evaluated. We evaluate our approach for detecting simple objects on NYU RGB-D dataset, illustrate the effectiveness of our approach as well as difficulties with the standard evaluation methodologies., The original document contains color images.
Published: 2013

38. Semantic Segmentation of Urban Environments into Object and Background Categories

Author: GEORGE MASON UNIV FAIRFAX VA, Lerma, Cesar C, Kosecka, Jana, GEORGE MASON UNIV FAIRFAX VA, Lerma, Cesar C, and Kosecka, Jana
Abstract: Advancements in robotic navigation, object search and exploration rest to a large extent on robust, efficient and more advanced semantic understanding of the surrounding environment. Since the choice of most relevant semantic information depends on the task, it is desirable to develop approaches which can be adopted for different tasks at hand and which separate the aspects related to surroundings from object entities. In the proposed work we present an efficient approach for detecting generic objects in urban environments from videos acquired by a moving vehicle by means of semantic segmentation. Compared to traditional approaches for semantic labeling, we strive to detect variety of objects, while avoiding the need for large amounts of training data required for recognizing individual object categories and visual variability within and across the categories. In the proposed approach we exploit the features providing evidence about widely available non-object categories (such as sky, road, buildings) and use informative features which are indicative of the presence of object boundaries to gather the evidence about objects. We formulate the object/non-object semantic segmentation problem in the Conditional Random Field Framework, where the structure of the graph is induced by the minimum spanning tree computed over 3D reconstruction, yielding an efficient algorithm for an exact inference. We carry out extensive experiments on videos of urban environments acquired by a moving vehicle and compare our approach to existing alternatives.
Published: 2013

39. Dynamic RGB-D Mapping

Author: GEORGE MASON UNIV FAIRFAX VA DEPT OF COMPUTER SCIENCE, Paton, Michael, Kosecka, Jana, GEORGE MASON UNIV FAIRFAX VA DEPT OF COMPUTER SCIENCE, Paton, Michael, and Kosecka, Jana
Abstract: Localization and mapping has been an area of great importance and interest to the robotics and computer vision community. It has traditionally been accomplished with range sensors such as lasers and sonars. Recent improvements in processing power coupled with advancements in image matching and motion estimation has allowed development of vision based localization techniques. Despite much progress, there are disadvantages to both range sensing and vision techniques making localization and mapping that is inexpensive and robust hard to attain. With the advent of RGB-D cameras which provide synchronized range and video data, localization and mapping is now able to exploit both range data as well as RGB features. This thesis exploits the strengths of vision and range sensing localization and mapping strategies and proposes novel algorithms using RGB-D cameras. We show how to combine existing strategies and present through evaluation of the resulting algorithms against a dataset of RGB-D benchmarks. Lastly we demonstrate the proposed algorithm on a challenging indoor dataset and demonstrate improvements where either pure range sensing or vision techniques perform poorly., The original document contains color images.
Published: 2012

40. Motion bias and structure distortion induced by intrinsic calibration errors

Author: Zucchelli, Marco, Kosecka, Jana, Zucchelli, Marco, and Kosecka, Jana
Abstract: This article provides an account of sensitivity and robustness of structure and motion recovery with respect to the errors in intrinsic parameters of the camera. We demonstrate both analytically and in simulation, the interplay between measurement and calibration errors and their effect on motion and structure estimates. In particular we show that the calibration errors introduce an additional bias towards the optical axis, which has opposite sign to the bias typically observed by egomotion algorithms. The overall bias causes a distortion of the resulting 3D structure, which we express in a parametric form. The analysis and experiments are carried out in the differential setting for motion and structure estimation from image velocities. While the analytical explanations are derived in the context of linear techniques for motion estimation, we verify our observations experimentally on a variety of optimal and suboptimal motion and structure estimation algorithms. The obtained results illuminate and explain the performance and sensitivity of the differential structure and motion recovery techniques in the presence of calibration errors., QC 20110601
Published: 2008
Full Text: View/download PDF

41. Development of Binocular Stereopsis for Vehicle Lateral Control, Longitudinal Control and Obstacle Detection

Author: Malik, Jitendra, Malik, Jitendra, Kosecka, Jana, Taylor, Camillo J., McLauchlan, Philip, Malik, Jitendra, Malik, Jitendra, Kosecka, Jana, Taylor, Camillo J., and McLauchlan, Philip
Abstract: This nal report describes the application of computer vision techniques to the lateral and longitudinal control of an autonomous highway vehicle. In the part of the project we focused on an analysis of the vehicle's lateral dynamics and the design of an appropriate controller for lateral control and investigated various static feedback strategies where the measurements obtained from vision, namely o set from the centerline and angle between the road tangent and the orientation of the vehicle at some look-ahead distance, are directly used for control. The role of the look-ahead, its relation to the vision processing delay, longitudinal velocity and road geometry was crucial on the design of the control and their experimental evaluation. We carried out a thorough analysis of the e ects of changing various important system parameters like the vehicle velocity, thelookahead range of the vision sensor and the processing delay associated with the perception and control systems. We also present the results of a series of experiments that were designed to provide a systematic comparison of a number of control strategies. The control strategies that were explored include a lead-lag control law, a full- state linear controller and input-output linearizing control law. Each of these control strategies was implemented and tested at highway speeds on our experimental vehicle platform, a Honda Accord LX sedan. For the longitudinal control problem, we investigated the possibility of using stereo vision to provide the range information, in conjunction with a scanning laser radar sensor. The vision based tracking system utilizes a layered architecture wherein the bottom layer computes motion in both images using a simple correlation algorithm, and the upper level performs stereo xation and reconstruction using an algorithm designed for active vision systems. We present some initial results comparing the quality of range measurements provided by a vision system with the laser radar s
Published: 1999

42. Development of Binocular Stereopsis for Vehicle Lateral Control, Longitudinal Control and Obstacle Detection

Author: Malik, Jitendra, Malik, Jitendra, Kosecka, Jana, Taylor, Camillo J., McLauchlan, Philip, Malik, Jitendra, Malik, Jitendra, Kosecka, Jana, Taylor, Camillo J., and McLauchlan, Philip
Abstract: This nal report describes the application of computer vision techniques to the lateral and longitudinal control of an autonomous highway vehicle. In the part of the project we focused on an analysis of the vehicle's lateral dynamics and the design of an appropriate controller for lateral control and investigated various static feedback strategies where the measurements obtained from vision, namely o set from the centerline and angle between the road tangent and the orientation of the vehicle at some look-ahead distance, are directly used for control. The role of the look-ahead, its relation to the vision processing delay, longitudinal velocity and road geometry was crucial on the design of the control and their experimental evaluation. We carried out a thorough analysis of the e ects of changing various important system parameters like the vehicle velocity, thelookahead range of the vision sensor and the processing delay associated with the perception and control systems. We also present the results of a series of experiments that were designed to provide a systematic comparison of a number of control strategies. The control strategies that were explored include a lead-lag control law, a full- state linear controller and input-output linearizing control law. Each of these control strategies was implemented and tested at highway speeds on our experimental vehicle platform, a Honda Accord LX sedan. For the longitudinal control problem, we investigated the possibility of using stereo vision to provide the range information, in conjunction with a scanning laser radar sensor. The vision based tracking system utilizes a layered architecture wherein the bottom layer computes motion in both images using a simple correlation algorithm, and the upper level performs stereo xation and reconstruction using an algorithm designed for active vision systems. We present some initial results comparing the quality of range measurements provided by a vision system with the laser radar s
Published: 1999

43. Development Of Binocular Stereopsis For Vehicle Lateral Control, Longitudinal Control And Obstacle Detection

Author: Malik, Jitendra, Malik, Jitendra, Taylor, Camillo J., Mclauchlan, Philip, Kosecka, Jana, Malik, Jitendra, Malik, Jitendra, Taylor, Camillo J., Mclauchlan, Philip, and Kosecka, Jana
Abstract: This report describes progress in the application of computer vision techniques to the lateral and longitudinal control of an autonomous highway vehicle. A vehicle's lateral dynamics and the design of an appropriate controller for lateral control are investigated. Stereo vision, in conjunction with a scanning laser radar sensor, are studied for providing range information applicable to the longitudinal control problem. The results from the experimental demonstration of this system are reported as part of the National Automated Highway Systems Consortium (NAHSC) Demonstration that took place in San Diego in August 1997.
Published: 1997

44. Development Of Binocular Stereopsis For Vehicle Lateral Control, Longitudinal Control And Obstacle Detection

Author: Malik, Jitendra, Malik, Jitendra, Taylor, Camillo J., Mclauchlan, Philip, Kosecka, Jana, Malik, Jitendra, Malik, Jitendra, Taylor, Camillo J., Mclauchlan, Philip, and Kosecka, Jana
Abstract: This report describes progress in the application of computer vision techniques to the lateral and longitudinal control of an autonomous highway vehicle. A vehicle's lateral dynamics and the design of an appropriate controller for lateral control are investigated. Stereo vision, in conjunction with a scanning laser radar sensor, are studied for providing range information applicable to the longitudinal control problem. The results from the experimental demonstration of this system are reported as part of the National Automated Highway Systems Consortium (NAHSC) Demonstration that took place in San Diego in August 1997.
Published: 1997

45. A framework for modeling and verifying visually guided agents: Design, analysis and experiments

Author: Kosecka, Jana and Kosecka, Jana
Abstract: The successful functioning of robotic agents in a dynamically changing environment requires rich sensory input and advanced sensory-motor control. The robotic agents are typically equipped with multiple sensors and multiple actuators. The interactions between agents and the environment can be characterized by both discrete and continuous models. The accomplishment of various tasks is mediated by complex coordination and interaction between individual sensing and control strategies. It is crucial for the reliable and predictable operation of robotic systems that the design be within a structured methodology which supports analysis and modularity. In this thesis we introduce a framework for modeling and verifying autonomous mobile agents which provides a unified approach to the design and analysis of systems comprising of both continuous and discrete event components. Our framework proposes to model elementary sensing and control strategies by making appropriate finite state machine abstractions which capture the discrete event aspects of continuous models. In order to achieve the desired modularity and flexibility we define a Task Specification Language for composing elementary strategies to form more complex missions. The robustness and reliability of individual control strategies is guaranteed by their design and analysis at the continuous level. The verification of the discrete event interactions between strategies is formulated as a Supervisory Control Theory Problem, where we synthesize a supervisor which serves as a run-time scheduler and monitor of the task. The overall framework is verified in a series of experiments where teams of visually guided mobile agents are engaged in various navigation tasks. Within our framework, we formulate the visually guided navigation problem and propose a relational model of the environment embedded in a finite state machine structure which can then be used for automatic generation of the task specification.
Published: 1996

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

45 results on '"Kosecka, Jana"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources