42 results on '"people detection"'
Search Results
2. Comparativa entre la técnica de umbralización binaria y el método de Otsu para la detección de personas
- Author
-
Carlos Niño, Dinael Guevara-Ibarra, Sergio Alexander Castro Casadiego, Byron Medina Delgado, and Luis Camargo
- Subjects
memoria ,detección de personas ,Pharmaceutical Science ,people detection ,hits ,lcsh:Technology ,aciertos ,memory ,Otsu method ,umbralización binaria ,Pharmacology (medical) ,time ,Matlab ,comparativa ,lcsh:T ,tiempo ,requerimiento de máquina ,método de Otsu ,Complementary and alternative medicine ,lcsh:TA1-2040 ,binarythresholding ,lcsh:Engineering (General). Civil engineering (General) ,machine requirement ,Python - Abstract
En procesos de detección por imágenes en las que existe variación de luminosidad entre pixeles, se requieren técnicas que permitan obtener valores óptimos y adaptables de umbral ante dichas variaciones. Por ello, se realiza una comparativa entre la técnica de umbralización binaria y el método adaptativo de Otsu, en videos con fondo dinámico y estático, ponderando el tiempo de respuesta del algoritmo, memoria utilizada, requerimiento de la unidad central de procesos y aciertos en las detecciones, en los lenguajes de Python y M (Matlab). Las técnicas en Python presentan mejores resultados en cuanto a tiempo de respuesta y espacio de memoria; mientras que, al utilizar Matlab, se presenta el menor porcentaje de requerimiento de máquina. Asimismo, el método de Otsu mejora el porcentaje de aciertos en 12.89 % y 11.3 % para videos con fondo dinámico y estático, respecto a la técnica de umbralización binaria. In image detection processes where there is a variation in brightness between pixels, techniques are required to obtain optimal and adaptable threshold values for these variations. Therefore, a comparison between the binary thresholdingtechnique and the adaptive method of Otsu is made, in videos with dynamic and static background, weighing the response time of the algorithm, memory used, requirement of the central processing unit and hits in the detections, in the languages of Python and M (Matlab). The techniques in Python present better results in terms of response time and memory space; while, when using Matlab, the lowest percentage of machine requirement is presented. Also, the Otsu method improves the percentage of hits in 12.89 % and 11.3 % for videos with dynamic and static background, with respect to the binary thresholding technique.
- Published
- 2021
- Full Text
- View/download PDF
3. LiDAL: Light Detection and Localization
- Author
-
Safwan Hafeedh Younus, Ahmed Taha Hussein, Jaafar M. H. Elmirghani, Aubida A. Al-Hameed, and Mohammed Thamer Alresheed
- Subjects
Signal Processing (eess.SP) ,General Computer Science ,Infrared ,Computer science ,Photodetector ,Visible light communication ,people detection ,02 engineering and technology ,01 natural sciences ,localization ,law.invention ,020210 optoelectronics & photonics ,law ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Computer vision ,Electrical Engineering and Systems Science - Signal Processing ,Electrical and Electronic Engineering ,Radar ,Background subtraction ,business.industry ,010401 analytical chemistry ,Transmitter ,Detector ,General Engineering ,Ranging ,Reflectivity ,0104 chemical sciences ,VLC systems ,optimum receviers ,counting ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,business ,Optical indoor localization ,lcsh:TK1-9971 ,Visible spectrum - Abstract
In this paper, we present the first indoor light-based detection and localization system that builds on concepts from radio detection and ranging (radar) making use of the expected growth in the use and adoption of visible light communication (VLC), which can provide the infrastructure for our Light Detection and Localization (LiDAL) system. Our system enables active detection, counting, and localization of people, in addition to being fully compatible with the existing VLC systems. In order to detect human (targets), LiDAL uses the visible light spectrum. It sends pulses using a VLC transmitter and analyses the reflected signal collected by a photodetector receiver. Although we examine the use of the visible spectrum here, LiDAL can be used in the infrared spectrum and other parts of the light spectrum. We introduce LiDAL with different transmitter-receiver configurations and optimum and sub-optimum detectors considering the fluctuation of the received reflected signal from the target in the presence of Gaussian noise. We design an efficient multiple input multiple output (MIMO) LiDAL system with a wide field of view (FOV) single photodetector receiver, and also design a multiple input single output (MISO) LiDAL system with an imaging receiver to eliminate the ambiguity in target detection and localization. We develop models for the human body and its reflections and consider the impact of the color and texture of the cloth used as well as the impact of target mobility. A number of detection and localization methods are developed for our LiDAL system, including cross correlation and a background subtraction method. These methods are considered to distinguish a mobile target from the ambient reflections due to background obstacles (furniture) in a realistic indoor environment.
- Published
- 2019
- Full Text
- View/download PDF
4. Real-time passenger social distance monitoring with video analytics using deep learning in railway station
- Author
-
Iqbal Ahmad Dahlan1,3, Muhammad Bryan Gutomo Putra2,3, Suhono Harso Supangkat2,3, Fadhil Hidayat2,3, Fetty Fitriyanti Lubis2,3, and Faqih Hamami4
- Subjects
DeepSORT algorithm ,Control and Optimization ,Social distancing ,Computer Networks and Communications ,Hardware and Architecture ,People detection ,Signal Processing ,COVID-19 ,People tracking ,CCTV surveillance ,Electrical and Electronic Engineering ,Information Systems - Abstract
Recently, at the end of December, the world faced a severe problem which is a pandemic that is caused by coronavirus disease. It also must be considered by the railway station's authorities that it must have the capability of reducing the covid transmission risk in the pandemic condition. Like a railway station, public transport plays a vital role in managing the COVID-19 spread because it is a center of public mass transportation that can be associated with the acquisition of infectious diseases. This paper implements social distance monitoring with a YOLOv4 object detection model for crowd monitoring using standard CCTV cameras to track visitors using the deep learning with simple online and real-time (DeepSORT) algorithm. This paper used CCTV surveillance with the actual implementation in Bandung railway station with the accuracy at 96.5% result on people tracking with tested in real-time processing by using minicomputer Intel(R) Xeon(R) CPU E3-1231 v3 3.40GHz RAM 6 GB around at 18 FPS.
- Published
- 2022
- Full Text
- View/download PDF
5. Incorporating wheelchair users in people detection
- Author
-
Rafael Martin-Nieto, Alvaro Garcia-Martin, José M. Martínez, UAM. Departamento de Tecnología Electrónica y de las Comunicaciones, and Video Processing & Understanding Lab (VPULab)
- Subjects
Healthcare system ,Telecomunicaciones ,Computer Networks and Communications ,Computer science ,Independent living ,People detection ,020207 software engineering ,02 engineering and technology ,Wheelchair ,Wheelchair users ,Assisted living ,Hardware and Architecture ,Human–computer interaction ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Software - Abstract
A wheelchair users detector is presented to extend people detection, providing a more general solution to detect people in environments such as houses adapted for independent and assisted living, hospitals, healthcare centers and senior residences. A wheelchair user model is incorporated in a detector whose detections are afterwards combined with the ones obtained using traditional people detectors (we define these as standing people detectors). We have trained a model for classical (DPM) and for modern (Faster-RCNN) detection algorithms, to compare their performance. Besides the extensibility proposed with respect to people detection, a dataset of video sequences has been recorded in a real in-door senior residence environment containing wheelchairs users and standing people and it has been released together with the associated groundtruth, This work has been partially supported by the Spanish government under the project TEC2014-53176-R (HAVideo) and by the Spanish Government FPU grant programme (Ministerio de Educación, Cultura y Deporte)
- Published
- 2018
- Full Text
- View/download PDF
6. Laser-Based People Detection and Obstacle Avoidance for a Hospital Transport Robot
- Author
-
Kuisong Zheng, Feng Wu, and Xiaoping Chen
- Subjects
0209 industrial biotechnology ,Computer science ,people detection ,02 engineering and technology ,Workspace ,lcsh:Chemical technology ,Legibility ,01 natural sciences ,Biochemistry ,Article ,Analytical Chemistry ,obstacle avoidance ,020901 industrial engineering & automation ,Sliding window protocol ,Obstacle avoidance ,Humans ,lcsh:TP1-1185 ,Computer vision ,Electrical and Electronic Engineering ,navigation ,Cluster analysis ,Instrumentation ,automotive_engineering ,Service robot ,Artificial neural network ,business.industry ,Lasers ,service robot ,Deep learning ,010401 analytical chemistry ,Robotics ,Hospitals ,Atomic and Molecular Physics, and Optics ,0104 chemical sciences ,Robot ,Neural Networks, Computer ,Artificial intelligence ,business ,Algorithms - Abstract
This paper describes the development of a laser-based people detection and obstacle avoidance algorithm for a differential-drive robot, which is used for transporting materials along a reference path in hospital domains. Detecting humans from laser data is an important functionality for the safety of navigation in the shared workspace with people. Nevertheless, traditional methods normally utilize machine learning techniques on hand-crafted geometrical features extracted from individual clusters. Moreover, the datasets used to train the models are usually small and need to manually label every laser scan, increasing the difficulty and cost of deploying people detection algorithms in new environments. To tackle these problems, (1) we propose a novel deep learning-based method, which uses the deep neural network in a sliding window fashion to effectively classify every single point of a laser scan. (2) To increase the speed of inference without losing performance, we use a jump distance clustering method to decrease the number of points needed to be evaluated. (3) To reduce the workload of labeling data, we also propose an approach to automatically annotate datasets collected in real scenarios. In general, the proposed approach runs in real-time, performs much better than traditional methods, and can be straightforwardly extended to 3D laser data. Secondly, conventional pure reactive obstacle avoidance algorithms can produce inefficient and oscillatory behaviors in dynamic environments, making pedestrians confused and possibly leading to dangerous reactions. To improve the legibility and naturalness of obstacle avoidance in human crowded environments, we introduce a sampling-based local path planner, similar to the method used in autonomous driving cars. The key idea is to avoid obstacles by switching lanes. We also adopt a simple rule to decrease the number of unnecessary deviations from the reference path. Experiments carried out in real-world environments confirmed the effectiveness of the proposed algorithms.
- Published
- 2021
- Full Text
- View/download PDF
7. Analysis of pedestrian activity before and during COVID-19 lockdown, using webcam time-lapse from Cracow and machine learning
- Author
-
Szczepanek, Robert
- Subjects
Environmental Impacts ,Cracow ,Coronavirus disease 2019 (COVID-19) ,Pedestrian detection ,Data Mining and Machine Learning ,Observation period ,lcsh:Medicine ,02 engineering and technology ,Pedestrian ,Machine learning ,computer.software_genre ,General Biochemistry, Genetics and Molecular Biology ,Database ,03 medical and health sciences ,Public space ,Published Database ,0202 electrical engineering, electronic engineering, information engineering ,Old town ,Spatial and Geographic Information Science ,OpenCV ,Pedestrian counting ,Movement control ,030304 developmental biology ,0303 health sciences ,business.industry ,General Neuroscience ,Data Science ,People detection ,lcsh:R ,COVID-19 ,General Medicine ,Coupled Natural and Human Systems ,YOLOv3 ,Webcam ,Geography ,020201 artificial intelligence & image processing ,Artificial intelligence ,General Agricultural and Biological Sciences ,business ,computer - Abstract
At the turn of February and March 2020, COVID-19 pandemic reached Europe. Many countries, including Poland imposed lockdown as a method of securing social distance between potentially infected. Stay-at-home orders and movement control within public space not only affected the touristm industry, but also the everyday life of the inhabitants. The hourly time-lapse from four HD webcams in Cracow (Poland) are used in this study to estimate how pedestrian activity changed during COVID-19 lockdown. The collected data covers the period from 9 June 2016 to 19 April 2020 and comes from various urban zones. One zone is tourist, one is residential and two are mixed. In the first stage of the analysis, a state-of-the-art machine learning algorithm (YOLOv3) is used to detect people. Additionally, a non-standard application of the YOLO method is proposed, oriented to the images from HD webcams. This approach (YOLOtiled) is less prone to pedestrian detection errors with the only drawback being the longer computation time. Splitting the HD image into smaller tiles increases the number of detected pedestrians by over 50%. In the second stage, the analysis of pedestrian activity before and during the COVID-19 lockdown is conducted for hourly, daily and weekly averages. Depending on the type of urban zone, the number of pedestrians decreased from 33% in residential zones to 85% in tourist zones located in the Old Town. The presented method allows for more efficient detection and counting of pedestrians from HD time-lapse webcam images compared to SSD, YOLOv3 and Faster R-CNN. The result of the research is a published database with the detected number of pedestrians from the four-year observation period for four locations in Cracow.
- Published
- 2020
- Full Text
- View/download PDF
8. CNN Implementation for Semantic Heads Segmentation Using Top-View Depth Data in Crowded Environment
- Author
-
Rocco Pietrini, Daniele Liciotti, Marina Paolanti, Emanuele Frontoni, and Primo Zingaretti
- Subjects
Source code ,business.industry ,Computer science ,media_common.quotation_subject ,People detection ,CNN ,Top-view ,Python (programming language) ,computer.software_genre ,Scripting language ,RGB color model ,Segmentation ,Computer vision ,Artificial intelligence ,Architecture ,business ,F1 score ,computer ,Implementation ,media_common ,computer.programming_language - Abstract
The paper “Convolutional Networks for semantic Heads Segmentation using Top-View Depth Data in Crowded Environment” [1] introduces an approach to track and detect people in cases of heavy occlusions based on CNNs for semantic segmentation using top-view RGB-D visual data. The purpose is the design of a novel U-Net architecture, U-Net 3, that has been modified compared to the previous ones at the end of each layer. In order to evaluate this new architecture a comparison has been made with other networks in the literature used for semantic segmentation. The implementation is in Python code using Keras API with Tensorflow library. The input data consist of depth frames, from Asus Xtion Pro Live OpenNI recordings (.oni). The dataset used for training and testing of the networks has been manually labeled and it is freely available as well as the source code. The aforementioned networks have their stand-alone Python script implementation for training and testing. A Python script for the on-line prediction in OpenNI recordings (.oni) is also provided. Evaluation of the networks has been made with different metrics implementations (precision, recall, F1 Score, Sorensen-Dice coefficient), included in the networks scripts.
- Published
- 2019
- Full Text
- View/download PDF
9. Artificial Neural Network for LiDAL Systems
- Author
-
Ahmed Taha Hussein, Aubida A. Al-Hameed, Mohammed T. Alresheedi, Jaafar M. H. Elmirghani, and Safwan Hafeedh Younus
- Subjects
Signal Processing (eess.SP) ,General Computer Science ,Light detection ,Artificial neural network ,Computer science ,business.industry ,MIMO ,General Engineering ,Pattern recognition ,people detection ,Signal ,Neural network ,Set (abstract data type) ,VLC systems ,FOS: Electrical engineering, electronic engineering, information engineering ,counting ,General Materials Science ,Artificial intelligence ,Time domain ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Electrical Engineering and Systems Science - Signal Processing ,business ,ANN ,lcsh:TK1-9971 ,optical indoor localization - Abstract
In this paper, we introduce an intelligent light detection and localization (LiDAL) system that uses artificial neural networks (ANN). The LiDAL systems of interest are MIMO LiDAL and MISO IMG LiDAL systems. A trained ANN with the LiDAL system of interest is used to distinguish a human (target) from the background obstacles (furniture) in a realistic indoor environment. In the LiDAL systems, the received reflected signals in the time domain have different patterns corresponding to the number of targets and their locations in an indoor environment. The indoor environment with background obstacles (furniture) appears as a set of patterns in the time domain when the transmitted optical signals are reflected from objects in LiDAL systems. Hence, a trained neural network that has the ability to classify and recognize the received signal patterns can distinguish the targets from the background obstacles in a realistic environment. The LiDAL systems with ANN are evaluated in a realistic indoor environment through computer simulation., Comment: arXiv admin note: substantial text overlap with arXiv:1903.09896
- Published
- 2019
- Full Text
- View/download PDF
10. Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information
- Author
-
José M. Martínez, Alvaro Garcia-Martin, Juan C. SanMiguel, UAM. Departamento de Tecnología Electrónica y de las Comunicaciones, and Tratamiento e Interpretación de Vídeo (ING EPS-06)
- Subjects
Computer science ,Physics::Instrumentation and Detectors ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,coarse-to-fine adaptation ,02 engineering and technology ,Biosensing Techniques ,people detection ,lcsh:Chemical technology ,Biochemistry ,Detector adaptation ,Article ,Analytical Chemistry ,pair-wise correlation ,Pair-wise correlation ,0202 electrical engineering, electronic engineering, information engineering ,Image Processing, Computer-Assisted ,Entropy (information theory) ,Humans ,lcsh:TP1-1185 ,Electrical and Electronic Engineering ,Instrumentation ,Ground truth ,Telecomunicaciones ,Training set ,business.industry ,020208 electrical & electronic engineering ,Detector ,People detection ,thresholds ,Video sequence ,Pattern recognition ,Mutual information ,Atomic and Molecular Physics, and Optics ,Coarse to fine ,detector adaptation ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,entropy ,Algorithms - Abstract
Applying people detectors to unseen data is challenging since patterns distributions, such as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt frame by frame people detectors during runtime classification, without requiring any additional manually labeled ground truth apart from the offline training of the detection model. Such adaptation make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation discriminates between relevant instants in a video sequence, i.e., identifies the representative frames for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration (i.e., detection threshold) of each detector under analysis, maximizing the mutual information to obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not require training the detectors for each new scenario and uses standard people detector outputs, i.e., bounding boxes. The experimental results demonstrate that the proposed approach outperforms state-of-the-art detectors whose optimal threshold configurations are previously determined and fixed from offline training data, This work has been partially supported by the Spanish government under the project TEC2014-53176-R (HAVideo)
- Published
- 2018
11. Enhancing Multi-Camera People Detection by Online Automatic Parametrization Using Detection Transfer and Self-Correlation Maximization
- Author
-
José M. Martínez, Juan C. SanMiguel, Rafael Martin-Nieto, Alvaro Garcia-Martin, UAM. Departamento de Tecnología Electrónica y de las Comunicaciones, and Video Processing & Understanding Lab
- Subjects
Computer science ,02 engineering and technology ,Multi camera ,lcsh:Chemical technology ,Biochemistry ,Article ,Analytical Chemistry ,Transfer (computing) ,0502 economics and business ,Self-correlation maximization ,0202 electrical engineering, electronic engineering, information engineering ,lcsh:TP1-1185 ,Computer vision ,Electrical and Electronic Engineering ,Instrumentation ,050210 logistics & transportation ,Ground truth ,Telecomunicaciones ,business.industry ,05 social sciences ,Detector ,People detection ,Automatic parametrization ,Maximization ,Atomic and Molecular Physics, and Optics ,Multi-camera ,Task (computing) ,Self correlation ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Parametrization - Abstract
Finding optimal parametrizations for people detectors is a complicated task due to the large number of parameters and the high variability of application scenarios. In this paper, we propose a framework to adapt and improve any detector automatically in multi-camera scenarios where people are observed from various viewpoints. By accurately transferring detector results between camera viewpoints and by self-correlating these transferred results, the best configuration (in this paper, the detection threshold) for each detector-viewpoint pair is identified online without requiring any additional manually-labeled ground truth apart from the offline training of the detection model. Such a configuration consists of establishing the confidence detection threshold present in every people detector, which is a critical parameter affecting detection performance. The experimental results demonstrate that the proposed framework improves the performance of four different state-of-the-art detectors (DPM , ACF, faster R-CNN, and YOLO9000) whose Optimal Fixed Thresholds (OFTs) have been determined and fixed during training time using standard datasets. Keywords: self-correlationmaximization;multi-camera; people detection; automatic, This work has been partially supported by the Spanish government under the project TEC2014-53176-R
- Published
- 2018
12. People Detection and Pose Classification Inside a Moving Train Using Computer Vision
- Author
-
Sergio A. Velastin, Diego A. Gómez-Lira, European Commission, and Ministerio de Economía y Competitividad (España)
- Subjects
Informática ,0209 industrial biotechnology ,Computer science ,business.industry ,People monitoring ,People detection ,Posture classification ,02 engineering and technology ,Emergency situations ,On-board surveillance ,Support vector machine ,020901 industrial engineering & automation ,Histogram of oriented gradients ,Boss ,Public transport ,Machine learning ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,False positive rate ,Artificial intelligence ,Performance improvement ,business ,Classifier (UML) - Abstract
This paper has been presented at : 5th International Visual Informatics Conference (IVIC 2017) Also part of the Image Processing, Computer Vision, Pattern Recognition, and Graphics book sub series (LNIP, volume 10645) The use of surveillance video cameras in public transport is increasingly regarded as a solution to control vandalism and emergency situations. The widespread use of cameras brings in the problem of managing high volumes of data, resulting in pressure on people and resources. We illustrate a possible step to automate the monitoring task in the context of a moving train (where popular background removal algorithms will struggle with rapidly changing illumination). We looked at the detection of people in three possible postures: Sat down (on a train seat), Standing and Sitting (half way between sat down and standing). We then use the popular Histogram of Oriented Gradients (HOG) descriptor to train Support Vector Machines to detect people in any of the predefined postures. As a case study, we use the public BOSS dataset. We show different ways of training and combining the classifiers obtaining a sensitivity performance improvement of about 12% when using a combination of three SVM classifiers instead of a global (all classes) classifier, at the expense of an increase of 6% in false positive rate. We believe this is the first set of public results on people detection using the BOSS dataset so that future researchers can use our results as a baseline to improve upon. The work described here was carried out as part of the OBSERVE project funded by the Fondecyt Regular Program of Conicyt (Chilean Research Council for Science and Technology) under grant no. 1140209. S.A. Velastin is grateful to funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 600371, el Ministerio de Economía y Competitividad (COFUND2013-51509) and Banco Santander.
- Published
- 2017
- Full Text
- View/download PDF
13. Detection and classification of the behavior of people in an intelligent building by camera
- Author
-
Henni Sid Ahmed, Jean Caelen, and Belbachir Mohamed Faouzi
- Subjects
intelligent building ,Multimedia ,lcsh:T ,Computer science ,business.industry ,people detection ,computer.software_genre ,lcsh:Technology ,classification ,Control and Systems Engineering ,video analysis ,lcsh:Technology (General) ,lcsh:T1-995 ,Electrical and Electronic Engineering ,business ,computer ,Building automation - Abstract
an intelligent building is an environment that contains a number of sensor and camera, which aims to provide information that give the various actions taken by individuals, and their status to be processed by a system of detection and classification of behaviors . This system of detection and classification uses this information as input to provide maximum comfort to people who are in this building with optimal energy consumption, for example if I workout in the room then the system will lower the heating . My goal is to develop a robust and reliable system which is composed of two fixed cameras in every room of intelligent building which are connected to a computer for acquisition of video sequences, with a program using these video sequences as inputs, we use RGB color histograms and textures for LBP represented different images of video sequences, and SVM (support vector machine) Lights as a programming tool for the detection and classification of the behavior of people in this intelligent building in order to give maximum comfort with optimized energy consumption. The classification is performed using the classification of k 1 and k = 11 in our case, we built 11 models in the learning phase using different nucleus in order to choose the best models that give the highest classification rate and finally for, the classification phase, to classify the behavior, we compare it to the 11 behaviors, that is to say, we make 11 classification and take the behavior that has the highest classification rate. This work has been carried out within the University Joseph Fourier in Grenoble precisely LIG (Grenoble computer lab) in the team MULTI COM and the University of Oran Algeria USTO. Our contribution in this field is the design and implementation of a robust, and accurate system that make detection and classification of 11 behaviors cameras in an intelligent building, varying illumination it means, whatever lighting is our system must be capable of detecting and classifying behaviors.
- Published
- 2013
- Full Text
- View/download PDF
14. Improving the reliability of 3D people tracking system by means of deep-learning
- Author
-
Matteo Boschini, Matteo Poggi, Stefano Mattoccia, Boschini, Matteo, Poggi, Matteo, and Mattoccia, Stefano
- Subjects
Computer science ,Reliability (computer networking) ,people detection ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Convolutional neural network ,Media Technology ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,0105 earth and related environmental sciences ,business.industry ,Deep learning ,deep learning ,Tracking system ,tracking ,stereo vision ,Computer Graphics and Computer-Aided Design ,Stereopsis ,Analytics ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Transfer of learning ,computer ,Stereo camera ,3D - Abstract
People tracking is a crucial task in most computer vision applications aimed at analyzing specific behaviors in the sensed area. Practical applications include vision analytics, people counting, etc. In order to properly follow the actions of a single subject, a people tracking framework needs to robustly recognize it from the rest of the surrounding environment, thus allowing proper management of changing positions, occlusions and so on. The recent widespread diffusion of deep learning techniques on almost any kind of computer vision application provides a powerful methodology to address recognition. On the other hand, a large amount of data is required to train state-of-the-art Convolutional Neural Networks (CNN) and this problem is solved, when possible, by means of transfer learning. In this paper, we propose a novel dataset made of nearly 26 thousand samples acquired with a custom stereo camera providing depth according to a fast and accurate stereo algorithm. The dataset includes sequences acquired in different environments with more than 20 different people moving across the sensed area. Once labeled the 26 K images and depth maps of the dataset, we train a head detection module based on state-of-the-art deep network on a portion of the dataset and validate it a different sequence. Finally, we include the head detection module within an existing 3D tracking framework showing that the proposed approach notably improves people detection and tracking accuracy.
- Published
- 2016
- Full Text
- View/download PDF
15. Using a Deep Learning Model on Images to Obtain a 2D Laser People Detector for a Mobile Robot
- Author
-
Miguel García-Silvente and Eugenio Aguirre
- Subjects
General Computer Science ,business.industry ,Computer science ,Deep learning ,People detection ,Detector ,2D laser ,Mobile robot ,QA75.5-76.95 ,Laser ,lcsh:QA75.5-76.95 ,law.invention ,Computational Mathematics ,Government (linguistics) ,Work (electrical) ,Human–computer interaction ,law ,Electronic computers. Computer science ,Machine learning ,Mobile robots ,lcsh:Electronic computers. Computer science ,Artificial intelligence ,business - Abstract
Recent improvements in deep learning techniques applied to images allow the detection of people with a high success rate. However, other types of sensors, such as laser rangefinders, are still useful due to their wide field of vision and their ability to operate in different environments and lighting conditions. In this work we use an interesting computational intelligence technique such as the deep learning method to detect people in images taken by a mobile robot. The masks of the people in the images are used to automatically label a set of samples formed by 2D laser range data that will allow us to detect the legs of people present in the scene. The samples are geometric characteristics of the clusters built from the laser data. The machine learning algorithms are used to learn a classifier that is capable of detecting people from only 2D laser range data. Our people detector is compared to a state-of-the-art classifier. Our proposal achieves a higher value of F1 in the test set using an unbalanced dataset. To improve accuracy, the final classifier has been generated from a balanced training set. This final classifier has also been evaluated using a test set in which we have obtained very high accuracy values in each class. The contribution of this work is 2-fold. On the one hand, our proposal performs an automatic labeling of the samples so that the dataset can be collected under real operating conditions. On the other hand, the robot can detect people in a wider field of view than if we only used a camera, and in this way can help build more robust behaviors., This work has been supported by the Spanish Government TIN2016- 76515-R Grant, supported with Feder funds.
- Published
- 2019
- Full Text
- View/download PDF
16. Real-time people counting using blob descriptor
- Author
-
Satoshi Yoshinaga, Atsushi Shimada, and Rin-ichiro Taniguchi
- Subjects
Background subtraction ,Artificial neural network ,Computer science ,business.industry ,People detection ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Density estimation ,Object detection ,Set (abstract data type) ,Visual surveillance ,Simple (abstract algebra) ,General Materials Science ,Computer vision ,Artificial intelligence ,People counting ,business - Abstract
We propose a system for counting the number of pedestrians in real-time. This system estimates “how many pedestrians are and where they are in video sequences” by the following procedures. First, candidate regions are segmented into blobs according to background subtraction. Second, a set of features are extracted from each blob and a neural network estimates the number of pedestrians corresponding to each set of features. To realize real-time processing, we used only simple and valid features, and the adaptive background modeling using Parzen density estimation, which realizes fast and accurate object detection in input images. We also validate the effectiveness of the proposed system by several experiments.
- Published
- 2010
- Full Text
- View/download PDF
17. SHAPE-BASED INDIVIDUAL/GROUP DETECTION FOR SPORT VIDEOS CATEGORIZATION
- Author
-
Michèle Rombaut, Denis Pellerin, Costas Panagiotakis, Georgios Tziritas, Emmanuel Ramasso, Computer Science Department [Crete] (CSD-UOC), School of Sciences and Engineering [Crete] (SSE-UOC), University of Crete [Heraklion] (UOC)-University of Crete [Heraklion] (UOC), GIPSA - Géométrie, Perception, Images, Geste (GIPSA-GPIG), Département Images et Signal (GIPSA-DIS), Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Grenoble Images Parole Signal Automatique (GIPSA-lab), Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Stendhal - Grenoble 3-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), and SIMILAR European Network of Excellence, Greek PENED 2003 project
- Subjects
Team sport ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Stability (learning theory) ,02 engineering and technology ,Transferable belief model ,Motion (physics) ,Activity recognition ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Artificial Intelligence ,video analysis ,team activity recognition ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,business.industry ,People detection ,Process (computing) ,020207 software engineering ,Categorization ,transferable belief model ,people counting ,Jump ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Software - Abstract
International audience; We present a shape based method for automatic people detection and counting without any assumption or nowledge of camera motion. The proposed method is applied to athletic videos in order to classify them to videos of individual and team sports. Moreover, in the case of team (multi-agent) sport, we propose a shape deformations based method for running/hurdling discrimination (activity recognition). Robust, adaptive and independent from the camera motion, the proposed features are combined within the Transferable Belief Model (TBM) framework providing a two level (frames and shot) video categorization. The TBM allows to take into account imprecision, uncertainty and conflict inherent to the features into the fusion process.We have tested the proposed scheme into a big variety of athletic videos like pole vault, high jump, triple jump, hurdling, running, etc. The experimental results of 97% individual/team sport categorization accuracy, using a dataset of 252 real videos of athletic meetings acquired by moving cameras under varying view angles, indicate the stability and the good performance of the proposed scheme.
- Published
- 2008
- Full Text
- View/download PDF
18. Posture estimation for improved photogrammetric localization of pedestrians in monocular infrared imagery
- Author
-
Toby P. Breckon and Mikolaj E. Kundegorski
- Subjects
Sensor networks ,Temporal fusion ,Passive target positioning ,Intelligent target reporting ,Thermal imaging ,Monocular ,business.industry ,Pedestrian detection ,People detection ,Thermal target tracking ,Support vector machine ,Geography ,Photogrammetry ,Feature (computer vision) ,Position (vector) ,Histogram ,Range (statistics) ,3D pedestrian localization ,Computer vision ,Artificial intelligence ,business ,Temporal filtering - Abstract
Target tracking complexity within conventional video imagery can be fundamentally attributed to the ambiguity associated with actual 3D scene position of a given tracked object in relation to its observed position in 2D image space. Recent work, within thermal-band infrared imagery, has tackled this challenge head on by returning to classical photogrammetry as a means of recovering the true 3D position of pedestrian targets. A key limitation in such approaches is the assumption of posture – that the observed pedestrian is at full height stance within the scene. Whilst prior work has shown the effects of statistical height variation to be negligible, variations in the posture of the target may still pose a significant source of potential error. Here we present a method that addresses this issue via the use of Support Vector Machine (SVM) regression based pedestrian posture estimation operating on Histogram of Orientated Gradient (HOG) feature descriptors. Within an existing tracking framework, we demonstrate improved target localization that is independent of variations in target posture (i.e. behaviour) and within the statistical error bounds of prior work for pedestrian height posture varying from 0.4-2.4m over a distance to target range of 7-30m.
- Published
- 2015
- Full Text
- View/download PDF
19. Detección y conteo de personas, a partir de mapas de profundidad cenitales capturados con cámaras TOF
- Author
-
García Jiménez, Raquel, Losada Gutiérrez, Cristina, Luna Vázquez, Carlos Andrés, and Universidad de Alcalá. Escuela Politécnica Superior
- Subjects
PCA ,Telecomunicaciones ,Time-of-flight cameras (ToF cameras) ,Cámaras de tiempo de vuelo ,Detección de personas ,People detection ,Telecommunication - Abstract
El objetivo de este proyecto es la detección y conteo de personas, a partir de imágenes de profundidad obtenidas mediante un sensor basado en tiempo de vuelo (ToF), situado en posición cenital en un entorno interior. Para conseguir este objetivo se han estudiado diferentes trabajos en este área y se ha implementado una solución basada en la extracción de características relacionadas con la superficie de la persona vista desde el sensor, y la clasificación posterior mediante el análisis de componentes principales (PCA). El algoritmo desarrollado se ha evaluado sobre una base de datos grabada y etiquetada para ello, obteniendo una tasa de aciertos en torno al 95 %., The aim of this project is the detection and counting of people, using depth images obtained by a sensor based on time of flight (TOF), located in zenith position in an indoor environment. To achieve this, several works in this area have been studied, and a solution has been proposed. It is based on the extraction of features from the surface of the person seen from the sensor, and their clasification using the principal components analysis(PCA). The developed algorithm has been tested with a database recorded and labelled for it, and a success rate of 95% was obtained., Grado en Ingeniería Electrónica de Comunicaciones
- Published
- 2015
20. Real time people detection combining appearance and depth image spaces using boosted random ferns
- Author
-
Alberto Sanfeliu, Victor Vaquero, Michael Villamizar, Comisión Interministerial de Ciencia y Tecnología, CICYT (España), European Commission, Ministerio de Economía y Competitividad (España), Institut de Robòtica i Informàtica Industrial, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel.ligents, and Universitat Politècnica de Catalunya. VIS - Visió Artificial i Sistemes Intel·ligents
- Subjects
Computer science ,Feature vector ,Feature extraction ,02 engineering and technology ,Discriminative model ,Boosted random ferns ,0502 economics and business ,11. Sustainability ,0202 electrical engineering, electronic engineering, information engineering ,Learning ,Computer vision ,Cybernetics::Artificial intelligence::Learning (artificial intelligence) [Classificació INSPEC] ,050210 logistics & transportation ,business.industry ,feature extraction ,05 social sciences ,Detector ,People detection ,Pattern recognition ,object detection ,Object detection ,RGBD ,020201 artificial intelligence & image processing ,Artificial intelligence ,Depth perception ,business ,Informàtica::Robòtica [Àrees temàtiques de la UPC] ,Classifier (UML) - Abstract
Trabajo presentado a la 2nd Iberian Robotics Conference (ROBOT-2015)., This paper presents a robust and real-time method for people detection in urban and crowed environments. Unlike other conventional methods which either focus on single features or compute multiple and independent classifiers specialized in a particular feature space, the proposed approach creates a synergic combination of appearance and depth cues in a unique classifier. The core of our method is a Boosted Random Ferns classifier that selects automatically the most discriminative local binary features for both the appearance and depth image spaces. Based on this classifier, a fast and robust people detector which maintains high detection rates in spite of environmental changes is created. The proposed method has been validated in a challenging RGB-D database of people in urban scenarios and has shown that outperforms state-of-the-art approaches in spite of the difficult environment conditions. As a result, this method is of special interest for real-time robotic applications where people detection is a key matter, such as human-robot interaction or safe navigation of mobile robots for example., This work has been partially funded by the EU project CargoANTs FP7-SST-2013-605598 and the Spanish CICYT project DPI2013-42458-P.
- Published
- 2015
21. A photogrammetric approach for real-time 3D localization and tracking of pedestrians in monocular infrared imagery
- Author
-
Toby P. Breckon, Mikolaj E. Kundegorski, Burgess, Douglas, Owen, Gari, Rana, Harbinder, Zamboni, Roberto, Kajzar, François, and Szep, Attila A.
- Subjects
Sensor networks ,Temporal fusion ,Passive target positioning ,Monocular ,Intelligent target reporting ,Thermal imaging ,business.industry ,Pedestrian detection ,People detection ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Thermal target tracking ,Context (language use) ,Kalman filter ,Photogrammetry ,Geography ,Position (vector) ,Global Positioning System ,Computer vision ,3D pedestrian localization ,Instrumentation (computer programming) ,Artificial intelligence ,business ,Temporal filtering - Abstract
Target tracking within conventional video imagery poses a significant challenge that is increasingly being addressed via complex algorithmic solutions. The complexity of this problem can be fundamentally attributed to the ambiguity associated with actual 3D scene position of a given tracked object in relation to its observed position in 2D image space. We propose an approach that challenges the current trend in complex tracking solutions by addressing this fundamental ambiguity head-on. In contrast to prior work in the field, we leverage the key advantages of thermal-band infrared (IR) imagery for the pedestrian localization to show that robust localization and foreground target separation, afforded via such imagery, facilities accurate 3D position estimation to within the error bounds of conventional Global Position System (GPS) positioning. This work investigates the accuracy of classical photogrammetry, within the context of current target detection and classification techniques, as a means of recovering the true 3D position of pedestrian targets within the scene. Based on photogrammetric estimation of target position, we then illustrate the efficiency of regular Kalman filter based tracking operating on actual 3D pedestrian scene trajectories. We present both a statistical and experimental analysis of the associated errors of this approach in addition to real-time 3D pedestrian tracking using monocular infrared (IR) imagery from a thermal-band camera. © (2014) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
- Published
- 2014
22. A multi-configuration part-based person detector
- Author
-
Thomas Sikora, Alvaro Garcia-Martin, Ruben Heras Evangelio, UAM. Departamento de Tecnología Electrónica y de las Comunicaciones, and Tratamiento e Interpretación de Vídeo (ING EPS-006)
- Subjects
Scheme (programming language) ,Telecomunicaciones ,business.industry ,Computer science ,Detector ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Task (project management) ,Multi-configuration Body Parts ,People Detection ,0202 electrical engineering, electronic engineering, information engineering ,Part-based Detector ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,State (computer science) ,business ,computer ,0105 earth and related environmental sciences ,computer.programming_language - Abstract
Proceedings of the Special Session on Multimodal Security and Surveillance Analytics 2014, held during the International Conference on Signal Processing and Multimedia Applications (SIGMAP 2014) in Vienna, People detection is a task that has generated a great interest in the computer vision and specially in the surveillance community. One of the main problems of this task in crowded scenarios is the high number of occlusions deriving from persons appearing in groups. In this paper, we address this problem by combining individual body part detectors in a statistical driven way in order to be able to detect persons even in case of failure of any detection of the body parts, i.e., we propose a generic scheme to deal with partial occlusions. We demonstrate the validity of our approach and compare it with other state of the art approaches on several public datasets. In our experiments we consider sequences with different complexities in terms of occupation and therefore with different number of people present in the scene, in order to highlight the benefits and difficulties of the approaches considered for evaluation. The results show that our approach improves the results provided by state of the art approaches specially in the case of crowded scenes, This work has been done while visiting the Communication Systems Group at the Technische Universität Berlin (Germany) under the supervision of Prof. Dr.-Ing. Thomas Sikora. This work has been partially supported by the Universidad Aut´onoma de Madrid (“Programa propio de ayudas para estancias breves en España y extranjero para Personal Docente e Investigador en Formación de la UAM”), by the Spanish Government (TEC2011-25995 EventVideo) and by the European Community’s FP7 under grant agreement number 261776 (MOSAIC).
- Published
- 2014
23. Context-adaptive multimodal wireless sensor network for energy-efficient gas monitoring
- Author
-
Luca Benini, Vana Jelicic, Davide Brunelli, Michele Magno, Giacomo Paci, Vana Jelicic, Michele Magno, Davide Brunelli, Giacomo Paci, and Luca Benini
- Subjects
Engineering ,Brooks–Iyengar algorithm ,Energy management ,Real-time computing ,02 engineering and technology ,people detection ,7. Clean energy ,01 natural sciences ,gas sensor ,0202 electrical engineering, electronic engineering, information engineering ,Mobile wireless sensor network ,ENERGY MANAGEMENT ,Electrical and Electronic Engineering ,Instrumentation ,business.industry ,Node (networking) ,010401 analytical chemistry ,020206 networking & telecommunications ,WIRELESS SENSOR NETWORKS ,0104 chemical sciences ,Key distribution in wireless sensor networks ,Embedded system ,Sensor node ,business ,Wireless sensor network ,energy management ,metal oxide semiconductor ,wireless sensor network ,Efficient energy use - Abstract
We present a wireless sensor network (WSN) for monitoring indoor air quality, which is crucial for people’s comfort, health, and safety because they spend a large percentage of time in indoor environments. A major concern in such networks is energy efficiency because gas sensors are power-hungry, and the sensor node must operate unattended for several years on a battery power supply. A system with aggressive energy management at the sensor level, node level, and network level is presented. The node is designed with very low sleep current consumption (only 8 μA), and it contains a metal oxide semiconductor gas sensor and a pyroelectric infrared (PIR) sensor. Furthermore, the network is multimodal ; it exploits information from auxiliary sensors, such as PIR sensors about the presence of people and from the neighbor nodes about gas concentration to modify the behavior of the node and the measuring frequency of the gas concentration. In this way, we reduce the nodes’ activity and energy requirements, while simultaneously providing a reliable service. To evaluate our approach and the benefits of the context-aware adaptive sampling, we simulate an application scenario which demonstrates a significant lifetime extension (several years) compared to the continuously-driven gas sensor. In March 2012, we deployed the WSN with 36 nodes in a four-story building and by now the performance has confirmed models and expectations.
- Published
- 2013
- Full Text
- View/download PDF
24. A new fuzzy based algorithm for solving stereo vagueness in detecting and tracking people
- Author
-
Rui Paúl, Miguel García-Silvente, Eugenio Aguirre, and Rafael Muòoz-Salinas
- Subjects
BitTorrent tracker ,business.industry ,Applied Mathematics ,People detection ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Fuzzy control system ,Kalman filter ,Filter (signal processing) ,Tracking (particle physics) ,Stereo vision ,Fuzzy logic ,Theoretical Computer Science ,Stereopsis ,Artificial Intelligence ,People tracking ,Computer vision ,Artificial intelligence ,Particle filter ,business ,Particle filtering ,Software ,Colour information ,Mathematics - Abstract
This paper describes a system capable of detecting and tracking various people using a new approach based on colour, stereo vision and fuzzy logic. Initially, in the people detection phase, two fuzzy systems are used to filter out false positives of a face detector. Then, in the tracking phase, a new fuzzy logic based particle filter (FLPF) is proposed to fuse stereo and colour information assigning different confidence levels to each of these information sources. Information regarding depth and occlusion is used to create these confidence levels. This way, the system is able to keep track of people, in the reference camera image, even when either stereo information or colour information is confusing or not reliable. To carry out the tracking, the new FLPF is used, so that several particles are generated while several fuzzy systems compute the possibility that some of the generated particles correspond to the new position of people. Our technique outperforms two well known tracking approaches, one based on the method from Nummiaro et al. [1] and other based on the Kalman/meanshift tracker method in Comaniciu and Ramesh [2]. All these approaches were tested using several colour-with-distance sequences simulating real life scenarios. The results show that our system is able to keep track of people in most of the situations where other trackers fail, as well as to determine the size of their projections in the camera image. In addition, the method is fast enough for real time applications.
- Published
- 2012
- Full Text
- View/download PDF
25. A corpus for benchmarking of people detection algorithms
- Author
-
José M. Martínez, Jesús Bescós, Alvaro Garcia-Martin, UAM. Departamento de Tecnología Electrónica y de las Comunicaciones, and Tratamiento e Interpretación de Vídeo (ING EPS-006)
- Subjects
Surveillance video ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Machine learning ,computer.software_genre ,Corpus ,Artificial Intelligence ,Sequence ,Ground truth ,Telecomunicaciones ,business.industry ,Frame (networking) ,Critical factors ,People detection ,Benchmarking ,Ground-truth ,Signal Processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Data mining ,business ,Algorithm ,computer ,Software ,Dataset - Abstract
This is the author’s version of a work that was accepted for publication in Pattern Recognition Letters. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Pattern Recognition Letters, 33, 2 (2012) DOI: 10.1016/j.patrec.2011.09.038, This paper describes a corpus, dataset and associated ground-truth, for the evaluation of people detection algorithms in surveillance video scenarios, along with the design procedure followed to generate it. Sequences from scenes with different levels of complexity have been manually annotated. Each person present at a scene has been labeled frame by frame, in order to automatically obtain a people detection ground-truth for each sequence. Sequences have been classified into different complexity categories depending on critical factors that typically affect the behavior of detection algorithms. The resulting corpus, which exceeds other public pedestrian datasets in the amount of video sequences and its complexity variability, is freely available for benchmarking and research purposes under a license agreement., This work has been partially supported by the Cátedra UAM-Infoglobal (“Nuevas tecnologías de vídeo aplicadas a sistemas de video-seguridad”), by the Ministerio de Ciencia e Innovación of the Spanish Goverment (TEC2011-25995 EventVideo: “Estrategias de segmentación, detección y seguimientos de objetos en entornos complejos para la detección de eventos en videovigilancia y monitorización”) and by the Universidad Autónoma de Madrid (“FPI-UAM: Programa propio de ayudas para la Formación de Personal Investigador”).
- Published
- 2012
26. People-background segmentation with unequal error cost
- Author
-
José M. Martínez, Alvaro Garcia-Martin, Andrea Cavallaro, UAM. Departamento de Tecnología Electrónica y de las Comunicaciones, and Tratamiento e Interpretación de Vídeo (ING EPS-006)
- Subjects
Detection confidence map ,Background subtraction ,Telecomunicaciones ,Contextual image classification ,business.industry ,Computer science ,Segmentation-based object categorization ,People detection ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Scale-space segmentation ,Pattern recognition ,Image segmentation ,Background confidence map ,People-background segmentation ,Segmentation ,Computer vision ,Artificial intelligence ,business - Abstract
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Á. García-Martín, A. Cavallaro, J. M. Martínez, "People-background segmentation with unequal error cost", in 19th IEEE International Conference on Image Processing, ICIP 2012, p. 157 - 160, We address the problem of segmenting a video in two classes of different semantic value, namely background and people, with the goal of guaranteeing that no people (or body parts) are classified as background. Body parts classified as background are given a higher classification error cost (segmentation with bias on background), as opposed to traditional approaches focused on people detection. To generate the people-background segmentation mask, the proposed approach first combines detection confidence maps of body parts and then extends them in order to derive a background mask, which is finally post-processed using morphological operators. Experiments validate the performance of our algorithm in different complex indoor and outdoor scenes with both static and moving cameras., Work partially supported by the Universidad Autónoma de Madrid (“FPI-UAM”) and by the Spanish Goverment (“TEC2011-25995 EventVideo”). This work was done while the first author was visting Queen Mary University of London.
- Published
- 2012
27. People detection based on appearance and motion models
- Author
-
Alexander G. Hauptmann, José M. Martínez, Alvaro Garcia-Martin, UAM. Departamento de Tecnología Electrónica y de las Comunicaciones, and Tratamiento e Interpretación de Vídeo (ING EPS-006)
- Subjects
Motion detector ,Implicit Shape Model ,Telecomunicaciones ,business.industry ,Computer science ,People detection ,Detector ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Implicit motion model ,Tracking (particle physics) ,TRECVID ,Object detection ,Motion (physics) ,MoSIFT ,Implicit shape model ,Motion estimation ,Computer vision ,Artificial intelligence ,business - Abstract
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. A. Garcia-Martin, A. Hauptmann, and J. M. Martínez "People detection based on appearance and motion models", in 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2011, p. 256-260, The main contribution of this paper is a new people detection algorithm based on motion information. The algorithm builds a people motion model based on the Implicit Shape Model (ISM) Framework and the MoSIFT descriptor. We also propose a detection system that integrates appearance, motion and tracking information. Experimental results over sequences extracted from the TRECVID dataset show that our new people motion detector produces results comparable to the state of the art and that the proposed multimodal fusion system improves the obtained results combining the three information sources., This work has been partially supported by the Cátedra UAM-Infoglobal ("Nuevas tecnologías de vídeo aplicadas a sistemas de video-seguridad") and by the Universidad Autónoma de Madrid (“FPI-UAM: Programa propio de ayudas para la Formación de Personal Investigador”)
- Published
- 2011
- Full Text
- View/download PDF
28. Single person pose recognition and tracking
- Author
-
Barbadillo Amor, Javier, Escuela Técnica Superior de Ingenieros Industriales y de Telecomunicación, Telekomunikazio eta Industria Ingeniarien Goi Mailako Eskola Teknikoa, Universidad Pública de Navarra. Departamento de Ingeniería Eléctrica y Electrónica, Nafarroako Unibertsitate Publikoa. Ingeniaritza Elektriko eta Elektronikoa Saila, and Gómez Laso, Miguel Ángel
- Subjects
Pose recognition ,Reconocimiento de personas ,People detection ,People tracking ,Reconocimiento de posturas ,Seguimiento de personas - Abstract
The goal of this thesis is to research detection and tracking of a single person with just one camera and recognize and track human poses in order to improve the performance of an interactive spatial game that uses the horizontal position of the person and the performed poses for its control. Of course the results achieved with this research are not attached to this application only but to any application that can be imagined for using human poses detection as input. The use of one single camera restricts considerably the depth information and it is more challenging for detection and tracking when self occlusions occur but, on the other hand, it has the advantage that you only need a simple camera to run applications of this kind so it is interesting for commercial use on laptops or personal computers with a web camera. These types of games controlled by human poses can develop physical aptitudes and coordination while having fun playing with them and also re lieves you from the load of holding a mouse or stick; it is just you and the screen.
- Published
- 2010
29. Sparsity-driven people localization algorithm: Evaluation in crowded scenes environments
- Author
-
Pierre Vandergheynst, Yannick Boursier, Laurent Jacques, and Alexandre Alahi
- Subjects
Discretization ,Pixel ,business.industry ,People detection ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Binary number ,Inverse problem ,Grid ,crowd ,Crowds ,Geography ,lts2 ,multi-view ,Computer Science::Computer Vision and Pattern Recognition ,lts4 ,Scalability ,Pattern recognition (psychology) ,Computer vision ,Artificial intelligence ,business ,Sparsity ,Algorithm - Abstract
We propose to evaluate our sparsity driven people localization framework on crowded complex scenes. The problem is recast as a linear inverse problem. It relies on deducing an occupancy vector, i.e. the discretized occupancy of people on the ground, from the noisy binary silhouettes observed as foreground pixels in each camera. This inverse problem is regularized by imposing a sparse occupancy vector, i.e. made of few non-zero elements, while a particular dictionary of silhouettes linearly maps these non-empty grid locations to the multiple silhouettes viewed by the cameras network. The proposed approach is (i) generic to any scene of people, i.e. people are located in low and high density crowds, (ii) scalable to any number of cameras and already working with a single camera, (iii) unconstraint on the scene surface to be monitored. Qualitative and quantitative results are presented given the PETS 2009 dataset. The proposed algorithm detects people in high density crowd, count and track them given severely degraded foreground silhouettes.
- Published
- 2009
- Full Text
- View/download PDF
30. A sparsity constrained inverse problem to locate people in a network of cameras
- Author
-
Pierre Vandergheynst, Alexandre Alahi, Laurent Jacques, and Yannick Boursier
- Subjects
Surface (mathematics) ,Pixel ,Computer science ,business.industry ,People detection ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,020207 software engineering ,02 engineering and technology ,Image segmentation ,Inverse problem ,Object detection ,Dynamic programming ,Constraint (information theory) ,Intelligent Network ,lts2 ,lts4 ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Sparsity - Abstract
A novel approach is presented to locate dense crowd of people in a network of fixed cameras given the severely degraded background subtracted silhouettes. The problem is formulated as a sparsity constrained inverse problem using an adaptive dictionary constructed on- line. The framework has no constraint on the number of cameras neither on the surface to be monitored. Even with a single camera, partially occluded and grouped people are correctly detected and segmented. Qualitative results are presented in indoor and outdoor scenes.
- Published
- 2009
- Full Text
- View/download PDF
31. An FPGA-Based People Detection System
- Author
-
Pierre-Olivier Laprise, James J. Clark, and Vinod Nair
- Subjects
MicroBlaze ,Computer science ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,lcsh:TK7800-8360 ,Image processing ,people detection ,computer vision ,lcsh:Telecommunication ,lcsh:TK5101-6720 ,Discrete cosine transform ,Smart camera ,Electrical and Electronic Engineering ,Image sensor ,Field-programmable gate array ,FPGA ,Background subtraction ,business.industry ,lcsh:Electronics ,smart camera ,Motion detection ,computer.file_format ,Frame rate ,JPEG ,Hardware and Architecture ,Signal Processing ,business ,computer ,Computer hardware ,Image compression - Abstract
This paper presents an FPGA-based system for detecting people from video. The system is designed to use JPEG-compressed frames from a network camera. Unlike previous approaches that use techniques such as background subtraction and motion detection, we use a machine-learning-based approach to train an accurate detector. We address the hardware design challenges involved in implementing such a detector, along with JPEG decompression, on an FPGA. We also present an algorithm that efficiently combines JPEG decompression with the detection process. This algorithm carries out the inverse DCT step of JPEG decompression only partially. Therefore, it is computationally more efficient and simpler to implement, and it takes up less space on the chip than the full inverse DCT algorithm. The system is demonstrated on an automated video surveillance application and the performance of both hardware and software implementations is analyzed. The results show that the system can detect people accurately at a rate of about 2.5 frames per second on a Virtex-II 2V1000 using a MicroBlaze processor running at 75 MHz, communicating with dedicated hardware over FSL links.
- Published
- 2005
- Full Text
- View/download PDF
32. Calibration automatique d’une caméra
- Author
-
Gilliéron, Fanny, Fleuret, François, and Berclaz, Jérôme
- Subjects
multiple view ,people detection ,camera calibration - Abstract
Une méthode utilisée pour faire du tracking de personne consiste à évaluer les points d’une carte du sol ayant la plus grande probabilité de contenir quelqu’un. Etant donné une position sur la carte, il s’agit de représenter une personne comme un rectangle dans l’image provenant d’une caméra, et d’estimer à quel point l’image binaire dérivée de l’image réelle ressemble à cette image synthétique, la comparaison étant faite par l’algorithme POM. Cette opération est faite sur plusieurs caméras placées à différents endroits, et filmant la même scène sous un angle différent. Il est alors possible de localiser la personne par la position la plus probable correspondant à toutes les vues caméra. Pour utiliser cette méthode, il faut connaître les transformations donnant la position de la tête, respectivement des pieds de la personne dans une vue caméra, étant donné un point de la carte. Ces transformations peuvent être déterminées de manière exacte si tous les paramètres de la caméra sont connus (position exacte, focale, etc..). Si toutes ces informations ne sont pas disponibles de manière précise, il faut alors calibrer la caméra par des méthodes annexes, par exemple en posant des marques au sol, et en les retrouvant manuellement sur les images des caméras, mais ce travail est relativement long. Durant ce projet, nous avons testé une méthode de calibration automatique des caméras permettant d’obtenir la transformation voulue sans avoir besoin de repères posés à la main. Cette calibration se base sur la détection de la position d’une personne en mouvement dans toute la zone couverte par les caméras ; cette position est ensuite utilisée à la place des repères au sol pour estimer la transformation.
33. Fixed Point Probability Field for Occlusion Handling
- Author
-
Fleuret, Francois, Lengagne, Richard, and Fua, Pascal
- Subjects
multi-view environment ,probabilistic framework ,occlusions ,Computer Science::Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,people detection ,visual surveillance - Abstract
In this paper, we show that in a multi-camera context, we can effectively handle occlusions at each time frame independently, even when the only available data comes from the binary output of a fairly primitive motion detector. We start from occupancy probability estimates in a top view and rely on a generative model to yield probability images to be compared with the actual input images. We then refine the estimates so that the probability images match the binary input images as well as possible. We demonstrate the quality of our results on several sequences involving complex occlusions.
34. People Detection with Heterogeneous Features and Explicit Optimization on Computation Time
- Author
-
Ariane Herbulot, Frédéric Lerasle, Alhayat Ali Mekonnen, Cyril Briand, Équipe Robotique, Action et Perception (LAAS-RAP), Laboratoire d'analyse et d'architecture des systèmes (LAAS), Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Équipe Recherche Opérationnelle, Optimisation Combinatoire et Contraintes (LAAS-ROC), ANR-12-CORD-0003,RIDDLE,Robots perceptuels et interactifs dédiés aux environnements quotidiens(2012), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Université de Toulouse (UT)-Institut National des Sciences Appliquées (INSA)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), and Université de Toulouse (UT)
- Subjects
Theoretical computer science ,binary integer programming ,business.industry ,Computer science ,Computation ,People detection ,Detector ,[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] ,020207 software engineering ,Feature selection ,02 engineering and technology ,Frame rate ,feature selection ,Discrete optimization ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Algorithm - Abstract
International audience; In this paper we present a novel people detector that employs discrete optimization for feature selection. Specifically, we use binary integer programming to mine heterogeneous features taking both detection performance and computation time explicitly into consideration. The final trained detector exhibits low Miss Rates with significant boost in frame rate. For example, it achieves a 2.6% less Miss Rate at 10e−4 FPPW compared to Dalal and Triggs HOG detector with a 9.22x speed improvement.
35. Sport Players Detection and Tracking With a Mixed Network of Planar and Omnidirectional Cameras
- Author
-
Yannick Boursier, Alexandre Alahi, Laurent Jacques, and Pierre Vandergheynst
- Subjects
Approximation theory ,Pixel ,Computer science ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Inverse problem ,Multi-view ,Object detection ,People Detection ,lts2 ,Shadow ,lts4 ,Computer vision ,Artificial intelligence ,Focus (optics) ,Quantization (image processing) ,Omnidirectional antenna ,business ,Sparsity - Abstract
A generic approach is presented to detect and track people with a network of fixed and omnidirectional cameras given severely degraded foreground silhouettes. The problem is formulated as a sparsity constrained inverse problem. A dictionary made of atoms representing the silhouettes of a person at a given location is used within the problem formulation. A reweighted scheme is considered to better approximate the sparsity prior. Although the framework is generic to any scene, the focus of this paper is to evaluate the performance of the proposed approach on a basketball game. The main challenges come from the players' behavior, their similar appearance, and the mutual occlusions present in the views. In addition, the extracted foreground silhouettes are severely degraded due to the polished floor reflecting the players, and the strong shadow present in the scene. We present qualitative and quantitative results with the APIDIS dataset as part of the ICDSC sport challenge. 1
36. Conditional Random Fields for Multi-Camera Object Detection
- Author
-
Roig Noguera, Gemma, Boix Bosch, Xavier, Ben Shitrit, Horesh, and Fua, Pascal
- Subjects
Multi-camera ,Multi-object detection ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,people detection - Abstract
We formulate a model for multi-class object detection in a multi-camera environment. From our knowledge, this is the first time that this problem is addressed taken into account different object classes simultaneously. Given several images of the scene taken from different angles, our system estimates the ground plane location of the objects from the output of several object detectors applied at each viewpoint. We cast the problem as an energy minimization modeled with a Conditional Random Field (CRF). Instead of predicting the presence of an object at each image location independently, we simultaneously predict the labeling of the entire scene. Our CRF is able to take into account occlusions between objects and contextual constraints among them. We propose an effective iterative strategy that renders tractable the underlying optimization problem, and learn the parameters of the model with the max-margin paradigm. We evaluate the performance of our model on several challenging multi-camera pedestrian detection datasets namely PETS 2009 and EPFL terrace sequence. We also introduce a new dataset in which multiple classes of objects appear simultaneously in the scene. It is here where we show that our method effectively handles occlusions in the multi-class case.
37. SSDT: Distance Tracking Model Based on Deep Learning
- Author
-
Gupta, Chhaya, Gill, Nasib Singh, and Preeti Gulia
- Subjects
Computer Networks and Communications ,Hardware and Architecture ,Deep Learning ,COVID-19 Social Distancing ,MF-SORT ,Object Detection ,People Detection ,Bounding Box ,Electrical and Electronic Engineering - Abstract
Coronavirus disease (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus and population vulnerability increased all over the world due to lack of effective remedial measures. Nowadays vaccines are available; but in India, only 18.8% population has been fully vaccinated till now. Therefore, social distancing is only precautionary norm to avoid the spreading of this deadly virus. The risk of virus spread can be avoided by adhering to this norm. The main objective of this work is to provide a framework for tracking social distancing violations among people. This paper proposes a deep learning platform-based Smart Social Distancing Tracker (SSDT) model which is trained on MOT (Multiple Object Tracking) datasets. The proposed model is a hybrid approach that is a combination of YOLOv4 as object detection model merged with MF-SORT, Kalman Filter and brute force feature matching technique to distinguish people from background and provide a bounding box around these. Further, the results are also compared with another model, namely, Faster- RCNN in terms of FPS (frames per second), mAP(mean Average Precision) and training time over the dataset. The results show that the proposed model provides better and more balanced results. The experiment has been carried out in challenging conditions including, occlusion and under lighting variations with mAP of 97% and a real-time speed of 24 fps. The datasets provide numerous classes and from all the classes of objects, only people class has been used for identifying people in a closet. The ultimate goal of the model is to provide a tracking solution that will be helpful for different authorities to redesigning the layout of public places and reducing the risk. This model is also helpful in computing the distance between two people in an image and the results confirm that the proposed model successfully distinguishes between individuals who walk too close or breach the social distancing norms.
38. WatchNet plus plus : efficient and accurate depth-based network for detecting people attacks and intrusion
- Author
-
Villamizar, M., Martinez-Gonzalez, A., Canevet, O., and Odobez, J. -M.
- Subjects
deep learning ,video surveillance ,people detection ,head-shoulder detection ,convolutional network - Abstract
We present an efficient and accurate people detection approach based on deep learning to detect people attacks and intrusion in video surveillance scenarios Unlike other approaches using background segmentation and pre-processing techniques, which are not able to distinguish people from other elements in the scene, we propose WatchNet++ that is a depth-based and sequential network that localizes people in top-view depth images by predicting human body joints and pairwise connections (links) such as head and shoulders. WatchNet++ comprises a set of prediction stages and up-sampling operations that progressively refine the predictions of joints and links, leading to more accurate localization results. In order to train the network with varied and abundant data, we also present a large synthetic dataset of depth images with human models that is used to pre-train the network model. Subsequently, domain adaptation to real data is done via fine-tuning using a real dataset of depth images with people performing attacks and intrusion. An extensive evaluation of the proposed approach is conducted for the detection of attacks in airlocks and the counting of people in indoors and outdoors, showing high detection scores and efficiency. The network runs at 10 and 28 FPS using CPU and GPU, respectively.
39. On the use of a low-cost thermal sensor to improve Kinect people detection in a mobile robot
- Author
-
Basilio Sierra, Loreto Susperregi, Elena Lazkano, Modesto Castrillón, Javier Lorenzo, and José María Martínez-Otzeta
- Subjects
Engineering ,vision ,Support Vector Machine ,Databases, Factual ,people detection ,lcsh:Chemical technology ,algorithms ,Biochemistry ,Thermopile ,Article ,sensor fusion ,computer vision ,hierarchical classification ,mobile robot/platform ,Analytical Chemistry ,Image Processing, Computer-Assisted ,Humans ,Computer vision ,lcsh:TP1-1185 ,Electrical and Electronic Engineering ,Instrumentation ,classifier ,business.industry ,Supervised learning ,Reproducibility of Results ,Robotics ,Mobile robot ,tracking ,Sensor fusion ,Atomic and Molecular Physics, and Optics ,Support vector machine ,Histogram of oriented gradients ,Thermography ,Robot ,Artificial intelligence ,business ,Environmental Monitoring - Abstract
Detecting people is a key capability for robots that operate in populated environments. In this paper, we have adopted a hierarchical approach that combines classifiers created using supervised learning in order to identify whether a person is in the view-scope of the robot or not. Our approach makes use of vision, depth and thermal sensors mounted on top of a mobile platform. The set of sensors is set up combining the rich data source offered by a Kinect sensor, which provides vision and depth at low cost, and a thermopile array sensor. Experimental results carried out with a mobile platform in a manufacturing shop floor and in a science museum have shown that the false positive rate achieved using any single cue is drastically reduced. The performance of our algorithm improves other well-known approaches, such as C-4 and histogram of oriented gradients (HOG). This work was supported by Kutxa Obra Social in the project, KtBot. Work partially funded by the Institute of Intelligent Systems and Numerical Applications in Engineering (SIANI) and the Computer Science Department at ULPGC. The Basque Government Research Team grant and the University of the Basque Country UPV/EHU, under grant UFI11/45 (BAILab) are acknowledged.
40. People detection in dynamic images
- Author
-
Marco Leo, G. Attolico, Arcangelo Distante, and A. Branca
- Subjects
Surveillance ,Artificial neural network ,business.industry ,Computer science ,People detection ,NULL ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Cognitive neuroscience of visual object recognition ,Wavelet transform ,Motion detection ,Context (language use) ,Pattern recognition ,Classification ,Motion (physics) ,Object detection ,Wavelet ,Computer vision ,Artificial intelligence ,business - Abstract
The main aim of this work is people detection in outdoor environments in the context of video surveillance for intruder detection in archeological sites. Our goal is to propose an example-based learning technique to detect people in dynamic scenes. The classification is purely based on the people shape and not on its image content. First motion information is used for detecting the objects of interest. Haar wavelets are used to represent the images and, finally, a supervised three layer neural network is used to classify the patterns.
41. HDR imaging for enchancing people detection and tracking in indoor environments
- Author
-
Panagiotis Agrafiotis, Stathopoulou, E. K., Georgopoulos, A., and Doulamis, A.
- Subjects
HDR imaging ,People detection ,People tracking ,Engineering and Technology ,Civil Engineering - Abstract
VISAPP 2015 - 10th International Conference on Computer Vision Theory and Applications; VISIGRAPP, Proceedings, Volume 2, 2015, Pages 623-630 Videos and image sequences of indoor environments with challenging illumination conditions often capture either brightly lit or dark scenes where every single exposure may contain overexposed and/or underexposed regions. High Dynamic Range (HDR) images contain information that standard dynamic range ones, often mentioned also as low dynamic range images (SDR/LDR) cannot capture. This paper investigates the contribution of HDR imaging in people detection and tracking systems. In order to evaluate this contribution of the HDR imaging in the accuracy and robustness of pedestrian detection and tracking in challenging indoor visual conditions, two state of the art trackers of different complexity were implemented. To this direction data were collected taking into account the requirements and real-life indoor scenarios and HDR frames were produced. The algorithms were applied to the SDR data and their corresponding HDR data and were compared and evaluated for their robustness and accuracy in terms of precision and recall. Results show that that the use of HDR images enhances the performance of the detection and tracking scheme, making it robust and more reliable.
42. Principled Detection-by-Classification from Multiple Views
- Author
-
Berclaz, J., Fleuret, F., and Pascal Fua
- Subjects
classification ,bayesian framework ,people detection - Abstract
Machine-learning based classification techniques have been shown to be effective at detecting objects in complex scenes. However, the final results are often obtained from the alarms produced by the classifiers through a post-processing which typically relies on ad hoc heuristics. Spatially close alarms are assumed to be triggered by the same target and grouped together. Here we replace those heuristics by a principled Bayesian approach, which uses knowledge about both the classifier response model and the scene geometry to combine multiple classification answers. We demonstrate its effectiveness for multi-view pedestrian detection. We estimate the marginal probabilities of presence of people at any location in a scene, given the responses of classifiers evaluated in each view. Our approach naturally takes into account both the occlusions and the very low metric accuracy of the classifiers due to their invariance to translation and scale. Results show our method produces one order of magnitude fewer false positives than a method that is representative of typical state-of-the-art approaches. Moreover, the framework we propose is generic and could be applied to any detection-by-classification task.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.