Author: "Heinrich Niemann" / Topic: artificial intelligence - Searchworks@Jio Institute Digital Library Search Results

1. Prosody takes over: towards a prosodically guided dialog system

Author: Anton Batliner, Marion Mast, Heinrich Niemann, Ralf Kompe, K. Ott, Andreas Kießling, Thomas Kuhn, and Elmar Nöth
Subjects: Linguistics and Language, Computer science, business.industry, Communication, Speech recognition, Intonation (linguistics), Speech synthesis, computer.software_genre, Language and Linguistics, Computer Science Applications, Domain (software engineering), Officer, Modeling and Simulation, Computer Vision and Pattern Recognition, Artificial intelligence, Dialog box, Dialog system, ddc:004, Prosody, business, computer, Software, Utterance, Natural language processing
Abstract: The domain of the speech recognition and dialog system EVAR is train time table inquiry. We observed that in real human-human dialogs when the officer transmits the information, the customer very often interrupts. Many of these interruptions are just repetitions of the time of day given by the officer. The functional role of these interruptions is often determined by prosodic cues only. An important result of experiments where naive persons used the EVAR system is that it is hard to follow the train connection given via speech synthesis. In this case it is even more important than in human-human dialogs that the user has the opportunity to interact during the answer phase. Therefore we extended the dialog module to allow the user to repeat the time of day and we added a prosody module guiding the continuation of the dialog by analyzing the intonation contour of this utterance.
Published: 2020

2. Comparison and combination of confidence measures

Author: Elmar Nöth, Stefan Steidl, Anton Batliner, Georg Stemmer, and Heinrich Niemann
Subjects: Training set, Artificial neural network, business.industry, Computer science, Machine learning, computer.software_genre, Cross entropy, Posteriori probability, Confidence measures, Entropy (information theory), Beam search, Artificial intelligence, Language model, ddc:004, business, computer
Abstract: A set of features for word-level confidence estimation is developed. The features should be easy to implement and should require no additional knowledge beyond the information which is available from the speech recognizer and the training data. We compare a number of features based on a common scoring method, the normalized cross entropy. We also study different ways to combine the features. An artificial neural network leads to the best performance, and a recognition rate of 76% is achieved. The approach is extended not only to detect recognition errors but also to distinguish between insertion and substitution errors.
Published: 2020

3. Voice source state as a source of information in speech recognition: detection of laryngealizations

Author: Anton Batliner, Ralf Kompe, Elmar Nöth, Heinrich Niemann, and Andreas Kießling
Subjects: Voice activity detection, Artificial neural network, Computer science, business.industry, Speech recognition, Computation, Pattern recognition, Linear discriminant analysis, Signal, Constant false alarm rate, Cepstrum, State (computer science), Artificial intelligence, ddc:004, business
Abstract: Laryngealizations are irregular voiced portions of speech, which can have morpho-syntactic functions and can disturb the automatic computation of FO. Two methods for the automatic detection of laryngealizations are described in this paper: With a Gaussian classifier using spectral and cepstral features a recognition rate of 80% (false alarm rate of 8%) could be achieved. As an alternative a “non-standard” method has been developed: an artificial neural network (ANN) was used for non-linear inverse filtering of speech signals. The inversely filtered signal was directly used as input for another ANN, which was trained to detect laryngealizations. In preliminary experiments we obtained a recognition rate of 65% (12% false alarms).
Published: 2020

4. The prosodic marking of phrase boundaries: expectations and results

Author: Ralf Kompe, Elmar Nöth, Anton Batliner, Heinrich Niemann, U. Kilian, and Andreas Kießling
Subjects: Sentence generation, Phrase, Grammar, business.industry, Computer science, media_common.quotation_subject, computer.software_genre, Perception, Dependent clause, Artificial intelligence, ddc:004, business, computer, Natural language processing, Sentence, media_common
Abstract: Using sentence templates and a stochastic context-free grammar a large corpus (10,000 sentences) has been created, where prosodic phrase boundaries are labeled in the sentences automatically during sentence generation. With perception experiments on a subset of 500 utterances we verified that 92% of the automatically marked boundaries were perceived as prosodically marked. In initial automatic classification experiments for three levels of boundaries recognition rates up to 81% could be achieved.
Published: 2020

5. Can you tell apart spontaneous and read speech if you just look at prosody?

Author: Andreas Kießling, Heinrich Niemann, Elmar Nöth, Ralf Kompe, and Anton Batliner
Subjects: Systematic difference, Computer science, business.industry, Speech recognition, Speech corpus, computer.software_genre, language.human_language, German, language, Artificial intelligence, ddc:004, Prosody, business, computer, Classifier (UML), Natural language processing, Spontaneous speech
Abstract: Although the recognition of spontaneous speech is the ultimate aim of speech understanding systems it has rarely been investigated so far. In this article first analyses of a German database containing identical utterances of spontaneous and read speech are presented. We describe the differences in prosody between these two registers and report results of a classifier that was trained using prosodic features to discriminate spontaneous and read speech. A systematic difference could be observed that is however rather complex and partly speaker dependent
Published: 2020

6. On the use of prosody in automatic dialogue understanding

Author: Elmar Nöth, M. Nutt, Heinrich Niemann, Florian Gallwitz, Richard Huber, Anton Batliner, Jürgen Haas, Volker Warnke, Jan Buckow, and Manuela Boros
Subjects: Linguistics and Language, Shallow parsing, Parsing, Artificial neural network, business.industry, Computer science, Communication, SIGNAL (programming language), Speech synthesis, computer.software_genre, Language and Linguistics, Computer Science Applications, Task (project management), Modeling and Simulation, Segmentation, Computer Vision and Pattern Recognition, Artificial intelligence, ddc:004, business, Prosody, computer, Software, Natural language processing
Abstract: In this paper, we show how prosodic information can be used in automatic dialogue systems and give some examples of promising new approaches. Most of these examples are taken from our own work in the V erbmobil speech-to-speech translation system and in the EVAR train timetable dialogue system. In a `prosodic orbit', we first present units, phenomena, annotations and statistical methods from the signal (acoustics) to the dialogue understanding phase. We show then, how prosody can be used together with other knowledge sources for the task of resegmentation if a first segmentation turns out to be wrong, and how an integrated approach leads to better results than a sequential use of the different knowledge sources; then we present a hybrid approach which is used to perform a shallow parsing and which uses prosody to guide the parsing; finally, we show how a critical system evaluation can help to improve the overall performance of automatic dialogue systems.
Published: 2019

7. Prosodic processing and its use in VERBMOBIL

Author: Anton Batliner, Ralf Kompe, Heinrich Niemann, Elmar Nöth, and A. Kiessling
Subjects: Parsing, business.industry, Computer science, Speech recognition, artificial intelligence, computer.software_genre, Speech processing, Künstliche Intelligenz, Artificial intelligence, ddc:620, Computational linguistics, ddc:004, Prosody, business, computer, Natural language processing, Word (computer architecture), Sentence
Abstract: We present the prosody module of the VERBMOBlL speech-to-speech translation system, the world wide first complete system, which successfully uses prosodic information in the linguistic analysis. This is achieved by computing probabilities for clause boundaries, accentuation, and different types of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings.
Published: 2019

8. M = Syntax + Prosody: a syntactic–prosodic labelling scheme for large spontaneous speech databases

Author: Andreas Kießling, Heinrich Niemann, Ralf Kompe, Anton Batliner, Elmar Nöth, and M. Mast
Subjects: Scheme (programming language), Linguistics and Language, Computer science, Speech recognition, Boundary (topology), computer.software_genre, Language and Linguistics, Prosody, computer.programming_language, Database, Artificial neural network, business.industry, Communication, Statistical model, Perceptron, Syntax, Computer Science Applications, Modeling and Simulation, Computer Vision and Pattern Recognition, Language model, Artificial intelligence, ddc:004, business, computer, Software, Natural language processing
Abstract: In automatic speech understanding, division of continuous running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistical models for prosodic boundaries large databases are necessary. For the German Verbmobil (VM) project (automatic speech-to-speech translation), we developed a syntactic‐prosodic labelling scheme where diAerent types of syntactic boundaries are labelled for a large spontaneous speech corpus. This labelling scheme is presented and compared with other labelling schemes for perceptual‐prosodic, syntactic, and dialogue act boundaries. Interlabeller consistencies and estimation of eAort needed are discussed. We compare the results of classifiers (multi-layer perceptrons (MLPs) and n-gram language models) trained on these syntactic‐prosodic boundary labels with classifiers trained on perceptual‐prosodic and pure syntactic labels. The main advantage of the rough syntactic‐prosodic labels presented in this paper is that large amounts of data can be labelled with relatively little eAort. The classifiers trained with these labels turned out to be superior with respect to purely prosodic or syntactic labelling schemes, yielding recognition rates of up to 96% for the two-class-problem ‘boundary versus no boundary’. The use of boundary information leads to a marked improvement in the syntactic processing of the VM system. ” 1998 Elsevier Science B.V. All rights reserved.
Published: 2019

9. Verbmobil: the use of prosody in the linguistic components of a speech understanding system

Author: Ralf Kompe, Elmar Nöth, Anton Batliner, Heinrich Niemann, and A. Kiessling
Subjects: Parsing, Acoustics and Ultrasonics, business.industry, Computer science, Speech recognition, Intonation (linguistics), computer.software_genre, Speech processing, Syntax, Linguistics, ComputingMethodologies_PATTERNRECOGNITION, Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, Language translation, ddc:004, business, Prosody, computer, Software, Natural language, Sentence, Natural language processing
Abstract: We show how prosody can be used in speech understanding systems. This is demonstrated with the VERBMOBIL speech to-speech translation system which, to our knowledge, is the first complete system which successfully uses prosodic information in the linguistic analysis. Prosody is used by computing probabilities for clause boundaries, accentuation, and different types, of sentence mood for each of the word hypotheses computed by the word recognizer. These probabilities guide the search of the linguistic analysis. Disambiguation is already achieved during the analysis and not by a prosodic verification of different linguistic hypotheses. So far, the most useful prosodic information is provided by clause boundaries. These are detected with a recognition rate of 94%. For the parsing of word hypotheses graphs, the use of clause boundary probabilities yields a speed-up of 92% and a 96% reduction of alternative readings.
Published: 2019

10. A FRAMEWORK FOR ACTIVELY SELECTING VIEWPOINTS IN OBJECT RECOGNITION

Author: Heinrich Niemann, Frank Deinzer, Joachim Denzler, and Christian Derichs
Subjects: Correctness, Computer science, business.industry, 3D single-object recognition, Cognitive neuroscience of visual object recognition, Pattern recognition, Object (computer science), Discriminative model, Artificial Intelligence, Reinforcement learning, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Active vision, Focus (optics), business, Software
Abstract: Object recognition problems in computer vision are often based on single image data processing. In various applications this processing can be extended to a complete sequence of images, usually received passively. In contrast, we propose a method for active object recognition, where a camera is selectively moved around a considered object. Doing so, we aim at reliable classification results with a clearly reduced amount of necessary views by optimizing the camera movement for the access of new viewpoints (viewpoint selection). Therefore, the optimization criterion is the gain of class discriminative information when observing the appropriate next image. We show how to apply an unsupervised reinforcement learning algorithm to that problem. Specifically, we focus on the modeling of continuous states, continuous actions and supporting rewards for an optimized recognition. We also present an algorithm for the sequential fusion of gathered image information and we combine all these components into a single framework. The experimental evaluations are split into results for synthetic and real objects with one- or two-dimensional camera actions, respectively. This allows the systematic evaluation of the theoretical correctness as well as the practical applicability of the proposed method. Our experiments showed that the proposed combined viewpoint selection and viewpoint fusion approach is able to significantly improve the recognition rates compared to passive object recognition with randomly chosen views.
Published: 2009
Full Text: View/download PDF

11. Data Selection for Hand-eye Calibration: A Vector Quantization Approach

Author: Heinrich Niemann and Jochen Schmidt
Subjects: Sequence, business.industry, Calibration (statistics), Applied Mathematics, Mechanical Engineering, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Vector quantization, Degrees of freedom (statistics), Eye movement, Technische Fakultät -ohne weitere Spezifikation, Artificial Intelligence, Modeling and Simulation, Computer vision, Artificial intelligence, ddc:004, Electrical and Electronic Engineering, Representation (mathematics), business, Rotation (mathematics), Software, Mathematics, Curse of dimensionality
Abstract: This paper presents new vector quantization based methods for selecting well-suited data for hand-eye calibration from a given sequence of hand and eye movements. Data selection can improve the accuracy of classic hand-eye calibration, and make it possible in the first place in situations where the standard approach of manually selecting positions is inconvenient or even impossible, especially when using continuously recorded data. A variety of methods is proposed, which differ from each other in the dimensionality of the vector quantization compared to the degrees of freedom of the rotation representation, and how the rotation angle is incorporated. The performance of the proposed vector quantization based data selection methods is evaluated using data obtained from a manually moved optical tracking system (hand) and an endoscopic camera (eye).
Published: 2008
Full Text: View/download PDF

12. On minimizing errors in 3D reconstruction for stereo camera systems

Author: Heinrich Niemann, Joachim Denzler, and Stefan Wenhardt
Subjects: Pixel, business.industry, Visibility (geometry), 3D reconstruction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Triangulation (computer vision), Image plane, Translation (geometry), Computer Graphics and Computer-Aided Design, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Noise (video), business, Stereo camera, Mathematics
Abstract: Active reconstruction of 3D surfaces deals with the control of camera viewpoints to minimize error and uncertainty in the reconstructed shape of an object. In this paper we develop a mathematical relationship between the setup and focal lengths of a stereo camera system and the corresponding error in 3D reconstruction of a given surface. We explicitly model the noise in the image plane, which can be interpreted as pixel noise or as uncertainty in the localization of corresponding point features. The results can be used to plan sensor posi- tioning, e.g., using information theoretic concepts for optimal sensor data selection. The paper is structured as follows: first, we describe the setup for 3D reconstruction using triangulation in a normalized stereo camera system. Then we present a mathematical development of the reconstruction error in a simple 2D model, taking explicitly into account noise in a one dimensional image plane. We map the problem of optimal stereo positioning to an optimiza- tion problem. This will be analyzed first, to get the opti- mal focal length and the optimal baseline (translation in x direction) in a normalized 2D stereo system. Further we gradually generalize this model, firstly by rotations, and secondly by translation in x and z directions. In this simple case we can perform a partial analytical analy- sis, but there are visibility assumptions which cannot be fulfilled in real stereo systems. Therefore, we further generalize to a 3D model with visibility constraints. In this model, we cannot perform an analytical analysis; therefore, we optimize the modifiable parameters by a Monte Carlo simulation. The results will be compared with the analytical results of the simple case. We present experimental results and compare them with the theoretical predictions, and conclude this paper with prospects for future study.
Published: 2007
Full Text: View/download PDF

13. Fast training for object recognition with structure-from-motion

Author: Michael Reinhold, Heinrich Niemann, Marcin Grzegorzek, and I. Scholz
Subjects: Computer science, business.industry, Feature vector, 3D single-object recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Cognitive neuroscience of visual object recognition, Process (computing), Pattern recognition, Object (computer science), Computer Graphics and Computer-Aided Design, Data set, Pattern recognition (psychology), Structure from motion, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, business
Abstract: In this paper we present a system for statistical object classification and localization that applies a simplified image acquisition process for the learning phase. Instead of using complex setups to take training images in known poses, which is very time-consuming and not possible for some objects, we use a handheld camera. The pose parameters of objects in all training frames that are necessary for creating the object models are determined using a structure-from-motion algorithm. The local feature vectors we use are derived from wavelet multiresolution analysis. We model the object area as a function of 3D transformations and introduce a background model. Experiments made on a real data set taken with a handheld camera with more than 2500 images show that it is possible to obtain good classification and localization rates using this fast image acquisition method.
Published: 2007
Full Text: View/download PDF

14. Towards automated diagnostic evaluation of retina images

Author: Jiri Jan, Georg Michelson, Radim Chrástek, L. Kubecka, Christian Y. Mardin, Berthold Lausen, and Heinrich Niemann
Subjects: Active contour model, Pixel, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Scale-space segmentation, Image segmentation, Diagnostic evaluation, Computer Graphics and Computer-Aided Design, Region of interest, Pattern recognition (psychology), Computer vision, Segmentation, Computer Vision and Pattern Recognition, Artificial intelligence, business
Abstract: In this paper we describe the automatic segmentation of the optic nerve head (ONH) with the long-term goal of automatically diagnosing early stages of glaucoma. The images are average images obtained from a scanning laser ophthalmoscope (SLO). The segmentation consists of the main s teps of finding a region of interest containing the ONH, constraining the search space for final segmentation, and computing the fine segmentation by an active contour model. The agreement of “true positive pixels,” i.e., pixels attributed to the ONH by both manual and automatic segmentation, is very good. The classification results from three different classifiers using manual or automatic segmentation still show an advantage of manual segmentation. One means to further improve the automatic segmentation is to use information from an SLO as well as from a fundus camera.
Published: 2006
Full Text: View/download PDF

15. Appearance-based recognition of 3-D objects by cluttered background and occlusions

Author: Joachim Denzler, Heinrich Niemann, Michael Reinhold, and Marcin Grzegorzek
Subjects: Background subtraction, business.industry, 3D single-object recognition, Cognitive neuroscience of visual object recognition, Pattern recognition, Image plane, Object (computer science), Active appearance model, Artificial Intelligence, Signal Processing, Object model, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Noise (video), business, Software, Mathematics
Abstract: In this article we present a new appearance-based approach for the classification and the localization of 3-D objects in complex scenes. A main problem for object recognition is that the size and the appearance of the objects in the image vary for 3-D transformations. For this reason, we model the region of the object in the image as well as the object features themselves as functions of these transformations. We integrate the model into a statistical framework, and so we can deal with noise and illumination changes. To handle heterogeneous background and occlusions, we introduce a background model and an assignment function. Thus, the object recognition system becomes robust, and a reliable distinction, which features belong to the object and which to the background, is possible. Experiments on three large data sets that contain rotations orthogonal to the image plane and scaling with together more than 100000 images show that the approach is well suited for this task.
Published: 2005
Full Text: View/download PDF

16. Evaluating the quality of light fields computed from hand-held camera images

Author: Heinrich Niemann and Ingo Scholz
Subjects: business.industry, Computer science, Camera matrix, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Image-based modeling and rendering, Camera interface, Motion field, Image-based lighting, Artificial Intelligence, Camera auto-calibration, Computer Science::Computer Vision and Pattern Recognition, Computer graphics (images), Signal Processing, Structure from motion, Pinhole camera model, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, business, Software, Stereo camera, Light field, Camera resectioning
Abstract: Given an image sequence recorded by a hand-held camera we examine the computation of a light field without any further input data. Using structure-from-motion algorithms and optimization techniques camera motion and a 3-D reconstruction of the scene are established. The light field is completed by computing local depth information for each input image. During experimental evaluation a special focus is set on the effects of falsely estimated intrinsic parameters as well as different depth representations on the quality of the resulting light fields.
Published: 2005
Full Text: View/download PDF

17. KNOWLEDGE-BASED SCENE EXPLORATION USING COMPUTER VISION AND LEARNED ANALYSIS STRATEGIES

Author: Ulrike Ahlrichs, Heinrich Niemann, and Dietrich Paulus
Subjects: Knowledge representation and reasoning, Computer science, business.industry, Machine vision, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Semantic network, Artificial Intelligence, Camera auto-calibration, A priori and a posteriori, Reinforcement learning, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Smart camera, business, Active vision, Software
Abstract: In this contribution we demonstrate how the task of visual scene exploration can be solved by a knowledge-based vision system. During scene exploration, the system searches for a fixed number of a priori known objects in a static scene. If not all objects are visible using the initial camera set-up, the camera parameters have to be adjusted and the camera has to be moved by the system. This problem is reduced to the choice of optimal camera actions. The information about the objects and the camera actions is uniformly represented in a semantic network. In addition, a control algorithm is provided that finds the optimal assignment from objects to parts of a scene based on a suitable analysis strategy. This strategy is acquired by the system itself using reinforcement learning methods. The paper focuses on aspects of knowledge representation concerning the integration of camera actions and on the integration of reinforcement learning methods in a semantic network formalism and applies them in a realistic setup. Experiments are shown for images of two office rooms.
Published: 2004
Full Text: View/download PDF

18. Multimodal Retinal Image Registration for Optic Disk Segmentation

Author: Heinrich Niemann, M Skokan, K. Donath, Matthias T.F. Wolf, J Jan, L. Kubecka, Georg Michelson, and Radim Chrástek
Subjects: Advanced and Specialized Nursing, genetic structures, Pixel, business.industry, Computer science, Optic disk, Health Informatics, Fundus (eye), Laser, eye diseases, Hough transform, law.invention, Health Information Management, law, Canny edge detector, Segmentation, Computer vision, sense organs, Artificial intelligence, Tomography, business
Abstract: Summary Objectives: The analysis of the optic disk morphology with the means of the scanning laser tomography is an important step for glaucoma diagnosis. A method we developed for optic disk segmentation in images of the scanning laser tomograph is limited by noise, nonuniform illumination and presence of blood vessels. Inspired by recent medical research, we wanted to develop a tool for improving optic disk segmentation by registration of images of the scanning laser tomograph and color fundus photographs and by applying a method we developed for optic disk segmentation in color fundus photographs. Methods: The segmentation of the optic disk for glaucoma diagnosis in images of the scanning laser tomograph is based on morphological operations, detection of anatomical structures and active contours and has been described in a previous paper [1]. The segmentation of the optic disk in the fundus photographs is based on nonlinear filtering, Canny edge detector and a modified Hough transform. The registration is based on mutual information using simulated annealing for finding maxima. Results: The registration was successful 86.8% of the time when tested on 174 images. Results of the registration have shown a very low displacement error of a maximum of about 5 pixels. The correctness of the registration was manually evaluated by measuring distances between the real vessel borders and those from the registered image. Conclusions: We have developed a method for the registration of images of the scanning laser tomograph and fundus photographs. Our first experiments showed that the optic disk segmentation could be improved by fused information from both image modalities.
Published: 2004
Full Text: View/download PDF

19. Brain volumes characterisation using hierarchical neural networks

Author: Ovidio Salvetti, Heinrich Niemann, Gabriele Pieri, and Sergio Di Bona
Subjects: Male, Neural Networks, Computer science, Medicine (miscellaneous), Brain tissue, Tissue density, Machine learning, computer.software_genre, Imaging, Three-Dimensional, Neuroimaging, Artificial Intelligence, Voxel, Image Processing, Computer-Assisted, Humans, 3D images, Mri brain, Aged, Objective knowledge, Image Classification, Artificial neural network, business.industry, Brain, Reproducibility of Results, Middle Aged, Magnetic Resonance Imaging, Female, Neural Networks, Computer, Medical imaging, Artificial intelligence, Tomography, X-Ray Computed, business, computer, Algorithms
Abstract: Objective knowledge of tissue density distribution in CT/MRI brain datasets can be related to anatomical or neuro-functional regions for assessing pathologic conditions characterised by slight differences. The process of monitoring illness and its treatment could be then improved by a suitable detection of these variations. In this paper, we present an approach for three-dimensional (3D) classification of brain tissue densities based on a hierarchicalartificial neural network(ANN) able to classify the single voxels of the examined datasets. The method developed was tested on case studies selected by an expert neuro-radiologist and consisting of both normal andpathological conditions. The results obtained were submitted for validation to a group of physicians and they judged the system to be really effective in practical applications.
Published: 2003
Full Text: View/download PDF

20. Localization and classification based on projections

Author: Joachim Hornegger, Volkmar Welker, and Heinrich Niemann
Subjects: Matching (statistics), Computational complexity theory, business.industry, Cognitive neuroscience of visual object recognition, Probabilistic logic, Pattern recognition, Reduction (complexity), Range (mathematics), Artificial Intelligence, Feature (computer vision), Signal Processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Pose, Software, Mathematics
Abstract: Due to the loss of range information, projections as input data for a 3-D object recognition algorithm are expected to increase the computational complexity. In this work, however, we demonstrate that this deficiency carries potential for complexity reduction of major vision problems. We show that projections provide a reduction of feature dimensions, and lead to structures exhibiting simple combinatorial properties. The theoretical framework is embedded in a probabilistic setting which deals with uncertainties and variations of observed features. In statistics marginal densities and the assumption of independency prove to be the key tools when one encounters projections. The examples discussed in this paper include feature matching, pose estimation as well as classification of 3-D objects. The final experimental evaluation demonstrates the practical importance of the marginalization concept and independency assumptions.
Published: 2002
Full Text: View/download PDF

21. Erscheinungsbasierte statistische Objekterkennung

Author: Heinrich Niemann and Josef Pösl
Subjects: General Computer Science, Computer science, business.industry, Image processing, Statistical analysis, Computer vision, Artificial intelligence, business
Abstract: Die automatische Erkennung und Lokalisation von Objekten in digitalen Bildern ist ein wesentlicher Bestandteil vieler praktisch relevanter Anwendungen. In diesem Artikel wird ein erscheinungsbasiertes Verfahren zur Erkennung starrer zwei- oder dreidimensionaler Objekte vorgestellt, dem eine statistische Modellierung zugrundeliegt. Im Gegensatz zu segmentierungsbasierten Verfahren, wie sie vor allem im Bereich der 3D-Objekterkennung eingesetzt werden, ermoglicht der erscheinungsbasierte Ansatz aufgrund der Modellierung der Intensitatswerte oder davon abgeleiteter lokaler Merkmale eines Bildes die Erkennung komplexer Objekte. Die statistische Formulierung der Problemstellung bildet den mathematischen Kontext zur Bestimmung optimaler Losungen.
Published: 2002
Full Text: View/download PDF

22. Integrated recognition of words and prosodic phrase boundaries

Author: Volker Warnke, Heinrich Niemann, Elmar Nöth, and Florian Gallwitz
Subjects: Linguistics and Language, Parsing, Phrase, Computer science, business.industry, Communication, Speech recognition, Word error rate, computer.software_genre, Language and Linguistics, Computer Science Applications, ComputingMethodologies_PATTERNRECOGNITION, Modeling and Simulation, Classifier (linguistics), Computer Vision and Pattern Recognition, Language model, Artificial intelligence, business, Hidden Markov model, Prosody, computer, Software, Natural language processing, Word (computer architecture)
Abstract: In this paper, we present an integrated approach for recognizing both the word sequence and the syntactic–prosodic structure of a spontaneous utterance. The approach aims at improving the performance of the understanding component of speech understanding systems by exploiting not only acoustic–phonetic and syntactic information, but also prosodic information directly within the speech recognition process. Whereas spoken utterances are typically modelled as unstructured word sequences in the speech recognizer, our approach includes phrase boundary information in the language model and provides HMMs to model the acoustic and prosodic characteristics of phrase boundaries. This methodology has two major advantages compared to purely word-based speech recognizers. First, additional syntactic–prosodic boundaries are determined by the speech recognizer which facilitates parsing and resolve syntactic and semantic ambiguities. Second – after having removed the boundary information from the result of the recognizer – the integrated model yields a 4% relative word error rate (WER) reduction compared to a traditional word recognizer. The boundary classification performance is equal to that of a separate prosodic classifier operating on the word recognizer output, thus making a separate classifier unnecessary for this task and saving the computation time involved. Compared to the baseline word recognizer, the integrated word-and-boundary recognizer does not involve any computational overhead.
Published: 2002
Full Text: View/download PDF

23. Neural networks for the recognition and pose estimation of 3D objects from a single 2D perspective view

Author: Chunrong Yuan and Heinrich Niemann
Subjects: Neural gas, Artificial neural network, Computer science, business.industry, Time delay neural network, Feature vector, Pattern recognition, Real image, Rprop, Signal Processing, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, business, Pose, Classifier (UML)
Abstract: In this paper we present a neural network (NN) based system for recognition and pose estimation of 3D objects from a single 2D perspective view. We develop an appearance based neural approach for this task. First the object is represented in a feature vector derived by a principal component network. Then a NN classifier trained with R esilient back prop agation (Rprop) algorithm is applied to identify it. Next pose parameters are obtained by four NN estimators trained on the same feature vector. Performance on recognition and pose estimation for real images under occlusions are shown. Comparative studies with two other approaches are carried out.
Published: 2001
Full Text: View/download PDF

24. A NOVEL PROBABILISTIC MODEL FOR OBJECT RECOGNITION AND POSE ESTIMATION

Author: Heinrich Niemann and Joachim Hornegger
Subjects: business.industry, Computer science, Divergence-from-randomness model, Cognitive neuroscience of visual object recognition, Statistical model, 3D pose estimation, Artificial Intelligence, Probabilistic CTL, Computer Vision and Pattern Recognition, Artificial intelligence, business, Pose, Probabilistic relevance model, Software, Parametric statistics
Abstract: In this paper we consider the problem of object recognition and localization in a probabilistic framework. An object is represented by a parametric probability density, and the computation of pose parameters is implemented as a nonlinear parameter estimation problem. The presence of a probabilistic model allows for recognition according to Bayes rule. The introduced probabilistic model requires no prior segmentation but characterizes the statistical properties of observed intensity values in the image plane. A detailed discussion of the applied theoretical framework is followed by a concise experimental evaluation which demonstrates the benefit of the proposed approach.
Published: 2001
Full Text: View/download PDF

25. Appearance-based object recognition using optimal feature transforms

Author: Heinrich Niemann, Joachim Hornegger, and Robert Risack
Subjects: business.industry, Feature vector, 3D single-object recognition, Feature extraction, Cognitive neuroscience of visual object recognition, Pattern recognition, 3D pose estimation, Artificial Intelligence, Signal Processing, Preprocessor, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, business, Pose, Software, Curse of dimensionality, Mathematics
Abstract: In this paper we discuss and compare di!erent approaches to appearance-based object recognition and pose estimation. Images are considered as high-dimensional feature vectors which are transformed in various manners: we use di!erent types of non-linear image-to-image transforms composed with linear mappings to reduce the feature dimensions and to beat the curse of dimensionality. The transforms are selected such that special objective functions are optimized and available image data provide some invariance properties. The paper mainly concentrates on the comparison of preprocessing operations combined with di!erent linear projections in the context of appearance-based object recognition. The experimental evaluation provides recognition rates and pose estimation accuracy. ( 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
Published: 2000
Full Text: View/download PDF

26. [Untitled]

Author: Heinrich Niemann and Joachim Hornegger
Subjects: business.industry, Cognitive neuroscience of visual object recognition, Probabilistic logic, Pattern recognition, Estimation of distribution algorithm, Artificial Intelligence, Pattern recognition (psychology), Expectation–maximization algorithm, Computer Vision and Pattern Recognition, Artificial intelligence, business, Hidden Markov model, Global optimization, Pose, Software, Mathematics
Abstract: This paper introduces a uniform statistical framework for both 3-D and 2-D object recognition using intensity images as input data. The theoretical part provides a mathematical tool for stochastic modeling. The algorithmic part introduces methods for automatic model generation, localization, and recognition of objects. 2-D images are used for learning the statistical appearance of 3-D objectss both the depth information and the matching between image and model features are missing for model generation. The implied incomplete data estimation problem is solved by the Expectation Maximization algorithm. This leads to a novel class of algorithms for automatic model generation from projections. The estimation of pose parameters corresponds to a non-linear maximum likelihood estimation problem which is solved by a global optimization procedure. Classification is done by the Bayesian decision rule. This work includes the experimental evaluation of the various facets of the presented approach. An empirical evaluation of learning algorithms and the comparison of different pose estimation algorithms show the feasibility of the proposed probabilistic framework.
Published: 2000
Full Text: View/download PDF

27. [Untitled]

Author: Heinrich Niemann and Wlodzimierz Kasprzak
Subjects: Computer science, business.industry, Initialization, Image processing, Kalman filter, Curvature, Edge detection, Artificial Intelligence, Line (geometry), Pattern recognition (psychology), Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Vanishing point, business, Software
Abstract: An approach to road recognition and ego-state tracking in monocular image sequences of traffic scenes is described. The main contribution of this paper is the adaptive recognition scheme, which deals with competitive road hypotheses, and its application in several processing steps of an image sequence analysis system. No manual initialization of the tracked road is required and the change of the road type is allowed. The road parameters to be recognized are the road width, road lane number and road curvature. For exact estimation of road curvature the translational and rotational velocities of the ego-car are assumed to be available. The estimated ego-state parameters are the camera orientation (which is derived due to vanishing point tracking) and the camera position relative to the road center line.
Published: 1998
Full Text: View/download PDF

28. Syntactic-prosodic labeling of large spontaneous speech data-bases

Author: Ralf Kompe, Elmar Nöth, Anton Batliner, A. Kiessling, and Heinrich Niemann
Subjects: Computer science, business.industry, Speech recognition, Perceptron, Speech processing, computer.software_genre, artificial intelligence, Robustness (computer science), Künstliche Intelligenz, Automatic speech, Language model, Artificial intelligence, ddc:620, ddc:004, business, computer, Natural language processing, Natural language, Statistic, Spontaneous speech
Abstract: In automatic speech understanding, the division of continuously running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistic models for prosodic boundaries large databases are necessary. For the German VERBMOBIL project (automatic speech-to-speech translation), we developed a syntactic-prosodic labeling scheme where two main types of boundaries (major syntactic boundaries and syntactically ambiguous boundaries) and some other special boundaries are labeled for a large VERBMOBIL spontaneous speech corpus. We compare the results of classifiers (multilayer perceptrons and language models) trained on these syntactic-prosodic boundary labels with classifiers trained on perceptual-prosodic and pure syntactic labels. The main advantage of the rough syntactic-prosodic labels presented in this paper is that large amounts of data could be labeled within a short time. Therefore, the classifiers trained with these labels turned out to be superior (recognition rates of up to 96%).
Published: 2013
Full Text: View/download PDF

29. Combination of simple vision modules for robust real-time motion tracking

Author: Joachim Denzier and Heinrich Niemann
Subjects: Active contour model, Workstation, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Motion detection, Tracking system, Robot control, law.invention, Match moving, Robustness (computer science), law, Video tracking, Computer vision, Artificial intelligence, Electrical and Electronic Engineering, business
Abstract: In this paper we describe a real time object tracking system consisting of three modules (motion detection, object tracking, robot control), each working with a moderate accuracy, implemented in parallel on a workstation cluster, and therefore operating fast without any specialized hardware. The robustness and quality of the system is achieved by a combination of these vision modules with an additional attention module which recognizes errors during the tracking. For object tracking in image sequences we apply the method of active contour models (snakes) which can be used for contour description and extraction as well. We show how the snake is initialized automatically by the motion detection module, explain the tracking module, and demonstrate the detection of errors during the tracking by the attention module. Experiments show that this approach allows a robust real-time object tracking over long image sequences. Using a formal error measurement presented in this paper it will be shown that the moving object is in the center of the image in 90 percent of all images.
Published: 1995
Full Text: View/download PDF

30. A knowledge based system for analysis of gated blood pool studies

Author: F. Wolf, Horst Bunke, Ingrid Hofmann, Gerhard Sagerer, H. Feistel, and Heinrich Niemann
Subjects: Pixel, business.industry, Computer science, Applied Mathematics, Image segmentation, computer.software_genre, Fuzzy logic, Expert system, Knowledge-based systems, Computational Theory and Mathematics, Artificial Intelligence, Medical imaging, Computer Vision and Pattern Recognition, Artificial intelligence, Data mining, Medical diagnosis, business, computer, Software
Abstract: A system for obtaining a complete diagnostic description of an image sequence taken in nuclear medicine from the human heart has been developed, implemented, and tested. The knowledge about these images is represented in a semantic net, conclusions are drawn by a production rule approach, and scoring of alternative diagnoses is based on fuzzy membership functions. On the low level, image pixels are smoothed and organ contours are extracted; these are the input for the high level processing. Tests with several image sequences gave correct descriptions as compared to the diagnosis of a physician.
Published: 2011

31. Control and explanation in a signal understanding environment

Author: Gerhard Sagerer, Heinrich Niemann, R. Prechtel, and Franz Kummert
Subjects: semantic networks, Control algorithm, Knowledge representation and reasoning, business.industry, knowledge representation, Image processing, Procedural knowledge, Semantic network, Formalism (philosophy of mathematics), Strategy, Control and Systems Engineering, Human–computer interaction, explanation facility, interpretation of sensor signals, Signal Processing, Image sequence, Computer Vision and Pattern Recognition, Artificial intelligence, Electrical and Electronic Engineering, problem-independent control, business, Software, Mathematics
Abstract: To interpret sensor signals like images, image sequences, or continuous speech the representation and use of task-specific knowledge is necessary. The paper presents a framework for the representation of declarative and procedural knowledge using a suitable definition of a semantic network. Based on that formalism a problem-independent control algorithm for the interpretation of sensor signals is presented. It provides both data-driven and model-driven control structures which can easily be combined to perform any mixed strategy. An explanation facility is available which makes the development of complex knowledge bases easier and increases the acceptance of such a knowledge-based analysis system., Unerläßliche Voraussetzung für die Interpretation von Sensordaten wie Bilder, Bildfolgen oder kontinuierliche Sprache ist die Darstellung und Nutzung von aufgabenspezifischem Wissen. In diesem Artikel wird ein Formalismus vorgestellt, der deklaratives und prozedurales Wissen in einem semantischen Netzwerk repräsentiert. Auf diesem Formalismus aufbauend ist ein problemunabhängiger Kontrollalgorithmus für die Interpretation von Sensordaten definiert. Er besitzt sowohl datengetriebene als auch erwartungsgesteuerte Kontrollmechanismen, wodurch jede gemischte Strategie erzielt werden kann. Eine Erklärungskomponente unterstützt die Entwicklung und Wartung komplexer Wissensbasen und erhöht die Akzeptanz eines solchen wissensbasierten Analysesystems.
Published: 1993
Full Text: View/download PDF

32. Finite element method for determination of optical flow

Author: H. Kirchner and Heinrich Niemann
Subjects: Computer science, Grid method multiplication, Optical flow, Finite difference method, Mixed finite element method, Optical field, Finite element method, Artificial Intelligence, Motion estimation, Signal Processing, Calculus, Computer Vision and Pattern Recognition, Algorithm, Software, Extended finite element method
Abstract: In this paper the application of the finite element method is presented to compute the optical flow field. Using this approach homogeneous areas are roughly triangulated. And vice versa, a fine partition at motion boundaries ensures a flow field with high resolution. This drastically reduces the computational effort without loss of accuracy. We will describe the adaptation of the image partition to the variance of the gray level gradients as well as the basic steps for solving the variational problem. Experimental results are presented using synthetic and natural image sequences. Finally, the approach is compared to results computed by the finite difference method (grid method), used up to now.
Published: 1992
Full Text: View/download PDF

33. Feature Extraction with Wavelet Transformation for Statistical Object Recognition

Author: Marcin Grzegorzek, Michael Reinhold, and Heinrich Niemann
Subjects: Pixel, Computer science, business.industry, 3D single-object recognition, Feature vector, Feature extraction, Pattern recognition, Wavelet packet decomposition, Transformation (function), Wavelet, Feature (computer vision), Computer Science::Computer Vision and Pattern Recognition, Computer vision, Artificial intelligence, business
Abstract: In this paper we present a statistical approach for localization and classification of 3-D objects in 2-D images with real heterogeneous background. Two-dimensional local feature vectors are computed directly from pixel intensities in square gray level images with the wavelet multiresolution analysis. We use three different resolution levels for the feature computation. For the first one local neighborhoods of size 8 × 8 pixels, for the second one 4 × 4 pixels, and for the third one 2 × 2 pixels are taken into account. Then we define an object area as a function of 3-D transformations and represent the feature vectors as density functions. Our localization and classification algorithm uses a combination of object models created for the three different resolutions in the training phase. Experiments made on a real data set with 42240 images show that the recognition rates are much better using the resolution combination of the wavelet transformation.
Published: 2008
Full Text: View/download PDF

34. Information Theoretic Approaches for Next Best View Planning in Active Computer Vision

Author: B. Deutsch, S. Wenhardt, Heinrich Niemann, J. Denzler, and C. Derichs
Subjects: business.industry, Computer science, View planning, Computer vision, Artificial intelligence, business
Published: 2008
Full Text: View/download PDF

35. Descriptive approach to medical image mining. An algorithmic scheme for analysis of cytological specimens

Author: Ovidio Salvetti, V. V. Yashina, Igor B. Gurevich, Heinrich Niemann, and I. V. Koryabkina
Subjects: Scheme (programming language), Theoretical computer science, Basis (linear algebra), Image Classification, Computer science, business.industry, Information technology, Image Mining, computer.software_genre, Computer Graphics and Computer-Aided Design, Grayscale, Image (mathematics), Pattern recognition (psychology), Descriptive Approach, Computer Vision and Pattern Recognition, Artificial intelligence, Descriptive research, business, computer, Formal representation, Natural language processing, computer.programming_language
Abstract: The present paper is devoted the development and formal representation of a descriptive model for an information technology to automate the morphological analysis of cytologic preparations (a tumor of the lymphatic system). The theoretical basis of the model is a descriptive approach to image analysis and understanding and its main mathematical tools. Practical application of the algebraic tools of the descriptive approach is demonstrated, and the algorithmic scheme of the technology is described in the language of descriptive image algebras.
Published: 2008
Full Text: View/download PDF

36. Some Special Problems of Speech Communication

Author: Heinrich Niemann
Subjects: Cued speech, business.industry, Computer science, Speech recognition, Speech technology, Speech corpus, computer.software_genre, Speech processing, media_common.cataloged_instance, Speech analytics, Artificial intelligence, Dialog system, European union, business, Prosody, computer, Natural language processing, media_common
Abstract: We start with a brief overview of our work in speech recognition and understanding which led from monomodal (speech only) human-machine dialog to multimodal human-machine interaction and assistance. Our work in speech communication initially had the goal to develop a complete system for question answering by spoken dialog [7,15]. This goal was achieved in various projects funded by the German Research Foundation [14] and the German Federal Ministry of Education and Research [16]. Problems of multilingual communication were considered in projects supported by the European Union [2,4,10]. In the Verbmobil project the speech-to-speech translation problem was investigated and it turned out that prosody and the recognition of emotion was important and extremely useful - if not indispensible - to disambiguate utterances and to influence the dialog strategy [3,17]. Multimodal and multimedia aspects of human-machine communication became a topic in the follow-up projects Embassi [11], SmartKom [1], FORSIP [12], and SmartWeb [9]. The SmartWeb project [19], which involves 17 partners from companies, research institutes, and universities, has the general goal to provide the foundations for multimodal human-machine communication with distributed semantic web services using different mobile devices, hand-held, mounted in a car or to a motor cycle. It uses speech and video signals as well as signals from other sensors, e.g. ECG or skin resistance. A special problem in human-machine interaction and assistance is the question whether the user speaks to the machine or not, that is, the distinction of on- and off-talk. It is shown how on-/off-talk can be classified by the combination of prosodic and image features. Using additional sensors the user state in general is estimated to give further cues to the dialog control. This may be used, for example, to avoid input from the dialog system in a situation where a driver is under stress. In other projects the special problem of children's speech processing was considered [20]. Among others it was investigated whether a manual correction of automatically computed fundamental frequency F0 and word boundaries might have a positive effect on the automatic classification of the 4 classes anger, motherese, emphatic, and neutral; this was not the case, leading to the conclusion that presently there is no need for improved F0 algorithms in emotion recognition. The word accuracy (WA) of native and non-native English speaking children was investigated; it was shown that non-native speakers (age 10-15) achieve about the same WA as children aged 6-7 using a speech recognizer trained with native children speech. The recognizer also was used to develop an automatic scoring of the pronunciation quality of children learning English. A special problem are impairments of speech which may be congenital (e.g. the cleft lip and palate) or acquired by disease (e.g. cancer of the larynx). Impairments are, among others, treated with speech training by speech therapists. They score the speech quality subjectively according to various criteria. The idea is that the WA of an automatic speech recognizer should be highly correlated with the human rating. Using speech samples from laryngectomees it is shown that the machine rating is about as good as the rating of five human experts and can also be done via telephone. This opens the possibility of an objective and standardized rating of speech quality.
Published: 2007
Full Text: View/download PDF

37. Active Visual Object Reconstruction using D-, E-, and T-Optimal Next Best Views

Author: S. Wenhardt, Benjamin Deutsch, Heinrich Niemann, and Elli Angelopoulou
Subjects: business.industry, Covariance matrix, Gaussian, Computation, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Probabilistic logic, Iterative reconstruction, symbols.namesake, Motion estimation, symbols, Entropy (information theory), Computer vision, Artificial intelligence, business, Gaussian process, Mathematics
Abstract: In visual 3-D reconstruction tasks with mobile cameras, one wishes to move the cameras so that they provide the views that lead to the best reconstruction result. When the camera motion is adapted during the reconstruction, the view of interest is the next best view for the current shape estimate. We present such a next best view planning approach for visual 3-D reconstruction. The reconstruction is based on a probabilistic state estimation with sensor actions. The next best view is determined by a metric of the state estimation's uncertainty. We compare three metrics: D-optimality, which is based on the entropy and corresponds to the (D)eterminant of the covariance matrix of a Gaussian distribution, E-optimality, and T-optimality, which are based on (E)igenvalues or on the (T)race of this matrix, respectively. We show the validity of our approach with a simulation as well as real-world experiments, and compare reconstruction accuracy and computation time for the optimality criteria.
Published: 2007
Full Text: View/download PDF

38. Multimodal Emogram, Data Collection and Presentation

Author: Johann Adelhardt, Carmen Frank, Elmar Nöth, Viktor Zeißler, Heinrich Niemann, and Rui Ping Shi
Subjects: Facial expression, Data collection, business.industry, Computer science, media_common.quotation_subject, computer.software_genre, Presentation, Use case, Emotional expression, Artificial intelligence, User state, business, computer, Natural language processing, media_common, Gesture
Abstract: There are several characteristics not optimally suited for the user state classification with Wizard-of-Oz (WOZ) data like the nonuniform distribution of emotions in the utterances and the distribution of emotional utterances in speech, facial expression, and gesture. In particular, the fact that most of the data collected in the WOZ experiments are without any emotional expression gives rise to the problem of getting enough representative data for training the classifiers. Because of this problem we collected data in our own database. These data are also relevant for several demonstration sessions, where the functionality of the SmartKom system is shown in accordance with the defined use cases.
Published: 2006
Full Text: View/download PDF

39. Learning an Analysis Strategy for Knowledge-Based Exploration of Scenes

Author: Ulrike Ahlrichs, Heinrich Niemann, and Dietrich Paulus
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Table (database), Graph (abstract data type), Computer vision, Image processing, Artificial intelligence, Zoom, Object (computer science), business, Tilt (camera), Image (mathematics)
Abstract: A system is presented which searches with an active camera for known objects, constrained to lie on a table, in an otherwise unknown office using color images. Both camera actions and image processing methods are represented as concepts of a semantic network. Image processing methods comprise depth computation to find a table, generation of object hypotheses in an overview image, and object verification in a close-up view. Camera actions are pan, tilt, zoom, and motion on a linear sledge. System actions, either image processing or camera actions, are initialized by a graph search based control algorithm which tries to compute the best scoring instance of a goal concept. The sequence of actions for computing an instance is determined by precedences which are either adjusted manually or computed from reinforcement-learning. Results are presented comparing the two approaches.
Published: 2006
Full Text: View/download PDF

40. Multi-step Multi-camera View Planning for Real-Time Visual Object Tracking

Author: Stefan Wenhardt, Benjamin Deutsch, and Heinrich Niemann
Subjects: Variable (computer science), business.industry, Computer science, Video tracking, Motion estimation, Pattern recognition (psychology), Probabilistic logic, Computer vision, Image processing, Artificial intelligence, Object (computer science), business
Abstract: We present a new method for planning the optimal next view for a probabilistic visual object tracking task. Our method uses a variable number of cameras, can plan an action sequence several time steps into the future, and allows for real-time usage due to a computation time which is linear both in the number of cameras and the number of time steps. The algorithm can also handle object loss in one, more or all cameras, interdependencies in the camera's information contribution, and variable action costs. We evaluate our method by comparing it to previous approaches with a prerecorded sequence of real world images.
Published: 2006
Full Text: View/download PDF

41. An Information Theoretic Approach for Next Best View Planning in 3-D Reconstruction

Author: Heinrich Niemann, Benjamin Deutsch, Joachim Denzler, Stefan Wenhardt, and Joachim Hornegger
Subjects: Calibration (statistics), business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Kalman filter, Iterative reconstruction, Information theory, Real image, Object detection, Position (vector), Point (geometry), Computer vision, Artificial intelligence, business, Mathematics
Abstract: We present an algorithm for optimal view point selection for 3-D reconstruction of an object using 2-D image points. Since the image points are noisy, a Kalman filter is used to obtain the best estimate of the objects geometry. This Kalman filter allows us to efficiently predict the effect of any given camera position on the uncertainty, and therefore quality, of the estimate. By choosing a suitable optimization criterion, we are able to determine the camera positions which minimize our reconstruction error. We verify our results using two experiments with real images: one experiment uses a calibration pattern for comparison to a ground-truth state, the other reconstructs a real world object.
Published: 2006
Full Text: View/download PDF

42. Handling Camera Movement Constraints in Reinforcement Learning Based Active Object Recognition

Author: Heinrich Niemann and Christian Derichs
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Viewpoints, Image (mathematics), Constraint (information theory), Action (philosophy), Position (vector), Camera auto-calibration, Reinforcement learning, Computer vision, Artificial intelligence, business, Interpolation
Abstract: In real world scenes, objects to be classified are usually not visible from every direction, since they are almost always positioned on some kind of opaque plane. When moving a camera selectively around those objects for classifying them in an active manner, a hemisphere is fully sufficient for positioning meaningful camera viewpoints. Based on this constraint, this paper addresses the problem of handling planned camera actions which nevertheless lead to viewpoints beyond the plane of that hemisphere. Those actions arise from the uncertainty in the current vertical camera position combined with the view planning method's request of a relative action. The latter is based on an optimized and interpolating query of a knowledge base which is built up in a Reinforcement Learning training phase beforehand. This work discusses the influence of three different, intuitive and optimized, methods for handling invalid action suggestions generated by Reinforcement Learning. Influence is measured by the difference in classification results after each step of merging the image data information with active view planning.
Published: 2006
Full Text: View/download PDF

43. Aspects of Optimal Viewpoint Selection and Viewpoint Fusion

Author: Frank Deinzer, Joachim Denzler, Heinrich Niemann, and Christian Derichs
Subjects: Optimization problem, business.industry, Computer science, Pattern recognition (psychology), Cognitive neuroscience of visual object recognition, Bayesian network, Reinforcement learning, Context (language use), Artificial intelligence, business, Selection (genetic algorithm), Field (computer science)
Abstract: In the past decades, most object recognition systems were based on passive approaches. But in the last few years a lot of research was done in the field of active object recognition. In this context, there are several unique problems to be solved, such as the fusion of views and the selection of an optimal next viewpoint. In this paper we present an approach to solve the problem of choosing optimal views (viewpoint selection) and the fusion of these for an optimal 3D object recognition (viewpoint fusion). We formally define the selection of additional views as an optimization problem and we show how to use reinforcement learning for viewpoint training and selection in continuous state spaces without user interaction. In this context we focus on the modeling of the reinforcement learning reward. We also present an approach for the fusion of multiple views based on density propagation, and discuss the advantages and disadvantages of two approaches for the practical evaluation of these densities, namely Parzen estimation and density trees.
Published: 2006
Full Text: View/download PDF

44. Integrated Viewpoint Fusion and Viewpoint Selection for Optimal Object Recognition

Author: Christian Derichs, Heinrich Niemann, Joachim Denzler, and Frank Deinzer
Subjects: Optimization problem, Method, Computer science, business.industry, 3D single-object recognition, Cognitive neuroscience of visual object recognition, Reinforcement learning, Computer vision, Context (language use), Artificial intelligence, business, Focus (optics), Object (computer science)
Abstract: In the past decades, most object recognition systems were based on passive approaches. But in the last few years a lot of research was done in the eld of active object recognition, that is selectively moving a sensor/camera around a considered object in order to acquire as much information about it as possible. In this paper we present an active object recognition approach that solves the problem of choosing optimal views (viewpoint selection) and iteratively fuses the gained information for an optimal 3D object recognition (viewpoint fusion) in an integrated manner. Therefore, we apply a method for the fusion of multiple views with respect to the knowledge about the assumed camera movement between them. For viewpoint selection we formally dene the choice of additional views as an optimization problem. We show how to use reinforcement learning for this purpose and perform a training without user interaction. In this context we focus on the modeling of continuous states, continuous, one-dimensional actions and supporting rewards for an optimized recognition of real objects. The experimental results show that our combined viewpoint selection and viewpoint fusion approach is able to signicantly improve the recognition rates compared to passive object recognition with randomly chosen views.
Published: 2006
Full Text: View/download PDF

45. Scene exploration using Bayesian nets

Author: K.N. Pasumarthy, Joachim Denzler, Heinrich Niemann, and Marcin Grzegorzek
Subjects: Class (computer programming), business.industry, Computer science, Sample (material), 3D single-object recognition, Bayesian probability, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Cognitive neuroscience of visual object recognition, Pattern recognition, Machine learning, computer.software_genre, Image (mathematics), Visual recognition, Set (abstract data type), Artificial intelligence, business, computer
Abstract: Exploring scenes using Bayesian nets is based on the idea of performing an active knowledge based search on images unlike conventional visual recognition algorithms. During the active search of images, a sample set of training images from different classes is available right at the onset of an experiment and the nature of the class to be searched is unknown. Usually a recursive search on the images for objects, belonging to all classes is performed using a conventional object recognition system and our approach presented in the present paper can obviate this. The search by Bayesian nets can be confined only to a specific class or a set of classes, provided the relationships between constituent objects are exactly defined. We prove that if structural relationships are rightly established between the constituent objects of an image, searching scenes using Bayesian nets is quite effective and the presented results proclaim this very fact.
Published: 2005
Full Text: View/download PDF

46. Language models beyond word strings

Author: Jörg Spilker, Elmar Nöth, Georg Stemmer, Heinrich Niemann, Florian Gallwitz, and Anton Batliner
Subjects: Audio mining, Computer science, business.industry, Speech recognition, Word error rate, Speech synthesis, Speech corpus, computer.software_genre, Speech segmentation, Cache language model, Speech analytics, Language model, Artificial intelligence, ddc:004, business, computer, Natural language processing
Abstract: In this paper we want to show how n-gram language models can be used to provide additional information in automatic speech understanding systems beyond the pure word chain. This becomes important in the context of conversational dialogue systems that have to recognize and interpret spontaneous speech. We show how n-grams can: (1) help to classify prosodic events like boundaries and accents; (2) be extended to directly provide boundary information in the speech recognition phase; (3) help to process speech repairs; and (4) detect and semantically classify out-of-vocabulary words. The approaches can work on the best word chain or a word hypotheses graph. Examples and experimental results are provided from our own research within the EVAR information retrieval system and the VERBMOBIL speech-to-speech translation system.
Published: 2005
Full Text: View/download PDF

47. Optic nerve head segmentation in multimodal retinal images

Author: V. Derhartunian, Jiri Jan, Heinrich Niemann, Georg Michelson, L. Kubecka, and Radim Chrástek
Subjects: genetic structures, business.industry, Computer science, Glaucoma, Mutual information, medicine.disease, eye diseases, Hough transform, law.invention, Random search, Kernel (image processing), law, Head segmentation, Optic nerve, medicine, Segmentation, Computer vision, Artificial intelligence, business
Abstract: An established method for glaucoma diagnosis is the morphological analysis of the optic nerve head (ONH) by the scanning-laser-tomography (SLT). This analysis depends on prior manual outlining of the ONH. The first automated segmentation method that we developed is limited in its reliability by noise, non-uniform illumination and presence of blood vessels. Inspired by recent medical research we developed a new algorithm improving our previous method by segmenting in registered multimodal retinal images. The multimodal approach combines SLT-images with color fundus photographs (CFP). The first step of the algorithm, the registration, is based on gradient-image mutual information maximization using controlled random search as the optimization procedure. The kernel of the segmentation module consists in the anchored active contours. The initial contour is obtained from the CFP. The points the initial curve should be attracted to, the anchors, are constrained by the Hough transform applied to a morphologically processed SLT-image. The false anchors are eliminated by masking out blood vessels that are extracted in the CFP. The method was tested on 174 multimodal image pairs. The overall performance of the system yielded 89% correctly segmented ONH, qualitatively evaluated comparing the automated contours with manual ones drawn by an experienced ophthalmologist. This represents an appreciable improvement in reliability (from 74% to 89%) compared to monomodal approach. The developed method is the basis for a promising tool for glaucoma screening.
Published: 2005
Full Text: View/download PDF

48. Representation of a continuous speech understanding and dialog system in a homogeneous semantic net achitecture

Author: Gerhard Sagerer, Heinrich Niemann, U. Ehrlich, and A. Brietzmann
Subjects: Scheme (programming language), Vocabulary, Knowledge representation and reasoning, Computer science, business.industry, media_common.quotation_subject, Representation (systemics), computer.software_genre, Speech processing, Semantic network, Artificial intelligence, Dialog system, business, computer, Natural language, Natural language processing, media_common, Abstraction (linguistics), computer.programming_language
Abstract: Continuous speech understanding systems require the integration of different knowledge sources on different levels of abstraction. In this paper a representation scheme based on semantic networks is presented which allow a homogeneous architechture and a global processing strategy for a complete system.
Published: 2005
Full Text: View/download PDF

49. Multi-step active object tracking with entropy based optimal actions using the sequential Kalman filter

Author: Heinrich Niemann, B. Deutsch, and Joachim Denzler
Subjects: business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Probabilistic logic, Kalman filter, Invariant extended Kalman filter, Object detection, Extended Kalman filter, Video tracking, Entropy (information theory), Fast Kalman filter, Computer vision, Artificial intelligence, business, Mathematics
Abstract: We describe an enhanced method for the selection of optimal sensor actions in a probabilistic state estimation framework. We apply this to the selection of optimal focal lengths for cameras with a variable motor zoom in a real-time visual object tracking task. The optimal camera action is determined by the expected state estimate entropy for each candidate action. Varying action costs are taken into account by predicting the entropy several steps into the future. Our contribution is the use of the sequential Kalman filter to deal transparently with a variable number of cameras, potential object loss in a subset of the cameras, and to reduce the calculation time through independent optimization.
Published: 2005
Full Text: View/download PDF

50. Cost Integration in Multi-step Viewpoint Selection for Object Recognition

Author: Heinrich Niemann, Frank Deinzer, and Christian Derichs
Subjects: Point (typography), Computer science, business.industry, Cognitive neuroscience of visual object recognition, Object (computer science), Machine learning, computer.software_genre, Task (project management), Pattern recognition (psychology), Reinforcement learning, Artificial intelligence, Active vision, business, Selection algorithm, computer
Abstract: During the last years, computer vision tasks like object recognition and localization were rapidly expanded from passive solution approaches to active ones, that is to execute a viewpoint selection algorithm in order to acquire just the most significant views of an arbitrary object. Although fusion of multiple views can already be done reliably, planning is still limited to gathering the next best view, normally the one providing the highest immediate gain in information. In this paper, we show how to perform a generally more intelligent, long-run optimized sequence of actions by linking them with costs. Therefore it will be introduced how to acquire the cost of an appropriate dimensionality in a non-empirical way while still leaving the determination of the system's basic behavior to the user. Since this planning process is accomplished by an underlying machine learning technique, we also point out the ease of adjusting these to the expanded task and show why to use a multi-step approach for doing so.
Published: 2005
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

165 results on '"Heinrich Niemann"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources