Author: "Juho Kannala" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Juho Kannala"' showing total 248 results

Start Over Author "Juho Kannala"

248 results on '"Juho Kannala"'

201. A general model and calibration method for spherical stereoscopic vision.

Author: Weijia Feng, Juha Röning, Juho Kannala, Xiaoning Zong, and Baofeng Zhang
Published: 2012
Full Text: View/download PDF

202. Learning to Drive (L2D) as a Low-Cost Benchmark for Real-World Reinforcement Learning

Author: Ari Viitala, Rinu Boney, Yi Zhao, Alexander Ilin, and Juho Kannala
Subjects: FOS: Computer and information sciences, Computer Science - Robotics, Robotics (cs.RO)
Abstract: We present Learning to Drive (L2D), a low-cost benchmark for real-world reinforcement learning (RL). L2D involves a simple and reproducible experimental setup where an RL agent has to learn to drive a Donkey car around three miniature tracks, given only monocular image observations and speed of the car. The agent has to learn to drive from disengagements, which occurs when it drives off the track. We present and open-source our training pipeline, which makes it straightforward to apply any existing RL algorithm to the task of autonomous driving with a Donkey car. We test imitation learning, state-of-the-art model-free, and model-based algorithms on the proposed L2D benchmark. Our results show that existing RL algorithms can learn to drive the car from scratch in less than five minutes of interaction. We demonstrate that RL algorithms can learn from sparse and noisy disengagement to drive even faster than imitation learning and a human operator.
Published: 2020
Full Text: View/download PDF

203. Multi-View Stereo by Temporal Nonparametric Fusion

Author: Arno Solin, Juho Kannala, and Yuxin Hou
Subjects: FOS: Computer and information sciences, Hyperparameter, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Nonparametric statistics, Inference, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Bottleneck, symbols.namesake, 0202 electrical engineering, electronic engineering, information engineering, symbols, Leverage (statistics), 020201 artificial intelligence & image processing, Artificial intelligence, business, Gaussian process, Decoding methods, 0105 earth and related environmental sciences
Abstract: We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder--decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process prior. We propose a pose-kernel structure that encourages similar poses to have resembling latent spaces. The flexibility of the Gaussian process (GP) prior provides adapting memory for fusing information from previous views. We train the encoder--decoder and the GP hyperparameters jointly end-to-end. In addition to a batch method, we derive a lightweight estimation scheme that circumvents standard pitfalls in scaling Gaussian process inference, and demonstrate how our scheme can run in real-time on smart devices., ICCV 2019
Published: 2019
Full Text: View/download PDF

204. Automated Structure Discovery in Atomic Force Microscopy

Author: Benjamin, Alldritt, Prokop, Hapala, Niko, Oinonen, Fedor, Urtev, Ondrej, Krejci, Filippo, Federici Canova, Juho, Kannala, Fabian, Schulz, Peter, Liljeroth, Adam S, Foster, Department of Applied Physics, Department of Computer Science, Surfaces and Interfaces at the Nanoscale, Professorship Kannala Juho, Atomic Scale Physics, Aalto-yliopisto, and Aalto University
Subjects: Condensed Matter::Quantum Gases, Physics - Instrumentation and Detectors, Condensed Matter - Mesoscale and Nanoscale Physics, SciAdv r-articles, FOS: Physical sciences, Instrumentation and Detectors (physics.ins-det), Computational Physics (physics.comp-ph), Condensed Matter Physics, Mesoscale and Nanoscale Physics (cond-mat.mes-hall), Physics::Atomic and Molecular Clusters, Physics::Atomic Physics, Physics - Computational Physics, Research Articles, Research Article, Surface Chemistry
Abstract: We develop a deep learning method that predicts atomic structure directly from experimental atomic force microscopy images., Atomic force microscopy (AFM) with molecule-functionalized tips has emerged as the primary experimental technique for probing the atomic structure of organic molecules on surfaces. Most experiments have been limited to nearly planar aromatic molecules due to difficulties with interpretation of highly distorted AFM images originating from nonplanar molecules. Here, we develop a deep learning infrastructure that matches a set of AFM images with a unique descriptor characterizing the molecular configuration, allowing us to predict the molecular structure directly. We apply this methodology to resolve several distinct adsorption configurations of 1S-camphor on Cu(111) based on low-temperature AFM measurements. This approach will open the door to applying high-resolution AFM to a large variety of systems, for which routine atomic and chemical structural resolution on the level of individual objects/molecules would be a major breakthrough.
Published: 2019

205. Digging Deeper into Egocentric Gaze Prediction

Author: Ali Borji, Hamed R. Tavakoli, Esa Rahtu, and Juho Kannala
Subjects: FOS: Computer and information sciences, ta113, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Optical flow, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Gaze, Visualization, Activity recognition, Fixation (visual), 0202 electrical engineering, electronic engineering, information engineering, Task analysis, 020201 artificial intelligence & image processing, Artificial intelligence, Vanishing point, business, 0105 earth and related environmental sciences
Abstract: This paper digs deeper into factors that influence egocentric gaze. Instead of training deep models for this purpose in a blind manner, we propose to inspect factors that contribute to gaze guidance during daily tasks. Bottom-up saliency and optical flow are assessed versus strong spatial prior baselines. Task-specific cues such as vanishing point, manipulation point, and hand regions are analyzed as representatives of top-down information. We also look into the contribution of these factors by investigating a simple recurrent neural model for ego-centric gaze prediction. First, deep features are extracted for all input video frames. Then, a gated recurrent unit is employed to integrate information over time and to predict the next fixation. We also propose an integrated model that combines the recurrent model with several top-down and bottom-up cues. Extensive experiments over multiple datasets reveal that (1) spatial biases are strong in egocentric videos, (2) bottom-up saliency models perform poorly in predicting gaze and underperform spatial biases, (3) deep features perform better compared to traditional features, (4) as opposed to hand regions, the manipulation point is a strong influential cue for gaze prediction, (5) combining the proposed recurrent model with bottom-up cues, vanishing points and, in particular, manipulation point results in the best gaze prediction accuracy over egocentric videos, (6) the knowledge transfer works best for cases where the tasks or sequences are similar, and (7) task and activity recognition can benefit from gaze prediction. Our findings suggest that (1) there should be more emphasis on hand-object interaction and (2) the egocentric vision community should consider larger datasets including diverse stimuli and more subjects., presented at WACV 2019
Published: 2019

206. Mask-RCNN and U-net Ensembled for Nuclei Segmentation

Author: Saad Ullah Akram, Juho Kannala, and Aarno Oskar Vuola
Subjects: FOS: Computer and information sciences, 0303 health sciences, Ideal (set theory), Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, Machine learning, computer.software_genre, Convolutional neural network, Task (project management), 03 medical and health sciences, Margin (machine learning), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Segmentation, Artificial intelligence, Nuclei segmentation, business, computer, 030304 developmental biology
Abstract: Nuclei segmentation is both an important and in some ways ideal task for modern computer vision methods, e.g. convolutional neural networks. While recent developments in theory and open-source software have made these tools easier to implement, expert knowledge is still required to choose the right model architecture and training setup. We compare two popular segmentation frameworks, U-Net and Mask-RCNN in the nuclei segmentation task and find that they have different strengths and failures. To get the best of both worlds, we develop an ensemble model to combine their predictions that can outperform both models by a significant margin and should be considered when aiming for best nuclei segmentation performance., To appear in IEEE International Symposium on Biomedical Imaging (ISBI) 2019
Published: 2019

207. Semantic Matching by Weakly Supervised 2D Point Set Registration

Author: Juho Kannala, Zakaria Laskar, Hamed R. Tavakoli, Professorship Kannala Juho, Professorship Kaski Samuel, Department of Computer Science, Aalto-yliopisto, and Aalto University
Subjects: FOS: Computer and information sciences, ta113, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Geometric transformation, Computer Science - Computer Vision and Pattern Recognition, Pattern recognition, Point set registration, 02 engineering and technology, Function (mathematics), 010501 environmental sciences, 01 natural sciences, Convolutional neural network, Image (mathematics), Set (abstract data type), Transformation (function), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, 0105 earth and related environmental sciences, Semantic matching
Abstract: In this paper we address the problem of establishing correspondences between different instances of the same object. The problem is posed as finding the geometric transformation that aligns a given image pair. We use a convolutional neural network (CNN) to directly regress the parameters of the transformation model. The alignment problem is defined in the setting where an unordered set of semantic key-points per image are available, but, without the correspondence information. To this end we propose a novel loss function based on cyclic consistency that solves this 2D point set registration problem by inferring the optimal geometric transformation model parameters. We train and test our approach on a standard benchmark dataset Proposal-Flow (PF-PASCAL)\cite{proposal_flow}. The proposed approach achieves state-of-the-art results demonstrating the effectiveness of the method. In addition, we show our approach further benefits from additional training samples in PF-PASCAL generated by using category level information., Comment: Accepted to WACV 2019
Published: 2019
Full Text: View/download PDF

208. Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much Accuracy

Author: Juho Kannala, Yoshua Bengio, Alex Lamb, Vikas Verma, Professorship Kannala Juho, University of Montreal, Department of Computer Science, Aalto-yliopisto, and Aalto University
Subjects: TheoryofComputation_MISCELLANEOUS, Artificial neural network, Computer science, business.industry, Deep learning, Adversary, Training methods, Machine learning, computer.software_genre, Adversarial system, Standard error, Robustness (computer science), Artificial intelligence, business, computer, Interpolation
Abstract: Adversarial robustness has become a central goal in deep learning, both in theory and in practice. However, successful methods to improve the adversarial robustness (such as adversarial training) greatly hurt generalization performance on the unperturbed data. This could have a major impact on how achieving adversarial robustness affects real world systems (i.e. many may opt to forego robustness if it can improve accuracy on the unperturbed data). We propose Interpolated Adversarial Training, which employs recently proposed interpolation based training methods in the framework of adversarial training. On CIFAR-10, adversarial training increases the standard test error (when there is no adversary) from 4.43% to 12.32%, whereas with our Interpolated adversarial training we retain adversarial robustness while achieving a standard test error of only 6.45%. With our technique, the relative increase in the standard error for the robust model is reduced from 178.1% to just 45.5%.
Published: 2019

209. Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization

Author: Juha Ylioinas, Xiaotian Li, Juho Kannala, Jakob Verbeek, Aalto University, Apprentissage de modèles à partir de données massives (Thoth ), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Center for Machine Vision Research (CMV), University of Oulu, ANR-16-CE23-0006,Deep_in_France,Réseaux de neurones profonds pour l'apprentissage(2016), and ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011)
Subjects: FOS: Computer and information sciences, 0209 industrial biotechnology, Computer science, Computer Vision and Pattern Recognition (cs.CV), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, Initialization, 02 engineering and technology, RANSAC, Convolutional neural network, Image (mathematics), 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, ComputingMethodologies_COMPUTERGRAPHICS, Ground truth, camera relocalization, Pixel, business.industry, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Robotics, scene coordinate regression, Function (mathematics), deep neural networks, 020201 artificial intelligence & image processing, Artificial intelligence, business
Abstract: Image-based camera relocalization is an important problem in computer vision and robotics. Recent works utilize convolutional neural networks (CNNs) to regress for pixels in a query image their corresponding 3D world coordinates in the scene. The final pose is then solved via a RANSAC-based optimization scheme using the predicted coordinates. Usually, the CNN is trained with ground truth scene coordinates, but it has also been shown that the network can discover 3D scene geometry automatically by minimizing single-view reprojection loss. However, due to the deficiencies of the reprojection loss, the network needs to be carefully initialized. In this paper, we present a new angle-based reprojection loss, which resolves the issues of the original reprojection loss. With this new loss function, the network can be trained without careful initialization, and the system achieves more accurate results. The new loss also enables us to utilize available multi-view constraints, which further improve performance., ECCV 2018 Workshop (Geometry Meets Deep Learning)
Published: 2018
Full Text: View/download PDF

210. Deep Learning Based Speed Estimation for Constraining Strapdown Inertial Navigation on Smartphones

Author: Santiago Cortes, Juho Kannala, Arno Solin, Department of Computer Science, Professorship Solin A., Professorship Kannala Juho, Aalto-yliopisto, and Aalto University
Subjects: FOS: Computer and information sciences, Physics::General Physics, Inertial frame of reference, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, 02 engineering and technology, Accelerometer, 01 natural sciences, law.invention, Computer Science::Robotics, Odometry, law, Inertial measurement unit, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Inertial navigation system, ta113, business.industry, Deep learning, 020208 electrical & electronic engineering, 010401 analytical chemistry, Gyroscope, 0104 chemical sciences, Artificial intelligence, business, Mobile device
Abstract: Strapdown inertial navigation systems are sensitive to the quality of the data provided by the accelerometer and gyroscope. Low-grade IMUs in handheld smart-devices pose a problem for inertial odometry on these devices. We propose a scheme for constraining the inertial odometry problem by complementing non-linear state estimation by a CNN-based deep-learning model for inferring the momentary speed based on a window of IMU samples. We show the feasibility of the model using a wide range of data from an iPhone, and present proof-of-concept results for how the model can be combined with an inertial navigation system for three-dimensional inertial navigation., To appear in IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2018
Published: 2018

211. Automated tracking of tumor-stroma morphology in microtissues identifies functional targets within the tumor microenvironment for therapeutic intervention

Author: Hannu-Pekka Schukov, Matthias Nees, Janne Heikkilä, Malin Åkerfelt, Sean Robinson, Raija Sormunen, Neslihan Bayramoglu, Johannes Virtanen, Juho Kannala, Mika Kaakinen, Mervi Toriseva, Ville Härmä, Lauri Eklund, Turku Centre for Biotechnology, University of Turku-Åbo Academy University, University of Turku, Laboratoire de Biologie à Grande Échelle (BGE - UMR S1038), Institut de Recherche Interdisciplinaire de Grenoble (IRIG), Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Direction de Recherche Fondamentale (CEA) (DRF (CEA)), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), VTT Technical Research Centre of Finland (VTT), Machine Vision Group (MVG), University of Oulu, Center for Machine Vision Research (CMV), Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut de Recherche Interdisciplinaire de Grenoble (IRIG), and Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Commissariat à l'énergie atomique et aux énergies alternatives (CEA)
Subjects: Male, [SDV]Life Sciences [q-bio], focal adhesion kinase (FAK), Cell Culture Techniques, Cell Communication, Prostate cancer, 0302 clinical medicine, Cell Movement, Tumor Microenvironment, ComputingMilieux_MISCELLANEOUS, 0303 health sciences, Microscopy, Confocal, 3D co-culture, invasion, University hospital, 3. Good health, Oncology, Cell Tracking, Research centre, 030220 oncology & carcinogenesis, Collagen, Algorithms, Research Paper, Cancer associated fibroblast, ta3111, Models, Biological, Time-Lapse Imaging, Cell Line, 03 medical and health sciences, Microscopy, Electron, Transmission, SDG 3 - Good Health and Well-being, Cell Line, Tumor, medicine, Humans, Tumor growth, Tumor stroma, Protein Kinase Inhibitors, Cell Proliferation, 030304 developmental biology, ta113, Tumor histology, Tumor microenvironment, cancer associated fibroblast (CAF), business.industry, phenotypic screening, ta1182, Granulocyte-Macrophage Colony-Stimulating Factor, Prostatic Neoplasms, Fibroblasts, ta3122, medicine.disease, Coculture Techniques, Focal Adhesion Protein-Tyrosine Kinases, Cancer research, business
Abstract: // Malin Akerfelt 1, 2 , Neslihan Bayramoglu 3 , Sean Robinson 4, 5, 6, 7 , Mervi Toriseva 1, 2, 8 , Hannu-Pekka Schukov 8 , Ville Harma 2 , Johannes Virtanen 2 , Raija Sormunen 9 , Mika Kaakinen 10 , Juho Kannala 3 , Lauri Eklund 10 , Janne Heikkila 3 , Matthias Nees 1, 2 1 Turku Centre for Biotechnology, University of Turku, Turku, FI-20520, Finland 2 VTT Technical Research Centre of Finland, Turku, FI-20521, Finland 3 Centre for Machine Vision Research, University of Oulu, Oulu, FI-90014, Finland 4 Department of Mathematics and Statistics, University of Turku, Turku, FI-20014, Finland 5 University Grenoble Alpes, iRTSV-BGE, Grenoble, F-38000, France 6 CEA, iRTSV-BGE, Grenoble, F-38000, France 7 INSERM, BGE, Grenoble, F-38000, France 8 Institute of Biomedicine, University of Turku, Turku, FI-20520, Finland 9 Biocenter Oulu and Department of Pathology, University of Oulu and Oulu University Hospital, Oulu, FI-90220, Finland 10 Oulu Center for Cell-Matrix Research, Biocenter Oulu and Faculty of Biochemistry and Molecular Medicine, University of Oulu, Oulu, FI-90014, Finland Correspondence to: Matthias Nees, e-mail: matthias.nees@btk.fi Keywords: 3D co-culture, cancer associated fibroblast (CAF), phenotypic screening, invasion, focal adhesion kinase (FAK) Received: March 21, 2015 Accepted: August 24, 2015 Published: September 03, 2015 ABSTRACT Cancer-associated fibroblasts (CAFs) constitute an important part of the tumor microenvironment and promote invasion via paracrine functions and physical impact on the tumor. Although the importance of including CAFs into three-dimensional (3D) cell cultures has been acknowledged, computational support for quantitative live-cell measurements of complex cell cultures has been lacking. Here, we have developed a novel automated pipeline to model tumor-stroma interplay, track motility and quantify morphological changes of 3D co-cultures, in real-time live-cell settings. The platform consists of microtissues from prostate cancer cells, combined with CAFs in extracellular matrix that allows biochemical perturbation. Tracking of fibroblast dynamics revealed that CAFs guided the way for tumor cells to invade and increased the growth and invasiveness of tumor organoids. We utilized the platform to determine the efficacy of inhibitors in prostate cancer and the associated tumor microenvironment as a functional unit. Interestingly, certain inhibitors selectively disrupted tumor-CAF interactions, e.g. focal adhesion kinase (FAK) inhibitors specifically blocked tumor growth and invasion concurrently with fibroblast spreading and motility. This complex phenotype was not detected in other standard in vitro models. These results highlight the advantage of our approach, which recapitulates tumor histology and can significantly improve cancer target validation in vitro .
Published: 2015
Full Text: View/download PDF

212. Full-Frame Scene Coordinate Regression for Image-Based Localization

Author: Juho Kannala, Juha Ylioinas, and Xiaotian Li
Subjects: FOS: Computer and information sciences, ta113, Pixel, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Frame (networking), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Computer Science - Computer Vision and Pattern Recognition, Context (language use), 02 engineering and technology, RANSAC, Overfitting, Convolutional neural network, Random forest, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business
Abstract: Image-based localization, or camera relocalization, is a fundamental problem in computer vision and robotics, and it refers to estimating camera pose from an image. Recent state-of-the-art approaches use learning based methods, such as Random Forests (RFs) and Convolutional Neural Networks (CNNs), to regress for each pixel in the image its corresponding position in the scene's world coordinate frame, and solve the final pose via a RANSAC-based optimization scheme using the predicted correspondences. In this paper, instead of in a patch-based manner, we propose to perform the scene coordinate regression in a full-frame manner to make the computation efficient at test time and, more importantly, to add more global context to the regression process to improve the robustness. To do so, we adopt a fully convolutional encoder-decoder neural network architecture which accepts a whole image as input and produces scene coordinate predictions for all pixels in the image. However, using more global context is prone to overfitting. To alleviate this issue, we propose to use data augmentation to generate more data for training. In addition to the data augmentation in 2D image space, we also augment the data in 3D space. We evaluate our approach on the publicly available 7-Scenes dataset, and experiments show that it has better scene coordinate predictions and achieves state-of-the-art results in localization with improved robustness on the hardest frames (e.g., frames with repeated structures)., RSS 2018
Published: 2018

213. Real-time human pose estimation with convolutional neural networks

Author: Esa Rahtu, Marko Linna, and Juho Kannala
Subjects: ta113, Person detection, business.industry, Computer science, Human pose estimation, Pattern recognition, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Convolutional neural networks, Artificial intelligence, business, Pose, 0105 earth and related environmental sciences
Abstract: In this paper, we present a method for real-time multi-person human pose estimation from video by utilizing convolutional neural networks. Our method is aimed for use case specific applications, where good accuracy is essential and variation of the background and poses is limited. This enables us to use a generic network architecture, which is both accurate and fast. We divide the problem into two phases: (1) pre-training and (2) finetuning. In pre-training, the network is learned with highly diverse input data from publicly available datasets, while in finetuning we train with application specific data, which we record with Kinect. Our method differs from most of the state-of-the-art methods in that we consider the whole system, including person detector, pose estimator and an automatic way to record application specific training material for finetuning. Our method is considerably faster than many of the state-of-the-art methods. Our method can be thought of as a replacement for Kinect in restricted environments. It can be used for tasks, such as gesture control, games, person tracking, action recognition and action tracking. We achieved accuracy of 96.8% (PCK@0.2) with application specific data.
Published: 2018

214. Robust Gyroscope-Aided Camera Self-Calibration

Author: Juho Kannala, Santiago Cortes Reina, and Arno Solin
Subjects: FOS: Computer and information sciences, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Initialization, 02 engineering and technology, law.invention, Computer Science - Robotics, Extended Kalman filter, law, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Differentiable function, ta113, business.industry, Distortion (optics), 020206 networking & telecommunications, Gyroscope, Image stabilization, 020201 artificial intelligence & image processing, Artificial intelligence, business, Monocular vision, Robotics (cs.RO), Camera resectioning
Abstract: Camera calibration for estimating the intrinsic parameters and lens distortion is a prerequisite for various monocular vision applications including feature tracking and video stabilization. This application paper proposes a model for estimating the parameters on the fly by fusing gyroscope and camera data, both readily available in modern day smartphones. The model is based on joint estimation of visual feature positions, camera parameters, and the camera pose, the movement of which is assumed to follow the movement predicted by the gyroscope. Our model assumes the camera movement to be free, but continuous and differentiable, and individual features are assumed to stay stationary. The estimation is performed online using an extended Kalman filter, and it is shown to outperform existing methods in robustness and insensitivity to initialization. We demonstrate the method using simulated data and empirical data from an iPad., Comment: Appearing in Proceedings of the International Conference on Information Fusion (FUSION 2018)
Published: 2018
Full Text: View/download PDF

215. Fast motion deblurring for feature detection and matching using inertial measurements

Author: Simo Särkkä, Janne Mustaniemi, Jiri Matas, Juho Kannala, and Janne Heikkilä
Subjects: FOS: Computer and information sciences, 0209 industrial biotechnology, Deblurring, Computer science, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Iterative reconstruction, Simultaneous localization and mapping, 020901 industrial engineering & automation, Robustness (computer science), Distortion, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, ta113, business.industry, Motion blur, Rolling shutter, Computer Science::Graphics, Kernel (image processing), Computer Science::Computer Vision and Pattern Recognition, 020201 artificial intelligence & image processing, Deconvolution, Artificial intelligence, business
Abstract: Many computer vision and image processing applications rely on local features. It is well-known that motion blur decreases the performance of traditional feature detectors and descriptors. We propose an inertial-based deblurring method for improving the robustness of existing feature detectors and descriptors against the motion blur. Unlike most deblurring algorithms, the method can handle spatially-variant blur and rolling shutter distortion. Furthermore, it is capable of running in real-time contrary to state-of-the-art algorithms. The limitations of inertial-based blur estimation are taken into account by validating the blur estimates using image data. The evaluation shows that when the method is used with traditional feature detector and descriptor, it increases the number of detected keypoints, provides higher repeatability and improves the localization accuracy. We also demonstrate that such features will lead to more accurate and complete reconstructions when used in the application of 3D visual reconstruction.
Published: 2018

216. Accurate 3-D Reconstruction with RGB-D Cameras using Depth Map Fusion and Pose Refinement

Author: Juho Kannala, Janne Heikkilä, and Markus Ylimäki
Subjects: FOS: Computer and information sciences, ta113, Sequence, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Point cloud, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020206 networking & telecommunications, 02 engineering and technology, Iterative reconstruction, Depth map, 0202 electrical engineering, electronic engineering, information engineering, RGB color model, 020201 artificial intelligence & image processing, Computer vision, Noise (video), Artificial intelligence, business
Abstract: Depth map fusion is an essential part in both stereo and RGB-D based 3-D reconstruction pipelines. Whether produced with a passive stereo reconstruction or using an active depth sensor, such as Microsoft Kinect, the depth maps have noise and may have poor initial registration. In this paper, we introduce a method which is capable of handling outliers, and especially, even significant registration errors. The proposed method first fuses a sequence of depth maps into a single non-redundant point cloud so that the redundant points are merged together by giving more weight to more certain measurements. Then, the original depth maps are re-registered to the fused point cloud to refine the original camera extrinsic parameters. The fusion is then performed again with the refined extrinsic parameters. This procedure is repeated until the result is satisfying or no significant changes happen between iterations. The method is robust to outliers and erroneous depth measurements as well as even significant depth map registration errors due to inaccurate initial camera poses., Comment: Accepted to ICPR 2018
Published: 2018

217. Learning Image Relations with Contrast Association Networks

Author: Zhirong Yang, Samuel Kaski, Juho Kannala, and Yao Lu
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Artificial neural network, Relation (database), business.industry, Computer science, Association (object-oriented programming), Computer Vision and Pattern Recognition (cs.CV), Optical flow, Computer Science - Computer Vision and Pattern Recognition, Inference, Contrast (statistics), 02 engineering and technology, Image (mathematics), Machine Learning (cs.LG), 03 medical and health sciences, 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Representation (mathematics), 030217 neurology & neurosurgery
Abstract: Inferring the relations between two images is an important class of tasks in computer vision. Examples of such tasks include computing optical flow and stereo disparity. We treat the relation inference tasks as a machine learning problem and tackle it with neural networks. A key to the problem is learning a representation of relations. We propose a new neural network module, contrast association unit (CAU), which explicitly models the relations between two sets of input variables. Due to the non-negativity of the weights in CAU, we adopt a multiplicative update algorithm for learning these weights. Experiments show that neural networks with CAUs are more effective in learning five fundamental image transformations than conventional neural networks. © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Published: 2017

218. Image-based Localization using Hourglass Networks

Author: Juho Kannala, Iaroslav Melekhov, Juha Ylioinas, and Esa Rahtu
Subjects: FOS: Computer and information sciences, Computer science, Decoding, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Convolutional codes, 02 engineering and technology, 010501 environmental sciences, 01 natural sciences, Convolutional neural network, Convolution, law.invention, law, 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Computer architecture, 0105 earth and related environmental sciences, ta113, Orientation (computer vision), business.industry, Motion blur, Cameras, Solid modeling, Three-dimensional displays, 020201 artificial intelligence & image processing, Hourglass, Artificial intelligence, business
Abstract: In this paper, we propose an encoder-decoder convolutional neural network (CNN) architecture for estimating camera pose (orientation and location) from a single RGB-image. The architecture has a hourglass shape consisting of a chain of convolution and up-convolution layers followed by a regression part. The up-convolution layers are introduced to preserve the fine-grained information of the input image. Following the common practice, we train our model in end-to-end manner utilizing transfer learning from large scale classification data. The experiments demonstrate the performance of the approach on data exhibiting different lighting conditions, reflections, and motion blur. The results indicate a clear improvement over the previous state-of-the-art even when compared to methods that utilize sequence of test frames instead of a single frame., Comment: Camera-ready version for ICCVW 2017 (fixed glitches in abstract)
Published: 2017
Full Text: View/download PDF

219. Deep learning for magnification independent breast cancer histopathology image classification

Author: Juho Kannala, Neslihan Bayramoglu, and Janne Heikkilä
Subjects: medicine.medical_specialty, Computer science, Magnification, 02 engineering and technology, Malignancy, Convolutional neural network, 030218 nuclear medicine & medical imaging, Set (abstract data type), 03 medical and health sciences, 0302 clinical medicine, Breast cancer, Microscopy, 0202 electrical engineering, electronic engineering, information engineering, medicine, Computer vision, Medical diagnosis, ta113, Contextual image classification, business.industry, Deep learning, Digital imaging, Cancer, medicine.disease, 020201 artificial intelligence & image processing, Histopathology, Artificial intelligence, business
Abstract: Microscopic analysis of breast tissues is necessary for a definitive diagnosis of breast cancer which is the most common cancer among women. Pathology examination requires time consuming scanning through tissue images under different magnification levels to find clinical assessment clues to produce correct diagnoses. Advances in digital imaging techniques offers assessment of pathology images using computer vision and machine learning methods which could automate some of the tasks in the diagnostic pathology workflow. Such automation could be beneficial to obtain fast and precise quantification, reduce observer variability, and increase objectivity. In this work, we propose to classify breast cancer histopathology images independent of their magnifications using convolutional neural networks (CNNs). We propose two different architectures; single task CNN is used to predict malignancy and multi-task CNN is used to predict both malignancy and image magnification level simultaneously. Evaluations and comparisons with previous results are carried out on BreaKHis dataset. Experimental results show that our magnification independent CNN approach improved the performance of magnification specific model. Our results in this limited set of training data are comparable with previous state-of-the-art results obtained by hand-crafted features. However, unlike previous methods, our approach has potential to directly benefit from additional training data, and such additional data could be captured with same or different magnification levels than previous data.
Published: 2017

220. Terrain navigation in the magnetic landscape: Particle filtering for indoor positioning

Author: Simo Särkkä, Arno Solin, Juho Kannala, and Esa Rahtu
Subjects: Matching (statistics), ta213, business.industry, Monte Carlo method, Probabilistic logic, Terrain, Map matching, Computer Science::Robotics, symbols.namesake, Geography, Kriging, symbols, Computer vision, Artificial intelligence, business, Particle filter, Gaussian process
Abstract: Variations in the ambient magnetic field can be used as features in indoor positioning and navigation. We describe a technique for map matching where the pedestrian movement is matched to a map of the magnetic landscape. The map matching algorithm is based on a particle filter, a recursive Monte Carlo method, and follows the classical terrain matching framework used in aircraft positioning and navigation. A recent probabilistic Gaussian process regression based method for modeling the ambient magnetic field is employed in the framework. The feasibility of this terrain matching approach is demonstrated in a simple real-life indoor positioning example, where both the mapping and positioning is done using a smartphone device.
Published: 2016
Full Text: View/download PDF

221. Joint cell segmentation and tracking using cell proposals

Author: Lauri Eklund, Saad Ullah Akram, Juho Kannala, and Janne Heikkilä
Subjects: ta113, 0301 basic medicine, Segmentation-based object categorization, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Volume (computing), Scale-space segmentation, Cell segmentation, Image segmentation, Tracking (particle physics), 030218 nuclear medicine & medical imaging, 03 medical and health sciences, cell proposals, 030104 developmental biology, 0302 clinical medicine, cell tracking, joint segmentation and tracking, Path (graph theory), Graph (abstract data type), Computer vision, Artificial intelligence, business, Joint (audio engineering), cell segmentation
Abstract: Time-lapse microscopy imaging has advanced rapidly in last few decades and is producing large volume of data in cell and developmental biology. This has increased the importance of automated analyses, which depend heavily on cell segmentation and tracking as these are the initial stages when computing most biologically important cell properties. In this paper, we propose a novel joint cell segmentation and tracking method for fluorescence microscopy sequences, which generates a large set of cell proposals, creates a graph representing different cell events and then iteratively finds the most probable path within this graph providing cell segmentations and tracks. We evaluate our method on three datasets from ISBI Cell Tracking Challenge and show that our greedy nonoptimal joint solution results in improved performance compared with state of the art methods.
Published: 2016
Full Text: View/download PDF

222. Siamese network features for image matching

Author: Esa Rahtu, Iaroslav Melekhov, and Juho Kannala
Subjects: Matching (statistics), Feature vector, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, 0202 electrical engineering, electronic engineering, information engineering, Training, Computer vision, Visual Word, Image retrieval, Feature detection (computer vision), Mathematics, ta113, Contextual image classification, Image matching, business.industry, Template matching, Network architecture, 020206 networking & telecommunications, Pattern recognition, Automatic image annotation, 020201 artificial intelligence & image processing, Artificial intelligence, Euclidean distance, business, Neural networks
Abstract: Finding matching images across large datasets plays a key role in many computer vision applications such as structure-from-motion (SfM), multi-view 3D reconstruction, image retrieval, and image-based localisation. In this paper, we propose finding matching and non-matching pairs of images by representing them with neural network based feature vectors, whose similarity is measured by Euclidean distance. The feature vectors are obtained with convolutional neural networks which are learnt from labeled examples of matching and non-matching image pairs by using a contrastive loss function in a Siamese network architecture. Previously Siamese architecture has been utilised in facial image verification and in matching local image patches, but not yet in generic image retrieval or whole-image matching. Our experimental results show that the proposed features improve matching performance compared to baseline features obtained with networks which are trained for image classification task. The features generalize well and improve matching of images of new landmarks which are not seen at training time. This is despite the fact that the labeling of matching and non-matching pairs is imperfect in our training data. The results are promising considering image retrieval applications, and there is potential for further improvement by utilising more training image pairs with more accurate ground truth labels.
Published: 2016

223. Inertial-Based Scale Estimation for Structure from Motion on Mobile Devices

Author: Juho Kannala, Jiri Matas, Simo Särkkä, Janne Heikkilä, and Janne Mustaniemi
Subjects: FOS: Computer and information sciences, 0209 industrial biotechnology, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Hardware, 020901 industrial engineering & automation, Angular velocity, Match moving, Inertial measurement unit, Robustness (computer science), 0202 electrical engineering, electronic engineering, information engineering, Structure from motion, Computer vision, Time domain, Visualization, ta113, Ground truth, business.industry, Motion blur, Rolling shutter, Cameras, Scale factor, Frequency domain, Image reconstruction, 020201 artificial intelligence & image processing, Artificial intelligence, business, Estimation
Abstract: Structure from motion algorithms have an inherent limitation that the reconstruction can only be determined up to the unknown scale factor. Modern mobile devices are equipped with an inertial measurement unit (IMU), which can be used for estimating the scale of the reconstruction. We propose a method that recovers the metric scale given inertial measurements and camera poses. In the process, we also perform a temporal and spatial alignment of the camera and the IMU. Therefore, our solution can be easily combined with any existing visual reconstruction software. The method can cope with noisy camera pose estimates, typically caused by motion blur or rolling shutter artifacts, via utilizing a Rauch-Tung-Striebel (RTS) smoother. Furthermore, the scale estimation is performed in the frequency domain, which provides more robustness to inaccurate sensor time stamps and noisy IMU samples than the previously used time domain representation. In contrast to previous methods, our approach has no parameters that need to be tuned for achieving a good performance. In the experiments, we show that the algorithm outperforms the state-of-the-art in both accuracy and convergence speed of the scale estimate. The accuracy of the scale is around $1\%$ from the ground truth depending on the recording. We also demonstrate that our method can improve the scale accuracy of the Project Tango's build-in motion tracking.
Published: 2016
Full Text: View/download PDF

224. Cell proposal network for microscopy image analysis

Author: Juho Kannala, Janne Heikkilä, Lauri Eklund, and Saad Ullah Akram
Subjects: 0301 basic medicine, Computer science, Feature extraction, Cell proposals, computer.software_genre, Convolutional neural network, 030218 nuclear medicine & medical imaging, Image (mathematics), 03 medical and health sciences, 0302 clinical medicine, Segmentation, ta113, Cell detection, Fully convolutional network, business.industry, Deep learning, Image segmentation, Fluorescence, Variable (computer science), 030104 developmental biology, Cell tracking, Key (cryptography), Artificial intelligence, Data mining, business, computer
Abstract: Robust cell detection plays a key role in the development of reliable methods for automated analysis of microscopy images. It is a challenging problem due to low contrast, variable fluorescence, weak boundaries, conjoined and overlapping cells, causing most cell detection methods to fail in difficult situations. One approach for overcoming these challenges is to use cell proposals, which enable the use of more advanced features from ambiguous regions and/or information from adjacent frames to make better decisions. However, most current methods rely on simple proposal generation and scoring methods, which limits the performance they can reach. In this paper, we propose a convolutional neural network based method which generates cell proposals to facilitate cell detection, segmentation and tracking. We compare our method against commonly used proposal generation and scoring methods and show that our method generates significantly better proposals, and achieves higher final recall and average precision.
Published: 2016

225. Adaptive Kalman filtering and smoothing for gravitation tracking in mobile systems

Author: Esa Rahtu, Simo Särkkä, Ville Tolvanen, and Juho Kannala
Subjects: business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Context (language use), Kalman filter, Filter (signal processing), Invariant extended Kalman filter, Extended Kalman filter, Computer vision, Fast Kalman filter, Artificial intelligence, business, Alpha beta filter, Smoothing, Mathematics
Abstract: This paper is concerned with inertial-sensor-based tracking of the gravitation direction in mobile devices such as smartphones. Although this tracking problem is a classical one, choosing a good state-space for this problem is not entirely trivial. Even though for many other orientation related tasks a quaternion-based representation tends to work well, for gravitation tracking their use is not always advisable. In this paper we present a convenient linear quaternion-free state-space model for gravitation tracking. We also discuss the efficient implementation of the Kalman filter and smoother for the model. Furthermore, we propose an adaption mechanism for the Kalman filter which is able to filter out shot-noises similarly as has been proposed in context of adaptive and robust Kalman filtering. We compare the proposed approach to other approaches using measurement data collected with a smartphone.
Published: 2015
Full Text: View/download PDF

226. Unsupervised learning of overcomplete face descriptors

Author: Abdenour Hadid, Matti Pietikäinen, Juho Kannala, and Juha Ylioinas
Subjects: Discriminative model, Computer science, Feature (computer vision), business.industry, Dimensionality reduction, Face (geometry), Feature extraction, Three-dimensional face recognition, Unsupervised learning, Pattern recognition, Artificial intelligence, business, Facial recognition system
Abstract: The current state-of-the-art indicates that a very discriminative unsupervised face representation can be constructed by encoding overlapping multi-scale face image patches at facial landmarks. If fixed as such, there are even suggestions (albeit subtle) that the underlying features may no longer have as much meaning. In spite of the effectiveness of this strategy, we argue that one may still afford to improve especially at the feature level. In this paper, we investigate the role of overcompleteness in features for building unsupervised face representations. In our approach, we first learn an overcomplete basis from a set of sampled face image patches. Then, we use this basis to produce features that are further encoded using the Bag-of-Features (BoF) approach. Using our method, without an extensive use of facial landmarks, one is able to construct a single-scale representation reaching state-of-the-art performance in face recognition and age estimation following the protocols of LFW, FERET, and Adience benchmarks. Furthermore, we make several interesting findings related, for example, to the positive impact of applying soft feature encoding scheme preceding standard dimensionality reduction. To this end, making the encoding faster, we propose a novel method for approximative soft-assignment which we show to perform better than its hard-assigned counterpart.
Published: 2015
Full Text: View/download PDF

227. NOVEL FEATURE DESCRIPTOR BASED ON MICROSCOPY IMAGE STATISTICS

Author: Neslihan Bayramoglu, Janne Heikkilä, Malin Åkerfelt, Mika Kaakinen, Matthias Nees, Lauri Eklund, and Juho Kannala
Subjects: Pixel, business.industry, GLOH, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Filter (signal processing), ta3111, Pipeline (software), Independent component analysis, Image (mathematics), Computer Science::Computer Vision and Pattern Recognition, Histogram, Statistics, Computer vision, Artificial intelligence, Representation (mathematics), business, Mathematics
Abstract: In this paper, we propose a novel feature description algorithm based on image statistics. The pipeline first performs independent component analysis on training image patches to obtain basis vectors (filters) for a lower dimensional representation. Then for a given image, a set of filter responses at each pixel is computed. Finally, a histogram representation, which considers the signs and magnitudes of the responses as well as the number of filters, is applied on local image patches. We propose to apply this idea to a microscopy image pixel identification system based on a learning framework. Experimental results show that the proposed algorithm performs better than the state-of-the-art descriptors in biomedical images of different microscopy modalities.
Published: 2015

228. DT-SLAM: Deferred Triangulation for Robust SLAM

Author: Kihwan Kim, Janne Heikkilä, C Daniel Herrera, Juho Kannala, and Kari Pulli
Subjects: Robustness (computer science), business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Key frame, Monocular slam, Computer vision, Bundle adjustment, Artificial intelligence, business
Abstract: Obtaining a good baseline between different video frames is one of the key elements in vision-based monocular SLAM systems. However, if the video frames contain only a few 2D feature correspondences with a good baseline, or the camera only rotates without sufficient translation in the beginning, tracking and mapping becomes unstable. We introduce a real-time visual SLAM system that incrementally tracks individual 2D features, and estimates camera pose by using matched 2D features, regardless of the length of the baseline. Triangulating 2D features into 3D points is deferred until key frames with sufficient baseline for the features are available. Our method can also deal with pure rotational motions, and fuse the two types of measurements in a bundle adjustment step. Adaptive criteria for key frame selection are also introduced for efficient optimization and dealing with multiple maps. We demonstrate that our SLAM system improves camera pose estimates and robustness, even with purely rotational motions.
Published: 2014
Full Text: View/download PDF

229. An In-depth Examination of Local Binary Descriptors in Unconstrained Face Recognition

Author: Abdenour Hadid, Juho Kannala, Matti Pietikäinen, and Juha Ylioinas
Subjects: business.industry, Computer science, Rank (computer programming), Pattern recognition, Machine learning, computer.software_genre, Facial recognition system, Domain (software engineering), Identification (information), Face (geometry), Three-dimensional face recognition, Artificial intelligence, business, Face detection, Protocol (object-oriented programming), computer
Abstract: Automatic face recognition in unconstrained conditions is a difficult task which has recently attained increasing attention. In this domain, face verification methods have significantly improved since the release of the Labeled Faces in the Wild database, but the related problem of face identification, is still lacking considerations, which is partly because of the shortage of representative databases. Only recently, two new datasets called Remote Face and Point-and-Shoot Challenge were published providing appropriate benchmarks for the research community to investigate the problem of face recognition in challenging imaging conditions, in both, verification and identification modes. In this paper we provide an in-depth examination of three local binary description methods in unconstrained face recognition evaluating them on these two recently published datasets. In detail, we investigate three well established methods separately and fusing them at rank- and score-levels. We are using a well-defined evaluation protocol allowing a fair comparison of our results for future examinations.
Published: 2014
Full Text: View/download PDF

230. Understanding Objects in Detail with Fine-grained Attributes

Author: Ross Girshick, Juho Kannala, Siddharth Mahendran, Naomi Saphra, Andrea Vedaldi, Esa Rahtu, Stavros Tsogkas, Subhransu Maji, Iasonas Kokkinos, David J. Weiss, Ben Taskar, Karen Simonyan, Matthew B. Blaschko, Sammy Mohamed, Department of Engineering Science, University of Oxford [Oxford], Center for Imaging Science (CIS), Johns Hopkins University (JHU), Organ Modeling through Extraction, Representation and Understanding of Medical Image Content (GALEN), Ecole Centrale Paris-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Centre de vision numérique (CVN), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec, Toyota Technological Institute, Toyota Technological Institute at Chicago [Chicago] (TTIC), Department of Computer Science, University of Chicago, University of Chicago, Center for Machine Vision Research (CMV), University of Oulu, Machine Vision Group (MVG), Google Inc., Department of Computer Science & Engineering (CSE), University of Washington [Seattle], Stony Brook University [SUNY] (SBU), State University of New York (SUNY), University of Oxford, Inria Saclay - Ile de France, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Ecole Centrale Paris
Subjects: Relation (database), business.industry, Computer science, Cognitive neuroscience of visual object recognition, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Cascade algorithm, Object (computer science), Crowdsourcing, Machine learning, computer.software_genre, Object detection, Object-class detection, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], Artificial intelligence, Data mining, business, computer
Abstract: International audience; We study the problem of understanding objects in detail, intended as recognizing a wide array of fine-grained object attributes. To this end, we introduce a dataset of 7,413 airplanes annotated in detail with parts and their attributes, leveraging images donated by airplane spotters and crowd-sourcing both the design and collection of the detailed annotations. We provide a number of insights that should help researchers interested in designing fine-grained datasets for other basic level categories. We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object. We note that the prediction of certain attributes can benefit substantially from accurate part detection. We also show that, differently from previous results in object detection, employing a large number of part templates can improve detection accuracy at the expenses of detection speed. We finally propose a coarse-to-fine approach to speed up detection through a hierarchical cascade algorithm.
Published: 2014
Full Text: View/download PDF

231. Generating Object Segmentation Proposals Using Global and Local Search

Author: Juho Kannala, Pekka Rantalankila, and Esa Rahtu
Subjects: Hierarchy (mathematics), business.industry, Cut, Graph (abstract data type), Pattern recognition, Segmentation, Local search (optimization), Graph theory, Artificial intelligence, Image segmentation, business, Object detection, Mathematics
Abstract: We present a method for generating object segmentation proposals from groups of superpixels. The goal is to propose accurate segmentations for all objects of an image. The proposed object hypotheses can be used as input to object detection systems and thereby improve efficiency by replacing exhaustive search. The segmentations are generated in a class-independent manner and therefore the computational cost of the approach is independent of the number of object classes. Our approach combines both global and local search in the space of sets of superpixels. The local search is implemented by greedily merging adjacent pairs of superpixels to build a bottom-up segmentation hierarchy. The regions from such a hierarchy directly provide a part of our region proposals. The global search provides the other part by performing a set of graph cut segmentations on a superpixel graph obtained from an intermediate level of the hierarchy. The parameters of the graph cut problems are learnt in such a manner that they provide complementary sets of regions. Experiments with Pascal VOC images show that we reach state-of-the-art with greatly reduced computational cost.
Published: 2014
Full Text: View/download PDF

232. Learning local image descriptors using binary decision trees

Author: Juho Kannala, Abdenour Hadid, Juha Ylioinas, and Matti Pietikäinen
Subjects: Pixel, Computer science, business.industry, Binary decision diagram, Visual descriptors, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Decision tree, Pattern recognition, Machine learning, computer.software_genre, ComputingMethodologies_PATTERNRECOGNITION, Categorization, Unsupervised learning, Entropy (information theory), Binary code, Artificial intelligence, business, computer
Abstract: In this paper we propose a unified framework for learning such local image descriptors that describe pixel neighborhoods using binary codes. The descriptors are constructed using binary decision trees which are learnt from a set of training image patches. Our framework generalizes several previously proposed binary descriptors, such as BRIEF, LBP and their variants, and provides a principled way to learn new constructions which have not been previously studied. Further, the proposed framework can utilize both labeled or unlabeled training data, and hence fits to both supervised and unsupervised learning scenarios. We evaluate our framework using varying levels of supervision in the learning phase. The experiments show that our descriptor constructions perform comparably to benchmark descriptors in two different applications, namely texture categorization and age group classification from facial images.
Published: 2014
Full Text: View/download PDF

233. Scandinavian Conference on Image Analysis (SCIA)

Author: Matthew B. Blaschko, Juho Kannala, and Esa Rahtu
Published: 2013

234. A learned joint depth and intensity prior using Markov Random fields

Author: Peter Sturm, C Daniel Herrera, Janne Heikkilä, Juho Kannala, Center for Machine Vision Research (CMV), University of Oulu, Sustainability transition, environment, economy and local policy (STEEP), Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), and Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)
Subjects: Markov random field, Random field, Markov chain, Computer science, business.industry, Inpainting, Markov process, Sampling (statistics), [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 020207 software engineering, Pattern recognition, 02 engineering and technology, symbols.namesake, Generative model, 0202 electrical engineering, electronic engineering, information engineering, symbols, 020201 artificial intelligence & image processing, Artificial intelligence, business, Intensity (heat transfer)
Abstract: International audience; We present a joint prior that takes intensity and depth information into account. The prior is defined using a flexible Field-of-Experts model and is learned from a database of natural images. It is a generative model and has an efficient method for sampling. We use sampling from the model to perform in painting and up sampling of depth maps when intensity information is available. We show that including the intensity information in the prior improves the results obtained from the model. We also compare to another two-channel inpainting approach and show superior results.
Published: 2013
Full Text: View/download PDF

235. Joint depth and color camera calibration with distortion correction

Author: C Daniel Herrera, Janne Heikkil #x E, and Juho Kannala
Subjects: Computer science, business.industry, Applied Mathematics, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Astrophysics::Instrumentation and Methods for Astrophysics, Planar, Computational Theory and Mathematics, Artificial Intelligence, Camera auto-calibration, Computer Science::Computer Vision and Pattern Recognition, Distortion, Pinhole camera model, Computer vision, Computer Vision and Pattern Recognition, Artificial intelligence, Noise (video), business, Software, Camera resectioning
Abstract: We present an algorithm that simultaneously calibrates two color cameras, a depth camera, and the relative pose between them. The method is designed to have three key features: accurate, practical, and applicable to a wide range of sensors. The method requires only a planar surface to be imaged from various poses. The calibration does not use depth discontinuities in the depth image, which makes it flexible and robust to noise. We apply this calibration to a Kinect device and present a new depth distortion model for the depth sensor. We perform experiments that show an improved accuracy with respect to the manufacturer's calibration.
Published: 2012

236. MULTI-VIEW SURFACE RECONSTRUCTION BY QUASI-DENSE WIDE BASELINE MATCHING

Author: Pekka Koskenkorva, Sami S. Brandt, Juho Kannala, and Markus Ylimäki
Subjects: Matching (statistics), business.industry, Computer science, Computer vision, Artificial intelligence, Baseline (configuration management), business, Surface reconstruction
Published: 2011
Full Text: View/download PDF

237. Generating dense depth maps using a patch cloud and local planar surface models

Author: C Daniel Herrera, Juho Kannala, and Janne Heikkilä
Subjects: Surface (mathematics), Pixel, Plane (geometry), business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Iterative reconstruction, Planar, Depth map, Cut, Computer vision, Artificial intelligence, business, Surface reconstruction, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Patch cloud based multi-view stereo methods have proven to be an accurate and scalable approach for scene reconstruction. Their applicability, however, is limited due to the semi-dense nature of their reconstruction. We propose a method to generate a dense depth map from a patch cloud by assuming a planar surface model for non-reconstructed areas. We use local evidence to estimate the best fitting plane around missing areas. We then apply a graph cut optimization to select the best plane for each pixel. We demonstrate our approach with a challenging scene containing planar and non-planar surfaces.
Published: 2011
Full Text: View/download PDF

238. Quasi-dense Wide Baseline Matching for Three Views

Author: Juho Kannala, Sami S. Brandt, and Pekka Koskenkorva
Subjects: Set (abstract data type), Matching (statistics), Stereopsis, Image texture, Pixel, Computer science, business.industry, Computer vision, Pattern recognition, Iterative reconstruction, Artificial intelligence, business
Abstract: This paper proposes a method for computing a quasi-dense set of matching points between three views of a scene. The method takes a sparse set of seed matches between pairs of views as input and then propagates the seeds to neighboring regions. The proposed method is based on the best-first match propagation strategy, which is here extended from two-view matching to the case of three views. The results show that utilizing the three-view constraint during the correspondence growing improves the accuracy of matching and reduces the occurrence of outliers. In particular, compared with two-view stereo, our method is more robust for repeating texture. Since the proposed approach is able to produce high quality depth maps from only three images, it could be used in multi-view stereo systems that fuse depth maps from multiple views.
Published: 2010
Full Text: View/download PDF

239. Uncalibrated non-rigid factorisation with automatic shape basis selection

Author: Anders Heyden, Sami S. Brandt, Juho Kannala, and Pekka Koskenkorva
Subjects: Harris affine region detector, business.industry, Affine shape adaptation, Affine coordinate system, Affine involution, Affine combination, Affine hull, Affine space, Computer vision, Artificial intelligence, Affine transformation, business, Algorithm, Mathematics
Abstract: We propose an extension to the non-rigid factorisation method to solve the affine structure and motion of a deformable object, where the shape basis is selected automatically. In contrast to earlier approaches, we assume a general uncalibrated, affine camera model whereas most of the previous approaches assume a special case such as an orthographic, weak-perspective or paraperspective camera model. In general, there is a global affine ambiguity for the shape bases. It turns out that a natural way of selecting the shape bases is to pick up the bases that are statistically as independent as possible. The independent bases can be found by independent subspace analysis (ISA) which leads to the minimisation of mutual information between the basis shapes. After selecting the shape basis by ISA, only the in-the-subspace affine ambiguities remain from the general affine ambiguity. To solve the remaining unknowns of the general affine transformation, we propose an iterative method that recovers the block structure of the factored motion matrix. The experiments are provided with synthetic structure and real face expression data in 2D and 3D, which show promising results.
Published: 2009
Full Text: View/download PDF

240. Geometric Camera Calibration

Author: Juho Kannala, Janne Heikkilä, and Sami S. Brandt
Subjects: Calibration (statistics), Camera matrix, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Astrophysics::Instrumentation and Methods for Astrophysics, Triangulation (computer vision), Camera auto-calibration, Computer Science::Computer Vision and Pattern Recognition, Computer graphics (images), Pinhole camera model, Computer vision, Artificial intelligence, Three-CCD camera, business, Stereo camera, Camera resectioning
Abstract: Geometric camera calibration is a prerequisite for making accurate geometric measurements from image data, and hence it is a fundamental task in computer vision. This article gives a discussion about the camera models and calibration methods used in the field. The emphasis is on conventional calibration methods in which the parameters of the camera model are determined by using images of a calibration object whose geometric properties are known. The presented techniques are illustrated with real calibration examples in which several different kinds of cameras are calibrated using a planar calibration object. Keywords: camera calibration; camera model; computer vision; photogrammetry; central camera; omnidiractional vision; catadioptivic camera; fish.eye camera
Published: 2008
Full Text: View/download PDF

241. Object recognition and segmentation by non-rigid quasi-dense matching

Author: Sami S. Brandt, Juho Kannala, Esa Rahtu, and Janne Heikkilä
Subjects: Matching (statistics), Segmentation-based object categorization, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Cognitive neuroscience of visual object recognition, Scale-space segmentation, Image registration, Pattern recognition, Image segmentation, Computer Science::Computer Vision and Pattern Recognition, Computer vision, Segmentation, Affine transformation, Artificial intelligence, business, ComputingMethodologies_COMPUTERGRAPHICS, Mathematics
Abstract: In this paper, we present a non-rigid quasi-dense matching method and its application to object recognition and segmentation. The matching method is based on the match propagation algorithm which is here extended by using local image gradients for adapting the propagation to smooth non-rigid deformations of the imaged surfaces. The adaptation is based entirely on the local properties of the images and the method can be hence used in non-rigid image registration where global geometric constraints are not available. Our approach for object recognition and segmentation is directly built on the quasi-dense matching. The quasi-dense pixel matches between the model and test images are grouped into geometrically consistent groups using a method which utilizes the local affine transformation estimates obtained during the propagation. The number and quality of geometrically consistent matches is used as a recognition criterion and the location of the matching pixels directly provides the segmentation. The experiments demonstrate that our approach is able to deal with extensive background clutter, partial occlusion, large scale and viewpoint changes, and notable geometric deformations.
Published: 2008
Full Text: View/download PDF

242. Quasi-Dense Wide Baseline Matching Using Match Propagation

Author: Sami S. Brandt and Juho Kannala
Subjects: Transformation (function), Matching (graph theory), business.industry, Geometric transformation, Pattern recognition, Covariant transformation, Algorithm design, Solid modeling, Affine transformation, Artificial intelligence, business, Pose, Mathematics
Abstract: In this paper we propose extensions to the match propagation algorithm which is a technique for computing quasi-dense point correspondences between two views. The extensions make the match propagation applicable for wide baseline matching, i.e., for cases where the camera pose can vary a lot between the views. Our first extension is to use a local affine model for the geometric transformation between the images. The estimate of the local transformation is obtained from affine covariant interest regions which are used as seed matches. The second extension is to use the second order intensity moments to adapt the current estimate of the local affine transformation during the propagation. This allows a single seed match to propagate into regions where the local transformation between the views differs from the initial one. The experiments with real data show that the proposed techniques improve both the quality and coverage of the quasi-dense disparity map.
Published: 2007
Full Text: View/download PDF

243. Object localization by subspace clustering of local descriptors

Author: Juho Kannala, Stéphane Girard, Cordelia Schmid, Charles Bouveyron, Learning and recognition in vision (LEAR), Laboratoire d'informatique GRAphique, VIsion et Robotique de Grenoble (GRAVIR - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique de Grenoble (INPG)-Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Machine Vision Group (MVG), University of Oulu, Modelling and Inference of Complex and Structured Stochastic Systems [?-2006] (MISTIS [?-2006]), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Prem Kalra and Shmuel Peleg, and Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes
Subjects: Clustering high-dimensional data, Fuzzy clustering, business.industry, Correlation clustering, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], Pattern recognition, 02 engineering and technology, 01 natural sciences, 010104 statistics & probability, ComputingMethodologies_PATTERNRECOGNITION, CURE data clustering algorithm, 0202 electrical engineering, electronic engineering, information engineering, Canopy clustering algorithm, FLAME clustering, 020201 artificial intelligence & image processing, Artificial intelligence, 0101 mathematics, business, Cluster analysis, k-medians clustering, Mathematics
Abstract: National audience; This paper presents a probabilistic approach for object localization which combines subspace clustering with the selection of discriminative clusters. Clustering is often a key step in object recognition and is penalized by the high dimensionality of the descriptors. Indeed, local descriptors, such as SIFT, which have shown excellent results in recognition, are high-dimensional and live in different low-dimensional subspaces. We therefore use a subspace clustering method called High-Dimensional Data Clustering (HDDC) which overcomes the curse of dimensionality. Furthermore, in many cases only a few of the clusters are useful to discriminate the object. We, thus, evaluate the discriminative capacity of clusters and use it to compute the probability that a local descriptor belongs to the object. Experimental results demonstrate the effectiveness of our probabilistic approach for object localization and show that subspace clustering gives better results compared to standard clustering methods. Furthermore, our approach outperforms existing results for the Pascal 2005 dataset.
Published: 2006
Full Text: View/download PDF

244. Affine registration with multi-scale autoconvolution

Author: Esa Rahtu, Janne Heikkilä, and Juho Kannala
Subjects: business.industry, Template matching, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Probabilistic logic, Image registration, Point set registration, Pattern recognition, Grayscale, Convolution, Computer vision, Affine transformation, Artificial intelligence, business, Mathematics
Abstract: In this paper we propose a novel method for the recovery of affine transformation parameters between two images. Registration is achieved without separate feature extraction by directly utilizing the intensity distribution of the images. The method can also be used for matching point sets under affine transformations. Our approach is based on the same probabilistic interpretation of the image function as the recently introduced multi-scale autoconvolution (MSA) transform. Here we describe how the framework may be used in image registration and present two variants of the method for practical implementation. The proposed method is experimented with binary and grayscale images and compared with other non-feature-based registration methods. The experiments show that the new method can efficiently align images of isolated objects and is relatively robust.
Published: 2005
Full Text: View/download PDF

245. Robust and accurate multi-view reconstruction by prioritized matching

Author: Markus Ylimaki, Juho Kannala, Jukka Holappa, Janne Heikkilä, and Sami Brandt

246. Deep Automodulators

Author: Ari Heljakka, Yuxin Hou, Juho Kannala, and Arno Solin
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Statistics - Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Machine Learning (stat.ML), Machine Learning (cs.LG)
Abstract: We introduce a new category of generative autoencoders called automodulators. These networks can faithfully reproduce individual real-world input images like regular autoencoders, but also generate a fused sample from an arbitrary combination of several such images, allowing instantaneous 'style-mixing' and other new applications. An automodulator decouples the data flow of decoder operations from statistical properties thereof and uses the latent vector to modulate the former by the latter, with a principled approach for mutual disentanglement of decoder layers. Prior work has explored similar decoder architecture with GANs, but their focus has been on random sampling. A corresponding autoencoder could operate on real input images. For the first time, we show how to train such a general-purpose model with sharp outputs in high resolution, using novel training techniques, demonstrated on four image data sets. Besides style-mixing, we show state-of-the-art results in autoencoder comparison, and visual image quality nearly indistinguishable from state-of-the-art GANs. We expect the automodulator variants to become a useful building block for image applications and other data domains., To appear in Advances in Neural Information Processing Systems (NeurIPS 2020)

247. Conditional image sampling by deep automodulators

Author: Ari Heljakka, Yuxin Hou, Juho Kannala, and Arno Solin

248. Iterative path reconstruction for large-scale inertial navigation on smartphones

Author: Santiago Cortes Reina, Yuxin Hou, Juho Kannala, and Arno Solin
Subjects: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Modern smartphones have all the sensing capabilities required for accurate and robust navigation and tracking. In specific environments some data streams may be absent, less reliable, or flat out wrong. In particular, the GNSS signal can become flawed or silent inside buildings or in streets with tall buildings. In this application paper, we aim to advance the current state-of-the-art in motion estimation using inertial measurements in combination with partial GNSS data on standard smartphones. We show how iterative estimation methods help refine the positioning path estimates in retrospective use cases that can cover both fixed-interval and fixed-lag scenarios. We compare estimation results provided by global iterated Kalman filtering methods to those of a visual-inertial tracking scheme (Apple ARKit). The practical applicability is demonstrated on real-world use cases on empirical data acquired from both smartphones and tablet devices., Comment: To appear in Proceedings FUSION 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

248 results on '"Juho Kannala"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources