3,360 results
Search Results
2. Structure of Multiple Mirror System From Kaleidoscopic Projections of Single 3D Point.
- Author
-
Takahashi, Kosuke and Nobuhara, Shohei
- Subjects
GRAPHICAL projection ,IMAGING systems ,CAMERA calibration ,PARAMETER estimation ,PROBLEM solving - Abstract
This paper proposes a novel algorithm of discovering the structure of a kaleidoscopic imaging system that consists of multiple planar mirrors and a camera. The kaleidoscopic imaging system can be recognized as the virtual multi-camera system and has strong advantages in that the virtual cameras are strictly synchronized and have the same intrinsic parameters. In this paper, we focus on the extrinsic calibration of the virtual multi-camera system. The problems to be solved in this paper are two-fold. The first problem is to identify to which mirror chamber each of the 2D projections of mirrored 3D points belongs. The second problem is to estimate all mirror parameters, i.e., normals, and distances of the mirrors. The key contribution of this paper is to propose novel algorithms for these problems using a single 3D point of unknown geometry by utilizing a kaleidoscopic projection constraint, which is an epipolar constraint on mirror reflections. We demonstrate the performance of the proposed algorithm of chamber assignment and estimation of mirror parameters with qualitative and quantitative evaluations using synthesized and real data. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Multiview Rectification of Folded Documents.
- Author
-
You, Shaodi, Matsushita, Yasuyuki, Sinha, Sudipta, Bou, Yusuke, and Ikeuchi, Katsushi
- Subjects
IMAGE recognition (Computer vision) ,THREE-dimensional imaging ,TEXT recognition ,SCANNING systems ,DIGITAL image processing ,SURFACE geometry - Abstract
Digitally unwrapping images of paper sheets is crucial for accurate document scanning and text recognition. This paper presents a method for automatically rectifying curved or folded paper sheets from a few images captured from multiple viewpoints. Prior methods either need expensive 3D scanners or model deformable surfaces using over-simplified parametric representations. In contrast, our method uses regular images and is based on general developable surface models that can represent a wide variety of paper deformations. Our main contribution is a new robust rectification method based on ridge-aware 3D reconstruction of a paper sheet and unwrapping the reconstructed surface using properties of developable surfaces via $\ell _1$
conformal mapping. We present results on several examples including book pages, folded letters and shopping receipts. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
4. A Survey on Deep Learning Techniques for Stereo-Based Depth Estimation.
- Author
-
Laga, Hamid, Jospin, Laurent Valentin, Boussaid, Farid, and Bennamoun, Mohammed
- Subjects
DEEP learning ,COMPUTER vision ,MACHINE learning ,AUGMENTED reality ,LEARNING communities ,AUTONOMOUS vehicles - Abstract
Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed through matching hand-crafted features across multiple images. Despite the extensive amount of research, these traditional techniques still suffer in the presence of highly textured areas, large uniform regions, and occlusions. Motivated by their growing success in solving various 2D and 3D vision problems, deep learning for stereo-based depth estimation has attracted a growing interest from the community, with more than 150 papers published in this area between 2014 and 2019. This new generation of methods has demonstrated a significant leap in performance, enabling applications such as autonomous driving and augmented reality. In this paper, we provide a comprehensive survey of this new and continuously growing field of research, summarize the most commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we also conjecture what the future may hold for deep learning-based stereo for depth estimation research. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. State of the Journal Editorial.
- Author
-
Dickinson, Sven
- Abstract
Presents the state of the journal review for this issue of the publication. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. Pharmacological, Non-Pharmacological Policies and Mutation: An Artificial Intelligence Based Multi-Dimensional Policy Making Algorithm for Controlling the Casualties of the Pandemic Diseases.
- Author
-
Tutsoy, Onder
- Subjects
ARTIFICIAL intelligence ,PANDEMICS ,PARAMETRIC modeling ,ALGORITHMS ,VACCINATION policies ,MULTIDIMENSIONAL databases - Abstract
Fighting against the pandemic diseases with unique characters requires new sophisticated approaches like the artificial intelligence. This paper develops an artificial intelligence algorithm to produce multi-dimensional policies for controlling and minimizing the pandemic casualties under the limited pharmacological resources. In this respect, a comprehensive parametric model with a priority and age-specific vaccination policy and a variety of non-pharmacological policies are introduced. This parametric model is utilized for constructing an artificial intelligence algorithm by following the exact analogy of the model-based solution. Also, this parametric model is manipulated by the artificial intelligence algorithm to seek for the best multi-dimensional non-pharmacological policies that minimize the future pandemic casualties as desired. The role of the pharmacological and non-pharmacological policies on the uncertain future casualties are extensively addressed on the real data. It is shown that the developed artificial intelligence algorithm is able to produce efficient policies which satisfy the particular optimization targets such as focusing on minimization of the death casualties more than the infected casualties or considering the curfews on the people age over 65 rather than the other non-pharmacological policies. The paper finally analyses a variety of the mutant virus cases and the corresponding non-pharmacological policies aiming to reduce the morbidity and mortality rates. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Introduction to the Special Section of CVPR 2017.
- Author
-
Liu, Yanxi, Rehg, James M., Taylor, Camillo J., and Wu, Ying
- Subjects
PATTERN recognition systems ,COMPUTER vision - Abstract
The papers in this special section were presented at the Computer Vision and Pattern Recognition conference. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Detailed Avatar Recovery From Single Image.
- Author
-
Zhu, Hao, Zuo, Xinxin, Yang, Haotian, Wang, Sen, Cao, Xun, and Yang, Ruigang
- Subjects
ARTIFICIAL neural networks ,AVATARS (Virtual reality) ,HUMAN body - Abstract
This paper presents a novel framework to recover detailed avatar from a single image. It is a challenging task due to factors such as variations in human shapes, body poses, texture, and viewpoints. Prior methods typically attempt to recover the human body shape using a parametric-based template that lacks the surface details. As such resulting body shape appears to be without clothing. In this paper, we propose a novel learning-based framework that combines the robustness of the parametric model with the flexibility of free-form 3D deformation. We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation (HMD) framework, utilizing the constraints from body joints, silhouettes, and per-pixel shading information. Our method can restore detailed human body shapes with complete textures beyond skinned models. Experiments demonstrate that our method has outperformed previous state-of-the-art approaches, achieving better accuracy in terms of both 2D IoU number and 3D metric distance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Recent Advances in Large Margin Learning.
- Author
-
Guo, Yiwen and Zhang, Changshui
- Subjects
ARTIFICIAL neural networks ,MACHINE learning ,SUPPORT vector machines - Abstract
This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Deep Learning-Based Multi-Focus Image Fusion: A Survey and a Comparative Study.
- Subjects
IMAGE fusion ,IMAGE processing ,DEEP learning ,GENERATIVE adversarial networks - Abstract
Multi-focus image fusion (MFIF) is an important area in image processing. Since 2017, deep learning has been introduced to the field of MFIF and various methods have been proposed. However, there is a lack of survey papers that discuss deep learning-based MFIF methods in detail. In this study, we fill this gap by giving a detailed survey on deep learning-based MFIF algorithms, including methods, datasets and evaluation metrics. To the best of our knowledge, this is the first survey paper that focuses on deep learning-based approaches in the field of MFIF. Besides, extensive experiments have been conducted to compare the performance of deep learning-based MFIF algorithms with conventional MFIF approaches. By analyzing qualitative and quantitative results, we give some observations on the current status of MFIF and discuss some future prospects of this field. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. Stopping Criterion Design for Recursive Bayesian Classification: Analysis and Decision Geometry.
- Author
-
Kocanaogullari, Aziz, Akcakaya, Murat, and Erdogmus, Deniz
- Subjects
BAYESIAN analysis ,DECISION making ,COMPUTER interfaces ,GEOMETRY ,THRESHOLDING algorithms ,TRACKING radar - Abstract
Systems that are based on recursive Bayesian updates for classification limit the cost of evidence collection through certain stopping/termination criteria and accordingly enforce decision making. Conventionally, two termination criteria based on pre-defined thresholds over (i) the maximum of the state posterior distribution; and (ii) the state posterior uncertainty are commonly used. In this paper, we propose a geometric interpretation over the state posterior progression and accordingly we provide a point-by-point analysis over the disadvantages of using such conventional termination criteria. For example, through the proposed geometric interpretation we show that confidence thresholds defined over maximum of the state posteriors suffer from stiffness that results in unnecessary evidence collection whereas uncertainty based thresholding methods are fragile to number of categories and terminate prematurely if some state candidates are already discovered to be unfavorable. Moreover, both types of termination methods neglect the evolution of posterior updates. We then propose a new stopping/termination criterion with a geometrical insight to overcome the limitations of these conventional methods and provide a comparison in terms of decision accuracy and speed. We validate our claims using simulations and using real experimental data obtained through a brain computer interfaced typing system. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Imbalance Problems in Object Detection: A Review.
- Author
-
Oksuz, Kemal, Cam, Baris Can, Kalkan, Sinan, and Akbas, Emre
- Subjects
OBJECT recognition (Computer vision) ,FEATURE extraction ,DEEP learning ,TAXONOMY - Abstract
In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce a problem-based taxonomy. Following this taxonomy, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance problems as well as imbalance problems that have not been discussed before. Moreover, in order to keep our review up to date, we provide an accompanying webpage which catalogs papers addressing imbalance problems, according to our problem-based taxonomy. Researchers can track newer studies on this webpage available at: https://github.com/kemaloksuz/ObjectDetectionImbalance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. LAEO-Net++: Revisiting People Looking at Each Other in Videos.
- Author
-
Marin-Jimenez, Manuel J., Kalogeiton, Vicky, Medina-Suarez, Pablo, and Zisserman, Andrew
- Subjects
VIDEOS ,SOCIAL interaction ,SOCIAL networks ,BASE pairs ,TASK analysis ,MAGNETIC recording heads - Abstract
Capturing the ‘mutual gaze’ of people is essential for understanding and interpreting the social interactions between them. To this end, this paper addresses the problem of detecting people Looking At Each Other (LAEO) in video sequences. For this purpose, we propose LAEO-Net++, a new deep CNN for determining LAEO in videos. In contrast to previous works, LAEO-Net++ takes spatio-temporal tracks as input and reasons about the whole track. It consists of three branches, one for each character's tracked head and one for their relative position. Moreover, we introduce two new LAEO datasets: UCO-LAEO and AVA-LAEO. A thorough experimental evaluation demonstrates the ability of LAEO-Net++ to successfully determine if two people are LAEO and the temporal window where it happens. Our model achieves state-of-the-art results on the existing TVHID-LAEO video dataset, significantly outperforming previous approaches. Finally, we apply LAEO-Net++ to a social network, where we automatically infer the social relationship between pairs of people based on the frequency and duration that they LAEO, and show that LAEO can be a useful tool for guided search of human interactions in videos. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors.
- Subjects
ARTIFICIAL intelligence ,PERIODICAL publishing - Abstract
These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors.
- Subjects
ARTIFICIAL intelligence ,PERIODICAL publishing - Abstract
These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
16. Geometry-Guided Street-View Panorama Synthesis From Satellite Imagery.
- Author
-
Shi, Yujiao, Campbell, Dylan, Yu, Xin, and Li, Hongdong
- Subjects
REMOTE-sensing images ,PIXELS ,GENERATIVE adversarial networks ,PANORAMAS ,LANDSAT satellites ,GEOSTATIONARY satellites - Abstract
This paper presents a new approach for synthesizing a novel street-view panorama given a satellite image, as if captured from the geographical location at the center of the satellite image. Existing works approach this as an image generation problem, adopting generative adversarial networks to implicitly learn the cross-view transformations, but ignore the geometric constraints. In this paper, we make the geometric correspondences between the satellite and street-view images explicit so as to facilitate the transfer of information between domains. Specifically, we observe that when a 3D point is visible in both views, and the height of the point relative to the camera is known, there is a deterministic mapping between the projected points in the images. Motivated by this, we develop a novel satellite to street-view projection (S2SP) module which learns the height map and projects the satellite image to the ground-level viewpoint, explicitly connecting corresponding pixels. With these projected satellite images as input, we next employ a generator to synthesize realistic street-view panoramas that are geometrically consistent with the satellite images. Our S2SP module is differentiable and the whole framework is trained in an end-to-end manner. Extensive experimental results on two cross-view benchmark datasets demonstrate that our method generates more accurate and consistent images than existing approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond.
- Author
-
Liu, Risheng, Gao, Jiaxin, Zhang, Jin, Meng, Deyu, and Lin, Zhouchen
- Subjects
BILEVEL programming ,COMPUTER vision ,REINFORCEMENT learning ,AUTOMATIC differentiation ,VISUAL fields ,DEEP learning ,MACHINE learning - Abstract
Bi-Level Optimization (BLO) is originated from the area of economic game theory and then introduced into the optimization community. BLO is able to handle problems with a hierarchical structure, involving two levels of optimization tasks, where one task is nested inside the other. In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems, such as hyper-parameter optimization, multi-task and meta learning, neural architecture search, adversarial learning and deep reinforcement learning, actually all contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of BLO. Then we construct a best-response-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies, covering aspects ranging from fundamental automatic differentiation schemes to various accelerations, simplifications, extensions and their convergence and complexity properties. Last but not least, we discuss the potentials of our unified BLO framework for designing new algorithms and point out some promising directions for future research. A list of important papers discussed in this survey, corresponding codes, and additional resources on BLOs are publicly available at: https://github.com/vis-opt-group/BLO. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
18. Reversible Data Hiding By Using CNN Prediction and Adaptive Embedding.
- Author
-
Hu, Runwen and Xiang, Shijun
- Subjects
REVERSIBLE data hiding (Computer science) ,GLOBAL optimization ,FORECASTING - Abstract
In the field of reversible data hiding (RDH), how to predict an image and embed a message into the image with smaller distortion are two important aspects. In this paper, we propose a novel and efficient RDH method by innovating an intelligent predictor and an adaptive embedding way. In the prediction stage, we first constructed a convolutional neural network (CNN) based predictor by reasonably dividing an image into four parts. In such a way, each part can be predicted by using the other three parts as the context for the improvement of the prediction performance. Compared with existing predictors, the proposed CNN predictor can use more neighboring pixels for the prediction by exploiting its multi-receptive fields and global optimization capacities. In the embedding stage, we also developed a prediction-error-ordering (PEO) based adaptive embedding strategy, which can better adapt image content and thus efficiently reduce the embedding distortion by elaborately and luminously applying background complexity to select and pair those smaller prediction errors for data hiding. With the proposed CNN prediction and embedding ways, the RDH method presented in this paper provides satisfactory results in improving the visual quality of data hidden images, e.g., the average PSNR value for the Kodak benchmark dataset can reach as high as 63.59 dB with an embedding capacity of 10,000 bits. Extensive experimental results have shown that the RDH method proposed in this paper is superior to those existing state-of-the-art works. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Adaptive Graph Auto-Encoder for General Data Clustering.
- Author
-
Li, Xuelong, Zhang, Hongyuan, and Zhang, Rui
- Subjects
WEIGHTED graphs ,TASK analysis ,FUZZY clustering technique ,TANNER graphs - Abstract
Graph-based clustering plays an important role in the clustering area. Recent studies about graph neural networks (GNN) have achieved impressive success on graph-type data. However, in general clustering tasks, the graph structure of data does not exist such that GNN can not be applied to clustering directly and the strategy to construct a graph is crucial for performance. Therefore, how to extend GNN into general clustering tasks is an attractive problem. In this paper, we propose a graph auto-encoder for general data clustering, AdaGAE, which constructs the graph adaptively according to the generative perspective of graphs. The adaptive process is designed to induce the model to exploit the high-level information behind data and utilize the non-euclidean structure sufficiently. Importantly, we find that the simple update of the graph will result in severe degeneration, which can be concluded as better reconstruction means worse update. We provide rigorous analysis theoretically and empirically. Then we further design a novel mechanism to avoid the collapse. Via extending the generative graph models to general type data, a graph auto-encoder with a novel decoder is devised and the weighted graphs can be also applied to GNN. AdaGAE performs well and stably in different scale and type datasets. Besides, it is insensitive to the initialization of parameters and requires no pretraining. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Privacy Preserving Defense For Black Box Classifiers Against On-Line Adversarial Attacks.
- Author
-
Theagarajan, Rajkumar and Bhanu, Bir
- Subjects
DEEP learning ,PRIVACY ,IMAGE recognition (Computer vision) - Abstract
Deep learning models have been shown to be vulnerable to adversarial attacks. Adversarial attacks are imperceptible perturbations added to an image such that the deep learning model misclassifies the image with a high confidence. Existing adversarial defenses validate their performance using only the classification accuracy. However, classification accuracy by itself is not a reliable metric to determine if the resulting image is “adversarial-free”. This is a foundational problem for online image recognition applications where the ground-truth of the incoming image is not known and hence we cannot compute the accuracy of the classifier or validate if the image is “adversarial-free” or not. This paper proposes a novel privacy preserving framework for defending Black box classifiers from adversarial attacks using an ensemble of iterative adversarial image purifiers whose performance is continuously validated in a loop using Bayesian uncertainties. The proposed approach can convert a single-step black box adversarial defense into an iterative defense and proposes three novel privacy preserving Knowledge Distillation (KD) approaches that use prior meta-information from various datasets to mimic the performance of the Black box classifier. Additionally, this paper proves the existence of an optimal distribution for the purified images that can reach a theoretical lower bound, beyond which the image can no longer be purified. Experimental results on six public benchmark datasets namely: 1) Fashion-MNIST, 2) CIFAR-10, 3) GTSRB, 4) MIO-TCD, 5) Tiny-ImageNet, and 6) MS-Celeb show that the proposed approach can consistently detect adversarial examples and purify or reject them against a variety of adversarial attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Editorial.
- Abstract
Presents the introductory editorial for this issue of the publication. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Intrinsic Grassmann Averages for Online Linear, Robust and Nonlinear Subspace Learning.
- Author
-
Chakraborty, Rudrasis, Yang, Liu, Hauberg, Soren, and Vemuri, Baba C.
- Subjects
PRINCIPAL components analysis ,GRASSMANN manifolds ,MACHINE learning - Abstract
Principal component analysis (PCA) and Kernel principal component analysis (KPCA) are fundamental methods in machine learning for dimensionality reduction. The former is a technique for finding this approximation in finite dimensions and the latter is often in an infinite dimensional reproducing Kernel Hilbert-space (RKHS). In this paper, we present a geometric framework for computing the principal linear subspaces in both (finite and infinite) situations as well as for the robust PCA case, that amounts to computing the intrinsic average on the space of all subspaces: the Grassmann manifold. Points on this manifold are defined as the subspaces spanned by $K$ K -tuples of observations. The intrinsic Grassmann average of these subspaces are shown to coincide with the principal components of the observations when they are drawn from a Gaussian distribution. We show similar results in the RKHS case and provide an efficient algorithm for computing the projection onto the this average subspace. The result is a method akin to KPCA which is substantially faster. Further, we present a novel online version of the KPCA using our geometric framework. Competitive performance of all our algorithms are demonstrated on a variety of real and synthetic data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
23. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors.
- Subjects
ARTIFICIAL intelligence ,PERIODICAL publishing - Abstract
These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Cover 3.
- Subjects
PERIODICAL publishing - Abstract
These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. VideoDG: Generalizing Temporal Relations in Videos to Novel Domains.
- Author
-
Yao, Zhiyu, Wang, Yunbo, Wang, Jianmin, Yu, Philip S., and Long, Mingsheng
- Subjects
VIDEOS ,DATA augmentation ,GENERALIZATION - Abstract
This paper introduces video domain generalization where most video classification networks degenerate due to the lack of exposure to the target domains of divergent distributions. We observe that the global temporal features are less generalizable, due to the temporal domain shift that videos from other unseen domains may have an unexpected absence or misalignment of the temporal relations. This finding has motivated us to solve video domain generalization by effectively learning the local-relation features of different timescales that are more generalizable, and exploiting them along with the global-relation features to maintain the discriminability. This paper presents the VideoDG framework with two technical contributions. The first is a new deep architecture named the Adversarial Pyramid Network, which improves the generalizability of video features by capturing the local-relation, global-relation, and cross-relation features progressively. On the basis of pyramid features, the second contribution is a new and robust approach of adversarial data augmentation that can bridge different video domains by improving the diversity and quality of augmented data. We construct three video domain generalization benchmarks in which domains are divided according to different datasets, different consequences of actions, or different camera views, respectively. VideoDG consistently outperforms the combinations of previous video classification models and existing domain generalization methods on all benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. PSGAN++: Robust Detail-Preserving Makeup Transfer and Removal.
- Author
-
Liu, Si, Jiang, Wentao, Gao, Chen, He, Ran, Feng, Jiashi, Li, Bo, and Yan, Shuicheng
- Subjects
GENERATIVE adversarial networks ,REFERENCE sources - Abstract
In this paper, we address the makeup transfer and removal tasks simultaneously, which aim to transfer the makeup from a reference image to a source image and remove the makeup from the with-makeup image respectively. Existing methods have achieved much advancement in constrained scenarios, but it is still very challenging for them to transfer makeup between images with large pose and expression differences, or handle makeup details like blush on cheeks or highlight on the nose. In addition, they are hardly able to control the degree of makeup during transferring or to transfer a specified part in the input face. These defects limit the application of previous makeup transfer methods to real-world scenarios. In this work, we propose a Pose and expression robust Spatial-aware GAN (abbreviated as PSGAN++). PSGAN++ is capable of performing both detail-preserving makeup transfer and effective makeup removal. For makeup transfer, PSGAN++ uses a Makeup Distill Network (MDNet) to extract makeup information, which is embedded into spatial-aware makeup matrices. We also devise an Attentive Makeup Morphing (AMM) module that specifies how the makeup in the source image is morphed from the reference image, and a makeup detail loss to supervise the model within the selected makeup detail area. On the other hand, for makeup removal, PSGAN++ applies an Identity Distill Network (IDNet) to embed the identity information from with-makeup images into identity matrices. Finally, the obtained makeup/identity matrices are fed to a Style Transfer Network (STNet) that is able to edit the feature maps to achieve makeup transfer or removal. To evaluate the effectiveness of our PSGAN++, we collect a Makeup Transfer In the Wild (MT-Wild) dataset that contains images with diverse poses and expressions and a Makeup Transfer High-Resolution (MT-HR) dataset that contains high-resolution images. Experiments demonstrate that PSGAN++ not only achieves state-of-the-art results with fine makeup details even in cases of large pose/expression differences but also can perform partial or degree-controllable makeup transfer. Both the code and the newly collected datasets will be released at https://github.com/wtjiang98/PSGAN. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. Low-Rank Riemannian Optimization for Graph-Based Clustering Applications.
- Author
-
Douik, Ahmed and Hassibi, Babak
- Subjects
RIEMANNIAN manifolds ,RIEMANNIAN geometry ,STATISTICS ,STOCHASTIC matrices ,MACHINE learning ,PROBLEM solving - Abstract
With the abundance of data, machine learning applications engaged increased attention in the last decade. An attractive approach to robustify the statistical analysis is to preprocess the data through clustering. This paper develops a low-complexity Riemannian optimization framework for solving optimization problems on the set of positive semidefinite stochastic matrices. The low-complexity feature of the proposed algorithms stems from the factorization of the optimization variable $\mathbf {X}=\mathbf {Y}\mathbf {Y}^{\mathrm{T}}$ X = Y Y T and deriving conditions on the number of columns of $\mathbf {Y}$ Y under which the factorization yields a satisfactory solution. The paper further investigates the embedded and quotient geometries of the resulting Riemannian manifolds. In particular, the paper explicitly derives the tangent space, Riemannian gradients and Hessians, and a retraction operator allowing the design of efficient first and second-order optimization methods for the graph-based clustering applications of interest. The numerical results reveal that the resulting algorithms present a clear complexity advantage as compared with state-of-the-art euclidean and Riemannian approaches for graph clustering applications. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. State of the Journal.
- Author
-
Dickinson, Sven
- Subjects
SCHOLARLY publishing ,PERIODICAL publishing ,PERIODICAL articles - Abstract
An introduction is presented in which the editor discusses the status of the journal in which it faces the challenge to reduce the average time from paper submission to paper acceptance.
- Published
- 2018
- Full Text
- View/download PDF
29. Cover 3.
- Subjects
PERIODICAL publishing - Abstract
These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. Instance-Dependent Positive and Unlabeled Learning With Labeling Bias Estimation.
- Author
-
Gong, Chen, Wang, Qizhou, Liu, Tongliang, Han, Bo, You, Jane, Yang, Jian, and Tao, Dacheng
- Subjects
ESTIMATION bias ,MAXIMUM likelihood statistics ,MATHEMATICAL optimization ,RANDOM variables ,PRODUCTION scheduling - Abstract
This paper studies instance-dependent Positive and Unlabeled (PU) classification, where whether a positive example will be labeled (indicated by $s$ s ) is not only related to the class label $y$ y , but also depends on the observation $\mathbf {x}$ x . Therefore, the labeling probability on positive examples is not uniform as previous works assumed, but is biased to some simple or critical data points. To depict the above dependency relationship, a graphical model is built in this paper which further leads to a maximization problem on the induced likelihood function regarding $P(s,y|\mathbf {x})$ P (s , y | x) . By utilizing the well-known EM and Adam optimization techniques, the labeling probability of any positive example $P(s=1|y=1,\mathbf {x})$ P (s = 1 | y = 1 , x) as well as the classifier induced by $P(y|\mathbf {x})$ P (y | x) can be acquired. Theoretically, we prove that the critical solution always exists, and is locally unique for linear model if some sufficient conditions are met. Moreover, we upper bound the generalization error for both linear logistic and non-linear network instantiations of our algorithm, with the convergence rate of expected risk to empirical risk as $\mathcal {O}(1/\sqrt{k}+1/\sqrt{n-k}+1/\sqrt{n})$ O (1 / k + 1 / n - k + 1 / n) ($k$ k and $n$ n are the sizes of positive set and the entire training set, respectively). Empirically, we compare our method with state-of-the-art instance-independent and instance-dependent PU algorithms on a wide range of synthetic, benchmark and real-world datasets, and the experimental results firmly demonstrate the advantage of the proposed method over the existing PU approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. Heterogeneous Hypergraph Variational Autoencoder for Link Prediction.
- Author
-
Fan, Haoyi, Zhang, Fengbin, Wei, Yuxuan, Li, Zuoyong, Zou, Changqing, Gao, Yue, and Dai, Qionghai
- Subjects
PROBABILISTIC generative models ,RECOMMENDER systems ,INFORMATION networks ,MULTILEVEL models ,FORECASTING ,DEEP learning - Abstract
Link prediction aims at inferring missing links or predicting future ones based on the currently observed network. This topic is important for many applications such as social media, bioinformatics and recommendation systems. Most existing methods focus on homogeneous settings and consider only low-order pairwise relations while ignoring either the heterogeneity or high-order complex relations among different types of nodes, which tends to lead to a sub-optimal embedding result. This paper presents a method named Heterogeneous Hypergraph Variational Autoencoder (HeteHG-VAE) for link prediction in heterogeneous information networks (HINs). It first maps a conventional HIN to a heterogeneous hypergraph with a certain kind of semantics to capture both the high-order semantics and complex relations among nodes, while preserving the low-order pairwise topology information of the original HIN. Then, deep latent representations of nodes and hyperedges are learned by a Bayesian deep generative framework from the heterogeneous hypergraph in an unsupervised manner. Moreover, a hyperedge attention module is designed to learn the importance of different types of nodes in each hyperedge. The major merit of HeteHG-VAE lies in its ability of modeling multi-level relations in heterogeneous settings. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Differentiated Explanation of Deep Neural Networks With Skewed Distributions.
- Author
-
Fu, Weijie, Wang, Meng, Du, Mengnan, Liu, Ninghao, Hao, Shijie, and Hu, Xia
- Subjects
SKEWNESS (Probability theory) ,EXPLANATION - Abstract
Over the last decade, deep neural networks (DNNs) are regarded as black-box methods, and their decisions are criticized for the lack of explainability. Existing attempts based on local explanations offer each input a visual saliency map, where the supporting features that contribute to the decision are emphasized with high relevance scores. In this paper, we improve the saliency map based on differentiated explanations, of which the saliency map not only distinguishes the supporting features from backgrounds but also shows the different degrees of importance of the various parts within the supporting features. To do this, we propose to learn a differentiated relevance estimator called DRE, where a carefully-designed distribution controller is introduced to guide the relevance scores towards right-skewed distributions. DRE can be directly optimized under pure classification losses, enabling higher faithfulness of explanations and avoiding non-trivial hyper-parameter tuning. The experimental results on three real-world datasets demonstrate that our differentiated explanations significantly improve the faithfulness with high explainability. Our code and trained models are available at https://github.com/fuweijie/DRE. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Not All Samples are Trustworthy: Towards Deep Robust SVP Prediction.
- Author
-
Xu, Qianqian, Yang, Zhiyong, Jiang, Yangbangyan, Cao, Xiaochun, Yao, Yuan, and Huang, Qingming
- Subjects
QUALITY control ,FORECASTING ,NOISE measurement ,COMPUTER vision ,OUTLIER detection ,CROWDSOURCING - Abstract
In this paper, we study the problem of estimating subjective visual properties (SVP) for images, which is an emerging task in Computer Vision. Generally speaking, collecting SVP datasets involves a crowdsourcing process where annotations are obtained from a wide range of online users. Since the process is done without quality control, SVP datasets are known to suffer from noise. This leads to the issue that not all samples are trustworthy. Facing this problem, we need to develop robust models for learning SVP from noisy crowdsourced annotations. In this paper, we construct two general robust learning frameworks for this application. Specifically, in the first framework, we propose a probabilistic framework to explicitly model the sparse unreliable patterns that exist in the dataset. It is noteworthy that we then provide an alternative framework that could reformulate the sparse unreliable patterns as a “contraction” operation over the original loss function. The latter framework leverages not only efficient end-to-end training but also rigorous theoretical analyses. To apply these frameworks, we further provide two models as implementations of the frameworks, where the sparse noise parameters could be interpreted with the HodgeRank theory. Finally, extensive theoretical and empirical studies show the effectiveness of our proposed framework. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. Geometry-Aware Generation of Adversarial Point Clouds.
- Author
-
Wen, Yuxin, Lin, Jiehong, Chen, Ke, Chen, C. L. Philip, and Jia, Kui
- Subjects
POINT cloud ,SURFACE roughness ,ORTHOGONAL matching pursuit ,SURFACE properties ,THREE-dimensional imaging ,SOURCE code - Abstract
Machine learning models have been shown to be vulnerable to adversarial examples. While most of the existing methods for adversarial attack and defense work on the 2D image domain, a few recent attempts have been made to extend them to 3D point cloud data. However, adversarial results obtained by these methods typically contain point outliers, which are both noticeable and easy to defend against using the simple techniques of outlier removal. Motivated by the different mechanisms by which humans perceive 2D images and 3D shapes, in this paper we propose the new design of geometry-aware objectives, whose solutions favor (the discrete versions of) the desired surface properties of smoothness and fairness. To generate adversarial point clouds, we use a targeted attack misclassification loss that supports continuous pursuit of increasingly malicious signals. Regularizing the targeted attack loss with our proposed geometry-aware objectives results in our proposed method, Geometry-Aware Adversarial Attack ($GeoA^3$ G e o A 3 ). The results of $GeoA^3$ G e o A 3 tend to be more harmful, arguably harder to defend against, and of the key adversarial characterization of being imperceptible to humans. While the main focus of this paper is to learn to generate adversarial point clouds, we also present a simple but effective algorithm termed $Geo_{+}A^3$ G e o + A 3 -IterNormPro, with Iterative Normal Projection (IterNorPro) that solves a new objective function $Geo_{+}A^3$ G e o + A 3 , towards surface-level adversarial attacks via generation of adversarial point clouds. We quantitatively evaluate our methods on both synthetic and physical objects in terms of attack success rate and geometric regularity. For a qualitative evaluation, we conduct subjective studies by collecting human preferences from Amazon Mechanical Turk. Comparative results in comprehensive experiments confirm the advantages of our proposed methods. Our source codes are publicly available at https://github.com/Yuxin-Wen/GeoA3. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. Guest Editorial: Special Section on CVPR 2014.
- Author
-
Basri, Ronen, Fermuller, Cornelia, Martinez, Aleix M., and Vidal, Rene
- Subjects
COMPUTER vision ,NEWSPAPER sections, columns, etc. ,PERIODICAL articles ,LIGHT sources ,ALGORITHMS - Abstract
The papers in this special section were presented at the IEEE Computer Vision and Pattern Recognition (CVPR), June, 2014, jointly sponsored by the IEEE and the Computer Vision Foundation. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
36. Affine Invariants of Vector Fields.
- Author
-
Kostkova, Jitka, Suk, Tomas, and Flusser, Jan
- Subjects
DIGITAL images ,FLUID mechanics ,PATTERN matching - Abstract
Vector fields are a special kind of multidimensional data, which are in a certain sense similar to digital color images, but are distinct from them in several aspects. In each pixel, the field is assigned to a vector that shows the direction and the magnitude of the quantity, which has been measured. To detect the patterns of interest in the field, special matching methods must be developed. In this paper, we propose a method for the description and matching of vector field patterns under an unknown affine transformation of the field. Unlike digital images, transformations of vector fields act not only on the spatial coordinates but also on the field values, which makes the detection different from the image case. To measure the similarity between the template and the field patch, we propose original invariants with respect to total affine transformation. They are designed from the vector field moments. It is demonstrated by experiments on real data from fluid mechanics that they perform significantly better than potential competitors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
37. Guest Editors’ Introduction to the Special Section on Compact and Efficient Feature Representation and Learning in Computer Vision.
- Author
-
Liu, Li, Pietikainen, Matti, Chen, Jie, Zhao, Guoying, Wang, Xiaogang, and Chellappa, Rama
- Subjects
COMPUTER vision ,IMAGE representation ,INFORMATION science ,BIOMETRIC identification ,ARTIFICIAL intelligence ,COMPUTERS - Abstract
The article offers information related to compact and efficient feature representation and learning in computer vision. It mentions the problems regarding representation and learning of computer vision such as image classification, object recognition, action recognition, and object tracking and also mentions its goal to publish high quality papers that bring a clear picture of the state of the art along this direction.
- Published
- 2019
- Full Text
- View/download PDF
38. The Perils and Pitfalls of Block Design for EEG Classification Experiments.
- Author
-
Li, Ren, Johansen, Jared S., Ahmed, Hamad, Ilyevsky, Thomas V., Wilbur, Ronnie B., Bharadwaj, Hari M., and Siskind, Jeffrey Mark
- Abstract
A recent paper claims to classify brain processing evoked in subjects watching ImageNet stimuli as measured with EEG and to employ a representation derived from this processing to construct a novel object classifier. That paper, together with a series of subsequent papers , , , , , , , claims to achieve successful results on a wide variety of computer-vision tasks, including object classification, transfer learning, and generation of images depicting human perception and thought using brain-derived representations measured through EEG. Our novel experiments and analyses demonstrate that their results crucially depend on the block design that they employ, where all stimuli of a given class are presented together, and fail with a rapid-event design, where stimuli of different classes are randomly intermixed. The block design leads to classification of arbitrary brain states based on block-level temporal correlations that are known to exist in all EEG data, rather than stimulus-related activity. Because every trial in their test sets comes from the same block as many trials in the corresponding training sets, their block design thus leads to classifying arbitrary temporal artifacts of the data instead of stimulus-related activity. This invalidates all subsequent analyses performed on this data in multiple published papers and calls into question all of the reported results. We further show that a novel object classifier constructed with a random codebook performs as well as or better than a novel object classifier constructed with the representation extracted from EEG data, suggesting that the performance of their classifier constructed with a representation extracted from EEG data does not benefit from the brain-derived representation. Together, our results illustrate the far-reaching implications of the temporal autocorrelations that exist in all neuroimaging data for classification experiments. Further, our results calibrate the underlying difficulty of the tasks involved and caution against overly optimistic, but incorrect, claims to the contrary. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. Cover 3.
- Subjects
PERIODICAL publishing - Abstract
These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos.
- Author
-
Yuan, Yitian, Ma, Lin, Wang, Jingwen, Liu, Wei, and Zhu, Wenwu
- Subjects
TASK analysis - Abstract
Temporal sentence grounding in videos aims to localize one target video segment, which semantically corresponds to a given sentence. Unlike previous methods mainly focusing on matching semantics between the sentence and different video segments, in this paper, we propose a novel semantic conditioned dynamic modulation (SCDM) mechanism, which leverages the sentence semantics to modulate the temporal convolution operations for better correlating and composing the sentence-relevant video contents over time. The proposed SCDM also performs dynamically with respect to the diverse video contents so as to establish a precise semantic alignment between sentence and video. By coupling the proposed SCDM with a hierarchical temporal convolutional architecture, video segments with various temporal scales are composed and localized. Besides, more fine-grained clip-level actionness scores are also predicted with the SCDM-coupled temporal convolution on the bottom layer of the overall architecture, which are further used to adjust the temporal boundaries of the localized segments and thereby lead to more accurate grounding results. Experimental results on benchmark datasets demonstrate that the proposed model can improve the temporal grounding accuracy consistently, and further investigation experiments also illustrate the advantages of SCDM on stabilizing the model training and associating relevant video contents for temporal sentence grounding. Our code for this paper is available at https://github.com/yytzsy/SCDM-TPAMI. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
41. Retrieving Similar Styles to Parse Clothing.
- Author
-
Yamaguchi, Kota, Kiapour, M. Hadi, Ortiz, Luis E., and Berg, Tamara L.
- Subjects
CLOTHING & dress ,DATABASES ,FASHION ,POSE estimation (Computer vision) ,T-shirts ,SWEATSHIRTS ,SNEAKERS - Abstract
Clothing recognition is a societally and commercially important yet extremely challenging problem due to large variations in clothing appearance, layering, style, and body shape and pose. In this paper, we tackle the clothing parsing problem using a retrieval-based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to recognize clothing items in the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on the fly from retrieved examples, and transferred parse-masks (Paper Doll item transfer) from retrieved examples. We evaluate our approach extensively and show significant improvements over previous state-of-the-art for both localization (clothing parsing given weak supervision in the form of tags) and detection (general clothing parsing). Our experimental results also indicate that the general pose estimation problem can benefit from clothing parsing. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
42. Cover 3.
- Subjects
PERIODICAL publishing - Abstract
These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. Learning Visual Instance Retrieval from Failure: Efficient Online Local Metric Adaptation from Negative Samples.
- Author
-
Zhou, Jiahuan and Wu, Ying
- Subjects
VISUAL learning ,FEATURE extraction ,IMAGE retrieval ,GLOBAL method of teaching - Abstract
Existing visual instance retrieval (VIR) approaches attempt to learn a faithful global matching metric or discriminative feature embedding offline to cover enormous visual appearance variations, so as to directly use it online on various unseen probes for retrieval. However, their requirement for a huge set of positive training pairs is very demanding in practice and the performance is largely constrained for the unseen testing samples due to the severe data shifting issue. In contrast, this paper advocates a different paradigm: part of the learning can be performed online but with nominal costs, so as to achieve online metric adaptation for different query probes. By exploiting easily-available negative samples, we propose a novel solution to achieve the optimal local metric adaptation effectively and efficiently. The insight of our method is the local hard negative samples can actually provide tight constraints to fine tune the metric locally. Our local metric adaptation method is generally applicable to be used on top of any offline-learned baselines. In addition, this paper gives in-depth theoretical analyses of the proposed method to guarantee the reduction of the classification error both asymptotically and practically. Extensive experiments on various VIR tasks have confirmed our effectiveness and superiority. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
44. Guest Editors' Introduction to the Special Section on Computational Photography.
- Author
-
Chakrabarti, Ayan, Sunkavalli, Kalyan, and Forsyth, David A.
- Subjects
COMPUTATIONAL photography ,OPTICAL flow ,ARTIFICIAL intelligence ,PIXELS ,COMPUTER graphics - Abstract
An introduction is presented in which the editor discusses articles in the issue on the topics including introduce novel computational methods that exploit non-traditional sensors; propose novel sensors and acquisition systems; and leverage visual measurements made by unconventional sensors.
- Published
- 2020
- Full Text
- View/download PDF
45. Guest Editorial: Non-Euclidean Machine Learning.
- Author
-
Zafeiriou, Stefanos, Bronstein, Michael, Cohen, Taco, Vinyals, Oriol, Song, Le, Leskovec, Jure, Lio, Pietro, Bruna, Joan, and Gori, Marco
- Subjects
MACHINE learning ,DEEP learning ,ARTIFICIAL intelligence ,COMPUTER vision ,COMPUTATIONAL intelligence ,COMPUTER engineering - Abstract
An editorial is presented on addressing the need for bringing together leading efforts in non-Euclidean deep learning across all communities. Topics include Convolutional Neural Networks (CNNs) relying on classical signal processing models limiting the applicability to data with underlying Euclidean grid-like structure; and multiple levels of abstraction being introduced similarly to multi-layer networks.
- Published
- 2022
- Full Text
- View/download PDF
46. Learnable Pooling in Graph Convolutional Networks for Brain Surface Analysis.
- Author
-
Gopinath, Karthik, Desrosiers, Christian, and Lombaert, Herve
- Subjects
SURFACE analysis ,NON-Euclidean geometry ,ALZHEIMER'S disease ,THREE-dimensional imaging ,MACHINE learning - Abstract
Brain surface analysis is essential to neuroscience, however, the complex geometry of the brain cortex hinders computational methods for this task. The difficulty arises from a discrepancy between 3D imaging data, which is represented in Euclidean space, and the non-Euclidean geometry of the highly-convoluted brain surface. Recent advances in machine learning have enabled the use of neural networks for non-Euclidean spaces. These facilitate the learning of surface data, yet pooling strategies often remain constrained to a single fixed-graph. This paper proposes a new learnable graph pooling method for processing multiple surface-valued data to output subject-based information. The proposed method innovates by learning an intrinsic aggregation of graph nodes based on graph spectral embedding. We illustrate the advantages of our approach with in-depth experiments on two large-scale benchmark datasets. The ablation study in the paper illustrates the impact of various factors affecting our learnable pooling method. The flexibility of the pooling strategy is evaluated on four different prediction tasks, namely, subject-sex classification, regression of cortical region sizes, classification of Alzheimer’s disease stages, and brain age regression. Our experiments demonstrate the superiority of our learnable pooling approach compared to other pooling techniques for graph convolutional networks, with results improving the state-of-the-art in brain surface analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. MannequinChallenge: Learning the Depths of Moving People by Watching Frozen People.
- Author
-
Li, Zhengqi, Dekel, Tali, Cole, Forrester, Tucker, Richard, Snavely, Noah, Liu, Ce, and Freeman, William T.
- Subjects
PARALLAX ,MONOCULARS ,STREAMING video & television ,CAMERAS ,IMAGE reconstruction - Abstract
We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving (right). Existing methods for recovering depth for dynamic, non-rigid objects from monocular video impose strong assumptions on the objects’ motion and may only recover sparse depth. In this paper, we take a data-driven approach and learn human depth priors from a new source of data: thousands of Internet videos of people imitating mannequins, i.e., freezing in diverse, natural poses, while a hand-held camera tours the scene (left). Because people are stationary, geometric constraints hold, thus training data can be generated using multi-view stereo reconstruction. At inference time, our method uses motion parallax cues from the static areas of the scenes to guide the depth prediction. We evaluate our method on real-world sequences of complex human actions captured by a moving hand-held camera, show improvement over state-of-the-art monocular depth prediction methods, and demonstrate various 3D effects produced using our predicted depth. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
48. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey.
- Author
-
Jing, Longlong and Tian, Yingli
- Subjects
VISUAL learning ,SUPERVISED learning ,DEEP learning ,COMPUTER vision - Abstract
Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used datasets for images, videos, audios, and 3D data, as well as the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
49. Adversarial Attacks on Time Series.
- Author
-
Karim, Fazle, Majumdar, Somshubra, and Darabi, Houshang
- Subjects
TIME series analysis ,NEAREST neighbor analysis (Statistics) ,DEEP learning - Abstract
Time series classification models have been garnering significant importance in the research community. However, not much research has been done on generating adversarial samples for these models. These adversarial samples can become a security concern. In this paper, we propose utilizing an adversarial transformation network (ATN) on a distilled model to attack various time series classification models. The proposed attack on the classification model utilizes a distilled model as a surrogate that mimics the behavior of the attacked classical time series classification models. Our proposed methodology is applied onto 1-nearest neighbor dynamic time warping (1-NN DTW) and a fully convolutional network (FCN), all of which are trained on 42 University of California Riverside (UCR) datasets. In this paper, we show both models were susceptible to attacks on all 42 datasets. When compared to Fast Gradient Sign Method, the proposed attack generates a larger faction of successful adversarial black-box attacks. A simple defense mechanism is successfully devised to reduce the fraction of successful adversarial samples. Finally, we recommend future researchers that develop time series classification models to incorporating adversarial data samples into their training data sets to improve resilience on adversarial samples. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
50. Auto-Pytorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL.
- Author
-
Zimmer, Lucas, Lindauer, Marius, and Hutter, Frank
- Subjects
DEEP learning ,COMPUTER architecture ,MACHINE learning ,TASK analysis - Abstract
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, a recent trend in AutoML is to focus on neural architecture search. In this paper, we introduce Auto-PyTorch, which brings together the best of these two worlds by jointly and robustly optimizing the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL). Auto-PyTorch achieves state-of-the-art performance on several tabular benchmarks by combining multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs) and common baselines for tabular data. To thoroughly study our assumptions on how to design such an AutoDL system, we additionally introduce a new benchmark on learning curves for DNNs, dubbed LCBench, and run extensive ablation studies of the full Auto-PyTorch on typical AutoML benchmarks, eventually showing that Auto-PyTorch performs better than several state-of-the-art competitors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.