Author: "So, Kanazawa" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"So, Kanazawa"' showing total 51,098 results

Start Over Author "So, Kanazawa"

51,098 results on '"So, Kanazawa"'

1. Decentralized Diffusion Models

Author: McAllister, David, Tancik, Matthew, Song, Jiaming, and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Machine Learning
Abstract: Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic clusters can support, driving up infrastructure costs and straining power systems. We propose Decentralized Diffusion Models, a scalable framework for distributing diffusion model training across independent clusters or datacenters by eliminating the dependence on a centralized, high-bandwidth networking fabric. Our method trains a set of expert diffusion models over partitions of the dataset, each in full isolation from one another. At inference time, the experts ensemble through a lightweight router. We show that the ensemble collectively optimizes the same objective as a single model trained over the whole dataset. This means we can divide the training burden among a number of "compute islands," lowering infrastructure costs and improving resilience to localized GPU failures. Decentralized diffusion models empower researchers to take advantage of smaller, more cost-effective and more readily available compute like on-demand GPU nodes rather than central integrated systems. We conduct extensive experiments on ImageNet and LAION Aesthetics, showing that decentralized diffusion models FLOP-for-FLOP outperform standard diffusion models. We finally scale our approach to 24 billion parameters, demonstrating that high-quality diffusion models can now be trained with just eight individual GPU nodes in less than a week., Comment: Project webpage: https://decentralizeddiffusion.github.io/
Published: 2025

2. Non-Fermi liquid transport and strong mass enhancement near the nematic quantum critical point in FeSe$_x$Te$_{1-x}$ thin films

Author: Sato, Yuki, Nagahama, Soma, Belopolski, Ilya, Yoshimi, Ryutaro, Kawamura, Minoru, Tsukazaki, Atsushi, Yamada, Akiyoshi, Tokunaga, Masashi, Kanazawa, Naoya, Takahashi, Kei S., Onuki, Yoshichika, Kawasaki, Masashi, and Tokura, Yoshinori
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Superconductivity
Abstract: Unconventional superconductivity is often accompanied by non-Fermi liquid (NFL) behavior, which emerges near a quantum critical point (QCP) - a point where an electronic ordered phase is terminated at absolute zero under non-thermal parameters. While nematic orders, characterized by broken rotational symmetry, are sometimes found in unconventional superconductors, the role of nematic fluctuations in driving NFL transport behavior remains unclear. Here, we investigated electrical and thermoelectric transport properties in FeSe$_x$Te$_{1-x}$ thin films and observed hallmark NFL behavior: temperature-linear resistivity and logarithmic divergence of thermoelectricity at low temperatures. Notably, the thermoelectricity peaks sharply at the nematic QCP ($x$ = 0.45), highlighting the dominant role of nematic fluctuations in the NFL transport. Furthermore, we found that the pair-breaking mechanisms in the superconducting phase crosses over from orbital- to Pauli-limited effects, indicating the mass enhancement near the nematic critical regime. These findings reveal the profound impact of nematic fluctuations on both normal-state transport and superconducting properties., Comment: 20 pages, 4 figures, and supplemental material
Published: 2024

3. Reconstructing People, Places, and Cameras

Author: Müller, Lea, Choi, Hongsuk, Zhang, Anthony, Yi, Brent, Malik, Jitendra, and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present "Humans and Structure from Motion" (HSfM), a method for jointly reconstructing multiple human meshes, scene point clouds, and camera parameters in a metric world coordinate system from a sparse set of uncalibrated multi-view images featuring people. Our approach combines data-driven scene reconstruction with the traditional Structure-from-Motion (SfM) framework to achieve more accurate scene reconstruction and camera estimation, while simultaneously recovering human meshes. In contrast to existing scene reconstruction and SfM methods that lack metric scale information, our method estimates approximate metric scale by leveraging a human statistical model. Furthermore, it reconstructs multiple human meshes within the same world coordinate system alongside the scene point cloud, effectively capturing spatial relationships among individuals and their positions in the environment. We initialize the reconstruction of humans, scenes, and cameras using robust foundational models and jointly optimize these elements. This joint optimization synergistically improves the accuracy of each component. We compare our method to existing approaches on two challenging benchmarks, EgoHumans and EgoExo4D, demonstrating significant improvements in human localization accuracy within the world coordinate frame (reducing error from 3.51m to 1.04m in EgoHumans and from 2.9m to 0.56m in EgoExo4D). Notably, our results show that incorporating human data into the SfM pipeline improves camera pose estimation (e.g., increasing RRA@15 by 20.3% on EgoHumans). Additionally, qualitative results show that our approach improves overall scene reconstruction quality. Our code is available at: muelea.github.io/hsfm., Comment: Project website: muelea.github.io/hsfm
Published: 2024

4. Locomotion on a lubricating fluid with spatial viscosity variations

Author: Kanazawa, Takahiro and Ishimoto, Kenta
Subjects: Physics - Fluid Dynamics, Condensed Matter - Soft Condensed Matter, Physics - Biological Physics
Abstract: We studied locomotion of a crawler on a thin Newtonian fluid film whose viscosity varied spatially. We first derived a general locomotion velocity formula with fluid viscosity variations via the lubrication theory. For further analysis, the surface of the crawler was described by a combination of transverse and longitudinal travelling waves and we analysed the time-averaged locomotion behaviours under two scenarios: (i) a sharp viscosity interface and (ii) a linear viscosity gradient. Using the asymptotic expansions of small surface deformations and the method of multiple time-scale analysis, we derived an explicit form of the average velocity that captures nonlinear, accumulative interactions between the crawler and the spatially varying environment. (i) In the case of a viscosity interface, the time-averaged speed of the crawler is always slower than that in the uniform viscosity, for both the transverse and longitudinal wave cases. Notably, the speed reduction is most significant when the crawler's front enters a more viscous layer and the crawler's rear exits from the same layer. (ii) In the case of a viscosity gradient, the crawler's speed becomes slower for the transverse wave, while for the longitudinal wave, the corrections are of a higher order compared with the uniform viscosity case. As an application of the derived locomotion velocity formula, we also analysed the impacts of a substrate topography to the average speed. Our analysis illustrates the fundamental importance of interactions between a locomotor and its environment, and separating the time scale behind the locomotion., Comment: 26 pages, 8 figures
Published: 2024

5. MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos

Author: Li, Zhengqi, Tucker, Richard, Cole, Forrester, Wang, Qianqian, Jin, Linyi, Ye, Vickie, Kanazawa, Angjoo, Holynski, Aleksander, and Snavely, Noah
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a system that allows for accurate, fast, and robust estimation of camera parameters and depth maps from casual monocular videos of dynamic scenes. Most conventional structure from motion and monocular SLAM techniques assume input videos that feature predominantly static scenes with large amounts of parallax. Such methods tend to produce erroneous estimates in the absence of these conditions. Recent neural network-based approaches attempt to overcome these challenges; however, such methods are either computationally expensive or brittle when run on dynamic videos with uncontrolled camera motion or unknown field of view. We demonstrate the surprising effectiveness of a deep visual SLAM framework: with careful modifications to its training and inference schemes, this system can scale to real-world videos of complex dynamic scenes with unconstrained camera paths, including videos with little camera parallax. Extensive experiments on both synthetic and real videos demonstrate that our system is significantly more accurate and robust at camera pose and depth estimation when compared with prior and concurrent work, with faster or comparable running times. See interactive results on our project page: https://mega-sam.github.io/, Comment: Project page: https://mega-sam.github.io/
Published: 2024

6. Does the square-root price impact law belong to the strict universal scalings?: quantitative support by a complete survey of the Tokyo stock exchange market

Author: Sato, Yuki and Kanazawa, Kiyoshi
Subjects: Quantitative Finance - Trading and Market Microstructure, Condensed Matter - Statistical Mechanics, Economics - General Economics, Quantitative Finance - Portfolio Management, Quantitative Finance - Risk Management
Abstract: Universal power laws have been scrutinised in physics and beyond, and a long-standing debate exists in econophysics regarding the strict universality of the nonlinear price impact, commonly referred to as the square-root law (SRL). The SRL posits that the average price impact $I$ follows a power law with respect to transaction volume $Q$, such that $I(Q) \propto Q^{\delta}$ with $\delta \approx 1/2$. Some researchers argue that the exponent $\delta$ should be system-specific, without universality. Conversely, others contend that $\delta$ should be exactly $1/2$ for all stocks across all countries, implying universality. However, resolving this debate requires high-precision measurements of $\delta$ with errors of around $0.1$ across hundreds of stocks, which has been extremely challenging due to the scarcity of large microscopic datasets -- those that enable tracking the trading behaviour of all individual accounts. Here we conclusively support the universality hypothesis of the SRL by a complete survey of all trading accounts for all liquid stocks on the Tokyo Stock Exchange (TSE) over eight years. Using this comprehensive microscopic dataset, we show that the exponent $\delta$ is equal to $1/2$ within statistical errors at both the individual stock level and the individual trader level. Additionally, we rejected two prominent models supporting the nonuniversality hypothesis: the Gabaix-Gopikrishnan-Plerou-Stanley and the Farmer-Gerig-Lillo-Waelbroeck models. Our work provides exceptionally high-precision evidence for the universality hypothesis in social science and could prove useful in evaluating the price impact by large investors -- an important topic even among practitioners., Comment: 28 pages, 16 figures
Published: 2024

7. Remote Life Support Robot Interface System for Global Task Planning and Local Action Expansion Using Foundation Models

Author: Obinata, Yoshiki, Jia, Haoyu, Kawaharazuka, Kento, Kanazawa, Naoaki, and Okada, Kei
Subjects: Computer Science - Robotics
Abstract: Robot systems capable of executing tasks based on language instructions have been actively researched. It is challenging to convey uncertain information that can only be determined on-site with a single language instruction to the robot. In this study, we propose a system that includes ambiguous parts as template variables in language instructions to communicate the information to be collected and the options to be presented to the robot for predictable uncertain events. This study implements prompt generation for each robot action function based on template variables to collect information, and a feedback system for presenting and selecting options based on template variables for user-to-robot communication. The effectiveness of the proposed system was demonstrated through its application to real-life support tasks performed by the robot., Comment: Accepted to 2024 IEEE-RAS International Conference on Humanoids Robots (Humanoids 2024)
Published: 2024

8. Wormhole-Induced ALP Dark Matter

Author: Cheong, Dhong Yeon, Hamaguchi, Koichi, Kanazawa, Yoshiki, Lee, Sung Mook, Nagata, Natsumi, and Park, Seong Chan
Subjects: High Energy Physics - Phenomenology, Astrophysics - Cosmology and Nongalactic Astrophysics, General Relativity and Quantum Cosmology, High Energy Physics - Theory
Abstract: Non-perturbative gravitational effects induce explicit global symmetry breaking terms within axion models. These exponentially suppressed terms in the potential give a mass contribution to the axion-like particles (ALPs). In this work we investigate this scenario with a scalar field charged under a global $U(1)$ symmetry and having a non-minimal coupling to gravity. Given the exponential dependence, the ALP can retain a mass spanning a wide range, which can act as a dark matter component. We specify pre-inflationary and post-inflationary production mechanisms of these ALPs, with the former from the misalignment mechanism and the latter from both the misalignment and cosmic-string decay. We identify the allowed parameter ranges that explain the dark matter abundance for both a general inflation case and a case where the radial mode scalar drives inflation, each in metric and Palatini formalisms. We show that the ALP can be the dominant component of the dark matter in a wide range of its mass, $m_{a} \in [10^{-21}~\mathrm{eV},\, \mathrm{TeV}]$, depending on the inflationary scenario and the $U(1)$ breaking scale. These results indicate that ALPs can be responsible for our dark matter abundance within a setup purely from non-perturbative gravitational effects., Comment: 25 pages, 5 figures, 1 table
Published: 2024

9. SOAR: Self-Occluded Avatar Recovery from a Single Video In the Wild

Author: Pan, Zhuoyang, Kanazawa, Angjoo, and Gao, Hang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Self-occlusion is common when capturing people in the wild, where the performer do not follow predefined motion scripts. This challenges existing monocular human reconstruction systems that assume full body visibility. We introduce Self-Occluded Avatar Recovery (SOAR), a method for complete human reconstruction from partial observations where parts of the body are entirely unobserved. SOAR leverages structural normal prior and generative diffusion prior to address such an ill-posed reconstruction problem. For structural normal prior, we model human with an reposable surfel model with well-defined and easily readable shapes. For generative diffusion prior, we perform an initial reconstruction and refine it using score distillation. On various benchmarks, we show that SOAR performs favorably than state-of-the-art reconstruction and generation methods, and on-par comparing to concurrent works. Additional video results and code are available at https://soar-avatar.github.io/.
Published: 2024

10. Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization

Author: Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Okada, Kei, and Inaba, Masayuki
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: State recognition of the environment and objects, such as the open/closed state of doors and the on/off of lights, is indispensable for robots that perform daily life support and security tasks. Until now, state recognition methods have been based on training neural networks from manual annotations, preparing special sensors for the recognition, or manually programming to extract features from point clouds or raw images. In contrast, we propose a robotic state recognition method using a pre-trained vision-language model, which is capable of Image-to-Text Retrieval (ITR) tasks. We prepare several kinds of language prompts in advance, calculate the similarity between these prompts and the current image by ITR, and perform state recognition. By applying the optimal weighting to each prompt using black-box optimization, state recognition can be performed with higher accuracy. Experiments show that this theory enables a variety of state recognitions by simply preparing multiple prompts without retraining neural networks or manual programming. In addition, since only prompts and their weights need to be prepared for each recognizer, there is no need to prepare multiple models, which facilitates resource management. It is possible to recognize the open/closed state of transparent doors, the state of whether water is running or not from a faucet, and even the qualitative state of whether a kitchen is clean or not, which have been challenging so far, through language., Comment: Accepted at Humanoids2024
Published: 2024

11. Agent-to-Sim: Learning Interactive Behavior Models from Casual Longitudinal Videos

Author: Yang, Gengshan, Bajcsy, Andrea, Saito, Shunsuke, and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Robotics
Abstract: We present Agent-to-Sim (ATS), a framework for learning interactive behavior models of 3D agents from casual longitudinal video collections. Different from prior works that rely on marker-based tracking and multiview cameras, ATS learns natural behaviors of animal and human agents non-invasively through video observations recorded over a long time-span (e.g., a month) in a single environment. Modeling 3D behavior of an agent requires persistent 3D tracking (e.g., knowing which point corresponds to which) over a long time period. To obtain such data, we develop a coarse-to-fine registration method that tracks the agent and the camera over time through a canonical 3D space, resulting in a complete and persistent spacetime 4D representation. We then train a generative model of agent behaviors using paired data of perception and motion of an agent queried from the 4D reconstruction. ATS enables real-to-sim transfer from video recordings of an agent to an interactive behavior simulator. We demonstrate results on pets (e.g., cat, dog, bunny) and human given monocular RGBD videos captured by a smartphone., Comment: Project page: https://gengshan-y.github.io/agent2sim-www/
Published: 2024

12. Estimating Body and Hand Motion in an Ego-sensed World

Author: Yi, Brent, Ye, Vickie, Zheng, Maya, Li, Yunqi, Müller, Lea, Pavlakos, Georgios, Ma, Yi, Malik, Jitendra, and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture a device wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve hand estimation: the resulting kinematic and temporal constraints can reduce world-frame errors in single-frame estimates by 40%. Project page: https://egoallo.github.io/, Comment: Project page: https://egoallo.github.io/
Published: 2024

13. Real-World Cooking Robot System from Recipes Based on Food State Recognition Using Foundation Models and PDDL

Author: Kanazawa, Naoaki, Kawaharazuka, Kento, Obinata, Yoshiki, Okada, Kei, and Inaba, Masayuki
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Although there is a growing demand for cooking behaviours as one of the expected tasks for robots, a series of cooking behaviours based on new recipe descriptions by robots in the real world has not yet been realised. In this study, we propose a robot system that integrates real-world executable robot cooking behaviour planning using the Large Language Model (LLM) and classical planning of PDDL descriptions, and food ingredient state recognition learning from a small number of data using the Vision-Language model (VLM). We succeeded in experiments in which PR2, a dual-armed wheeled robot, performed cooking from arranged new recipes in a real-world environment, and confirmed the effectiveness of the proposed system., Comment: Accepted at Advanced Robotics, website - https://kanazawanaoaki.github.io/cook-from-recipe-pddl/
Published: 2024
Full Text: View/download PDF

14. Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization

Author: Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Okada, Kei, and Inaba, Masayuki
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: In order for robots to autonomously navigate and operate in diverse environments, it is essential for them to recognize the state of their environment. On the other hand, the environmental state recognition has traditionally involved distinct methods tailored to each state to be recognized. In this study, we perform a unified environmental state recognition for robots through the spoken language with pre-trained large-scale vision-language models. We apply Visual Question Answering and Image-to-Text Retrieval, which are tasks of Vision-Language Models. We show that with our method, it is possible to recognize not only whether a room door is open/closed, but also whether a transparent door is open/closed and whether water is running in a sink, without training neural networks or manual programming. In addition, the recognition accuracy can be improved by selecting appropriate texts from the set of prepared texts based on black-box optimization. For each state recognition, only the text set and its weighting need to be changed, eliminating the need to prepare multiple different models and programs, and facilitating the management of source code and computer resource. We experimentally demonstrate the effectiveness of our method and apply it to the recognition behavior on a mobile robot, Fetch., Comment: Accepted at Advanced Robotics, website - https://haraduka.github.io/vlm-bbo/
Published: 2024
Full Text: View/download PDF

15. Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction

Author: Kerr, Justin, Kim, Chung Min, Wu, Mingxuan, Yi, Brent, Wang, Qianqian, Goldberg, Ken, and Kanazawa, Angjoo
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: Humans can learn to manipulate new objects by simply watching others; providing robots with the ability to learn from such demonstrations would enable a natural interface specifying new behaviors. This work develops Robot See Robot Do (RSRD), a method for imitating articulated object manipulation from a single monocular RGB human demonstration given a single static multi-view object scan. We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video with differentiable rendering. This analysis-by-synthesis approach uses part-centric feature fields in an iterative optimization which enables the use of geometric regularizers to recover 3D motions from only a single video. Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion. By representing demonstrations as part-centric trajectories, RSRD focuses on replicating the demonstration's intended behavior while considering the robot's own morphological limits, rather than attempting to reproduce the hand's motion. We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot. Each phase of RSRD achieves an average of 87% success rate, for a total end-to-end success rate of 60% across 90 trials. Notably, this is accomplished using only feature fields distilled from large pretrained vision models -- without any task-specific training, fine-tuning, dataset collection, or annotation. Project page: https://robot-see-robot-do.github.io, Comment: CoRL 2024, Project page: https://robot-see-robot-do.github.io
Published: 2024

16. gsplat: An Open-Source Library for Gaussian Splatting

Author: Ye, Vickie, Li, Ruilong, Kerr, Justin, Turkulainen, Matias, Yi, Brent, Pan, Zhuoyang, Seiskari, Otto, Ye, Jianbo, Hu, Jeffrey, Tancik, Matthew, and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: gsplat is an open-source library designed for training and developing Gaussian Splatting methods. It features a front-end with Python bindings compatible with the PyTorch library and a back-end with highly optimized CUDA kernels. gsplat offers numerous features that enhance the optimization of Gaussian Splatting models, which include optimization improvements for speed, memory, and convergence times. Experimental results demonstrate that gsplat achieves up to 10% less training time and 4x less memory than the original implementation. Utilized in several research projects, gsplat is actively maintained on GitHub. Source code is available at https://github.com/nerfstudio-project/gsplat under Apache License 2.0. We welcome contributions from the open-source community., Comment: 17 pages, 2 figures, JMLR MLOSS
Published: 2024

17. Maximum Persistent Betti Numbers of \v{C}ech Complexes

Author: Edelsbrunner, Herbert, Kahle, Matthew, and Kanazawa, Shu
Subjects: Mathematics - Combinatorics, 52C45
Abstract: This note proves that only a linear number of holes in a \v{C}ech complex of $n$ points in $\mathbb{R}^d$ can persist over an interval of constant length. The proof uses a packing argument supported by relating the \v{C}ech complexes with corresponding snap complexes over the cells in a partition of space. The bound also applies to Alpha complexes and Vietoris-Rips complexes., Comment: 8 pages, 2 figures
Published: 2024

18. Synergy and Synchrony in Couple Dances

Author: Maluleke, Vongani, Müller, Lea, Rajasegaran, Jathushan, Pavlakos, Georgios, Ginosar, Shiry, Kanazawa, Angjoo, and Malik, Jitendra
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper asks to what extent social interaction influences one's behavior. We study this in the setting of two dancers dancing as a couple. We first consider a baseline in which we predict a dancer's future moves conditioned only on their past motion without regard to their partner. We then investigate the advantage of taking social information into account by conditioning also on the motion of their dancing partner. We focus our analysis on Swing, a dance genre with tight physical coupling for which we present an in-the-wild video dataset. We demonstrate that single-person future motion prediction in this context is challenging. Instead, we observe that prediction greatly benefits from considering the interaction partners' behavior, resulting in surprisingly compelling couple dance synthesis results (see supp. video). Our contributions are a demonstration of the advantages of socially conditioned future motion prediction and an in-the-wild, couple dance video dataset to enable future research in this direction. Video results are available on the project website: https://von31.github.io/synNsync
Published: 2024

19. WhisperMask: A Noise Suppressive Mask-Type Microphone for Whisper Speech

Author: Hiraki, Hirotaka, Kanazawa, Shusuke, Miura, Takahiro, Yoshida, Manabu, Mochimaru, Masaaki, and Rekimoto, Jun
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, H.5.2
Abstract: Whispering is a common privacy-preserving technique in voice-based interactions, but its effectiveness is limited in noisy environments. In conventional hardware- and software-based noise reduction approaches, isolating whispered speech from ambient noise and other speech sounds remains a challenge. We thus propose WhisperMask, a mask-type microphone featuring a large diaphragm with low sensitivity, making the wearer's voice significantly louder than the background noise. We evaluated WhisperMask using three key metrics: signal-to-noise ratio, quality of recorded voices, and speech recognition rate. Across all metrics, WhisperMask consistently outperformed traditional noise-suppressing microphones and software-based solutions. Notably, WhisperMask showed a 30% higher recognition accuracy for whispered speech recorded in an environment with 80 dB background noise compared with the pin microphone and earbuds. Furthermore, while a denoiser decreased the whispered speech recognition rate of these two microphones by approximately 20% at 30-60 dB noise, WhisperMask maintained a high performance even without denoising, surpassing the other microphones' performances by a significant margin.WhisperMask's design renders the wearer's voice as the dominant input and effectively suppresses background noise without relying on signal processing. This device allows for reliable voice interactions, such as phone calls and voice commands, in a wide range of noisy real-world scenarios while preserving user privacy., Comment: 14 pages, 14 figures
Published: 2024
Full Text: View/download PDF

20. Reflex-Based Open-Vocabulary Navigation without Prior Knowledge Using Omnidirectional Camera and Multiple Vision-Language Models

Author: Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Tsukamoto, Naoto, Okada, Kei, and Inaba, Masayuki
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Systems and Control
Abstract: Various robot navigation methods have been developed, but they are mainly based on Simultaneous Localization and Mapping (SLAM), reinforcement learning, etc., which require prior map construction or learning. In this study, we consider the simplest method that does not require any map construction or learning, and execute open-vocabulary navigation of robots without any prior knowledge to do this. We applied an omnidirectional camera and pre-trained vision-language models to the robot. The omnidirectional camera provides a uniform view of the surroundings, thus eliminating the need for complicated exploratory behaviors including trajectory generation. By applying multiple pre-trained vision-language models to this omnidirectional image and incorporating reflective behaviors, we show that navigation becomes simple and does not require any prior setup. Interesting properties and limitations of our method are discussed based on experiments with the mobile robot Fetch., Comment: Accepted at Advanced Robotics, website - https://haraduka.github.io/omnidirectional-vlm/
Published: 2024
Full Text: View/download PDF

21. Dynamical phase transitions in single particle Brownian motion without drift

Author: Kanazawa, Takahiro, Kawaguchi, Kyogo, and Adachi, Kyosuke
Subjects: Condensed Matter - Statistical Mechanics
Abstract: Dynamical phase transitions (DPTs) arise from qualitative changes in the long-time behavior of stochastic trajectories, often observed in systems with kinetic constraints or driven out of equilibrium. Here we demonstrate that first-order DPTs can occur even in the large deviations of a single Brownian particle without drift, but only when the system's dimensionality exceeds four. These DPTs are accompanied by temporal phase separations in the trajectories and exhibit dimension-dependent order due to the threshold behavior for bound state formation in Schr\"{o}dinger operators. We also discover second-order DPTs in one-dimensional Brownian motion, characterized by universal exponents in the rate function of dynamical observables. Our results establish a novel framework linking classical DPTs to quantum phase transitions., Comment: 6 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2407.14090
Published: 2024

22. Comment on 'Reconsidering the nonlinear emergent inductance: time-varying Joule heating and its impact on the AC electrical response'

Author: Yokouchi, Tomoyuki, Kitaori, Aki, Yamaguchi, Daiki, Kanazawa, Naoya, Hirschberger, Max, Nagaosa, Naoto, and Tokura, Yoshinori
Subjects: Condensed Matter - Materials Science
Abstract: When non-collinear spin textures are driven by current, an emergent electric field arises due to the emergent electromagnetic induction. So far, this phenomenon has been reported in several materials, manifesting the current-nonlinear imaginary part of the complex impedance. Recently, Furuta et al. proposed a time-varying temperature increase due to Joule heating as an alternative explanation for these current-nonlinear complex impedances [arXiv:2407.00309v1]. In this study, we re-examine the nonlinear complex impedance in GdRuAl12 and YMn6Sn6, specifically addressing the impact of the time-varying temperature increase. Our findings reveal that the magnetic-field angle, frequency, and temperature dependence of nonlinear complex impedances in these two materials cannot be explained by the time-varying temperature increase. Instead, these dependencies of the imaginary part of the nonlinear impedance are consistent with the expected behaviour in the theory of emergent electromagnetic induction. Moreover, we observe a significant real part of the nonlinear complex impedance, likely resulting from the dissipation associated with the current-driven motion of helices and domain walls. Our findings highlight the diverse current-nonlinear transport phenomena of spin dynamical origin in helimagnets., Comment: Comment on arXiv:2407.00309
Published: 2024

23. Universality in the dynamical phase transitions of Brownian motion

Author: Kanazawa, Takahiro, Kawaguchi, Kyogo, and Adachi, Kyosuke
Subjects: Condensed Matter - Statistical Mechanics
Abstract: We study the dynamical phase transitions (DPTs) appearing for a single Brownian particle without drift. We first explore how first-order DPTs in large deviations can be found even for a single Brownian particle without any force upon raising the dimension to higher than four. The DPTs accompany temporal phase separations in their dynamical paths, which we numerically confirm by fitting to scaling functions. We next investigate how second-order DPTs can appear in one-dimensional free Brownian motion by choosing the observable, which essentially captures the localization transition of the trajectories. We discuss and confirm that the DPTs predicted for high dimensions can also be found when considering many Brownian particles at lower dimensions., Comment: 16 pages, 8 figures
Published: 2024

24. Shape of Motion: 4D Reconstruction from a Single Video

Author: Wang, Qianqian, Ye, Vickie, Gao, Hang, Austin, Jake, Li, Zhengqi, and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches are limited in that they either depend on templates, are effective only in quasi-static scenes, or fail to model 3D motion explicitly. In this work, we introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. We tackle the under-constrained nature of the problem with two key insights: First, we exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases. Each point's motion is expressed as a linear combination of these bases, facilitating soft decomposition of the scene into multiple rigidly-moving groups. Second, we utilize a comprehensive set of data-driven priors, including monocular depth maps and long-range 2D tracks, and devise a method to effectively consolidate these noisy supervisory signals, resulting in a globally consistent representation of the dynamic scene. Experiments show that our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes. Project Page: https://shape-of-motion.github.io/
Published: 2024

25. Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections

Author: Xu, Congrong, Kerr, Justin, and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Novel view synthesis from unconstrained in-the-wild image collections remains a significant yet challenging task due to photometric variations and transient occluders that complicate accurate scene reconstruction. Previous methods have approached these issues by integrating per-image appearance features embeddings in Neural Radiance Fields (NeRFs). Although 3D Gaussian Splatting (3DGS) offers faster training and real-time rendering, adapting it for unconstrained image collections is non-trivial due to the substantially different architecture. In this paper, we introduce Splatfacto-W, an approach that integrates per-Gaussian neural color features and per-image appearance embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. Our key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in in-the-wild scenarios. Our method improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS. Additional video results and code integrated into Nerfstudio are available at https://kevinxu02.github.io/splatfactow/., Comment: 9 pages
Published: 2024

26. LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Author: LLM-jp, Aizawa, Akiko, Aramaki, Eiji, Chen, Bowen, Cheng, Fei, Deguchi, Hiroyuki, Enomoto, Rintaro, Fujii, Kazuki, Fukumoto, Kensuke, Fukushima, Takuya, Han, Namgi, Harada, Yuto, Hashimoto, Chikara, Hiraoka, Tatsuya, Hisada, Shohei, Hosokawa, Sosuke, Jie, Lu, Kamata, Keisuke, Kanazawa, Teruhito, Kanezashi, Hiroki, Kataoka, Hiroshi, Katsumata, Satoru, Kawahara, Daisuke, Kawano, Seiya, Keyaki, Atsushi, Kiryu, Keisuke, Kiyomaru, Hirokazu, Kodama, Takashi, Kubo, Takahiro, Kuga, Yohei, Kumon, Ryoma, Kurita, Shuhei, Kurohashi, Sadao, Li, Conglong, Maekawa, Taiki, Matsuda, Hiroshi, Miyao, Yusuke, Mizuki, Kentaro, Mizuki, Sakae, Murawaki, Yugo, Mousterou, Akim, Nakamura, Ryo, Nakamura, Taishi, Nakayama, Kouta, Nakazato, Tomoka, Niitsuma, Takuro, Nishitoba, Jiro, Oda, Yusuke, Ogawa, Hayato, Okamoto, Takumi, Okazaki, Naoaki, Oseki, Yohei, Ozaki, Shintaro, Ryu, Koki, Rzepka, Rafal, Sakaguchi, Keisuke, Sasaki, Shota, Sekine, Satoshi, Suda, Kohei, Sugawara, Saku, Sugiura, Issa, Sugiyama, Hiroaki, Suzuki, Hisami, Suzuki, Jun, Suzumura, Toyotaro, Tachibana, Kensuke, Takagi, Yu, Takami, Kyosuke, Takeda, Koichi, Takeshita, Masashi, Tanaka, Masahiro, Taura, Kenjiro, Tolmachev, Arseny, Ueda, Nobuhiro, Wan, Zhen, Yada, Shuntaro, Yahata, Sakiko, Yamamoto, Yuya, Yamauchi, Yusuke, Yanaka, Hitomi, Yokota, Rio, and Yoshino, Koichiro
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/.
Published: 2024

27. Rethinking Score Distillation as a Bridge Between Image Distributions

Author: McAllister, David, Ge, Songwei, Huang, Jia-Bin, Jacobs, David W., Efros, Alexei A., Holynski, Aleksander, and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from a source distribution to a target distribution. Under this new interpretation, these methods seek to transport corrupted images (source) to the natural image distribution (target). We argue that current methods' characteristic artifacts are caused by (1) linear approximation of the optimal path and (2) poor estimates of the source distribution. We show that calibrating the text conditioning of the source distribution can produce high-quality generation and translation results with little extra overhead. Our method can be easily applied across many domains, matching or beating the performance of specialized methods. We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real. We compare our method to existing approaches for score distillation sampling and show that it can produce high-frequency details with realistic colors., Comment: NeurIPS 2024. Project webpage: https://sds-bridge.github.io/
Published: 2024

28. Event prediction and causality inference despite incomplete information

Author: Lam, Harrison, Chen, Yuanjie, Kanazawa, Noboru, Chowdhury, Mohammad, Battista, Anna, and Waldert, Stephan
Subjects: Computer Science - Machine Learning
Abstract: We explored the challenge of predicting and explaining the occurrence of events within sequences of data points. Our focus was particularly on scenarios in which unknown triggers causing the occurrence of events may consist of non-consecutive, masked, noisy data points. This scenario is akin to an agent tasked with learning to predict and explain the occurrence of events without understanding the underlying processes or having access to crucial information. Such scenarios are encountered across various fields, such as genomics, hardware and software verification, and financial time series prediction. We combined analytical, simulation, and machine learning (ML) approaches to investigate, quantify, and provide solutions to this challenge. We deduced and validated equations generally applicable to any variation of the underlying challenge. Using these equations, we (1) described how the level of complexity changes with various parameters (e.g., number of apparent and hidden states, trigger length, confidence, etc.) and (2) quantified the data needed to successfully train an ML model. We then (3) proved our ML solution learns and subsequently identifies unknown triggers and predicts the occurrence of events. If the complexity of the challenge is too high, our ML solution can identify trigger candidates to be used to interactively probe the system under investigation to determine the true trigger in a way considerably more efficient than brute force methods. By sharing our findings, we aim to assist others grappling with similar challenges, enabling estimates on the complexity of their problem, the data required and a solution to solve it., Comment: 16 pages, 8 figures, 1 table
Published: 2024

29. Qudit-Generalization of the Qubit Echo and Its Application to a Qutrit-Based Toffoli Gate

Author: Iiyama, Yutaro, Jang, Wonho, Kanazawa, Naoki, Sawada, Ryu, Onodera, Tamiya, and Terashi, Koji
Subjects: Quantum Physics
Abstract: The fidelity of certain gates on noisy quantum computers may be improved when they are implemented using more than two levels of the involved transmons. The main impediments to achieving this potential are the dynamic gate phase errors that cannot be corrected via calibration. The standard tool for countering such phase errors in two-level qubits is the echo protocol, often referred to as the dynamical decoupling sequence, where the evolution of a qubit is punctuated by an even number of X gates. We introduce basis cycling, which is a direct generalization of the qubit echo to general qudits, and provide an analytic framework for designing gate sequences to produce desired effects using this technique. We then apply basis cycling to a Toffoli gate decomposition incorporating a qutrit and obtain CCZ gate fidelity values up to 93.8$\pm$0.1%, measured by quantum process tomography, on IBM quantum computers. The gate fidelity remains stable without recalibration even while the resonant frequency of the qutrit fluctuates, highlighting the dynamical nature of phase error cancellation through basis cycling. Our results demonstrate that one of the biggest difficulties in implementing qudit-based gate decompositions on superconducting quantum computers can be systematically overcome when certain conditions are met, and thus open a path toward fulfilling the promise of qudits as circuit optimization agents., Comment: 17 pages, 9 figures
Published: 2024

30. Self-Supervised Learning of Visual Servoing for Low-Rigidity Robots Considering Temporal Body Changes

Author: Kawaharazuka, Kento, Kanazawa, Naoaki, Okada, Kei, and Inaba, Masayuki
Subjects: Computer Science - Robotics
Abstract: In this study, we investigate object grasping by visual servoing in a low-rigidity robot. It is difficult for a low-rigidity robot to handle its own body as intended compared to a rigid robot, and calibration between vision and body takes some time. In addition, the robot must constantly adapt to changes in its body, such as the change in camera position and change in joints due to aging. Therefore, we develop a method for a low-rigidity robot to autonomously learn visual servoing of its body. We also develop a mechanism that can adaptively change its visual servoing according to temporal body changes. We apply our method to a low-rigidity 6-axis arm, MyCobot, and confirm its effectiveness by conducting object grasping experiments based on visual servoing., Comment: Accepted at IEEE Robotics and Automation Letters
Published: 2024
Full Text: View/download PDF

31. Toon3D: Seeing Cartoons from New Perspectives

Author: Weber, Ethan, Peterlinz, Riley, Mathur, Rohan, Warburg, Frederik, Efros, Alexei A., and Kanazawa, Angjoo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We recover the underlying 3D structure from images of cartoons and anime depicting the same scene. This is an interesting problem domain because images in creative media are often depicted without explicit geometric consistency for storytelling and creative expression-they are only 3D in a qualitative sense. While humans can easily perceive the underlying 3D scene from these images, existing Structure-from-Motion (SfM) methods that assume 3D consistency fail catastrophically. We present Toon3D for reconstructing geometrically inconsistent images. Our key insight is to deform the input images while recovering camera poses and scene geometry, effectively explaining away geometrical inconsistencies to achieve consistency. This process is guided by the structure inferred from monocular depth predictions. We curate a dataset with multi-view imagery from cartoons and anime that we annotate with reliable sparse correspondences using our user-friendly annotation tool. Our recovered point clouds can be plugged into novel-view synthesis methods to experience cartoons from viewpoints never drawn before. We evaluate against classical and recent learning-based SfM methods, where Toon3D is able to obtain more reliable camera poses and scene geometry., Comment: Please see our project page: https://toon3d.studio
Published: 2024

32. NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry

Author: Khandelwal, Yash, Arvind, Mayur, Kumar, Sriram, Gupta, Ashish, Danisetty, Sachin Kumar, Bagad, Piyush, Madan, Anish, Lunayach, Mayank, Annavajjala, Aditya, Maiti, Abhishek, Jain, Sansiddh, Dalmia, Aman, Deka, Namrata, White, Jerome, Doshi, Jigar, Kanazawa, Angjoo, Panicker, Rahul, Raval, Alpan, Rana, Srinivas, and Tapaswi, Makarand
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Malnutrition among newborns is a top public health concern in developing countries. Identification and subsequent growth monitoring are key to successful interventions. However, this is challenging in rural communities where health systems tend to be inaccessible and under-equipped, with poor adherence to protocol. Our goal is to equip health workers and public health systems with a solution for contactless newborn anthropometry in the community. We propose NurtureNet, a multi-task model that fuses visual information (a video taken with a low-cost smartphone) with tabular inputs to regress multiple anthropometry estimates including weight, length, head circumference, and chest circumference. We show that visual proxy tasks of segmentation and keypoint prediction further improve performance. We establish the efficacy of the model through several experiments and achieve a relative error of 3.9% and mean absolute error of 114.3 g for weight estimation. Model compression to 15 MB also allows offline deployment to low-cost smartphones., Comment: Accepted at CVPM Workshop at CVPR 2024
Published: 2024

33. Only Children by Choice vs. Only Children by Circumstances: Why Do Some Women Have Only One Child?

Author: Kanazawa, Satoshi and Awata, Yoko
Published: 2025
Full Text: View/download PDF

34. Aggregate productivity slowdown and share of temporary workers

Author: Kanazawa, Nobuyuki
Published: 2025
Full Text: View/download PDF

35. Conventional magnetic resonance imaging key features for distinguishing pathologically confirmed corticobasal degeneration from its mimics: a retrospective analysis of the J-VAC study

Author: Sakurai, Keita, Tokumaru, Aya M., Yoshida, Mari, Saito, Yuko, Wakabayashi, Koichi, Komori, Takashi, Hasegawa, Masato, Ikeuchi, Takeshi, Hayashi, Yuichi, Shimohata, Takayoshi, Murayama, Shigeo, Iwasaki, Yasushi, Uchihara, Toshiki, Sakai, Motoko, Yabe, Ichiro, Tanikawa, Satoshi, Takigawa, Hiroshi, Adachi, Tadashi, Hanajima, Ritsuko, Fujimura, Harutoshi, Hayashi, Kentaro, Sugaya, Keizo, Hasegawa, Kazuko, Sano, Terunori, Takao, Masaki, Yokota, Osamu, Miki, Tomoko, Kobayashi, Michio, Arai, Nobutaka, Ohkubo, Takuya, Yokota, Takanori, Mori, Keiko, Ito, Masumi, Ishida, Chiho, Idezuka, Jiro, Toyoshima, Yasuko, Kanazawa, Masato, Aoki, Masashi, Hasegawa, Takafumi, Watanabe, Hirohisa, Hashizume, Atsushi, Niwa, Hisayoshi, Yasui, Keizo, Ito, Keita, Washimi, Yukihiko, Kubota, Akatsuki, Toda, Tatsushi, Nakashima, Kenji, and Aiba, Ikuko
Published: 2024
Full Text: View/download PDF

36. NeRF-XL: Scaling NeRFs with Multiple GPUs

Author: Li, Ruilong, Fidler, Sanja, Kanazawa, Angjoo, Williams, Francis, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
Published: 2025
Full Text: View/download PDF

37. Curse of Dimensionality on Persistence Diagrams

Author: Hiraoka, Yasuaki, Imoto, Yusuke, Kanazawa, Shu, and Liu, Enhao
Subjects: Mathematics - Statistics Theory, Mathematics - Algebraic Topology, Mathematics - Probability, 62R40 (Primary) 55N31, 60B15, 60B20 (Secondary)
Abstract: The stability of persistent homology has led to wide applications of the persistence diagram as a trusted topological descriptor in the presence of noise. However, with the increasing demand for high-dimension and low-sample-size data processing in modern science, it is questionable whether persistence diagrams retain their reliability in the presence of high-dimensional noise. This work aims to study the reliability of persistence diagrams in the high-dimension low-sample-size data setting. By analyzing the asymptotic behavior of persistence diagrams for high-dimensional random data, we show that persistence diagrams are no longer reliable descriptors of low-sample-size data under high-dimensional noise perturbations. We refer to this loss of reliability of persistence diagrams in such data settings as the curse of dimensionality on persistence diagrams. Next, we investigate the possibility of using normalized principal component analysis as a method for reducing the dimensionality of the high-dimensional observed data to resolve the curse of dimensionality. We show that this method can mitigate the curse of dimensionality on persistence diagrams. Our results shed some new light on the challenges of processing high-dimension low-sample-size data by persistence diagrams and provide a starting point for future research in this area.
Published: 2024

38. NeRF-XL: Scaling NeRFs with Multiple GPUs

Author: Li, Ruilong, Fidler, Sanja, Kanazawa, Angjoo, and Williams, Francis
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Graphics
Abstract: We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity. We begin by revisiting existing multi-GPU approaches, which decompose large scenes into multiple independently trained NeRFs, and identify several fundamental issues with these methods that hinder improvements in reconstruction quality as additional computational resources (GPUs) are used in training. NeRF-XL remedies these issues and enables the training and rendering of NeRFs with an arbitrary number of parameters by simply using more hardware. At the core of our method lies a novel distributed training and rendering formulation, which is mathematically equivalent to the classic single-GPU case and minimizes communication between GPUs. By unlocking NeRFs with arbitrarily large parameter counts, our approach is the first to reveal multi-GPU scaling laws for NeRFs, showing improvements in reconstruction quality with larger parameter counts and speed improvements with more GPUs. We demonstrate the effectiveness of NeRF-XL on a wide variety of datasets, including the largest open-source dataset to date, MatrixCity, containing 258K images covering a 25km^2 city area., Comment: Webpage: https://research.nvidia.com/labs/toronto-ai/nerfxl/
Published: 2024

39. Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind

Author: Plizzari, Chiara, Goel, Shubham, Perrett, Toby, Chalk, Jacob, Kanazawa, Angjoo, and Damen, Dima
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of sight. In this paper, we aim to mimic this spatial cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind - 3D tracking active objects using observations captured through an egocentric camera. We introduce Lift, Match and Keep (LMK), a method which lifts partial 2D observations to 3D world coordinates, matches them over time using visual appearance, 3D location and interactions to form object tracks, and keeps these object tracks even when they go out-of-view of the camera - hence keeping in mind what is out of sight. We test LMK on 100 long videos from EPIC-KITCHENS. Our results demonstrate that spatial cognition is critical for correctly locating objects over short and long time scales. E.g., for one long egocentric video, we estimate the 3D location of 50 active objects. Of these, 60% can be correctly positioned in 3D after 2 minutes of leaving the camera view., Comment: 21 pages including references and appendix. Project Webpage: http://dimadamen.github.io/OSNOM/
Published: 2024

40. The More You See in 2D, the More You Perceive in 3D

Author: Han, Xinyang, Gao, Zelin, Kanazawa, Angjoo, Goel, Shubham, and Gandelsman, Yossi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Humans can infer 3D structure from 2D images of an object based on past experience and improve their 3D understanding as they see more images. Inspired by this behavior, we introduce SAP3D, a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images. Given a few unposed images of an object, we adapt a pre-trained view-conditioned diffusion model together with the camera poses of the images via test-time fine-tuning. The adapted diffusion model and the obtained camera poses are then utilized as instance-specific priors for 3D reconstruction and novel view synthesis. We show that as the number of input images increases, the performance of our approach improves, bridging the gap between optimization-based prior-less 3D reconstruction methods and single-image-to-3D diffusion-based methods. We demonstrate our system on real images as well as standard synthetic benchmarks. Our ablation studies confirm that this adaption behavior is key for more accurate 3D understanding., Comment: Project page: https://sap3d.github.io/
Published: 2024

41. Current-induced magnon trapping in spin torque oscillation

Author: Makiuchi, Takahiko, Kanazawa, Naoki, and Saitoh, Eiji
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Strongly Correlated Electrons
Abstract: Spin torque nano-oscillators realized by magnetization dynamics trapped in a current-induced potential are reported. We fabricated Ni$_{81}$Fe$_{19}$/Pt nanostructures and measured current-induced microwave emission from the structures. The result shows an increase in the magnitude and spectral narrowing of the microwave emission. We demonstrate that the current-induced magnetic field suppresses magnon radiation loss and significantly reduces the linewidth and the threshold current required for the spin torque oscillation., Comment: 5 pages, 4 figures
Published: 2024

42. Learning-Based Wiping Behavior of Low-Rigidity Robots Considering Various Surface Materials and Task Definitions

Author: Kawaharazuka, Kento, Kanazawa, Naoaki, Okada, Kei, and Inaba, Masayuki
Subjects: Computer Science - Robotics
Abstract: Wiping behavior is a task of tracing the surface of an object while feeling the force with the palm of the hand. It is necessary to adjust the force and posture appropriately considering the various contact conditions felt by the hand. Several studies have been conducted on the wiping motion, however, these studies have only dealt with a single surface material, and have only considered the application of the amount of appropriate force, lacking intelligent movements to ensure that the force is applied either evenly to the entire surface or to a certain area. Depending on the surface material, the hand posture and pressing force should be varied appropriately, and this is highly dependent on the definition of the task. Also, most of the movements are executed by high-rigidity robots that are easy to model, and few movements are executed by robots that are low-rigidity but therefore have a small risk of damage due to excessive contact. So, in this study, we develop a method of motion generation based on the learned prediction of contact force during the wiping motion of a low-rigidity robot. We show that MyCobot, which is made of low-rigidity resin, can appropriately perform wiping behaviors on a plane with multiple surface materials based on various task definitions., Comment: Accepted at Humanoids2022
Published: 2024
Full Text: View/download PDF

43. Continuous Object State Recognition for Cooking Robots Using Pre-Trained Vision-Language Models and Black-box Optimization

Author: Kawaharazuka, Kento, Kanazawa, Naoaki, Obinata, Yoshiki, Okada, Kei, and Inaba, Masayuki
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: The state recognition of the environment and objects by robots is generally based on the judgement of the current state as a classification problem. On the other hand, state changes of food in cooking happen continuously and need to be captured not only at a certain time point but also continuously over time. In addition, the state changes of food are complex and cannot be easily described by manual programming. Therefore, we propose a method to recognize the continuous state changes of food for cooking robots through the spoken language using pre-trained large-scale vision-language models. By using models that can compute the similarity between images and texts continuously over time, we can capture the state changes of food while cooking. We also show that by adjusting the weighting of each text prompt based on fitting the similarity changes to a sigmoid function and then performing black-box optimization, more accurate and robust continuous state recognition can be achieved. We demonstrate the effectiveness and limitations of this method by performing the recognition of water boiling, butter melting, egg cooking, and onion stir-frying., Comment: accepted at IEEE Robotics and Automation Letters (RA-L), website - https://haraduka.github.io/continuous-state-recognition/
Published: 2024
Full Text: View/download PDF

44. Dynamics of measurement-induced state transitions in superconducting qubits

Author: Hirasaki, Yuta, Daimon, Shunsuke, Kanazawa, Naoki, Itoko, Toshinari, Tokunari, Masao, and Saitoh, Eiji
Subjects: Quantum Physics
Abstract: We have investigated temporal fluctuation of superconducting qubits via the time-resolved measurement for an IBM Quantum system. We found that the qubit error rate abruptly changes during specific time intervals. Each high error state persists for several tens of seconds, and exhibits an on-off behavior. The observed temporal instability can be attributed to qubit transitions induced by a measurement stimulus. Resonant transition between fluctuating dressed states of the qubits coupled with high-frequency resonators can be responsible for the error-rate change., Comment: 6 pages, 7 figures
Published: 2024

45. Conversion map from quantitative parameter mapping to myelin water fraction: comparison with R1·R2* and myelin water fraction in white matter

Author: Kitano, Shun, Kanazawa, Yuki, Harada, Masafumi, Taniguchi, Yo, Hayashi, Hiroaki, Matsumoto, Yuki, Ito, Kosuke, Bito, Yoshitaka, and Haga, Akihiro
Published: 2024
Full Text: View/download PDF

46. Large deviation principle for persistence diagrams of random cubical filtrations

Author: Kanazawa, Shu, Hiraoka, Yasuaki, Miyanaga, Jun, and Tsunoda, Kenkichi
Published: 2024
Full Text: View/download PDF

47. Regional Cerebral Oxygen Saturation and Estimated Oxygen Extraction Ratio as Predictive Markers of Major Adverse Events in Infants with Congenital Heart Disease

Author: Kimura, Satoshi, Shimizu, Kazuyoshi, Izumi, Kaoru, Kanazawa, Tomoyuki, Mizuno, Keiichiro, Iwasaki, Tatsuo, and Morimatsu, Hiroshi
Published: 2024
Full Text: View/download PDF

48. Investigation of the effectiveness of preoperative intubation simulation using a custom-made simulator for pediatric patients with difficult airway: a pilot study

Author: Kanazawa, Tomoyuki
Published: 2024
Full Text: View/download PDF

49. Characterization of carotid plaques using chemical exchange saturation transfer imaging

Author: Kanematsu, Yasuhisa, Kanazawa, Yuki, Shimada, Kenji, Korai, Masaaki, Miyamoto, Takeshi, Sogabe, Shu, Ishihara, Manabu, Yamaguchi, Izumi, Oya, Takeshi, Yamamoto, Nobuaki, Yamamoto, Yuki, Miyoshi, Mitsuharu, Harada, Masafumi, and Takagi, Yasushi
Published: 2024
Full Text: View/download PDF

50. Psychiatrists’ Perspectives on Advantages, Disadvantages and Challenging for Promotion Related to Telemedicine: Japan’s Clinical Experience During COVID-19 Pandemic

Author: Kinoshita, Shotaro, Kitazawa, Momoko, Abe, Yoshinari, Suda, Akira, Nakamae, Takashi, Kanazawa, Tetsufumi, Tomita, Hiroaki, Hishimoto, Akitoyo, and Kishimoto, Taishiro
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

51,098 results on '"So, Kanazawa"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources