311,664 results on '"A. Yuki"'
Search Results
2. The Microlensing Event Rate and Optical Depth from MOA-II 9 year Survey toward the Galactic Bulge
- Author
-
Nunota, Kansuke, Sumi, Takahiro, Koshimoto, Naoki, Rattenbury, Nicholas J., Abe, Fumio, Barry, Richard, Bennett, David P., Bhattacharya, Aparna, Fukui, Akihiko, Hamada, Ryusei, Hamada, Shunya, Hamasaki, Naoto, Hirao, Yuki, Silva, Stela Ishitani, Itow, Yoshitaka, Matsubara, Yutaka, Miyazaki, Shota, Muraki, Yasushi, Nagai, Tsutsumi, Olmschenk, Greg, Ranc, Clement, Satoh, Yuki K., Suzuki, Daisuke, Tristram, Paul J., Vandorou, Aikaterini, and Yama, Hibiki
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
We present measurements of the microlensing optical depth and event rate toward the Galactic bulge using the dataset from the 2006--2014 MOA-II survey, which covers 22 bulge fields spanning ~42 deg^2 between -5 deg < l < 10 deg and -7 deg < b < -1 deg. In the central region with |l|<5 deg, we estimate an optical depth of {\tau} = [1.75+-0.04]*10^-6exp[(0.34+-0.02)(3 deg-|b|)] and an event rate of {\Gamma} = [16.08+-0.28]*10^-6exp[(0.44+-0.02)(3 deg-|b|)] star^-1 year^-1 using a sample consisting of 3525 microlensing events, with Einstein radius crossing times of tE < 760 days and source star magnitude of IsWe confirm our results are consistent with the latest measurements from OGLE-IV 8 year dataset (Mr\'oz et al. 2019). We find our result is inconsistent with a prediction based on Galactic models, especially in the central region with |b|<3 deg. These results can be used to improve the Galactic bulge model, and more central regions can be further elucidated by future microlensing experiments, such as The PRime-focus Infrared Microlensing Experiment (PRIME) and Nancy Grace Roman Space Telescope.
- Published
- 2024
3. Construction and Analysis of Impression Caption Dataset for Environmental Sounds
- Author
-
Okamoto, Yuki, Nagase, Ryotaro, Okamoto, Minami, Saito, Yuki, Imoto, Keisuke, Fukumori, Takahiro, and Yamashita, Yoichi
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Some datasets with the described content and order of occurrence of sounds have been released for conversion between environmental sound and text. However, there are very few texts that include information on the impressions humans feel, such as "sharp" and "gorgeous," when they hear environmental sounds. In this study, we constructed a dataset with impression captions for environmental sounds that describe the impressions humans have when hearing these sounds. We used ChatGPT to generate impression captions and selected the most appropriate captions for sound by humans. Our dataset consists of 3,600 impression captions for environmental sounds. To evaluate the appropriateness of impression captions for environmental sounds, we conducted subjective and objective evaluations. From our evaluation results, we indicate that appropriate impression captions for environmental sounds can be generated.
- Published
- 2024
4. The Ni isotopic composition of Ryugu reveals a common accretion region for carbonaceous chondrites
- Author
-
Spitzer, Fridolin, Kleine, Thorsten, Burkhardt, Christoph, Hopp, Timo, Yokoyama, Tetsuya, Abe, Yoshinari, Aléon, Jérôme, Alexander, Conel M. O'D., Amari, Sachiko, Amelin, Yuri, Bajo, Ken-ichi, Bizzarro, Martin, Bouvier, Audrey, Carlson, Richard W., Chaussidon, Marc, Choi, Byeon-Gak, Dauphas, Nicolas, Davis, Andrew M., Di Rocco, Tommaso, Fujiya, Wataru, Fukai, Ryota, Gautam, Ikshu, Haba, Makiko K., Hibiya, Yuki, Hidaka, Hiroshi, Homma, Hisashi, Hoppe, Peter, Huss, Gary R., Ichida, Kiyohiro, Iizuka, Tsuyoshi, Ireland, Trevor R., Ishikawa, Akira, Itoh, Shoichi, Kawasaki, Noriyuki, Kita, Noriko T., Kitajima, Kouki, Komatani, Shintaro, Krot, Alexander N., Liu, Ming-Chang, Masuda, Yuki, Morita, Mayu, Moynier, Fréderic, Motomura, Kazuko, Nakai, Izumi, Nagashima, Kazuhide, Nguyen, Ann, Nittler, Larry, Onose, Morihiko, Pack, Andreas, Park, Changkun, Piani, Laurette, Qin, Liping, Russell, Sara S., Sakamoto, Naoya, Schönbächler, Maria, Tafla, Lauren, Tang, Haolan, Terada, Kentaro, Terada, Yasuko, Usui, Tomohiro, Wada, Sohei, Wadhwa, Meenakshi, Walker, Richard J., Yamashita, Katsuyuki, Yin, Qing-Zhu, Yoneda, Shigekazu, Young, Edward D., Yui, Hiroharu, Zhang, Ai-Cheng, Nakamura, Tomoki, Naraoka, Hiroshi, Noguchi, Takaaki, Okazaki, Ryuji, Sakamoto, Kanako, Yabuta, Hikaru, Abe, Masanao, Miyazaki, Akiko, Nakato, Aiko, Nishimura, Masahiro, Okada, Tatsuaki, Yada, Toru, Yogata, Kasumi, Nakazawa, Satoru, Saiki, Takanao, Tanaka, Satoshi, Terui, Fuyuto, Tsuda, Yuichi, Watanabe, Sei-ichiro, Yoshikawa, Makoto, Tachibana, Shogo, and Yurimoto, Hisayoshi
- Subjects
Astrophysics - Earth and Planetary Astrophysics - Abstract
The isotopic compositions of samples returned from Cb-type asteroid Ryugu and Ivuna-type (CI) chondrites are distinct from other carbonaceous chondrites, which has led to the suggestion that Ryugu and CI chondrites formed in a different region of the accretion disk, possibly around the orbits of Uranus and Neptune. We show that, like for Fe, Ryugu and CI chondrites also have indistinguishable Ni isotope anomalies, which differ from those of other carbonaceous chondrites. We propose that this unique Fe and Ni isotopic composition reflects different accretion efficiencies of small FeNi metal grains among the carbonaceous chondrite parent bodies. The CI chondrites incorporated these grains more efficiently, possibly because they formed at the end of the disk's lifetime, when planetesimal formation was also triggered by photoevaporation of the disk. Isotopic variations among carbonaceous chondrites may thus reflect fractionation of distinct dust components from a common reservoir, implying CI chondrites and Ryugu may have formed in the same region of the accretion disk as other carbonaceous chondrites., Comment: Published open access in Science Advances
- Published
- 2024
- Full Text
- View/download PDF
5. Surface structure of the 3x3-Si phase on Al(111), studied by the multiple usages of positron diffraction and core-level photoemission spectroscopy
- Author
-
Sato, Yusuke, Fukaya, Yuki, Nakano, Akito, Hoshi, Takeo, Lee, Chi-Cheng, Yoshimi, Kazuyoshi, Ozaki, Taisuke, Nakashima, Takeru, Ando, Yasunobu, Aoyama, Hiroaki, Abukawa, Tadashi, Tsujikawa, Yuki, Horio, Masafumi, Niibe, Masahito, Komori, Fumio, and Matsuda, Iwao
- Subjects
Condensed Matter - Materials Science - Abstract
The structure of an Al(111)3x3-Si surface was examined by combining data from positron diffraction and core-level photoemission spectroscopy. Analysis of the diffraction rocking curves indicated that the overlayer had a flat honeycomb lattice structure. Simulations of Si core-level spectra calculated via the first-principles indicated that one of the Si atoms in the unit cell was replaced by an Al atom. The surface superstructure was thus a two-dimensional layer of Al-embedded silicene on Al(111).
- Published
- 2024
6. A Fashion Item Recommendation Model in Hyperbolic Space
- Author
-
Shimizu, Ryotaro, Wang, Yu, Kimura, Masanari, Hirakawa, Yuki, Wada, Takashi, Saito, Yuki, and McAuley, Julian
- Subjects
Computer Science - Information Retrieval ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
In this work, we propose a fashion item recommendation model that incorporates hyperbolic geometry into user and item representations. Using hyperbolic space, our model aims to capture implicit hierarchies among items based on their visual data and users' purchase history. During training, we apply a multi-task learning framework that considers both hyperbolic and Euclidean distances in the loss function. Our experiments on three data sets show that our model performs better than previous models trained in Euclidean space only, confirming the effectiveness of our model. Our ablation studies show that multi-task learning plays a key role, and removing the Euclidean loss substantially deteriorates the model performance., Comment: This work was presented at the CVFAD Workshop at CVPR 2024
- Published
- 2024
7. GazeSearch: Radiology Findings Search Benchmark
- Author
-
Pham, Trong Thang, Nguyen, Tien-Phat, Ikebe, Yuki, Awasthi, Akash, Deng, Zhigang, Wu, Carol C., Nguyen, Hien, and Le, Ngan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Medical eye-tracking data is an important information source for understanding how radiologists visually interpret medical images. This information not only improves the accuracy of deep learning models for X-ray analysis but also their interpretability, enhancing transparency in decision-making. However, the current eye-tracking data is dispersed, unprocessed, and ambiguous, making it difficult to derive meaningful insights. Therefore, there is a need to create a new dataset with more focus and purposeful eyetracking data, improving its utility for diagnostic applications. In this work, we propose a refinement method inspired by the target-present visual search challenge: there is a specific finding and fixations are guided to locate it. After refining the existing eye-tracking datasets, we transform them into a curated visual search dataset, called GazeSearch, specifically for radiology findings, where each fixation sequence is purposefully aligned to the task of locating a particular finding. Subsequently, we introduce a scan path prediction baseline, called ChestSearch, specifically tailored to GazeSearch. Finally, we employ the newly introduced GazeSearch as a benchmark to evaluate the performance of current state-of-the-art methods, offering a comprehensive assessment for visual search in the medical imaging domain., Comment: Aceepted WACV 2025
- Published
- 2024
8. Magical Experience with Full-body Action
- Author
-
Ye, Bowen, Enzaki, Yuki, and Iwata, Hiroo
- Subjects
Computer Science - Human-Computer Interaction - Abstract
This paper presents a system that generates a magical experience with full-body motion. The system consists of a locomotion interface and a spatial immersive display. A virtual experience system named the Magical Experience Generator was developed, equipped with a Magical Experience Controller. This system provides a physical movement experience along with magical-like interactions in a virtual space. We developed content inspired by the Japanese story "The Man Who Made Flowers Bloom" using Unity as the system's environment. The locomotion interface records the participant's walking trajectory and hand movements, representing their actions in the virtual space., Comment: Part of proceedings of 6th International Conference AsiaHaptics 2024
- Published
- 2024
9. Development of fall prevention training device that can provide external disturbance to the ankle with pneumatic gel muscles (PGM) while walking
- Author
-
Isoshima, Keigo, Tada, Mitsunori, Maeda, Noriaki, Tashiro, Tsubasa, Arima, Satoshi, Nagao, Takumi, Tamura, Yuki, and Kurita, Yuichi
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Although the average life expectancy in Japan has been increasing in recent years, the problem of the large gap between healthy life expectancy and average life expectancy is still unresolved. Among the factors that lead to the need for nursing care, injuries due to falls account for a certain percentage of the total. In this paper, we developed boots that can provide external disturbance to the ankle with pneumatic gel muscles (PGM) while walking. We experimented using an angular velocity and acceleration of the heel as evaluation indices to evaluate the effectiveness of fall prevention training using this device, which is smaller and more wearable than conventional devices. In this study, we confirmed that the developed system has enough training intensity to significantly affect the gait waveform., Comment: Part of proceedings of 6th International Conference AsiaHaptics 2024
- Published
- 2024
10. An emotional expression system with vibrotactile feedback during the robot's speech
- Author
-
Konishi, Yuki and Tanaka, Yoshihiro
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Robotics - Abstract
This study aimed to develop a system that provides vibrotactile feedback corresponding to the emotional content of text when a communication robot speaks. We used OpenAI's "GPT-4o Mini" for emotion estimation, extracting valence and arousal values from the text. The amplitude and frequency of vibrotactile stimulation using sine waves were controlled on the basis of estimated emotional values. We assembled a palm-sized tactile display to present these vibrotactile stimuli. In the experiment, participants listened to the robot's speech while holding the device and then evaluated their psychological state. The results suggested that the communication accompanied by the vibrotactile feedback could influence psychological states and intimacy levels., Comment: Part of proceedings of 6th International Conference AsiaHaptics 2024
- Published
- 2024
11. This took us a Weyl: synthesis of a semimetallic Weyl ferromagnet with point Fermi surface
- Author
-
Belopolski, Ilya, Watanabe, Ryota, Sato, Yuki, Yoshimi, Ryutaro, Kawamura, Minoru, Nagahama, Soma, Zhao, Yilin, Shao, Sen, Jin, Yuanjun, Kato, Yoshihiro, Okamura, Yoshihiro, Zhang, Xiao-Xiao, Fujishiro, Yukako, Takahashi, Youtarou, Hirschberger, Max, Tsukazaki, Atsushi, Takahashi, Kei S., Chiu, Ching-Kai, Chang, Guoqing, Kawasaki, Masashi, Nagaosa, Naoto, and Tokura, Yoshinori
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science - Abstract
Quantum materials governed by emergent topological fermions have become a cornerstone of physics. Dirac fermions in graphene form the basis for moir\'e quantum matter, and Dirac fermions in magnetic topological insulators enabled the discovery of the quantum anomalous Hall effect. In contrast, there are few materials whose electromagnetic response is dominated by emergent Weyl fermions. Nearly all known Weyl materials are overwhelmingly metallic, and are largely governed by irrelevant, conventional electrons. Here we theoretically predict and experimentally observe a semimetallic Weyl ferromagnet in van der Waals (Cr,Bi)$_2$Te$_3$. In transport, we find a record bulk anomalous Hall angle $> 0.5$ along with non-metallic conductivity, a regime sharply distinct from conventional ferromagnets. Together with symmetry analysis, our data suggest a semimetallic Fermi surface composed of two Weyl points, with a giant separation $> 75\%$ of the linear dimension of the bulk Brillouin zone, and no other electronic states. Using state-of-the-art crystal synthesis techniques, we widely tune the electronic structure, allowing us to annihilate the Weyl state and visualize a unique topological phase diagram exhibiting broad Chern insulating, Weyl semimetallic and magnetic semiconducting regions. Our observation of a semimetallic Weyl ferromagnet offers an avenue toward novel correlated states and non-linear phenomena, as well as zero-magnetic-field Weyl spintronic and optical devices., Comment: Nature, in press
- Published
- 2024
12. A Bayesian nonparametric approach to mediation and spillover effects with multiple mediators in cluster-randomized trials
- Author
-
Ohnishi, Yuki and Li, Fan
- Subjects
Statistics - Methodology - Abstract
Cluster randomized trials (CRTs) with multiple unstructured mediators present significant methodological challenges for causal inference due to within-cluster correlation, interference among units, and the complexity introduced by multiple mediators. Existing causal mediation methods often fall short in simultaneously addressing these complexities, particularly in disentangling mediator-specific effects under interference that are central to studying complex mechanisms. To address this gap, we propose new causal estimands for spillover mediation effects that differentiate the roles of each individual's own mediator and the spillover effects resulting from interactions among individuals within the same cluster. We establish identification results for each estimand and, to flexibly model the complex data structures inherent in CRTs, we develop a new Bayesian nonparametric prior -- the Nested Dependent Dirichlet Process Mixture -- designed for flexibly capture the outcome and mediator surfaces at different levels. We conduct extensive simulations across various scenarios to evaluate the frequentist performance of our methods, compare them with a Bayesian parametric counterpart and illustrate our new methods in an analysis of a completed CRT., Comment: 71 pages
- Published
- 2024
13. Single-layer spin-orbit-torque magnetization switching due to spin Berry curvature generated by minute spontaneous atomic displacement in a Weyl oxide
- Author
-
Horiuchi, Hiroto, Araki, Yasufumi, Wakabayashi, Yuki K., Ieda, Jun'ichi, Yamanouchi, Michihiko, Kaneta-Takada, Shingo, Taniyasu, Yoshitaka, Yamamoto, Hideki, Krockenberger, Yoshiharu, Tanaka, Masaaki, and Ohya, Shinobu
- Subjects
Condensed Matter - Materials Science ,Physics - Applied Physics - Abstract
Spin Berry curvature characterizes the band topology as the spin counterpart of Berry curvature and is crucial in generating novel spintronics functionalities. By breaking the crystalline inversion symmetry, the spin Berry curvature is expected to be significantly enhanced; this enhancement will increase the intrinsic spin Hall effect in ferromagnetic materials and, thus, the spin-orbit torques (SOTs). However, this intriguing approach has not been applied to devices; generally, the extrinsic spin Hall effect in ferromagnet/heavy-metal bilayer is used for SOT magnetization switching. Here, SOT-induced partial magnetization switching is demonstrated in a single layer of a single-crystalline Weyl oxide SrRuO3 (SRO) with a small current density of ~3.1{\times}10^6 A cm-2. Detailed analysis of the crystal structure in the seemingly perfect periodic lattice of the SRO film reveals barely discernible oxygen octahedral rotations with angles of ~5{\deg} near the interface with a substrate. Tight-binding calculations indicate that a large spin Hall conductivity is induced around small gaps generated at band crossings by the synergy of inherent spin-orbit coupling and band inversion due to the rotations, causing magnetization reversal. Our results indicate that a minute atomic displacement in single-crystal films can induce strong intrinsic SOTs that are useful for spin-orbitronics devices., Comment: 39 pages, 5 figures in the main text, 9 figures in Supporting Information
- Published
- 2024
14. Clear Reduction in Spin Susceptibility and Superconducting Spin Rotation for $H \parallel a$ in the Early-Stage Sample of Spin-Triplet Superconductor UTe$_2$
- Author
-
Kitagawa, Shunsaku, Nakanishi, Kousuke, Matsumura, Hiroki, Takahashi, Yuki, Ishida, Kenji, Tokunaga, Yo, Sakai, Hironori, Kambe, Shinsaku, Nakamura, Ai, Shimizu, Yusei, Homma, Yoshiya, Li, Dexin, Honda, Fuminori, Miyake, Atsushi, and Aoki, Dai
- Subjects
Condensed Matter - Superconductivity ,Condensed Matter - Strongly Correlated Electrons - Abstract
We report the re-measurement of the $a$-axis spin susceptibility component in an early-stage sample of the spin-triplet superconductor UTe$_2$ with the transition temperature of $T_{\rm SC}$ = 1.6 K. Using Knight-shift measurements along the $b$ axis and at a 10-degree tilt from the $b$ axis towards the $a$ axis, we accurately determined the $a$-axis component without directly measuring the $a$-axis Knight shift. Our results reveal a decrease of approximately 3\% in the $a$-axis spin susceptibility in the superconducting state under $a$-axis magnetic field $\mu_0 H_a \sim 0.1$ T, indicating that the spin susceptibility decreases similarly in both early-stage and ultraclean samples with $T_{\rm SC}$ = 2.1 K. The previously reported absence of the reduction in Knight shift is attributed to the missing of signal from the superconducting region and to the detection of residual signals from the non-superconducting region instead. We also found that the decrease in the $a$-axis spin susceptibility is immediately suppressed with increasing the $a$-axis magnetic field and is estimated to be completely suppressed at around 1.5 T due to superconducting spin rotation., Comment: 5 pages, 3 figures
- Published
- 2024
- Full Text
- View/download PDF
15. Music Foundation Model as Generic Booster for Music Downstream Tasks
- Author
-
Liao, WeiHsiang, Takida, Yuhta, Ikemiya, Yukara, Zhong, Zhi, Lai, Chieh-Hsin, Fabbro, Giorgio, Shimada, Kazuki, Toyama, Keisuke, Cheuk, Kinwai, Martínez-Ramírez, Marco A., Takahashi, Shusuke, Uhlich, Stefan, Akama, Taketo, Choi, Woosung, Koyama, Yuichiro, and Mitsufuji, Yuki
- Subjects
Computer Science - Sound ,Computer Science - Information Retrieval ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
We demonstrate the efficacy of using intermediate representations from a single foundation model to enhance various music downstream tasks. We introduce SoniDo, a music foundation model (MFM) designed to extract hierarchical features from target music samples. By leveraging hierarchical intermediate features, SoniDo constrains the information granularity, leading to improved performance across various downstream tasks including both understanding and generative tasks. We specifically evaluated this approach on representative tasks such as music tagging, music transcription, music source separation, and music mixing. Our results reveal that the features extracted from foundation models provide valuable enhancements in training downstream task models. This highlights the capability of using features extracted from music foundation models as a booster for downstream tasks. Our approach not only benefits existing task-specific models but also supports music downstream tasks constrained by data scarcity. This paves the way for more effective and accessible music processing solutions., Comment: 41 pages with 14 figures
- Published
- 2024
16. All-frequency Full-body Human Image Relighting
- Author
-
Tajima, Daichi, Kanamori, Yoshihiro, and Endo, Yuki
- Subjects
Computer Science - Graphics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Relighting of human images enables post-photography editing of lighting effects in portraits. The current mainstream approach uses neural networks to approximate lighting effects without explicitly accounting for the principle of physical shading. As a result, it often has difficulty representing high-frequency shadows and shading. In this paper, we propose a two-stage relighting method that can reproduce physically-based shadows and shading from low to high frequencies. The key idea is to approximate an environment light source with a set of a fixed number of area light sources. The first stage employs supervised inverse rendering from a single image using neural networks and calculates physically-based shading. The second stage then calculates shadow for each area light and sums up to render the final image. We propose to make soft shadow mapping differentiable for the area-light approximation of environment lighting. We demonstrate that our method can plausibly reproduce all-frequency shadows and shading caused by environment illumination, which have been difficult to reproduce using existing methods., Comment: project page: [this URL](https://github.com/majita06/All-frequency_Full-body_Human_Image_Relighting)
- Published
- 2024
17. A Practical Style Transfer Pipeline for 3D Animation: Insights from Production R&D
- Author
-
Todo, Hideki, Koyama, Yuki, Sakai, Kunihiro, Komiya, Akihiro, and Kato, Jun
- Subjects
Computer Science - Graphics - Abstract
Our animation studio has developed a practical style transfer pipeline for creating stylized 3D animation, which is suitable for complex real-world production. This paper presents the insights from our development process, where we explored various options to balance quality, artist control, and workload, leading to several key decisions. For example, we chose patch-based texture synthesis over machine learning for better control and to avoid training data issues. We also addressed specifying style exemplars, managing multiple colors within a scene, controlling outlines and shadows, and reducing temporal noise. These insights were used to further refine our pipeline, ultimately enabling us to produce an experimental short film showcasing various styles.
- Published
- 2024
- Full Text
- View/download PDF
18. Neural Network Matrix Product Operator: A Multi-Dimensionally Integrable Machine Learning Potential
- Author
-
Hino, Kentaro and Kurashige, Yuki
- Subjects
Computer Science - Machine Learning ,Condensed Matter - Disordered Systems and Neural Networks ,Condensed Matter - Statistical Mechanics ,Physics - Chemical Physics ,Quantum Physics - Abstract
A neural network-based machine learning potential energy surface (PES) expressed in a matrix product operator (NN-MPO) is proposed. The MPO form enables efficient evaluation of high-dimensional integrals that arise in solving the time-dependent and time-independent Schr\"odinger equation and effectively overcomes the so-called curse of dimensionality. This starkly contrasts with other neural network-based machine learning PES methods, such as multi-layer perceptrons (MLPs), where evaluating high-dimensional integrals is not straightforward due to the fully connected topology in their backbone architecture. Nevertheless, the NN-MPO retains the high representational capacity of neural networks. NN-MPO can achieve spectroscopic accuracy with a test mean absolute error (MAE) of 3.03 cm$^{-1}$ for a fully coupled six-dimensional ab initio PES, using only 625 training points distributed across a 0 to 17,000 cm$^{-1}$ energy range. Our Python implementation is available at https://github.com/KenHino/Pompon., Comment: 11 pages, 10 figures
- Published
- 2024
19. Giant Seebeck Effect in PEDOT Materials with Molecular Strain
- Author
-
Osada, Yuki, Takagi, Ryo, Arimatsu, Hideki, and Fujima, Takuya
- Subjects
Condensed Matter - Materials Science ,Physics - Applied Physics - Abstract
Poly 3,4-ethylenedioxythiophene (PEDOT) has been attracting attention as a thermoelectric material for room-temperature use due to its flexibility and non-toxicity. However, PEDOT reportedly generates insufficient thermoelectric power for practical use. This work tried to improve the Seebeck coefficient by introducing molecular strain to PEDOT molecules by loading a Polystyrene sulfonate (PSS)-free PEDOT on a Polyethyleneterephthalate (PET) fiber. Raman spectroscopy revealed the PEDOT materials with significant compression in the C{\alpha}-C{\alpha} bond and extension in the C{\alpha}=C\b{eta} bond exhibit Seebeck coefficients two orders of magnitude larger than usual. Furthermore, strain in the C\b{eta}-C\b{eta} bond strongly correlated with the Seebeck coefficient that varied in a broad range from -2100 to 3100 {\mu}V K-1. This variation indicated that the molecular strain formed a sharp peak or valley around the Fermi level in the density of state (DOS) function, which gradually shifts along with the C\b{eta}-C\b{eta} strain. This molecular strain-induced giant Seebeck effect is expected to be an applicable technique for other polythiophene molecules., Comment: 10 pages, 7 figures
- Published
- 2024
20. On computational complexity of unitary and state design properties
- Author
-
Nakata, Yoshifumi, Takeuchi, Yuki, Kliesch, Martin, and Darmawan, Andrew
- Subjects
Quantum Physics ,Condensed Matter - Statistical Mechanics ,High Energy Physics - Theory ,Mathematical Physics - Abstract
We study unitary and state $t$-designs from a computational complexity theory perspective. First, we address the problems of computing frame potentials that characterize (approximate) $t$-designs. We provide a quantum algorithm for computing the frame potential and show that 1. exact computation can be achieved by a single query to a $\# \textsf{P}$-oracle and is $\# \textsf{P}$-hard, 2. for state vectors, it is $\textsf{BQP}$-complete to decide whether the frame potential is larger than or smaller than certain values, if the promise gap between the two values is inverse-polynomial in the number of qubits, and 3. both for state vectors and unitaries, it is $\textsf{PP}$-complete if the promise gap is exponentially small. As the frame potential is closely related to the out-of-time-ordered correlators (OTOCs), our result implies that computing the OTOCs with exponential accuracy is also hard. Second, we address promise problems to decide whether a given set is a good or bad approximation to a $t$-design and show that this problem is in $\textsf{PP}$ for any constant $t$ and is $\textsf{PP}$-hard for $t=1,2$ and $3$. Remarkably, this is the case even if a given set is promised to be either exponentially close to or worse than constant away from a $1$-design. Our results illustrate the computationally hard nature of unitary and state designs., Comment: 22 pages, 1 figure, and 1 table
- Published
- 2024
21. Quarkyonic matter pieces together the hyperon puzzle
- Author
-
Fujimoto, Yuki, Kojo, Toru, and McLerran, Larry
- Subjects
Nuclear Theory ,Astrophysics - High Energy Astrophysical Phenomena ,High Energy Physics - Phenomenology - Abstract
Matter composed of hyperons has been hypothesized to occur in neutron stars at densities slightly above the nuclear saturation density and in many descriptions gives rise to a significant softening in the equation of state (EoS). This softening would be at odds with the constraints from neutron star observations and ab initio nuclear matter computations at low density. This inconsistency is known as the hyperon puzzle. We show that Quarkyonic Matter models, which take into account the quark substructure of baryons, can mitigate the hyperon puzzle. We demonstrate two important consequences of the quark substructure effects. First, the hyperon threshold is shifted to a higher density as neutrons preoccupy the phase space for down quarks, preventing the emergence of hyperons at low energy. Secondly, the softening in the EoS becomes mild even above the hyperon threshold density because only little phase space is available for low-energy hyperons; increasing hyperon density quickly drives hyperons into the relativistic regime. In this work, we illustrate these two effects for a matter composed of charge-neutral baryons, using the ideal dual Quarkyonic (IdylliQ) model for three flavors. This model incorporates the Quarkyonic duality and allows us to manifestly express the quark Pauli blocking constraints in terms of the baryon occupation probability. The extension to neutron star matter is also briefly discussed., Comment: 20 pages, 6 figures
- Published
- 2024
22. Component Modularized Design of Musculoskeletal Humanoid Platform Musashi to Investigate Learning Control Systems
- Author
-
Kawaharazuka, Kento, Makino, Shogo, Tsuzuki, Kei, Onitsuka, Moritaka, Nagamatsu, Yuya, Shinjo, Koki, Makabe, Tasuku, Asano, Yuki, Okada, Kei, Kawasaki, Koji, and Inaba, Masayuki
- Subjects
Computer Science - Robotics - Abstract
To develop Musashi as a musculoskeletal humanoid platform to investigate learning control systems, we aimed for a body with flexible musculoskeletal structure, redundant sensors, and easily reconfigurable structure. For this purpose, we develop joint modules that can directly measure joint angles, muscle modules that can realize various muscle routes, and nonlinear elastic units with soft structures, etc. Next, we develop MusashiLarm, a musculoskeletal platform composed of only joint modules, muscle modules, generic bone frames, muscle wire units, and a few attachments. Finally, we develop Musashi, a musculoskeletal humanoid platform which extends MusashiLarm to the whole body design, and conduct several basic experiments and learning control experiments to verify the effectiveness of its concept., Comment: Accepted at IROS2019
- Published
- 2024
- Full Text
- View/download PDF
23. Role of Ion Milling Angle in Determining Conducting and Insulating States on SrTiO3 Surfaces
- Author
-
Wakabayashia, Yuki K., Krockenberger, Yoshiharu, Takiguchi, Kosuke, Yamamoto, Hideki, and Taniyasu, Yoshitaka
- Subjects
Condensed Matter - Materials Science - Abstract
SrTiO3 (STO), a promising wide-bandgap semiconductor for high-k capacitors and photocatalysis, requires precise surface control for device fabrication. This study investigates the impact of ion milling on STO's surface conductivity. We find that ion milling at incident angles below 10 degree preserves the insulating state, while ion milling at larger angles induces a conducting surface with high electron mobility (5000-11000 cm2/Vs). This transition is attributed to the milling penetration depth exceeding the STO lattice constant (3.905 {\AA}). Our results provide valuable insights for optimizing STO-based device fabrication, enabling precise control over surface properties while maintaining desired insulating characteristics.
- Published
- 2024
24. Trajectory Flow Matching with Applications to Clinical Time Series Modeling
- Author
-
Zhang, Xi, Pu, Yuan, Kawamura, Yuki, Loza, Andrew, Bengio, Yoshua, Shung, Dennis L., and Tong, Alexander
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning - Abstract
Modeling stochastic and irregularly sampled time series is a challenging problem found in a wide range of applications, especially in medicine. Neural stochastic differential equations (Neural SDEs) are an attractive modeling technique for this problem, which parameterize the drift and diffusion terms of an SDE with neural networks. However, current algorithms for training Neural SDEs require backpropagation through the SDE dynamics, greatly limiting their scalability and stability. To address this, we propose Trajectory Flow Matching (TFM), which trains a Neural SDE in a simulation-free manner, bypassing backpropagation through the dynamics. TFM leverages the flow matching technique from generative modeling to model time series. In this work we first establish necessary conditions for TFM to learn time series data. Next, we present a reparameterization trick which improves training stability. Finally, we adapt TFM to the clinical time series setting, demonstrating improved performance on three clinical time series datasets both in terms of absolute performance and uncertainty prediction., Comment: NeurIPS 2024 Spotlight
- Published
- 2024
25. Self-Shielding Enhanced Organics Synthesis in an Early Reduced Earth's Atmosphere
- Author
-
Yoshida, Tatsuya, Koyama, Shungo, Nakamura, Yuki, Terada, Naoki, and Kuramoto, Kiyoshi
- Subjects
Astrophysics - Earth and Planetary Astrophysics ,Physics - Atmospheric and Oceanic Physics - Abstract
Earth is expected to have acquired a reduced proto-atmosphere enriched in H2 and CH4 through the accretion of building blocks that contain metallic Fe and/or the gravitational trapping of surrounding nebula gas. Such an early, wet, reduced atmosphere that covers a proto-ocean would then ultimately evolve toward oxidized chemical compositions through photochemical processes that involve reactions with H2O-derived oxidant radicals and the selective escape of hydrogen to space. During this time, atmospheric CH4 could be photochemically reprocessed to generate not only C-bearing oxides but also organics. However, the branching ratio between organic matter formation and oxidation remains unknown despite its significance on the abiotic chemical evolution of early Earth. Here, we show via numerical analyses that UV absorptions by gaseous hydrocarbons such as C2H2 and C3H4 significantly suppress H2O photolysis subsequent CH4 oxidation during the photochemical evolution of a wet proto-atmosphere enriched in H2 and CH4. As a result, nearly half of the initial CH4 converted to heavier organics along with the deposition of prebiotically essential molecules such as HCN and H2CO on the surface of a primordial ocean for a geological timescale order of 10-100 Myr. Our results suggest that the accumulation of organics and prebiotically important molecules in the proto-ocean could produce a soup enriched in various organics, which might have eventually led to the emergence of living organisms., Comment: 35 pages, 12 figures
- Published
- 2024
- Full Text
- View/download PDF
26. Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation
- Author
-
Lee, Junwon, Tailleur, Modan, Heller, Laurie M., Choi, Keunwoo, Lagrange, Mathieu, McFee, Brian, Imoto, Keisuke, and Okamoto, Yuki
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Despite significant advancements in neural text-to-audio generation, challenges persist in controllability and evaluation. This paper addresses these issues through the Sound Scene Synthesis challenge held as part of the Detection and Classification of Acoustic Scenes and Events 2024. We present an evaluation protocol combining objective metric, namely Fr\'echet Audio Distance, with perceptual assessments, utilizing a structured prompt format to enable diverse captions and effective evaluation. Our analysis reveals varying performance across sound categories and model architectures, with larger models generally excelling but innovative lightweight approaches also showing promise. The strong correlation between objective metrics and human ratings validates our evaluation approach. We discuss outcomes in terms of audio quality, controllability, and architectural considerations for text-to-audio synthesizers, providing direction for future research., Comment: accepted to NeurIPS 2024 Workshop: Audio Imagination
- Published
- 2024
27. JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
- Author
-
Onohara, Shota, Miyai, Atsuyuki, Imajuku, Yuki, Egashira, Kazuki, Baek, Jeonghun, Yue, Xiang, Neubig, Graham, and Aizawa, Kiyoharu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Accelerating research on Large Multimodal Models (LMMs) in non-English languages is crucial for enhancing user experiences across broader populations. In this paper, we introduce JMMMU (Japanese MMMU), the first large-scale Japanese benchmark designed to evaluate LMMs on expert-level tasks based on the Japanese cultural context. To facilitate comprehensive culture-aware evaluation, JMMMU features two complementary subsets: (i) culture-agnostic (CA) subset, where the culture-independent subjects (e.g., Math) are selected and translated into Japanese, enabling one-to-one comparison with its English counterpart MMMU; and (ii) culture-specific (CS) subset, comprising newly crafted subjects that reflect Japanese cultural context. Using the CA subset, we observe performance drop in many LMMs when evaluated in Japanese, which is purely attributable to language variation. Using the CS subset, we reveal their inadequate Japanese cultural understanding. Further, by combining both subsets, we identify that some LMMs perform well on the CA subset but not on the CS subset, exposing a shallow understanding of the Japanese language that lacks depth in cultural understanding. We hope this work will not only help advance LMM performance in Japanese but also serve as a guideline to create high-standard, culturally diverse benchmarks for multilingual LMM development. The project page is https://mmmu-japanese-benchmark.github.io/JMMMU/., Comment: Project page: https://mmmu-japanese-benchmark.github.io/JMMMU/
- Published
- 2024
28. A method to detect the VUV photons from cooled $^{229}$Th:CaF$_2$ crystals
- Author
-
Guan, Ming, Bartokos, Michael, Beeks, Kjeld, Fukunaga, Yuta, Hiraki, Takahiro, Masuda, Takahiko, Miyamoto, Yuki, Okage, Ryoichiro, Okai, Koichi, Sasao, Noboru, Schaden, Fabian, Schumm, Thorsten, Shimizu, Koutaro, Takatori, Sayuri, Yoshimi, Akihiro, and Yoshimura, Koji
- Subjects
Physics - Instrumentation and Detectors ,Nuclear Experiment - Abstract
Thorium-229, with its exceptionally low-energy nuclear excited state, is a key candidate for developing nuclear clocks. $^{229}$Th-doped CaF$_2$ crystals, benefiting from calcium fluoride's wide band gap, show great promise as solid-state nuclear clock materials. These crystals are excited by vacuum ultraviolet (VUV) lasers, which over time cause radiation damage. Cooling the crystals can mitigate this damage but introduces a challenge: photoabsorption. This occurs when residual gas molecules condense on the crystal surface, absorbing VUV photons and deteriorating detection efficiency. To solve this, we developed a cooling technique using a copper shield to surround the crystal, acting as a cold trap. This prevents ice-layer formation, even at temperatures below $-100\,^\circ$C, preserving high VUV photon detection efficiency. Our study detailed the experimental cooling setup and demonstrated the effectiveness of the copper shield in maintaining crystal performance, a critical improvement for future solid-state nuclear clocks operating at cryogenic temperatures., Comment: 5 figures
- Published
- 2024
29. AI-Driven Approaches for Glaucoma Detection -- A Comprehensive Review
- Author
-
Hagiwara, Yuki, Ciora, Octavia-Andreea, Monnet, Maureen, Lancho, Gino, and Lorenz, Jeanette Miriam
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The diagnosis of glaucoma plays a critical role in the management and treatment of this vision-threatening disease. Glaucoma is a group of eye diseases that cause blindness by damaging the optic nerve at the back of the eye. Often called "silent thief of sight", it exhibits no symptoms during the early stages. Therefore, early detection is crucial to prevent vision loss. With the rise of Artificial Intelligence (AI), particularly Deep Learning (DL) techniques, Computer-Aided Diagnosis (CADx) systems have emerged as promising tools to assist clinicians in accurately diagnosing glaucoma early. This paper aims to provide a comprehensive overview of AI techniques utilized in CADx systems for glaucoma diagnosis. Through a detailed analysis of current literature, we identify key gaps and challenges in these systems, emphasizing the need for improved safety, reliability, interpretability, and explainability. By identifying research gaps, we aim to advance the field of CADx systems especially for the early diagnosis of glaucoma, in order to prevent any potential loss of vision.
- Published
- 2024
30. OpenMU: Your Swiss Army Knife for Music Understanding
- Author
-
Zhao, Mengjie, Zhong, Zhi, Mao, Zhuoyuan, Yang, Shiqi, Liao, Wei-Hsiang, Takahashi, Shusuke, Wakaki, Hiromi, and Mitsufuji, Yuki
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
We present OpenMU-Bench, a large-scale benchmark suite for addressing the data scarcity issue in training multimodal language models to understand music. To construct OpenMU-Bench, we leveraged existing datasets and bootstrapped new annotations. OpenMU-Bench also broadens the scope of music understanding by including lyrics understanding and music tool usage. Using OpenMU-Bench, we trained our music understanding model, OpenMU, with extensive ablations, demonstrating that OpenMU outperforms baseline models such as MU-Llama. Both OpenMU and OpenMU-Bench are open-sourced to facilitate future research in music understanding and to enhance creative music production efficiency., Comment: Resources: https://github.com/mzhaojp22/openmu
- Published
- 2024
31. Wireless Link Quality Estimation Using LSTM Model
- Author
-
Kanto, Yuki and Watabe, Kohei
- Subjects
Computer Science - Networking and Internet Architecture ,Computer Science - Machine Learning - Abstract
In recent years, various services have been provided through high-speed and high-capacity wireless networks on mobile communication devices, necessitating stable communication regardless of indoor or outdoor environments. To achieve stable communication, it is essential to implement proactive measures, such as switching to an alternative path and ensuring data buffering before the communication quality becomes unstable. The technology of Wireless Link Quality Estimation (WLQE), which predicts the communication quality of wireless networks in advance, plays a crucial role in this context. In this paper, we propose a novel WLQE model for estimating the communication quality of wireless networks by leveraging sequential information. Our proposed method is based on Long Short-Term Memory (LSTM), enabling highly accurate estimation by considering the sequential information of link quality. We conducted a comparative evaluation with the conventional model, stacked autoencoder-based link quality estimator (LQE-SAE), using a dataset recorded in real-world environmental conditions. Our LSTM-based LQE model demonstrates its superiority, achieving a 4.0% higher accuracy and a 4.6% higher macro-F1 score than the LQE-SAE model in the evaluation., Comment: This paper was submitted to IEEE Network Operations and Management Symposium
- Published
- 2024
- Full Text
- View/download PDF
32. Differentially Private Covariate Balancing Causal Inference
- Author
-
Ohnishi, Yuki and Awan, Jordan
- Subjects
Statistics - Methodology ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
Differential privacy is the leading mathematical framework for privacy protection, providing a probabilistic guarantee that safeguards individuals' private information when publishing statistics from a dataset. This guarantee is achieved by applying a randomized algorithm to the original data, which introduces unique challenges in data analysis by distorting inherent patterns. In particular, causal inference using observational data in privacy-sensitive contexts is challenging because it requires covariate balance between treatment groups, yet checking the true covariates is prohibited to prevent leakage of sensitive information. In this article, we present a differentially private two-stage covariate balancing weighting estimator to infer causal effects from observational data. Our algorithm produces both point and interval estimators with statistical guarantees, such as consistency and rate optimality, under a given privacy budget., Comment: 30 pages
- Published
- 2024
33. Mitigating Embedding Collapse in Diffusion Models for Categorical Data
- Author
-
Nguyen, Bac, Lai, and Chieh-Hsin, Takida, Yuhta, Murata, Naoki, Uesaka, Toshimitsu, Ermon, Stefano, and Mitsufuji, Yuki
- Subjects
Computer Science - Machine Learning - Abstract
Latent diffusion models have enabled continuous-state diffusion models to handle a variety of datasets, including categorical data. However, most methods rely on fixed pretrained embeddings, limiting the benefits of joint training with the diffusion model. While jointly learning the embedding (via reconstruction loss) and the latent diffusion model (via score matching loss) could enhance performance, our analysis shows that end-to-end training risks embedding collapse, degrading generation quality. To address this issue, we introduce CATDM, a continuous diffusion framework within the embedding space that stabilizes training. We propose a novel objective combining the joint embedding-diffusion variational lower bound with a Consistency-Matching (CM) regularizer, alongside a shifted cosine noise schedule and random dropping strategy. The CM regularizer ensures the recovery of the true data distribution. Experiments on benchmarks show that CATDM mitigates embedding collapse, yielding superior results on FFHQ, LSUN Churches, and LSUN Bedrooms. In particular, CATDM achieves an FID of 6.81 on ImageNet $256\times256$ with 50 steps. It outperforms non-autoregressive models in machine translation and is on a par with previous methods in text generation.
- Published
- 2024
34. Integrating Temporal Representations for Dynamic Memory Retrieval and Management in Large Language Models
- Author
-
Hou, Yuki, Tamoto, Haruki, and Miyashita, Homei
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Conventional dialogue agents often struggle with effective memory recall, leading to redundant retrieval and inadequate management of unique user associations. To address this, we propose SynapticRAG, a novel approach integrating synaptic dynamics into Retrieval-Augmented Generation (RAG). SynapticRAG integrates temporal representations into memory vectors, mimicking biological synapses by differentiating events based on occurrence times and dynamically updating memory significance. This model employs temporal scoring for memory connections and a synaptic-inspired propagation control mechanism. Experiments across English, Japanese, and Chinese datasets demonstrate SynapticRAG's superiority over existing methods, including traditional RAG, with up to 14.66\% improvement in memory retrieval accuracy. Our approach advances context-aware dialogue AI systems by enhancing long-term context maintenance and specific information extraction from conversations.
- Published
- 2024
35. Regularization of matrices in the covariant derivative interpretation of matrix models
- Author
-
Hattori, Keiichiro, Mizuno, Yuki, and Tsuchiya, Asato
- Subjects
High Energy Physics - Theory - Abstract
We study regularization of matrices in the covariant derivative interpretation of matrix models, a typical example of which is the type IIB matrix model. The covariant derivative interpretation provides a possible way in which curved spacetimes are described by matrices, which are viewed as differential operators. One needs to regularize the operators as matrices with finite size in order to apply the interpretation to nonperturbative calculations such as numerical simulations. We develop a regularization of the covariant derivatives in two dimensions by using the Berezin-Toeplitz quantization. As examples, we examine the cases of $S^2$ and $T^2$ in details., Comment: 32 pages, 2 figures
- Published
- 2024
36. Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation
- Author
-
Shimizu, Ryotaro, Wada, Takashi, Wang, Yu, Kruse, Johannes, O'Brien, Sean, HtaungKham, Sai, Song, Linxin, Yoshikawa, Yuya, Saito, Yuki, Tsung, Fugee, Goto, Masayuki, and McAuley, Julian
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Information Retrieval - Abstract
Recent research on explainable recommendation generally frames the task as a standard text generation problem, and evaluates models simply based on the textual similarity between the predicted and ground-truth explanations. However, this approach fails to consider one crucial aspect of the systems: whether their outputs accurately reflect the users' (post-purchase) sentiments, i.e., whether and why they would like and/or dislike the recommended items. To shed light on this issue, we introduce new datasets and evaluation methods that focus on the users' sentiments. Specifically, we construct the datasets by explicitly extracting users' positive and negative opinions from their post-purchase reviews using an LLM, and propose to evaluate systems based on whether the generated explanations 1) align well with the users' sentiments, and 2) accurately identify both positive and negative opinions of users on the target items. We benchmark several recent models on our datasets and demonstrate that achieving strong performance on existing metrics does not ensure that the generated explanations align well with the users' sentiments. Lastly, we find that existing models can provide more sentiment-aware explanations when the users' (predicted) ratings for the target items are directly fed into the models as input. We will release our code and datasets upon acceptance.
- Published
- 2024
37. EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse View
- Author
-
Wang, Zhaorong, Kanamori, Yoshihiro, and Endo, Yuki
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
Generalizable neural radiance field (NeRF) enables neural-based digital human rendering without per-scene retraining. When combined with human prior knowledge, high-quality human rendering can be achieved even with sparse input views. However, the inference of these methods is still slow, as a large number of neural network queries on each ray are required to ensure the rendering quality. Moreover, occluded regions often suffer from artifacts, especially when the input views are sparse. To address these issues, we propose a generalizable human NeRF framework that achieves high-quality and real-time rendering with sparse input views by extensively leveraging human prior knowledge. We accelerate the rendering with a two-stage sampling reduction strategy: first constructing boundary meshes around the human geometry to reduce the number of ray samples for sampling guidance regression, and then volume rendering using fewer guided samples. To improve rendering quality, especially in occluded regions, we propose an occlusion-aware attention mechanism to extract occlusion information from the human priors, followed by an image space refinement network to improve rendering quality. Furthermore, for volume rendering, we adopt a signed ray distance function (SRDF) formulation, which allows us to propose an SRDF loss at every sample position to improve the rendering quality further. Our experiments demonstrate that our method outperforms the state-of-the-art methods in rendering quality and has a competitive rendering speed compared with speed-prioritized novel view synthesis methods., Comment: project page: https://github.com/LarsPh/EG-HumanNeRF
- Published
- 2024
38. The Physical Origin of Extreme Emission Line Galaxies at High redshifts: Strong {\sc [Oiii]} Emission Lines Produced by Obscured AGNs
- Author
-
Zhu, Chenghao, Harikane, Yuichi, Ouchi, Masami, Ono, Yoshiaki, Onodera, Masato, Tang, Shenli, Isobe, Yuki, Matsuoka, Yoshiki, Kawaguchi, Toshihiro, Umeda, Hiroya, Nakajima, Kimihiko, Liang, Yongming, Xu, Yi, Zhang, Yechi, Sun, Dongsheng, Shimasaku, Kazuhiro, Greene, Jenny, Iwasawa, Kazushi, Kohno, Kotaro, Nagao, Tohru, Schulze, Andreas, Shibuya, Takatoshi, Hilmi, Miftahul, and Schramm, Malte
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
We present deep Subaru/FOCAS spectra for two extreme emission line galaxies (EELGs) at $z\sim 1$ with strong {\sc[Oiii]}$\lambda$5007 emission lines, exhibiting equivalent widths (EWs) of $2905^{+946}_{-578}$ \AA\ and $2000^{+188}_{-159}$ \AA, comparable to those of EELGs at high redshifts that are now routinely identified with JWST spectroscopy. Adding a similarly large {\sc [Oiii]} EW ($2508^{+1487}_{-689}$ \AA) EELG found at $z\sim 2$ in the JWST CEERS survey to our sample, we explore for the physical origins of the large {\sc [Oiii]} EWs of these three galaxies with the Subaru spectra and various public data including JWST/NIRSpec, NIRCam, and MIRI data. While there are no clear signatures of AGN identified by the optical line diagnostics, we find that two out of two galaxies covered by the MIRI data show strong near-infrared excess in the spectral energy distributions (SEDs) indicating obscured AGN. Because none of the three galaxies show clear broad H$\beta$ lines, the upper limits on the flux ratios of broad-H$\beta$ to {\sc [Oiii]} lines are small, $\lesssim 0.15$ that are comparable with Seyfert $1.8-2.0$ galaxies. We conduct \texttt{Cloudy} modeling with the stellar and AGN incident spectra, allowing a wide range of parameters including metallicities and ionization parameters. We find that the large {\sc [Oiii]} EWs are not self-consistently reproduced by the spectra of stars or unobscured AGN, but obscured AGN that efficiently produces O$^{++}$ ionizing photons with weak nuclear and stellar continua that are consistent with the SED shapes., Comment: submitted to ApJ
- Published
- 2024
39. Learning to Ground VLMs without Forgetting
- Author
-
Bhowmik, Aritra, Derakhshani, Mohammad Mahdi, Koelma, Dennis, Oswald, Martin R., Asano, Yuki M., and Snoek, Cees G. M.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Spatial awareness is key to enable embodied multimodal AI systems. Yet, without vast amounts of spatial supervision, current Visual Language Models (VLMs) struggle at this task. In this paper, we introduce LynX, a framework that equips pretrained VLMs with visual grounding ability without forgetting their existing image and language understanding skills. To this end, we propose a Dual Mixture of Experts module that modifies only the decoder layer of the language model, using one frozen Mixture of Experts (MoE) pre-trained on image and language understanding and another learnable MoE for new grounding capabilities. This allows the VLM to retain previously learned knowledge and skills, while acquiring what is missing. To train the model effectively, we generate a high-quality synthetic dataset we call SCouT, which mimics human reasoning in visual grounding. This dataset provides rich supervision signals, describing a step-by-step multimodal reasoning process, thereby simplifying the task of visual grounding. We evaluate LynX on several object detection and visual grounding datasets, demonstrating strong performance in object detection, zero-shot localization and grounded reasoning while maintaining its original image and language understanding capabilities on seven standard benchmark datasets.
- Published
- 2024
40. TULIP: Token-length Upgraded CLIP
- Author
-
Najdenkoska, Ivona, Derakhshani, Mohammad Mahdi, Asano, Yuki M., van Noord, Nanne, Worring, Marcel, and Snoek, Cees G. M.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We address the challenge of representing long captions in vision-language models, such as CLIP. By design these models are limited by fixed, absolute positional encodings, restricting inputs to a maximum of 77 tokens and hindering performance on tasks requiring longer descriptions. Although recent work has attempted to overcome this limit, their proposed approaches struggle to model token relationships over longer distances and simply extend to a fixed new token length. Instead, we propose a generalizable method, named TULIP, able to upgrade the token length to any length for CLIP-like models. We do so by improving the architecture with relative position encodings, followed by a training procedure that (i) distills the original CLIP text encoder into an encoder with relative position encodings and (ii) enhances the model for aligning longer captions with images. By effectively encoding captions longer than the default 77 tokens, our model outperforms baselines on cross-modal tasks such as retrieval and text-to-image generation.
- Published
- 2024
41. Theoretical study on $\Lambda\alpha$ and $\Xi\alpha$ correlation functions
- Author
-
Jinno, Asanosuke, Kamiya, Yuki, Hyodo, Tetsuo, and Ohnishi, Akira
- Subjects
Nuclear Theory - Abstract
We examine the $\Lambda$-${}^4\mathrm{He}$ ($\alpha$) and $\Xi \alpha$ momentum correlation in high-energy collisions to further elucidate the properties of the hyperon-nucleon interactions. For the $\Lambda\alpha$ system, we compare $\Lambda\alpha$ potential models with different strengths at short range. We find that the difference among the models is visible in the momentum correlation from a small-size source. This indicates that the $\Lambda\alpha$ momentum correlation can constrain the property of the $\Lambda N$ interaction at short range, which plays an essential role in dense nuclear matter. For the $\Xi\alpha$ system, we employ the folding $\Xi \alpha$ potential based on the lattice QCD $\Xi N$ interactions. The $\Xi \alpha$ potential supports a Coulomb assisted bound state of ${}^5_{\Xi}\mathrm{H}$ in the $\Xi^-\alpha$ channel, while the $\Xi^0\alpha$ channel is unbound. To examine the sensitivity of the correlation function to the nature of the $\Xi \alpha$ potential, we vary the potential strength simulating stronger and weaker interactions. The result of the correlation function clearly reflects the bound state nature in the $\Xi^-\alpha$ correlation., Comment: 13 pages, 4 figures, Proceedings of Hadron interactions with strangeness and charm, 26-28 Jun 2024
- Published
- 2024
42. Distillation of Discrete Diffusion through Dimensional Correlations
- Author
-
Hayakawa, Satoshi, Takida, Yuhta, Imaizumi, Masaaki, Wakaki, Hiromi, and Mitsufuji, Yuki
- Subjects
Computer Science - Machine Learning ,Mathematics - Numerical Analysis ,Statistics - Machine Learning - Abstract
Diffusion models have demonstrated exceptional performances in various fields of generative modeling. While they often outperform competitors including VAEs and GANs in sample quality and diversity, they suffer from slow sampling speed due to their iterative nature. Recently, distillation techniques and consistency models are mitigating this issue in continuous domains, but discrete diffusion models have some specific challenges towards faster generation. Most notably, in the current literature, correlations between different dimensions (pixels, locations) are ignored, both by its modeling and loss functions, due to computational limitations. In this paper, we propose "mixture" models in discrete diffusion that are capable of treating dimensional correlations while remaining scalable, and we provide a set of loss functions for distilling the iterations of existing models. Two primary theoretical insights underpin our approach: first, that dimensionally independent models can well approximate the data distribution if they are allowed to conduct many sampling steps, and second, that our loss functions enables mixture models to distill such many-step conventional models into just a few steps by learning the dimensional correlations. We empirically demonstrate that our proposed method for discrete diffusions work in practice, by distilling a continuous-time discrete diffusion model pretrained on the CIFAR-10 dataset., Comment: To be presented at Machine Learning and Compression Workshop @ NeurIPS 2024
- Published
- 2024
43. $\textit{Jump Your Steps}$: Optimizing Sampling Schedule of Discrete Diffusion Models
- Author
-
Park, Yong-Hyun, Lai, Chieh-Hsin, Hayakawa, Satoshi, Takida, Yuhta, and Mitsufuji, Yuki
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion models have seen notable success in continuous domains, leading to the development of discrete diffusion models (DDMs) for discrete variables. Despite recent advances, DDMs face the challenge of slow sampling speeds. While parallel sampling methods like $\tau$-leaping accelerate this process, they introduce $\textit{Compounding Decoding Error}$ (CDE), where discrepancies arise between the true distribution and the approximation from parallel token generation, leading to degraded sample quality. In this work, we present $\textit{Jump Your Steps}$ (JYS), a novel approach that optimizes the allocation of discrete sampling timesteps by minimizing CDE without extra computational cost. More precisely, we derive a practical upper bound on CDE and propose an efficient algorithm for searching for the optimal sampling schedule. Extensive experiments across image, music, and text generation show that JYS significantly improves sampling quality, establishing it as a versatile framework for enhancing DDM performance for fast sampling.
- Published
- 2024
44. TVBench: Redesigning Video-Language Evaluation
- Author
-
Cores, Daniel, Dorkenwald, Michael, Mucientes, Manuel, Snoek, Cees G. M., and Asano, Yuki M.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Large language models have demonstrated impressive performance when integrated with vision models even enabling video understanding. However, evaluating these video models presents its own unique challenges, for which several benchmarks have been proposed. In this paper, we show that the currently most used video-language benchmarks can be solved without requiring much temporal reasoning. We identified three main issues in existing datasets: (i) static information from single frames is often sufficient to solve the tasks (ii) the text of the questions and candidate answers is overly informative, allowing models to answer correctly without relying on any visual input (iii) world knowledge alone can answer many of the questions, making the benchmarks a test of knowledge replication rather than visual reasoning. In addition, we found that open-ended question-answering benchmarks for video understanding suffer from similar issues while the automatic evaluation process with LLMs is unreliable, making it an unsuitable alternative. As a solution, we propose TVBench, a novel open-source video multiple-choice question-answering benchmark, and demonstrate through extensive evaluations that it requires a high level of temporal understanding. Surprisingly, we find that most recent state-of-the-art video-language models perform similarly to random performance on TVBench, with only Gemini-Pro and Tarsier clearly surpassing this baseline.
- Published
- 2024
45. G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving
- Author
-
Murata, Naoki, Lai, Chieh-Hsin, Takida, Yuhta, Uesaka, Toshimitsu, Nguyen, Bac, Ermon, Stefano, and Mitsufuji, Yuki
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Recent literature has effectively utilized diffusion models trained on continuous variables as priors for solving inverse problems. Notably, discrete diffusion models with discrete latent codes have shown strong performance, particularly in modalities suited for discrete compressed representations, such as image and motion generation. However, their discrete and non-differentiable nature has limited their application to inverse problems formulated in continuous spaces. This paper presents a novel method for addressing linear inverse problems by leveraging image-generation models based on discrete diffusion as priors. We overcome these limitations by approximating the true posterior distribution with a variational distribution constructed from categorical distributions and continuous relaxation techniques. Furthermore, we employ a star-shaped noise process to mitigate the drawbacks of traditional discrete diffusion models with absorbing states, demonstrating that our method performs comparably to continuous diffusion techniques. To the best of our knowledge, this is the first approach to use discrete diffusion model-based priors for solving image inverse problems.
- Published
- 2024
46. Do better language models have crisper vision?
- Author
-
Ruthardt, Jona, Burghouts, Gertjan J., Belongie, Serge, and Asano, Yuki M.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
How well do text-only Large Language Models (LLMs) grasp the visual world? As LLMs are increasingly used in computer vision, addressing this question becomes both fundamental and pertinent. However, existing studies have primarily focused on limited scenarios, such as their ability to generate visual content or cluster multimodal data. To this end, we propose the Visual Text Representation Benchmark (ViTeRB) to isolate key properties that make language models well-aligned with the visual world. With this, we identify large-scale decoder-based LLMs as ideal candidates for representing text in vision-centric contexts, counter to the current practice of utilizing text encoders. Building on these findings, we propose ShareLock, an ultra-lightweight CLIP-like model. By leveraging precomputable frozen features from strong vision and language models, ShareLock achieves an impressive 51% accuracy on ImageNet despite utilizing just 563k image-caption pairs. Moreover, training requires only 1 GPU hour (or 10 hours including the precomputation of features) - orders of magnitude less than prior methods. Code will be released.
- Published
- 2024
47. GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
- Author
-
Mirza, M. Jehanzeb, Zhao, Mengjie, Mao, Zhuoyuan, Doveh, Sivan, Lin, Wei, Gavrikov, Paul, Dorkenwald, Michael, Yang, Shiqi, Jha, Saurav, Wakaki, Hiromi, Mitsufuji, Yuki, Possegger, Horst, Feris, Rogerio, Karlinsky, Leonid, and Glass, James
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this work, we propose a novel method (GLOV) enabling Large Language Models (LLMs) to act as implicit Optimizers for Vision-Langugage Models (VLMs) to enhance downstream vision tasks. Our GLOV meta-prompts an LLM with the downstream task description, querying it for suitable VLM prompts (e.g., for zero-shot classification with CLIP). These prompts are ranked according to a purity measure obtained through a fitness function. In each respective optimization step, the ranked prompts are fed as in-context examples (with their accuracies) to equip the LLM with the knowledge of the type of text prompts preferred by the downstream VLM. Furthermore, we also explicitly steer the LLM generation process in each optimization step by specifically adding an offset difference vector of the embeddings from the positive and negative solutions found by the LLM, in previous optimization steps, to the intermediate layer of the network for the next generation step. This offset vector steers the LLM generation toward the type of language preferred by the downstream VLM, resulting in enhanced performance on the downstream vision tasks. We comprehensively evaluate our GLOV on 16 diverse datasets using two families of VLMs, i.e., dual-encoder (e.g., CLIP) and encoder-decoder (e.g., LLaVa) models -- showing that the discovered solutions can enhance the recognition performance by up to 15.0% and 57.5% (3.8% and 21.6% on average) for these models., Comment: Code: https://github.com/jmiemirza/GLOV
- Published
- 2024
48. VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression
- Author
-
Chae, Yunkee, Choi, Woosung, Takida, Yuhta, Koo, Junghyun, Ikemiya, Yukara, Zhong, Zhi, Cheuk, Kin Wai, Martínez-Ramírez, Marco A., Lee, Kyogu, Liao, Wei-Hsiang, and Mitsufuji, Yuki
- Subjects
Computer Science - Sound ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recent state-of-the-art neural audio compression models have progressively adopted residual vector quantization (RVQ). Despite this success, these models employ a fixed number of codebooks per frame, which can be suboptimal in terms of rate-distortion tradeoff, particularly in scenarios with simple input audio, such as silence. To address this limitation, we propose variable bitrate RVQ (VRVQ) for audio codecs, which allows for more efficient coding by adapting the number of codebooks used per frame. Furthermore, we propose a gradient estimation method for the non-differentiable masking operation that transforms from the importance map to the binary importance mask, improving model training via a straight-through estimator. We demonstrate that the proposed training framework achieves superior results compared to the baseline method and shows further improvement when applied to the current state-of-the-art codec., Comment: Accepted at NeurIPS 2024 Workshop on Machine Learning and Compression
- Published
- 2024
49. A Stretchable Electrostatic Tactile Surface
- Author
-
Takayanagi, Naoto, Matsuhisa, Naoji, Hashimoto, Yuki, and Sugiura, Yuta
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Tactile sensation is essential for humans to recognize objects. Various devices have been developed in the past for tactile presentation by electrostatic force, which are easy to configure devices, but there is currently no such device that features stretchability. Considering that the device is worn over the joints of a human body or robot, it is extremely important that the device itself be stretchable. In this study, we propose a stretchable electrostatic tactile surface comprising a stretchable transparent electrode and a stretchable insulating film that can be stretched to a maximum of 50%. This means that when attached to the human body, this surface can respond to the expansion and contraction that occur due to joint movements. This surface can also provide tactile information in response to deformation such as pushing and pulling. As a basic investigation, we measured the lower limit of voltage that can be perceived by changing the configuration of the surface and evaluated the states of stretching and contraction. We also investigated and modeled the relationship between the voltage and the perceived intensity., Comment: 7 pages, 9 figures
- Published
- 2024
50. Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning
- Author
-
Hiranaka, Ayano, Chen, Shang-Fu, Lai, Chieh-Hsin, Kim, Dongjun, Murata, Naoki, Shibuya, Takashi, Liao, Wei-Hsiang, Sun, Shao-Hua, and Mitsufuji, Yuki
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Human-Computer Interaction - Abstract
Controllable generation through Stable Diffusion (SD) fine-tuning aims to improve fidelity, safety, and alignment with human guidance. Existing reinforcement learning from human feedback methods usually rely on predefined heuristic reward functions or pretrained reward models built on large-scale datasets, limiting their applicability to scenarios where collecting such data is costly or difficult. To effectively and efficiently utilize human feedback, we develop a framework, HERO, which leverages online human feedback collected on the fly during model learning. Specifically, HERO features two key mechanisms: (1) Feedback-Aligned Representation Learning, an online training method that captures human feedback and provides informative learning signals for fine-tuning, and (2) Feedback-Guided Image Generation, which involves generating images from SD's refined initialization samples, enabling faster convergence towards the evaluator's intent. We demonstrate that HERO is 4x more efficient in online feedback for body part anomaly correction compared to the best existing method. Additionally, experiments show that HERO can effectively handle tasks like reasoning, counting, personalization, and reducing NSFW content with only 0.5K online feedback.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.