Author: "Zhilin, A." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhilin, A."' showing total 26,667 results

Start Over Author "Zhilin, A."

26,667 results on '"Zhilin, A."'

1. Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering

Author: Zhang, Zhilin, Wang, Jie, Zhu, Ruiqi, and Gong, Xiaoliang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical Visual Question Answering (MedVQA) has gained increasing attention at the intersection of computer vision and natural language processing. Its capability to interpret radiological images and deliver precise answers to clinical inquiries positions MedVQA as a valuable tool for supporting diagnostic decision-making for physicians and alleviating the workload on radiologists. While recent approaches focus on using unified pre-trained large models for multi-modal fusion like cross-modal Transformers, research on more efficient fusion methods remains relatively scarce within this discipline. In this paper, we introduce a novel fusion model that integrates Orthogonality loss, Multi-head attention and Bilinear Attention Network (OMniBAN) to achieve high computational efficiency and strong performance without the need for pre-training. We conduct comprehensive experiments and clarify aspects of how to enhance bilinear attention fusion to achieve performance comparable to that of large models. Experimental results show that OMniBAN outperforms traditional models on key MedVQA benchmarks while maintaining a lower computational cost, which indicates its potential for efficient clinical application in radiology and pathology image question answering.
Published: 2024

2. Diverging Preferences: When do Annotators Disagree and do Models Know?

Author: Zhang, Michael JQ, Wang, Zhilin, Hwang, Jena D., Dong, Yi, Delalleau, Olivier, Choi, Yejin, Choi, Eunsol, Ren, Xiang, and Pyatkin, Valentina
Subjects: Computer Science - Computation and Language
Abstract: We examine diverging preferences in human-labeled preference datasets. We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes -- task underspecification, response style, refusals, and annotation errors. We find that the majority of disagreements are in opposition with standard reward modeling approaches, which are designed with the assumption that annotator disagreement is noise. We then explore how these findings impact two areas of LLM development: reward modeling and evaluation. In our experiments, we demonstrate how standard reward modeling methods, like the Bradley-Terry model, fail to differentiate whether a given preference judgment is the result of unanimous agreement among annotators or the majority opinion among diverging user preferences. We also find that these tendencies are also echoed by popular LLM-as-Judge evaluation methods, which consistently identify a winning response in cases of diverging preferences. These findings highlight remaining challenges in LLM evaluations, which are greatly influenced by divisive features like response style, and in developing pluralistically aligned LLMs. To address these issues, we develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
Published: 2024

3. Federated Neural Nonparametric Point Processes

Author: Chen, Hui, Liu, Hengyu, Li, Yaqiong, Fan, Xuhui, Zhao, Zhilin, Zhou, Feng, Quinn, Christopher John, and Cao, Longbing
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Temporal point processes (TPPs) are effective for modeling event occurrences over time, but they struggle with sparse and uncertain events in federated systems, where privacy is a major concern. To address this, we propose \textit{FedPP}, a Federated neural nonparametric Point Process model. FedPP integrates neural embeddings into Sigmoidal Gaussian Cox Processes (SGCPs) on the client side, which is a flexible and expressive class of TPPs, allowing it to generate highly flexible intensity functions that capture client-specific event dynamics and uncertainties while efficiently summarizing historical records. For global aggregation, FedPP introduces a divergence-based mechanism that communicates the distributions of SGCPs' kernel hyperparameters between the server and clients, while keeping client-specific parameters local to ensure privacy and personalization. FedPP effectively captures event uncertainty and sparsity, and extensive experiments demonstrate its superior performance in federated settings, particularly with KL divergence and Wasserstein distance-based global aggregation.
Published: 2024

4. HelpSteer2-Preference: Complementing Ratings with Preferences

Author: Wang, Zhilin, Bukharin, Alexander, Delalleau, Olivier, Egert, Daniel, Shen, Gerald, Zeng, Jiaqi, Kuchaiev, Oleksii, and Dong, Yi
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Reward models are critical for aligning models to follow instructions, and are typically trained following one of two popular paradigms: Bradley-Terry style or Regression style. However, there is a lack of evidence that either approach is better than the other, when adequately matched for data. This is primarily because these approaches require data collected in different (but incompatible) formats, meaning that adequately matched data is not available in existing public datasets. To tackle this problem, we release preference annotations (designed for Bradley-Terry training) to complement existing ratings (designed for Regression style training) in the HelpSteer2 dataset. To improve data interpretability, preference annotations are accompanied with human-written justifications. Using this data, we conduct the first head-to-head comparison of Bradley-Terry and Regression models when adequately matched for data. Based on insights derived from such a comparison, we propose a novel approach to combine Bradley-Terry and Regression reward modeling. A Llama-3.1-70B-Instruct model tuned with this approach scores 94.1 on RewardBench, emerging top of more than 140 reward models as of 1 Oct 2024. We also demonstrate the effectiveness of this reward model at aligning models to follow instructions in RLHF. We open-source this dataset (CC-BY-4.0 license) at https://huggingface.co/datasets/nvidia/HelpSteer2 and openly release the trained Reward Model at https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Reward, Comment: 26 pages, 3 figures
Published: 2024

5. Verifying Randomized Consensus Protocols with Common Coins

Author: Gao, Song, Zhan, Bohua, Wu, Zhilin, and Zhang, Lijun
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Formal Languages and Automata Theory
Abstract: Randomized fault-tolerant consensus protocols with common coins are widely used in cloud computing and blockchain platforms. Due to their fundamental role, it is vital to guarantee their correctness. Threshold automata is a formal model designed for the verification of fault-tolerant consensus protocols. It has recently been extended to probabilistic threshold automata (PTAs) to verify randomized fault-tolerant consensus protocols. Nevertheless, PTA can only model randomized consensus protocols with local coins. In this work, we extend PTA to verify randomized fault-tolerant consensus protocols with common coins. Our main idea is to add a process to simulate the common coin (the so-called common-coin process). Although the addition of the common-coin process destroys the symmetry and poses technical challenges, we show how PTA can be adapted to overcome the challenges. We apply our approach to verify the agreement, validity and almost-sure termination properties of 8 randomized consensus protocols with common coins., Comment: This paper is accepted and presented at DSN 2024
Published: 2024
Full Text: View/download PDF

6. Relative Dolbeault Geometric Langlands via the Regular Quotient

Author: Hameister, Thomas, Luo, Zhilin, and Morrissey, Benedict
Subjects: Mathematics - Algebraic Geometry
Abstract: Let $X = G/H$ be an affine homogeneous spherical variety with abelian regular centralizer and no type N roots. In this paper, we formulate a relative geometric Langlands conjecture in the Dolbeault setting for $M = T^*X$. More concretely, we conjecture a Fourier-Mukai duality between the Dolbeault period sheaf and a sheaf whose construction closely resembles the Dirac-Higgs bundle of a polarization of the dual symplectic representation of Ben-Zvi, Sakellaridis, and Venkatesh. These conjectures can be seen as a generalization of Hitchin's conjectural duality of branes for symmetric spaces. We verify these conjectures in several cases, including the Friedberg-Jacquet case $X = GL_{2n}/GL_n\times GL_n$, the Jacquet-Ichino case $X = PGL_2^3/PGL_2$, the Rankin-Selberg case $X = GL_n\times GL_{n+1}/GL_n$, and the Gross-Prasad case $X = SO_n\times SO_{n+1}/SO_n$. Our main tool is the theory of the regular quotient, which was described in the context of symmetric spaces in [HM24]., Comment: Comments are welcome!
Published: 2024

7. Nonabelian Fourier Kernels on $\mathrm{SL}_2$ and $\mathrm{GL}_2$

Author: Luo, Zhilin and Chau, Ngo Bao
Subjects: Mathematics - Number Theory, Mathematics - Representation Theory
Abstract: For $G=\mathrm{SL}_2$ or $\mathrm{GL}_2$, we present explicit formulas for the nonabelian Fourier kernels on $G$, as conjectured by A. Braverman and D. Kazhdan. Additionally, we furnish explicit formulas for the orbital Hankel transform on $G$, a topic investigated by the second author, and provide an explicit formula for the stable orbital integral of the basic function. These results are applicable to local fields with residual characteristics other than two., Comment: 106pp. Any comments are very welcome
Published: 2024

8. Increased resistance to photooxidation in Dion-Jacobson lead halide perovskites -- implication for perovskite device stability

Author: Ren, Zhilin, Ovčar, Juraj, Leung, Tik Lun, He, Yanling, Li, Yin, Li, Dongyang, Qin, Xinshun, Mo, Hongbo, Yuan, Zhengtian, Bing, Jueming, Bucknall, Martin P., Grisanti, Luca, Ali, Muhammad Umair, Bai, Peng, Zhu, Tao, Syed, Ali Ashger, Lin, Jingyang, Wang, Jingbo, Abdul-Khaleed, Sun, Wenting, Li, Gangyue, Li, Gang, Ng, Alan Man Ching, Ho-Baillie, Anita W. Y., Lončarić, Ivor, Popović, Jasminka, and Djurišić, Aleksandra B.
Subjects: Condensed Matter - Materials Science, Physics - Applied Physics
Abstract: 2D metal halide perovskites have enabled significant stability improvements in perovskite devices, particularly in resistance to moisture. However, some 2D perovskites are even more susceptible to photooxidation compared to 3D perovskites. This is particularly true for more commonly investigated Ruddlesden-Popper (RP) perovskites that exhibit increased susceptibility to photoinduced degradation compared to Dion-Jacobson (DJ) perovskites. Comparisons between different RP and DJ perovskites reveal that this phenomenon cannot be explained by commonly proposed differences in superoxide ion generation, interlayer distance and lattice structural rigidity differences. Instead, the resistance to photooxidation of DJ perovskites can be attributed to decreased likelihood of double deprotonation events (compared to single deprotonation events in RP perovskites) required for the loss of organic cations and the perovskite decomposition. Consequently, DJ perovskites are less susceptible to oxidative degradation (both photo- and electrochemically induced), which leads to improved operational stability of solar cells based on these materials., Comment: Main text: 19 pages, 6 figures, supplementary information: 62 pages, 47 figures
Published: 2024

9. Towards Building a Robust Knowledge Intensive Question Answering Model with Large Language Models

Author: Hong, Xingyun, Shao, Yan, Wang, Zhilin, Duan, Manni, and Xiongnan, Jin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The development of LLMs has greatly enhanced the intelligence and fluency of question answering, while the emergence of retrieval enhancement has enabled models to better utilize external information. However, the presence of noise and errors in retrieved information poses challenges to the robustness of LLMs. In this work, to evaluate the model's performance under multiple interferences, we first construct a dataset based on machine reading comprehension datasets simulating various scenarios, including critical information absence, noise, and conflicts. To address the issue of model accuracy decline caused by noisy external information, we propose a data augmentation-based fine-tuning method to enhance LLM's robustness against noise. Additionally, contrastive learning approach is utilized to preserve the model's discrimination capability of external information. We have conducted experiments on both existing LLMs and our approach, the results are evaluated by GPT-4, which indicates that our proposed methods improve model robustness while strengthening the model's discrimination capability., Comment: This paper has been accepted by NLPCC-2024
Published: 2024

10. Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras

Author: Liao, Zimu, Chen, Siyan, Fu, Rong, Wang, Yi, Su, Zhongling, Luo, Hao, Ma, Li, Xu, Linning, Dai, Bo, Li, Hengjie, Pei, Zhilin, and Zhang, Xingcheng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Recently, 3D Gaussian Splatting (3DGS) has garnered attention for its high fidelity and real-time rendering. However, adapting 3DGS to different camera models, particularly fisheye lenses, poses challenges due to the unique 3D to 2D projection calculation. Additionally, there are inefficiencies in the tile-based splatting, especially for the extreme curvature and wide field of view of fisheye lenses, which are crucial for its broader real-life applications. To tackle these challenges, we introduce Fisheye-GS.This innovative method recalculates the projection transformation and its gradients for fisheye cameras. Our approach can be seamlessly integrated as a module into other efficient 3D rendering methods, emphasizing its extensibility, lightweight nature, and modular design. Since we only modified the projection component, it can also be easily adapted for use with different camera models. Compared to methods that train after undistortion, our approach demonstrates a clear improvement in visual quality.
Published: 2024

11. Fourth-order compact finite difference schemes for solving biharmonic equations with Dirichlet boundary conditions

Author: Pan, Kejia, Li, Jin, Li, Zhilin, and Fu, Kang
Subjects: Mathematics - Numerical Analysis
Abstract: In this study, we propose a genuine fourth-order compact finite difference scheme for solving biharmonic equations with Dirichlet boundary conditions in both two and three dimensions. In the 2D case, we build upon the high-order compact (HOC) schemes for flux-type boundary conditions originally developed by Zhilin Li and Kejia Pan [SIAM J. Sci. Comput., 45 (2023), pp. A646-A674] to construct a high order compact discretization for coupled boundary conditions. When considering the 3D case, we modify carefully designed undetermined coefficient methods of Li and Pan to derive the finite difference approximations of coupled boundary conditions. The resultant FD discretization maintains the global fourth order convergence and compactness. Unlike the very popular Stephenson method, the number of unknows do not increase with dimensions. Besides, it is noteworthy that the condition number of the coefficient matrix increases at a rate of $O(h^{-2})$ in both 2D and 3D. We also validate the performance of the proposed genuine HOC methods through nontrivial examples., Comment: 14 pages, 2 figures
Published: 2024

12. FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

Author: Feng, Guofeng, Chen, Siyan, Fu, Rong, Liao, Zimu, Wang, Yi, Liu, Tao, Pei, Zhilin, Li, Hengjie, Zhang, Xingcheng, and Dai, Bo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper includes a suite of optimization strategies, encompassing redundancy elimination, efficient pipelining, refined control and scheduling mechanisms, and memory access optimizations, all of which are meticulously integrated to amplify the performance of the rasterization process. An extensive evaluation of FlashGS' performance has been conducted across a diverse spectrum of synthetic and real-world large-scale scenes, encompassing a variety of image resolutions. The empirical findings demonstrate that FlashGS consistently achieves an average 4x acceleration over mobile consumer GPUs, coupled with reduced memory consumption. These results underscore the superior performance and resource optimization capabilities of FlashGS, positioning it as a formidable tool in the domain of 3D rendering.
Published: 2024

13. PackMamba: Efficient Processing of Variable-Length Sequences in Mamba training

Author: Xu, Haoran, Liu, Ziqian, Fu, Rong, Su, Zhongling, Wang, Zerui, Cai, Zheng, Pei, Zhilin, and Zhang, Xingcheng
Subjects: Computer Science - Machine Learning
Abstract: With the evolution of large language models, traditional Transformer models become computationally demanding for lengthy sequences due to the quadratic growth in computation with respect to the sequence length. Mamba, emerging as a groundbreaking architecture in the field of generative AI, demonstrates remarkable proficiency in handling elongated sequences with reduced computational and memory complexity. Nevertheless, the existing training framework of Mamba presents inefficiency with variable-length sequence inputs. Either single-sequence training results in low GPU utilization, or batched processing of variable-length sequences to a maximum length incurs considerable memory and computational overhead. To address this problem, we analyze the performance of bottleneck operators in Mamba under diverse tensor shapes and proposed PackMamba, a high-throughput Mamba that efficiently handles variable-length sequences. Diving deep into state-space models (SSMs), we modify the parallel operators to avoid passing information between individual sequences while maintaining high performance. Experimental results on an NVIDIA A100 GPU demonstrate throughput exceeding the baseline single-sequence processing scheme: 3.06x speedup on the 1.4B model and 2.62x on the 2.8B model.
Published: 2024

14. Adaptive time-stepping for aggregation-shattering kinetics

Author: Matveev, Sergey A., Zhilin, Viktor, and Smirnov, Alexander P.
Subjects: Mathematics - Numerical Analysis, Condensed Matter - Statistical Mechanics, F.2.1, G.1.7
Abstract: We propose an experimental study of adaptive time-stepping methods for efficient modeling of the aggregation-fragmentation kinetics. Precise modeling of this phenomena usually requires utilization of the large systems of nonlinear ordinary differential equations and intensive computations. We concentrate on performance of three explicit Runge-Kutta time-integration methods and provide simulations for two types of problems: finding of equilibrium solutions and simulations for kinetics with periodic solutions. The first class of problems may be analyzed through the relaxation of the solution to the stationary state after large time. In this case, the adaptive time-stepping may help to reach it using big steps reducing cost of the calculations without loss of accuracy. In the second case, the problem becomes numerically unstable at certain points of the phase space and may require tiny steps making the simulations very time-consuming. Adaptive criteria allows to increase the steps for most of points and speedup simulations significantly., Comment: 9 pages, 3 figures, 3 tables
Published: 2024

15. Contrastive Adversarial Training for Unsupervised Domain Adaptation

Author: Chen, Jiahong, Zhang, Zhilin, Li, Lucy, Shahrasbi, Behzad, and Mishra, Arjun
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Domain adversarial training has shown its effective capability for finding domain invariant feature representations and been successfully adopted for various domain adaptation tasks. However, recent advances of large models (e.g., vision transformers) and emerging of complex adaptation scenarios (e.g., DomainNet) make adversarial training being easily biased towards source domain and hardly adapted to target domain. The reason is twofold: relying on large amount of labelled data from source domain for large model training and lacking of labelled data from target domain for fine-tuning. Existing approaches widely focused on either enhancing discriminator or improving the training stability for the backbone networks. Due to unbalanced competition between the feature extractor and the discriminator during the adversarial training, existing solutions fail to function well on complex datasets. To address this issue, we proposed a novel contrastive adversarial training (CAT) approach that leverages the labeled source domain samples to reinforce and regulate the feature generation for target domain. Typically, the regulation forces the target feature distribution being similar to the source feature distribution. CAT addressed three major challenges in adversarial learning: 1) ensure the feature distributions from two domains as indistinguishable as possible for the discriminator, resulting in a more robust domain-invariant feature generation; 2) encourage target samples moving closer to the source in the feature space, reducing the requirement for generalizing classifier trained on the labeled source domain to unlabeled target domain; 3) avoid directly aligning unpaired source and target samples within mini-batch. CAT can be easily plugged into existing models and exhibits significant performance improvements.
Published: 2024

16. Transcranial low-level laser stimulation in near infrared-II region for brain safety and protection

Author: Li, Zhilin, Zhao, Yongheng, Hu, Yiqing, Li, Yang, Zhang, Keyao, Gao, Zhibing, Tan, Lirou, Liu, Hanli, Li, Xiaoli, Cao, Aihua, Cui, Zaixu, and Zhao, Chenguang
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: Background: The use of near-infrared lasers for transcranial photobiomodulation (tPBM) offers a non-invasive method for influencing brain activity and is beneficial for various neurological conditions. Objective: To investigate the safety and neuroprotective properties of tPBM using near-infrared (NIR)-II laser stimulation. Methods: We conducted thirteen experiments involving multidimensional and quantitative methods and measured serum neurobiomarkers, performed electroencephalogram (EEG) and magnetic resonance imaging (MRI) scans, assessed executive functions, and collected a subjective questionnaire. Results: Significant reductions (n=15) in neuron specific enolase (NSE) levels were observed after treatment, indicating neuroprotective effects. No structural or functional brain abnormalities were observed, confirming the safety of tPBM. Additionally, cognitive and executive functions were not impaired, with participants' feedback indicating minimal discomfort. Conclusions: Our data indicate that NIR-II tPBM is safe with specific parameters, highlighting its potential for brain protection.
Published: 2024

17. Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation

Author: Zhu, Zhilin, Hong, Xiaopeng, Ma, Zhiheng, Zhuang, Weijun, Ma, Yaohui, Dai, Yong, and Wang, Yaowei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Continual Test-Time Adaptation (CTTA) involves adapting a pre-trained source model to continually changing unsupervised target domains. In this paper, we systematically analyze the challenges of this task: online environment, unsupervised nature, and the risks of error accumulation and catastrophic forgetting under continual domain shifts. To address these challenges, we reshape the online data buffering and organizing mechanism for CTTA. We propose an uncertainty-aware buffering approach to identify and aggregate significant samples with high certainty from the unsupervised, single-pass data stream. Based on this, we propose a graph-based class relation preservation constraint to overcome catastrophic forgetting. Furthermore, a pseudo-target replay objective is used to mitigate error accumulation. Extensive experiments demonstrate the superiority of our method in both segmentation and classification CTTA tasks. Code is available at https://github.com/z1358/OBAO., Comment: This is the preprint version of our paper and supplemental material to appear in ECCV 2024
Published: 2024

18. Data, Data Everywhere: A Guide for Pretraining Dataset Construction

Author: Parmar, Jupinder, Prabhumoye, Shrimai, Jennings, Joseph, Liu, Bo, Jhunjhunwala, Aastha, Wang, Zhilin, Patwary, Mostofa, Shoeybi, Mohammad, and Catanzaro, Bryan
Subjects: Computer Science - Computation and Language
Abstract: The impressive capabilities of recent language models can be largely attributed to the multi-trillion token pretraining datasets that they are trained on. However, model developers fail to disclose their construction methodology which has lead to a lack of open information on how to develop effective pretraining sets. To address this issue, we perform the first systematic study across the entire pipeline of pretraining set construction. First, we run ablations on existing techniques for pretraining set development to identify which methods translate to the largest gains in model accuracy on downstream evaluations. Then, we categorize the most widely used data source, web crawl snapshots, across the attributes of toxicity, quality, type of speech, and domain. Finally, we show how such attribute information can be used to further refine and improve the quality of a pretraining set. These findings constitute an actionable set of steps that practitioners can use to develop high quality pretraining sets., Comment: Accepted as an oral presentation at EMNLP 2024
Published: 2024

19. Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

Author: Liang, Quanmin, Huang, Zhilin, Zheng, Xiawu, Yang, Feidiao, Peng, Jun, Huang, Kai, and Tian, Yonghong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Current Event Stream Super-Resolution (ESR) methods overlook the redundant and complementary information present in positive and negative events within the event stream, employing a direct mixing approach for super-resolution, which may lead to detail loss and inefficiency. To address these issues, we propose an efficient Recursive Multi-Branch Information Fusion Network (RMFNet) that separates positive and negative events for complementary information extraction, followed by mutual supplementation and refinement. Particularly, we introduce Feature Fusion Modules (FFM) and Feature Exchange Modules (FEM). FFM is designed for the fusion of contextual information within neighboring event streams, leveraging the coupling relationship between positive and negative events to alleviate the misleading of noises in the respective branches. FEM efficiently promotes the fusion and exchange of information between positive and negative branches, enabling superior local information enhancement and global information complementation. Experimental results demonstrate that our approach achieves over 17% and 31% improvement on synthetic and real datasets, accompanied by a 2.3X acceleration. Furthermore, we evaluate our method on two downstream event-driven applications, \emph{i.e.}, object recognition and video reconstruction, achieving remarkable results that outperform existing methods. Our code and Supplementary Material are available at https://github.com/Lqm26/RMFNet.
Published: 2024
Full Text: View/download PDF

20. Nemotron-4 340B Technical Report

Author: Nvidia, Adler, Bo, Agarwal, Niket, Aithal, Ashwath, Anh, Dong H., Bhattacharya, Pallab, Brundyn, Annika, Casper, Jared, Catanzaro, Bryan, Clay, Sharon, Cohen, Jonathan, Das, Sirshak, Dattagupta, Ayush, Delalleau, Olivier, Derczynski, Leon, Dong, Yi, Egert, Daniel, Evans, Ellie, Ficek, Aleksander, Fridman, Denys, Ghosh, Shaona, Ginsburg, Boris, Gitman, Igor, Grzegorzek, Tomasz, Hero, Robert, Huang, Jining, Jawa, Vibhu, Jennings, Joseph, Jhunjhunwala, Aastha, Kamalu, John, Khan, Sadaf, Kuchaiev, Oleksii, LeGresley, Patrick, Li, Hui, Liu, Jiwei, Liu, Zihan, Long, Eileen, Mahabaleshwarkar, Ameya Sunil, Majumdar, Somshubra, Maki, James, Martinez, Miguel, de Melo, Maer Rodrigues, Moshkov, Ivan, Narayanan, Deepak, Narenthiran, Sean, Navarro, Jesus, Nguyen, Phong, Nitski, Osvald, Noroozi, Vahid, Nutheti, Guruprasad, Parisien, Christopher, Parmar, Jupinder, Patwary, Mostofa, Pawelec, Krzysztof, Ping, Wei, Prabhumoye, Shrimai, Roy, Rajarshi, Saar, Trisha, Sabavat, Vasanth Rao Naik, Satheesh, Sanjeev, Scowcroft, Jane Polak, Sewall, Jason, Shamis, Pavel, Shen, Gerald, Shoeybi, Mohammad, Sizer, Dave, Smelyanskiy, Misha, Soares, Felipe, Sreedhar, Makesh Narsimhan, Su, Dan, Subramanian, Sandeep, Sun, Shengyang, Toshniwal, Shubham, Wang, Hao, Wang, Zhilin, You, Jiaxuan, Zeng, Jiaqi, Zhang, Jimmy, Zhang, Jing, Zhang, Vivienne, Zhang, Yian, and Zhu, Chen
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.
Published: 2024

21. Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling

Author: Jiang, Jianan, Tang, Hao, Jiang, Zhilin, Yu, Weiren, and Wu, Di
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) aims to minimize the distance between sketches and corresponding images in the embedding space. However, scalability is hindered by the growing complexity of solutions, mainly due to the abstract nature of fine-grained sketches. In this paper, we propose an effective approach to narrow the gap between the two domains. It mainly facilitates unified mutual information sharing both intra- and inter-samples, rather than treating them as a single feature alignment problem between modalities. Specifically, our approach includes: (i) Employing dual weight-sharing networks to optimize alignment within the sketch and image domain, which also effectively mitigates model learning saturation issues. (ii) Introducing an objective optimization function based on contrastive loss to enhance the model's ability to align features in both intra- and inter-samples. (iii) Presenting a self-supervised Multi-Scale Token Recycling (MSTR) Module featured by recycling discarded patch tokens in multi-scale features, further enhancing representation capability and retrieval performance. Our framework achieves excellent results on CNN- and ViT-based backbones. Extensive experiments demonstrate its superiority over existing methods. We also introduce Cloths-V1, the first professional fashion sketch-image dataset, utilized to validate our method and will be beneficial for other applications
Published: 2024

22. Chip-scale generation of 60-mode continuous-variable cluster states

Author: Wang, Ze, Li, Kangkang, Wang, Yue, Zhou, Xin, Cheng, Yinke, Jing, Boxuan, Sun, Fengxiao, Li, Jincheng, Li, Zhilin, Gong, Qihuang, He, Qiongyi, Li, Bei-Bei, and Yang, Qi-Fan
Subjects: Physics - Optics, Quantum Physics
Abstract: Increasing the number of entangled entities is crucial for achieving exponential computational speedups and secure quantum networks. Despite recent progress in generating large-scale entanglement through continuous-variable (CV) cluster states, translating these technologies to photonic chips has been hindered by decoherence, limiting the number of entangled entities to 8. Here, we demonstrate 60-mode CVcluster states in a chip-based optical microresonator pumped by chromatic lasers. Resonantly-enhanced four-wave mixing processes establish entanglement between equidistant spectral quantum modes (qumodes), forming a quantum analogue of optical frequency combs. Decoherence is minimized to achieve unprecedented two-mode raw squeezing (>3 dB) from a chip. Using bichromatic and trichromatic pump lasers, we realize one- and two-dimensional cluster states with up to 60 qumodes. Our work provides a compact and scalable platform for constructing large-scale entangled quantum resources, which are appealing for performing computational and communicational tasks with quantum advantages.
Published: 2024

23. HelpSteer2: Open-source dataset for training top-performing reward models

Author: Wang, Zhilin, Dong, Yi, Delalleau, Olivier, Zeng, Jiaqi, Shen, Gerald, Egert, Daniel, Zhang, Jimmy J., Sreedhar, Makesh Narsimhan, and Kuchaiev, Oleksii
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling. Methods that distil preference data from proprietary LLMs such as GPT-4 have restrictions on commercial usage imposed by model providers. To improve upon both generated responses and attribute labeling quality, we release HelpSteer2, a permissively licensed preference dataset (CC-BY-4.0). Using a powerful internal base model trained on HelpSteer2, we are able to achieve the SOTA score (92.0%) on Reward-Bench's primary dataset, outperforming currently listed open and proprietary models, as of June 12th, 2024. Notably, HelpSteer2 consists of only ten thousand response pairs, an order of magnitude fewer than existing preference datasets (e.g., HH-RLHF), which makes it highly efficient for training reward models. Our extensive experiments demonstrate that reward models trained with HelpSteer2 are effective in aligning LLMs. In particular, we propose SteerLM 2.0, a model alignment approach that can effectively make use of the rich multi-attribute score predicted by our reward models. HelpSteer2 is available at https://huggingface.co/datasets/nvidia/HelpSteer2 and code is available at https://github.com/NVIDIA/NeMo-Aligner
Published: 2024

24. Low-Rank Similarity Mining for Multimodal Dataset Distillation

Author: Xu, Yue, Lin, Zhilin, Qiu, Yusong, Lu, Cewu, and Li, Yong-Lu
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Though dataset distillation has witnessed rapid development in recent years, the distillation of multimodal data, e.g., image-text pairs, poses unique and under-explored challenges. Unlike unimodal data, image-text contrastive learning (ITC) data lack inherent categorization and should instead place greater emphasis on modality correspondence. In this work, we propose Low-Rank Similarity Mining (LoRS) for multimodal dataset distillation, that concurrently distills a ground truth similarity matrix with image-text pairs, and leverages low-rank factorization for efficiency and scalability. The proposed approach brings significant improvement to the existing algorithms, marking a significant contribution to the field of visual-language dataset distillation. We advocate adopting LoRS as a foundational synthetic data setup for image-text dataset distillation. Our code is available at https://github.com/silicx/LoRS_Distill., Comment: Accepted at ICML 2024
Published: 2024

25. Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation

Author: Peng, Zelin, Xu, Zhengqin, Zeng, Zhilin, Wang, Yaoming, Xie, Lingxi, Tian, Qi, and Shen, Wei
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Open-vocabulary semantic segmentation seeks to label each pixel in an image with arbitrary text descriptions. Vision-language foundation models, especially CLIP, have recently emerged as powerful tools for acquiring open-vocabulary capabilities. However, fine-tuning CLIP to equip it with pixel-level prediction ability often suffers three issues: 1) high computational cost, 2) misalignment between the two inherent modalities of CLIP, and 3) degraded generalization ability on unseen categories. To address these issues, we propose H-CLIP a symmetrical parameter-efficient fine-tuning (PEFT) strategy conducted in hyperspherical space for both of the two CLIP modalities. Specifically, the PEFT strategy is achieved by a series of efficient block-diagonal learnable transformation matrices and a dual cross-relation communication module among all learnable matrices. Since the PEFT strategy is conducted symmetrically to the two CLIP modalities, the misalignment between them is mitigated. Furthermore, we apply an additional constraint to PEFT on the CLIP text encoder according to the hyperspherical energy principle, i.e., minimizing hyperspherical energy during fine-tuning preserves the intrinsic structure of the original parameter space, to prevent the destruction of the generalization ability offered by the CLIP text encoder. Extensive evaluations across various benchmarks show that H-CLIP achieves new SOTA open-vocabulary semantic segmentation results while only requiring updating approximately 4% of the total parameters of CLIP.
Published: 2024

26. AIGB: Generative Auto-bidding via Conditional Diffusion Modeling

Author: Guo, Jiayan, Huo, Yusen, Zhang, Zhilin, Wang, Tianyu, Yu, Chuan, Xu, Jian, Zhang, Yan, and Zheng, Bo
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computational Engineering, Finance, and Science
Abstract: Auto-bidding plays a crucial role in facilitating online advertising by automatically providing bids for advertisers. Reinforcement learning (RL) has gained popularity for auto-bidding. However, most current RL auto-bidding methods are modeled through the Markovian Decision Process (MDP), which assumes the Markovian state transition. This assumption restricts the ability to perform in long horizon scenarios and makes the model unstable when dealing with highly random online advertising environments. To tackle this issue, this paper introduces AI-Generated Bidding (AIGB), a novel paradigm for auto-bidding through generative modeling. In this paradigm, we propose DiffBid, a conditional diffusion modeling approach for bid generation. DiffBid directly models the correlation between the return and the entire trajectory, effectively avoiding error propagation across time steps in long horizons. Additionally, DiffBid offers a versatile approach for generating trajectories that maximize given targets while adhering to specific constraints. Extensive experiments conducted on the real-world dataset and online A/B test on Alibaba advertising platform demonstrate the effectiveness of DiffBid, achieving 2.81% increase in GMV and 3.36% increase in ROI., Comment: Accepted by KDD 2024
Published: 2024

27. ParamReL: Learning Parameter Space Representation via Progressively Encoding Bayesian Flow Networks

Author: Wu, Zhangkai, Fan, Xuhui, Li, Jin, Zhao, Zhilin, Chen, Hui, and Cao, Longbing
Subjects: Computer Science - Machine Learning
Abstract: The recently proposed Bayesian Flow Networks~(BFNs) show great potential in modeling parameter spaces, offering a unified strategy for handling continuous, discretized, and discrete data. However, BFNs cannot learn high-level semantic representation from the parameter space since {common encoders, which encode data into one static representation, cannot capture semantic changes in parameters.} This motivates a new direction: learning semantic representations hidden in the parameter spaces to characterize mixed-typed noisy data. {Accordingly, we propose a representation learning framework named ParamReL, which operates in the parameter space to obtain parameter-wise latent semantics that exhibit progressive structures. Specifically, ParamReL proposes a \emph{self-}encoder to learn latent semantics directly from parameters, rather than from observations. The encoder is then integrated into BFNs, enabling representation learning with various formats of observations. Mutual information terms further promote the disentanglement of latent semantics and capture meaningful semantics simultaneously.} We illustrate {conditional generation and reconstruction} in ParamReL via expanding BFNs, and extensive {quantitative} experimental results demonstrate the {superior effectiveness} of ParamReL in learning parameter representation.
Published: 2024

28. Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reports

Author: Guo, Guangyu, Yao, Jiawen, Xia, Yingda, Mok, Tony C. W., Zheng, Zhilin, Han, Junwei, Lu, Le, Zhang, Dingwen, Zhou, Jian, and Zhang, Ling
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: The absence of adequately sufficient expert-level tumor annotations hinders the effectiveness of supervised learning based opportunistic cancer screening on medical imaging. Clinical reports (that are rich in descriptive textual details) can offer a "free lunch'' supervision information and provide tumor location as a type of weak label to cope with screening tasks, thus saving human labeling workloads, if properly leveraged. However, predicting cancer only using such weak labels can be very changeling since tumors are usually presented in small anatomical regions compared to the whole 3D medical scans. Weakly semi-supervised learning (WSSL) utilizes a limited set of voxel-level tumor annotations and incorporates alongside a substantial number of medical images that have only off-the-shelf clinical reports, which may strike a good balance between minimizing expert annotation workload and optimizing screening efficacy. In this paper, we propose a novel text-guided learning method to achieve highly accurate cancer detection results. Through integrating diagnostic and tumor location text prompts into the text encoder of a vision-language model (VLM), optimization of weakly supervised learning can be effectively performed in the latent space of VLM, thereby enhancing the stability of training. Our approach can leverage clinical knowledge by large-scale pre-trained VLM to enhance generalization ability, and produce reliable pseudo tumor masks to improve cancer detection. Our extensive quantitative experimental results on a large-scale cancer dataset, including 1,651 unique patients, validate that our approach can reduce human annotation efforts by at least 70% while maintaining comparable cancer detection accuracy to competing fully supervised methods (AUC value 0.961 versus 0.966).
Published: 2024

29. Topological Weyl Altermagnetism in CrSb

Author: Li, Cong, Hu, Mengli, Li, Zhilin, Wang, Yang, Chen, Wanyu, Thiagarajan, Balasubramanian, Leandersson, Mats, Polley, Craig, Kim, Timur, Liu, Hui, Fulga, Cosma, Vergniory, Maia G., Janson, Oleg, Tjernberg, Oscar, and Brink, Jeroen van den
Subjects: Condensed Matter - Materials Science, Condensed Matter - Other Condensed Matter, Condensed Matter - Strongly Correlated Electrons
Abstract: Altermagnets constitute a novel, third fundamental class of collinear magnetic ordered materials, alongside with ferro- and antiferromagnets. They share with conventional antiferromagnets the feature of a vanishing net magnetization. At the same time they show a spin-splitting of electronic bands, just as in ferromagnets, caused by the atomic exchange interaction. On the other hand, topology has recently revolutionized our understanding of condensed matter physics, introducing new phases of matter classified by intrinsic topological order. Here we connect the worlds of altermagnetism and topology, showing that the electronic structure of the altermagnet CrSb is topological and hosts a novel Weyl semimetallic state. Using high-resolution and spin angleresolved photoemission spectroscopy, we observe a large momentum-dependent spin-splitting in CrSb, reaching up to 1 eV, that induces altermagnetic Weyl nodes with an associated magnetic quantum number. At the surface we observe their spin-polarized topological Fermi-arcs. This establishes that in altermagnets the large energy scale intrinsic to the spin-splitting - orders of magnitude larger than the relativistic spin-orbit coupling - creates its own realm of robust electronic topology., Comment: Main text: 16 pages, 4 figures. Supplementary material: 23 pages, 15 figures. Comments are welcome
Published: 2024

30. Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text

Author: Li, Yafu, Wang, Zhilin, Cui, Leyang, Bi, Wei, Shi, Shuming, and Zhang, Yue
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding context of the paraphrased text spans. Extensive experiments show that PTD models can generalize to versatile paraphrasing prompts and multiple paraphrased text spans. We release our resources at https://github.com/Linzwcs/PASTED., Comment: ACL 2024 Findings
Published: 2024

31. Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping

Author: Wu, Tianhao, Yang, Jing, Guo, Zhilin, Wan, Jingyi, Zhong, Fangcheng, and Oztireli, Cengiz
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: By equipping the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blurry reconstruction and noisy floaters under novel poses. This is because of the fundamental limitation of Gaussians and point clouds -- each Gaussian or point can only have a single directional radiance without spatial variance, therefore an unnecessarily large number of them is required to represent complicated spatially varying texture, even for simple geometry. In contrast, we propose to model the body part with a neural texture that consists of coarse and pose-dependent fine colors. To properly render the body texture for each view and pose without accurate geometry nor UV mapping, we optimize another sparse set of Gaussians as anchors that constrain the neural warping field that maps image plane coordinates to the texture space. We demonstrate that Gaussian Head & Shoulders can fit the high-frequency details on the clothed upper body with high fidelity and potentially improve the accuracy and fidelity of the head region. We evaluate our method with casual phone-captured and internet videos and show our method archives superior reconstruction quality and robustness in both self and cross reenactment tasks. To fully utilize the efficient rendering speed of Gaussian splatting, we additionally propose an accelerated inference method of our trained model without Multi-Layer Perceptron (MLP) queries and reach a stable rendering speed of around 130 FPS for any subjects., Comment: Project Page: https://gaussian-head-shoulders.netlify.app/
Published: 2024

32. Bilateral Event Mining and Complementary for Event Stream Super-Resolution

Author: Huang, Zhilin, Liang, Quanmin, Yu, Yijie, Qin, Chujun, Zheng, Xiawu, Huang, Kai, Zhou, Zikun, and Yang, Wenming
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Event Stream Super-Resolution (ESR) aims to address the challenge of insufficient spatial resolution in event streams, which holds great significance for the application of event cameras in complex scenarios. Previous works for ESR often process positive and negative events in a mixed paradigm. This paradigm limits their ability to effectively model the unique characteristics of each event and mutually refine each other by considering their correlations. In this paper, we propose a bilateral event mining and complementary network (BMCNet) to fully leverage the potential of each event and capture the shared information to complement each other simultaneously. Specifically, we resort to a two-stream network to accomplish comprehensive mining of each type of events individually. To facilitate the exchange of information between two streams, we propose a bilateral information exchange (BIE) module. This module is layer-wisely embedded between two streams, enabling the effective propagation of hierarchical global information while alleviating the impact of invalid information brought by inherent characteristics of events. The experimental results demonstrate that our approach outperforms the previous state-of-the-art methods in ESR, achieving performance improvements of over 11\% on both real and synthetic datasets. Moreover, our method significantly enhances the performance of event-based downstream tasks such as object recognition and video reconstruction. Our code is available at https://github.com/Lqm26/BMCNet-ESR., Comment: Accepted to CVPR2024
Published: 2024

33. Weak coupling limit of a Brownian particle in the curl of the 2D GFF

Author: Yang, Huanyu and Yang, Zhilin
Subjects: Mathematics - Probability
Abstract: In this article, we study the weak coupling limit of the following equation in $\mathbb{R}^2$: $$dX_t^\varepsilon=\frac{\hat{\lambda}}{\sqrt{\log\frac1\varepsilon}}\omega^\varepsilon(X_t^\varepsilon)dt+\nu dB_t,\quad X_0^\varepsilon=0. $$ Here $\omega^\varepsilon=\nabla^{\perp}\rho_\varepsilon*\xi$ with $\xi$ representing the $2d$ Gaussian Free Field (GFF) and $\rho_\varepsilon$ denoting an appropriate identity. $B_t$ denotes a two-dimensional standard Brownian motion, and $\hat{\lambda},\nu>0$ are two given constants. We use the approach from \cite{Cannizzaro.2023} to show that the second moment of $X_t^\varepsilon$ under the annealed law converges to $(c(\nu)^2+2\nu^2)t$ with a precisely determined constant $c(\nu)>0$, which implies a non-trivial limit of the drift terms as $\varepsilon$ vanishes. We also prove that in this weak coupling regime, the sequence of solutions converges in distribution to $\left(\sqrt{\frac{c(\nu)^2}{2}+\nu^2}\right)\widetilde{B}_t$ as $\varepsilon$ vanishes, where $\widetilde{B}_t$ is a two-dimensional standard Brownian motion.
Published: 2024

34. Effect of realistic oscillator phase noise on the performance of cell-free networks

Author: Zhilin, Igor and Vinogradov, Evgenii
Subjects: Computer Science - Networking and Internet Architecture, Electrical Engineering and Systems Science - Signal Processing
Abstract: To keep supporting 6G requirements, the radio access infrastructure will increasingly densify. Cell-free (CF) networks offer extreme flexibility by coherently serving users with multiple Access points (APs). This paradigm requires precise and stable phase synchronization. In this article, we adapt the standardized 5G NR setup (subcarrier spacing, OFDM symbol duration and allocation) to investigate the effect of Phase Noise (PN) on the simulated performance of scalable CF networks. In contrast to the prior literature relying on the simplified model of a free-running oscillator with the Wiener process, we deploy a realistic hardware-inspired phase noise model reproducing the Local Oscillator (LO) phase drift. Our results demonstrate that even affordable LOs offer sufficient stability to ensure negligible loss of uplink Spectral Efficiency (SE) on the time scale of the standardized 5G Transmission Time Interval of 1 ms. This study substantiates the feasibility of CF networks based on 5G standards.
Published: 2024

35. NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

Author: Shen, Gerald, Wang, Zhilin, Delalleau, Olivier, Zeng, Jiaqi, Dong, Yi, Egert, Daniel, Sun, Shengyang, Zhang, Jimmy, Jain, Sahil, Taghibakhshi, Ali, Ausin, Markel Sanz, Aithal, Ashwath, and Kuchaiev, Oleksii
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment can be challenging, especially for the largest and most competent LLMs which often contain tens or hundreds of billions of parameters. We create NeMo-Aligner, a toolkit for model alignment that can efficiently scale to a thousand GPUs for training the largest open-source LLMs such as Nemotron 4 340B and Llama 3.1 405B. NeMo-Aligner comes with highly optimized and scalable implementations for major paradigms of model alignment such as: Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), SteerLM, and Self-Play Fine-Tuning (SPIN). Additionally, our toolkit supports running most of the alignment techniques in a Parameter Efficient Fine-Tuning (PEFT) setting. NeMo-Aligner is designed for extensibility, allowing support for other alignment techniques with minimal effort. It is open-sourced with Apache 2.0 License and we invite community contributions at https://github.com/NVIDIA/NeMo-Aligner, Comment: 16 pages, 4 figures, Accepted to COLM 2024
Published: 2024

36. Enhanced Visual Question Answering: A Comparative Analysis and Textual Feature Extraction Via Convolutions

Author: Zhang, Zhilin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Visual Question Answering (VQA) has emerged as a highly engaging field in recent years, attracting increasing research efforts aiming to enhance VQA accuracy through the deployment of advanced models such as Transformers. Despite this growing interest, there has been limited exploration into the comparative analysis and impact of textual modalities within VQA, particularly in terms of model complexity and its effect on performance. In this work, we conduct a comprehensive comparison between complex textual models that leverage long dependency mechanisms and simpler models focusing on local textual features within a well-established VQA framework. Our findings reveal that employing complex textual encoders is not invariably the optimal approach for the VQA-v2 dataset. Motivated by this insight, we introduce an improved model, ConvGRU, which incorporates convolutional layers to enhance the representation of question text. Tested on the VQA-v2 dataset, ConvGRU achieves better performance without substantially increasing parameter complexity.
Published: 2024

37. Analysing 2-$(v,k,2)$ designs admitting a flag-transitive almost simple automorphism group with socle $PSL(2,q)$ by means of conics and hyperovals of $PG(2,q)$

Author: Montinaro, Alessandro, Zhao, Yanwei, Zhang, Zhilin, and Zhou, Shenglin
Subjects: Mathematics - Combinatorics, 05B05, 05B25, 20B25, 51E15, 51E21
Abstract: The classification of the $2$-designs with $\lambda=2$ admitting a flag-transitive automorphism groups with socle $PSL(2,q)$ is completed by settling the two open cases in \cite{ABDT}. The result is achieved by using conics and hyperovals of $PG(2,q)$.
Published: 2024

38. Motion-aware Latent Diffusion Models for Video Frame Interpolation

Author: Huang, Zhilin, Yu, Yijie, Yang, Ling, Qin, Chujun, Zheng, Bing, Zheng, Xiawu, Zhou, Zikun, Wang, Yaowei, and Yang, Wenming
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecutive frames, and this imprecise estimation leads to blurred and visually incoherent interpolated frames. In this paper, we propose a novel diffusion framework, motion-aware latent diffusion models (MADiff), which is specifically designed for the VFI task. By incorporating motion priors between the conditional neighboring frames with the target interpolated frame predicted throughout the diffusion sampling procedure, MADiff progressively refines the intermediate outcomes, culminating in generating both visually smooth and realistic results. Extensive experiments conducted on benchmark datasets demonstrate that our method achieves state-of-the-art performance significantly outperforming existing approaches, especially under challenging scenarios involving dynamic textures with complex motion., Comment: 17 pages, 4 figures
Published: 2024

39. SLSM : An Efficient Strategy for Lazy Schema Migration on Shared-Nothing Databases

Author: Zeng, Zhilin, Li, Hui, Gao, Xiyue, Zhang, Hui, Zhang, Huiquan, and Cui, Jiangtao
Subjects: Computer Science - Databases
Abstract: By introducing intermediate states for metadata changes and ensuring that at most two versions of metadata exist in the cluster at the same time, shared-nothing databases are capable of making online, asynchronous schema changes. However, this method leads to delays in the deployment of new schemas since it requires waiting for massive data backfill. To shorten the service vacuum period before the new schema is available, this paper proposes a strategy named SLSM for zero-downtime schema migration on shared-nothing databases. Based on the lazy migration of stand-alone databases, SLSM keeps the old and new schemas with the same data distribution, reducing the node communication overhead of executing migration transactions for shared-nothing databases. Further, SLSM combines migration transactions with user transactions by extending the distributed execution plan to allow the data involved in migration transactions to directly serve user transactions, greatly reducing the waiting time of user transactions. Experiments demonstrate that our strategy can greatly reduce the latency of user transactions and improve the efficiency of data migration compared to existing schemes.
Published: 2024

40. Picotesla-sensitivity microcavity optomechanical magnetometry

Author: Hu, Zhi-Gang, Gao, Yi-Meng, Liu, Jian-Fei, Yang, Hao, Wang, Min, Lei, Yuechen, Zhou, Xin, Li, Jincheng, Cao, Xuening, Liang, Jinjing, Hu, Chao-Qun, Li, Zhilin, Lau, Yong-Chang, Cai, Jian-Wang, and Li, Bei-Bei
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: Cavity optomechanical systems have enabled precision sensing of magnetic fields, by leveraging the optical resonance-enhanced readout and mechanical resonance-enhanced response. Previous studies have successfully achieved scalable and reproducible microcavity optomechanical magnetometry (MCOM) by incorporating Terfenol-D thin films into high-quality ($Q$) factor whispering gallery mode (WGM) microcavities. However, the sensitivity was limited to 585 pT/Hz$^{1/2}$, over 20 times inferior to those using Terfenol-D particles. In this work, we propose and demonstrate a high-sensitivity and scalable MCOM approach by sputtering a FeGaB thin film onto a high-$Q$ SiO$_2$ WGM microdisk. Theoretical studies are conducted to explore the magnetic actuation constant and noise-limited sensitivity by varying the parameters of the FeGaB film and SiO$_2$ microdisk. Multiple magnetometers with different radii are fabricated and characterized. By utilizing a microdisk with a radius of 355 $\mu$m and a thickness of 1 $\mu$m, along with a FeGaB film with a radius of 330 $\mu$m and a thickness of 1.3 $\mu$m, we have achieved a remarkable peak sensitivity of 1.68 pT/Hz$^{1/2}$ at 9.52 MHz. This represents a significant improvement of over two orders of magnitude compared with previous studies employing sputtered Terfenol-D film. Notably, the magnetometer operates without a bias magnetic field, thanks to the remarkable soft magnetic properties of the FeGaB film. Furthermore, as a proof-of-concept, we have demonstrated the real-time measurement of a pulsed magnetic field simulating the corona current in a high-voltage transmission line using our developed magnetometer. These high-sensitivity magnetometers hold great potential for various applications, such as magnetic induction tomography and corona current monitoring.
Published: 2024
Full Text: View/download PDF

41. MEBS: Multi-task End-to-end Bid Shading for Multi-slot Display Advertising

Author: Gong, Zhen, Niu, Lvyin, Zhao, Yang, Xu, Miao, Zheng, Zhenzhe, Zhang, Haoqi, Zhang, Zhilin, Wu, Fan, Bai, Rongquan, Yu, Chuan, Xu, Jian, and Zheng, Bo
Subjects: Computer Science - Computer Science and Game Theory, Computer Science - Artificial Intelligence
Abstract: Online bidding and auction are crucial aspects of the online advertising industry. Conventionally, there is only one slot for ad display and most current studies focus on it. Nowadays, multi-slot display advertising is gradually becoming popular where many ads could be displayed in a list and shown as a whole to users. However, multi-slot display advertising leads to different cost-effectiveness. Advertisers have the incentive to adjust bid prices so as to win the most economical ad positions. In this study, we introduce bid shading into multi-slot display advertising for bid price adjustment with a Multi-task End-to-end Bid Shading(MEBS) method. We prove the optimality of our method theoretically and examine its performance experimentally. Through extensive offline and online experiments, we demonstrate the effectiveness and efficiency of our method, and we obtain a 7.01% lift in Gross Merchandise Volume, a 7.42% lift in Return on Investment, and a 3.26% lift in ad buy count.
Published: 2024

42. Deciding Separation Logic with Pointer Arithmetic and Inductive Definitions

Author: Su, Wanyun, Wu, Zhilin, and Sighireanu, Mihaela
Subjects: Computer Science - Logic in Computer Science
Abstract: Pointer arithmetic is widely used in low-level programs, e.g. memory allocators. The specification of such programs usually requires using pointer arithmetic inside inductive definitions to define the common data structures, e.g. heap lists in memory allocators. In this work, we investigate decision problems for SLAH, a separation logic fragment that allows pointer arithmetic inside inductive definitions, thus enabling specification of properties for programs manipulating heap lists. Pointer arithmetic inside inductive definitions is challenging for automated reasoning. We tackle this challenge and achieve decision procedures for both satisfiability and entailment of SLAH formulas. The crux of our decision procedure for satisfiability is to compute summaries of inductive definitions. We show that although the summary is naturally expressed as an existentially quantified non-linear arithmetic formula, it can actually be transformed into an equivalent linear arithmetic formula. The decision procedure for entailment, on the other hand, has to match and split the spatial atoms according to the arithmetic relation between address variables. We report on the implementation of these decision procedures and their good performance in solving problems issued from the verification of building block programs used in memory allocators.
Published: 2024

43. Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

Author: Lin, Kun-Yu, Ding, Henghui, Zhou, Jiaming, Tang, Yu-Ming, Peng, Yi-Xing, Zhao, Zhilin, Loy, Chen Change, and Zheng, Wei-Shi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining), recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to efficient and effective video learners for open-vocabulary action recognition. Inspired by that humans perform actions in diverse environments, our work delves into an intriguing question: Can CLIP-based video learners effectively generalize to video domains they have not encountered during training? To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps. The evaluation demonstrates that previous methods exhibit limited action recognition performance in unseen video domains, revealing potential challenges of the cross-domain open-vocabulary action recognition task. In this paper, we focus on one critical challenge of the task, namely scene bias, and accordingly contribute a novel scene-aware video-text alignment method. Our key idea is to distinguish video representations apart from scene-encoded text representations, aiming to learn scene-agnostic video representations for recognizing actions across domains. Extensive experiments demonstrate the effectiveness of our method. The benchmark and code will be available at https://github.com/KunyuLin/XOV-Action/.
Published: 2024

44. Targeting overexpressed antigens in glioblastoma via CAR T cells with computationally designed high-affinity protein binders

Author: Xia, Zhen, Jin, Qihan, Long, Zhilin, He, Yexuan, Liu, Fuyi, Sun, Chengfang, Liao, Jinyang, Wang, Chun, Wang, Chentong, Zheng, Jian, Zhao, Weixi, Zhang, Tianxin, Rich, Jeremy N., Zhang, Yongdeng, Cao, Longxing, and Xie, Qi
Published: 2024
Full Text: View/download PDF

45. Weighting non-IID batches for out-of-distribution detection

Author: Zhao, Zhilin and Cao, Longbing
Published: 2024
Full Text: View/download PDF

46. An improved model based on YOLOX for detection of tea sprouts in natural environment

Author: Li, Xiutong, Liu, Ruixin, Li, Yuxin, Li, Zhilin, Yan, Peng, Yu, Mei, Dong, Xuan, Yan, Jianwei, and Xie, Benliang
Published: 2024
Full Text: View/download PDF

47. Effects of High-Intensity Acoustic Waves on the Hydrogen Value of Water

Author: Zhilin, A. A.
Published: 2024
Full Text: View/download PDF

48. Study on Co2+ adsorption properties of β-cyclodextrin/graphene based on comprehensive experiments and theoretical calculation

Author: Bao, Ping, Wang, Xiaowei, Men, Jinfeng, and Hu, Zhilin
Published: 2024
Full Text: View/download PDF

49. The Role of Occipitotemporal Network for Speed-Reading: An fMRI Study

Author: Sun, Dexin, Zhang, Zhilin, Oishi, Naoya, Dai, Qi, Thuy, Dinh Ha Duy, Abe, Nobuhito, Tachibana, Jun, Funahashi, Shintaro, Wu, Jinglong, Murai, Toshiya, and Fukuyama, Hidenao
Published: 2024
Full Text: View/download PDF

50. Overcoming Liquation Cracking in AA7075 Welds via Friction Stir Processing Pre-weld Treatment: A Microstructural Approach

Author: Pirjamadi, Alireza, Movahedi, Mojtaba, Ghasemi, Ali, Peng, Zhilin, and Pouranvari, Majid
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

26,667 results on '"Zhilin, A."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources