Author: "Sheng, Lu" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Sheng, Lu"' showing total 3,708 results

Start Over Author "Sheng, Lu"

3,708 results on '"Sheng, Lu"'

1. T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

Author: Li, Lijun, Shi, Zhelun, Hu, Xuhao, Dong, Bowen, Qin, Yiran, Liu, Xihui, Sheng, Lu, and Shao, Jing
Subjects: Computer Science - Computation and Language, Computer Science - Cryptography and Security
Abstract: Text-to-image (T2I) models have rapidly advanced, enabling the generation of high-quality images from text prompts across various domains. However, these models present notable safety concerns, including the risk of generating harmful, biased, or private content. Current research on assessing T2I safety remains in its early stages. While some efforts have been made to evaluate models on specific safety dimensions, many critical risks remain unexplored. To address this gap, we introduce T2ISafety, a safety benchmark that evaluates T2I models across three key domains: toxicity, fairness, and bias. We build a detailed hierarchy of 12 tasks and 44 categories based on these three domains, and meticulously collect 70K corresponding prompts. Based on this taxonomy and prompt set, we build a large-scale T2I dataset with 68K manually annotated images and train an evaluator capable of detecting critical risks that previous work has failed to identify, including risks that even ultra-large proprietary models like GPTs cannot correctly detect. We evaluate 12 prominent diffusion models on T2ISafety and reveal several concerns including persistent issues with racial fairness, a tendency to generate toxic content, and significant variation in privacy protection across the models, even with defense methods like concept erasing. Data and evaluator are released under https://github.com/adwardlee/t2i_safety.
Published: 2025

2. Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Author: Zhou, Enshen, Su, Qi, Chi, Cheng, Zhang, Zhizheng, Wang, Zhongyuan, Huang, Tiejun, Sheng, Lu, and Wang, He
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments., Comment: Project page: https://zhoues.github.io/Code-as-Monitor/
Published: 2024

3. MV-Adapter: Multi-view Consistent Image Generation Made Easy

Author: Huang, Zehuan, Guo, Yuan-Chen, Wang, Haoran, Yi, Ran, Ma, Lizhuang, Cao, Yan-Pei, and Sheng, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing multi-view image generation methods often make invasive modifications to pre-trained text-to-image (T2I) models and require full fine-tuning, leading to (1) high computational costs, especially with large base models and high-resolution images, and (2) degradation in image quality due to optimization difficulties and scarce high-quality 3D data. In this paper, we propose the first adapter-based solution for multi-view image generation, and introduce MV-Adapter, a versatile plug-and-play adapter that enhances T2I models and their derivatives without altering the original network structure or feature space. By updating fewer parameters, MV-Adapter enables efficient training and preserves the prior knowledge embedded in pre-trained models, mitigating overfitting risks. To efficiently model the 3D geometric knowledge within the adapter, we introduce innovative designs that include duplicated self-attention layers and parallel attention architecture, enabling the adapter to inherit the powerful priors of the pre-trained models to model the novel 3D knowledge. Moreover, we present a unified condition encoder that seamlessly integrates camera parameters and geometric information, facilitating applications such as text- and image-based 3D generation and texturing. MV-Adapter achieves multi-view generation at 768 resolution on Stable Diffusion XL (SDXL), and demonstrates adaptability and versatility. It can also be extended to arbitrary view generation, enabling broader applications. We demonstrate that MV-Adapter sets a new quality standard for multi-view image generation, and opens up new possibilities due to its efficiency, adaptability and versatility., Comment: Project page: https://huanngzh.github.io/MV-Adapter-Page/
Published: 2024

4. MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

Author: Huang, Zehuan, Guo, Yuan-Chen, An, Xingqiao, Yang, Yunhan, Li, Yangguang, Zou, Zi-Xin, Liang, Ding, Liu, Xihui, Cao, Yan-Pei, and Sheng, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper introduces MIDI, a novel paradigm for compositional 3D scene generation from a single image. Unlike existing methods that rely on reconstruction or retrieval techniques or recent approaches that employ multi-stage object-by-object generation, MIDI extends pre-trained image-to-3D object generation models to multi-instance diffusion models, enabling the simultaneous generation of multiple 3D instances with accurate spatial relationships and high generalizability. At its core, MIDI incorporates a novel multi-instance attention mechanism, that effectively captures inter-object interactions and spatial coherence directly within the generation process, without the need for complex multi-step processes. The method utilizes partial object images and global scene context as inputs, directly modeling object completion during 3D generation. During training, we effectively supervise the interactions between 3D instances using a limited amount of scene-level data, while incorporating single-object data for regularization, thereby maintaining the pre-trained generalization ability. MIDI demonstrates state-of-the-art performance in image-to-scene generation, validated through evaluations on synthetic data, real-world scene data, and stylized scene images generated by text-to-image diffusion models., Comment: Project page: https://huanngzh.github.io/MIDI-Page/
Published: 2024

5. A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs

Author: He, Lehan, Chen, Zeren, Shi, Zhelun, Yu, Tianyu, Shao, Jing, and Sheng, Lu
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Aligning the behaviors of Multimodal Large Language Models (MLLMs) with human preferences is crucial for developing robust and trustworthy AI systems. While recent attempts have employed human experts or powerful auxiliary AI systems to provide more accurate preference feedback, such as determining the preferable responses from MLLMs or directly rewriting hallucination-free responses, extensive resource overhead compromise the scalability of the feedback collection. In this work, we introduce Topic-level Preference Overwriting (TPO), a self-correctional approach that guide the model itself to mitigate its own hallucination at the topic level. Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention. Notably, the experimental results demonstrate proposed TPO achieves state-of-the-art performance in trustworthiness, significantly reducing the object hallucinations by 92% and overall hallucinations by 38%. Code, model and dataset are available now.
Published: 2024

6. WorldSimBench: Towards Video Generation Models as World Simulators

Author: Qin, Yiran, Shi, Zhelun, Yu, Jiwen, Wang, Xijun, Zhou, Enshen, Li, Lijun, Yin, Zhenfei, Liu, Xihui, Sheng, Lu, Shao, Jing, Bai, Lei, Ouyang, Wanli, and Zhang, Ruimao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in predictive models have demonstrated exceptional capabilities in predicting the future state of objects and scenes. However, the lack of categorization based on inherent characteristics continues to hinder the progress of predictive model development. Additionally, existing benchmarks are unable to effectively evaluate higher-capability, highly embodied predictive models from an embodied perspective. In this work, we classify the functionalities of predictive models into a hierarchy and take the first step in evaluating World Simulators by proposing a dual evaluation framework called WorldSimBench. WorldSimBench includes Explicit Perceptual Evaluation and Implicit Manipulative Evaluation, encompassing human preference assessments from the visual perspective and action-level evaluations in embodied tasks, covering three representative embodied scenarios: Open-Ended Embodied Environment, Autonomous, Driving, and Robot Manipulation. In the Explicit Perceptual Evaluation, we introduce the HF-Embodied Dataset, a video assessment dataset based on fine-grained human feedback, which we use to train a Human Preference Evaluator that aligns with human perception and explicitly assesses the visual fidelity of World Simulators. In the Implicit Manipulative Evaluation, we assess the video-action consistency of World Simulators by evaluating whether the generated situation-aware video can be accurately translated into the correct control signals in dynamic environments. Our comprehensive evaluation offers key insights that can drive further innovation in video generation models, positioning World Simulators as a pivotal advancement toward embodied artificial intelligence.
Published: 2024

7. Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

Author: Wen, Hao, Huang, Zehuan, Wang, Yaohui, Chen, Xinyuan, Qiao, Yu, and Sheng, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Existing single image-to-3D creation methods typically involve a two-stage process, first generating multi-view images, and then using these images for 3D reconstruction. However, training these two stages separately leads to significant data bias in the inference phase, thus affecting the quality of reconstructed results. We introduce a unified 3D generation framework, named Ouroboros3D, which integrates diffusion-based multi-view image generation and 3D reconstruction into a recursive diffusion process. In our framework, these two modules are jointly trained through a self-conditioning mechanism, allowing them to adapt to each other's characteristics for robust inference. During the multi-view denoising process, the multi-view diffusion model uses the 3D-aware maps rendered by the reconstruction module at the previous timestep as additional conditions. The recursive diffusion framework with 3D-aware feedback unites the entire process and improves geometric consistency.Experiments show that our framework outperforms separation of these two stages and existing methods that combine them at the inference phase. Project page: https://costwen.github.io/Ouroboros3D/, Comment: See our project page at https://costwen.github.io/Ouroboros3D/
Published: 2024

8. From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

Author: Huang, Zehuan, Fan, Hongxing, Wang, Lipeng, and Sheng, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements in controllable human image generation have led to zero-shot generation using structural signals (e.g., pose, depth) or facial appearance. Yet, generating human images conditioned on multiple parts of human appearance remains challenging. Addressing this, we introduce Parts2Whole, a novel framework designed for generating customized portraits from multiple reference images, including pose images and various aspects of human appearance. To achieve this, we first develop a semantic-aware appearance encoder to retain details of different human parts, which processes each image based on its textual label to a series of multi-scale feature maps rather than one image token, preserving the image dimension. Second, our framework supports multi-image conditioned generation through a shared self-attention mechanism that operates across reference and target features during the diffusion process. We enhance the vanilla attention mechanism by incorporating mask information from the reference human images, allowing for the precise selection of any part. Extensive experiments demonstrate the superiority of our approach over existing alternatives, offering advanced capabilities for multi-part controllable human image customization. See our project page at https://huanngzh.github.io/Parts2Whole/.
Published: 2024

9. Self-Supervised Monocular Depth Estimation in the Dark: Towards Data Distribution Compensation

Author: Yang, Haolin, Zhao, Chaoqiang, Sheng, Lu, and Tang, Yang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Nighttime self-supervised monocular depth estimation has received increasing attention in recent years. However, using night images for self-supervision is unreliable because the photometric consistency assumption is usually violated in the videos taken under complex lighting conditions. Even with domain adaptation or photometric loss repair, performance is still limited by the poor supervision of night images on trainable networks. In this paper, we propose a self-supervised nighttime monocular depth estimation method that does not use any night images during training. Our framework utilizes day images as a stable source for self-supervision and applies physical priors (e.g., wave optics, reflection model and read-shot noise model) to compensate for some key day-night differences. With day-to-night data distribution compensation, our framework can be trained in an efficient one-stage self-supervised manner. Though no nighttime images are considered during training, qualitative and quantitative results demonstrate that our method achieves SoTA depth estimating results on the challenging nuScenes-Night and RobotCar-Night compared with existing methods., Comment: Accepted by IJCAI2024
Published: 2024

10. RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

Author: Chen, Zeren, Shi, Zhelun, Lu, Xiaoya, He, Lehan, Qian, Sucheng, Fang, Hao Shu, Yin, Zhenfei, Ouyang, Wanli, Shao, Jing, Qiao, Yu, Lu, Cewu, and Sheng, Lu
Subjects: Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition
Abstract: The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, making it possible to generalize on novel robotic tasks in a composable manner. Despite the promising future, the community is not yet adequately prepared for composable generalization agents, particularly due to the lack of primitive-level real-world robotic datasets. In this paper, we propose a primitive-level robotic dataset, namely RH20T-P, which contains about 33000 video clips covering 44 diverse and complicated robotic tasks. Each clip is manually annotated according to a set of meticulously designed primitive skills, facilitating the future development of composable generalization agents. To validate the effectiveness of RH20T-P, we also construct a potential and scalable agent based on RH20T-P, called RA-P. Equipped with two planners specialized in task decomposition and motion planning, RA-P can adapt to novel physical skills through composable generalization. Our website and videos can be found at https://sites.google.com/view/rh20t-primitive/main. Dataset and code will be made available soon., Comment: 24 pages, 12 figures, 6 tables
Published: 2024

11. Assessment of Multimodal Large Language Models in Alignment with Human Values

Author: Shi, Zhelun, Wang, Zhipin, Fan, Hongxing, Zhang, Zaibin, Li, Lijun, Zhang, Yongting, Yin, Zhenfei, Sheng, Lu, Qiao, Yu, and Shao, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh). However, in terms of Multimodal Large Language Models (MLLMs), despite their commendable performance in perception and reasoning tasks, their alignment with human values remains largely unexplored, given the complexity of defining hhh dimensions in the visual world and the difficulty in collecting relevant data that accurately mirrors real-world situations. To address this gap, we introduce Ch3Ef, a Compreh3ensive Evaluation dataset and strategy for assessing alignment with human expectations. Ch3Ef dataset contains 1002 human-annotated data samples, covering 12 domains and 46 tasks based on the hhh principle. We also present a unified evaluation strategy supporting assessment across various scenarios and different perspectives. Based on the evaluation results, we summarize over 10 key findings that deepen the understanding of MLLM capabilities, limitations, and the dynamic relationships between evaluation levels, guiding future advancements in the field., Comment: arXiv admin note: text overlap with arXiv:2311.02692
Published: 2024

12. MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

Author: Zhou, Enshen, Qin, Yiran, Yin, Zhenfei, Huang, Yuzhou, Zhang, Ruimao, Sheng, Lu, Qiao, Yu, and Shao, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways. However, existing approaches often fail to steadily follow instructions due to difficulties in understanding abstract and sequential natural language instructions. To this end, we introduce MineDreamer, an open-ended embodied agent built upon the challenging Minecraft simulator with an innovative paradigm that enhances instruction-following ability in low-level control signal generation. Specifically, MineDreamer is developed on top of recent advances in Multimodal Large Language Models (MLLMs) and diffusion models, and we employ a Chain-of-Imagination (CoI) mechanism to envision the step-by-step process of executing instructions and translating imaginations into more precise visual prompts tailored to the current state; subsequently, the agent generates keyboard-and-mouse actions to efficiently achieve these imaginations, steadily following the instructions at each step. Extensive experiments demonstrate that MineDreamer follows single and multi-step instructions steadily, significantly outperforming the best generalist agent baseline and nearly doubling its performance. Moreover, qualitative analysis of the agent's imaginative ability reveals its generalization and comprehension of the open world., Comment: Project page: https://sites.google.com/view/minedreamer/main
Published: 2024

13. Federated Learning for Data Trading Portfolio Allocation With Autonomous Economic Agents.

Author: Lei Zhao 0007, Lin Cai 0001, and Wu-Sheng Lu
Published: 2025
Full Text: View/download PDF

14. Data-Free Generalized Zero-Shot Learning

Author: Tang, Bowen, Yan, Long, Zhang, Jing, Yu, Qian, Sheng, Lu, and Xu, Dong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning models have the ability to extract rich knowledge from large-scale datasets. However, the sharing of data has become increasingly challenging due to concerns regarding data copyright and privacy. Consequently, this hampers the effective transfer of knowledge from existing data to novel downstream tasks and concepts. Zero-shot learning (ZSL) approaches aim to recognize new classes by transferring semantic knowledge learned from base classes. However, traditional generative ZSL methods often require access to real images from base classes and rely on manually annotated attributes, which presents challenges in terms of data restrictions and model scalability. To this end, this paper tackles a challenging and practical problem dubbed as data-free zero-shot learning (DFZSL), where only the CLIP-based base classes data pre-trained classifier is available for zero-shot classification. Specifically, we propose a generic framework for DFZSL, which consists of three main components. Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier. Secondly, we leverage the text features of CLIP as low-cost semantic information and propose a feature-language prompt tuning (FLPT) method to further align the virtual image features and textual features. Thirdly, we train a conditional generative model using the well-aligned virtual image features and corresponding semantic text features, enabling the generation of new classes features and achieve better zero-shot generalization. Our framework has been evaluated on five commonly used benchmarks for generalized ZSL, as well as 11 benchmarks for the base-to-new ZSL. The results demonstrate the superiority and effectiveness of our approach. Our code is available in https://github.com/ylong4/DFZSL, Comment: Accepted by AAAI24
Published: 2024

15. From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Author: Lu, Chaochao, Qian, Chen, Zheng, Guodong, Fan, Hongxing, Gao, Hongzhi, Zhang, Jie, Shao, Jing, Deng, Jingyi, Fu, Jinlan, Huang, Kexin, Li, Kunchang, Li, Lijun, Wang, Limin, Sheng, Lu, Chen, Meiqi, Zhang, Ming, Ren, Qibing, Chen, Sirui, Gui, Tao, Ouyang, Wanli, Wang, Yali, Teng, Yan, Wang, Yaru, Wang, Yi, He, Yinan, Wang, Yingchun, Wang, Yixu, Zhang, Yongting, Qiao, Yu, Shen, Yujiong, Mou, Yurong, Chen, Yuxi, Zhang, Zaibin, Shi, Zhelun, Yin, Zhenfei, and Wang, Zhipin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications.
Published: 2024

16. Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

Author: Li, Xiawei, Xu, Qingyuan, Zhang, Jing, Zhang, Tianyi, Yu, Qian, Sheng, Lu, and Xu, Dong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D point cloud semantic segmentation has a wide range of applications. Recently, weakly supervised point cloud segmentation methods have been proposed, aiming to alleviate the expensive and laborious manual annotation process by leveraging scene-level labels. However, these methods have not effectively exploited the rich geometric information (such as shape and scale) and appearance information (such as color and texture) present in RGB-D scans. Furthermore, current approaches fail to fully leverage the point affinity that can be inferred from the feature extraction network, which is crucial for learning from weak scene-level labels. Additionally, previous work overlooks the detrimental effects of the long-tailed distribution of point cloud data in weakly supervised 3D semantic segmentation. To this end, this paper proposes a simple yet effective scene-level weakly supervised point cloud segmentation method with a newly introduced multi-modality point affinity inference module. The point affinity proposed in this paper is characterized by features from multiple modalities (e.g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution. Extensive experiments on the ScanNet and S3DIS benchmarks verify the effectiveness of our proposed method, which outperforms the state-of-the-art by ~4% to ~6% mIoU. Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.
Published: 2023

17. MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Author: Qin, Yiran, Zhou, Enshen, Liu, Qichang, Yin, Zhenfei, Sheng, Lu, Zhang, Ruimao, Qiao, Yu, and Shao, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: It is a long-lasting goal to design an embodied system that can solve long-horizon open-world tasks in human-like ways. However, existing approaches usually struggle with compound difficulties caused by the logic-aware decomposition and context-aware execution of these tasks. To this end, we introduce MP5, an open-ended multimodal embodied system built upon the challenging Minecraft simulator, which can decompose feasible sub-objectives, design sophisticated situation-aware plans, and perform embodied action control, with frequent communication with a goal-conditioned active perception scheme. Specifically, MP5 is developed on top of recent advances in Multimodal Large Language Models (MLLMs), and the system is modulated into functional modules that can be scheduled and collaborated to ultimately solve pre-defined context- and process-dependent tasks. Extensive experiments prove that MP5 can achieve a 22% success rate on difficult process-dependent tasks and a 91% success rate on tasks that heavily depend on the context. Moreover, MP5 exhibits a remarkable ability to address many open-ended tasks that are entirely novel., Comment: Accepted to CVPR2024
Published: 2023

18. EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

Author: Huang, Zehuan, Wen, Hao, Dong, Junting, Wang, Yaohui, Li, Yangguang, Chen, Xinyuan, Cao, Yan-Pei, Liang, Ding, Qiao, Yu, Dai, Bo, and Sheng, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image. Recent methods that introduce 3D global representation into diffusion models have shown the potential to generate consistent multiviews, but they have reduced generation speed and face challenges in maintaining generalizability and quality. To address this issue, we propose EpiDiff, a localized interactive multiview diffusion model. At the core of the proposed approach is to insert a lightweight epipolar attention block into the frozen diffusion model, leveraging epipolar constraints to enable cross-view interaction among feature maps of neighboring views. The newly initialized 3D modeling module preserves the original feature distribution of the diffusion model, exhibiting compatibility with a variety of base diffusion models. Experiments show that EpiDiff generates 16 multiview images in just 12 seconds, and it surpasses previous methods in quality evaluation metrics, including PSNR, SSIM and LPIPS. Additionally, EpiDiff can generate a more diverse distribution of views, improving the reconstruction quality from generated multiviews. Please see our project page at https://huanngzh.github.io/EpiDiff/., Comment: Project page: https://huanngzh.github.io/EpiDiff/
Published: 2023

19. Mixed reality-assisted versus landmark-guided spinal puncture in elderly patients: protocol for a stratified randomized controlled trial

Author: Gao, Lei, Zhang, Haichao, Xu, Yidi, Dong, Yanjun, Sheng, Lu, Fan, Yongqian, Qin, Chunhui, and Gu, Weidong
Published: 2024
Full Text: View/download PDF

20. Cordyceps militaris and Armillaria mellea formula alleviates depressive behaviors via microglia regulation in an unpredictable chronic mild stress animal model

Author: Yu-En Lin, Hui-Ping Lin, Kuan-Hung Lu, Yun-Ju Huang, Suraphan Panyod, Wei-Ting Liu, Yun-Sheng Lu, Mei-Hsing Chen, and Lee-Yan Sheen
Subjects: Cordyceps militaris, Armillaria mellea, Depression, Unpredictable chronic mild stress, Microglia, Medicine
Abstract: Background and aim: Cordyceps militaris (CM) and Armillaria mellea (AM) are medicinal mushrooms with potential applications in the treatment of mood disorders, including depression and anxiety. While research suggests that both CM and AM possess anti-inflammatory properties and hold potential for treating depression when administered separately, there is limited knowledge about their efficacy when combined in a formula, as well as the underlying mechanism involving the modulation of microglia. Experimental procedure: Rats received oral administrations of the low-dose formulation, medium-dose formulation, and high-dose formulation over 28 consecutive days as part of the UCMS protocols. The concentrations of serotonin, dopamine, and the corresponding metabolites in the rat prefrontal cortex and hippocampus were assessed. Blood samples were collected to examine corticosterone levels, and the brains were dissected for evaluating activated microglia morphologies and associated pro- and anti-inflammatory signaling pathways. Results and conclusion: The CM-AM formula effectively averted abnormal behaviors triggered by UCMS, such as anhedonia and hypoactivity, and decreased the turnover rate of monoamines in both the prefrontal cortex and hippocampus. The formula mitigated the increase in serum corticosterone levels induced by chronic stress. Furthermore, the formula alleviated stress-induced microglia activation in the hippocampus, achieving this by down-regulating hyperactivated pro-inflammatory proteins and up-regulating hypoactivated anti-inflammatory proteins in the hippocampus. The antidepressant-like effects potentially stemming from the regulation of neurotransmitters and immunomodulation, likely by restoring the balance of M1 and M2 microglia fractions in the hippocampus. Consequently, the CM-AM formula could be explored as a prospective complementary and alternative therapy for depression.
Published: 2025
Full Text: View/download PDF

21. ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models

Author: Shi, Zhelun, Wang, Zhipin, Fan, Hongxing, Yin, Zhenfei, Sheng, Lu, Qiao, Yu, and Shao, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting with visual content with myriad potential downstream tasks. However, even though a list of benchmarks has been proposed, the capabilities and limitations of MLLMs are still not comprehensively understood, due to a lack of a standardized and holistic evaluation framework. To this end, we present the first Comprehensive Evaluation Framework (ChEF) that can holistically profile each MLLM and fairly compare different MLLMs. First, we structure ChEF as four modular components, i.e., Scenario as scalable multimodal datasets, Instruction as flexible instruction retrieving formulae, Inferencer as reliable question answering strategies, and Metric as indicative task-specific score functions. Based on them, ChEF facilitates versatile evaluations in a standardized framework, and new evaluations can be built by designing new Recipes (systematic selection of these four components). Notably, current MLLM benchmarks can be readily summarized as recipes of ChEF. Second, we introduce 6 new recipes to quantify competent MLLMs' desired capabilities (or called desiderata, i.e., calibration, in-context learning, instruction following, language performance, hallucination, and robustness) as reliable agents that can perform real-world multimodal interactions. Third, we conduct a large-scale evaluation of 9 prominent MLLMs on 9 scenarios and 6 desiderata. Our evaluation summarized over 20 valuable observations concerning the generalizability of MLLMs across various scenarios and the composite capability of MLLMs required for multimodal interactions. We will publicly release all the detailed implementations for further analysis, as well as an easy-to-use modular toolkit for the integration of new recipes and models, so that ChEF can be a growing evaluation framework for the MLLM community., Comment: 39 pages, 26 figures
Published: 2023

22. Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

Author: Chen, Zeren, Wang, Ziqin, Wang, Zhen, Liu, Huayang, Yin, Zhenfei, Liu, Si, Sheng, Lu, Ouyang, Wanli, Qiao, Yu, and Shao, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Recent studies have demonstrated Large Language Models (LLMs) can extend their zero-shot generalization capabilities to multimodal learning through instruction tuning. As more modalities and downstream tasks are introduced, negative conflicts and interference may have a worse impact on performance. While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs). Specifically, we combine the well-known Mixture-of-Experts (MoE) and one of the representative PEFT techniques, i.e., LoRA, designing a novel LLM-based decoder, called LoRA-MoE, for multimodal learning. To the best of our knowledge, we are one of the pioneering efforts to introduce MoE into MLLMs to address this problem. The experimental results (about 20% improvement) have shown the effectiveness and versatility of our design in various 2D and 3D downstream tasks. Code and datasets are available at https://openlamm.github.io/tutorial/., Comment: 22 pages, 12 figures. Accepted in ICLR 2024
Published: 2023

23. Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting

Author: Ai, Hao and Sheng, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengthy text prompts, and even requires further training through TextualInversion, DreamBooth or other methods, which is very complicated and expensive for painters. Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting. The first type of conditional image serves as an image prompt, supplying the necessary conceptual and color information for generation. The second type is blueprint image, which controls the visual structure of the generated image. It is natively embedded into the original UNet, eliminating the need for ControlNet. We released all the code for the module and pipeline, and trained a controllable character line art coloring model at https://github.com/aihao2000/stable-diffusion-reference-only, that achieved state-of-the-art results in this field. This verifies the effectiveness of the structure and greatly improves the production efficiency of animations, comics, and fanworks.
Published: 2023

24. Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter

Author: Wang, Jinglong, Li, Xiawei, Zhang, Jing, Xu, Qingyuan, Zhou, Qin, Yu, Qian, Sheng, Lu, and Xu, Dong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes. Recently, there has been a growing interest in expanding the application of generative models from generation tasks to semantic segmentation. These approaches utilize generative models either for generating annotated data or extracting features to facilitate semantic segmentation. This typically involves generating a considerable amount of synthetic data or requiring additional mask annotations. To this end, we uncover the potential of generative text-to-image diffusion models (e.g., Stable Diffusion) as highly efficient open-vocabulary semantic segmenters, and introduce a novel training-free approach named DiffSegmenter. The insight is that to generate realistic objects that are semantically faithful to the input text, both the complete object shapes and the corresponding semantics are implicitly learned by diffusion models. We discover that the object shapes are characterized by the self-attention maps while the semantics are indicated through the cross-attention maps produced by the denoising U-Net, forming the basis of our segmentation results.Additionally, we carefully design effective textual prompts and a category filtering mechanism to further enhance the segmentation results. Extensive experiments on three benchmark datasets show that the proposed DiffSegmenter achieves impressive results for open-vocabulary semantic segmentation.
Published: 2023

25. Dynamic of Organic Matter, Nutrient Cycling, and PH in Soil Aggregate Particle Sizes Under Long-Term Cultivation of Camellia Oleifera

Author: Zipei, Luo, Qi, Sun, Ndzana, Georges Martial, Lijun, Chen, Yuqi, Chen, sheng, Lu, and Lichao, Wu
Published: 2024
Full Text: View/download PDF

26. Distortion-aware Transformer in 360{\deg} Salient Object Detection

Author: Zhao, Yinjie, Zhao, Lichen, Yu, Qian, Zhang, Jing, Sheng, Lu, and Xu, Dong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With the emergence of VR and AR, 360{\deg} data attracts increasing attention from the computer vision and multimedia communities. Typically, 360{\deg} data is projected into 2D ERP (equirectangular projection) images for feature extraction. However, existing methods cannot handle the distortions that result from the projection, hindering the development of 360-data-based tasks. Therefore, in this paper, we propose a Transformer-based model called DATFormer to address the distortion problem. We tackle this issue from two perspectives. Firstly, we introduce two distortion-adaptive modules. The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally. The second module is a Distortion-Adaptive Attention Block that reduces local distortions on multi-scale features. Secondly, to exploit the unique characteristics of 360{\deg} data, we present a learnable relation matrix and use it as part of the positional embedding to further improve performance. Extensive experiments are conducted on three public datasets, and the results show that our model outperforms existing 2D SOD (salient object detection) and 360 SOD methods., Comment: 10 pages, 5 figures
Published: 2023

27. LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

Author: Yin, Zhenfei, Wang, Jiong, Cao, Jianjian, Shi, Zhelun, Liu, Dingning, Li, Mukai, Sheng, Lu, Bai, Lei, Huang, Xiaoshui, Wang, Zhiyong, Shao, Jing, and Ouyang, Wanli
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large language models have emerged as a promising approach towards achieving general-purpose AI agents. The thriving open-source LLM community has greatly accelerated the development of agents that support human-machine dialogue interaction through natural language processing. However, human interaction with the world extends beyond only text as a modality, and other modalities such as vision are also crucial. Recent works on multi-modal large language models, such as GPT-4V and Bard, have demonstrated their effectiveness in handling visual modalities. However, the transparency of these works is limited and insufficient to support academic research. To the best of our knowledge, we present one of the very first open-source endeavors in the field, LAMM, encompassing a Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark. Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs, with a specific focus on facilitating AI agents capable of bridging the gap between ideas and execution, thereby enabling seamless human-AI interaction. Our main contribution is three-fold: 1) We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision. Extensive experiments validate the effectiveness of our dataset and benchmark. 2) We outline the detailed methodology of constructing multi-modal instruction tuning datasets and benchmarks for MLLMs, enabling rapid scaling and extension of MLLM research to diverse domains, tasks, and modalities. 3) We provide a primary but potential MLLM training framework optimized for modality extension. We also provide baseline models, comprehensive experimental observations, and analysis to accelerate future research. Our baseline model is trained within 24 A100 GPU hours, framework supports training with V100 and RTX3090 is available thanks to the open-source society., Comment: NeurIPS2023 camera ready ; 37 pages, 33 figures. Code available at https://github.com/OpenLAMM/LAMM ; Project page: https://openlamm.github.io/
Published: 2023

28. Effect of isothermal annealing on yield anisotropy of AZ31 Mg alloy bars processed by ambient extrusion

Author: Zong-Yuan Cheng, Tao Fang, Yi-Song Wang, Huan-Huan Chen, Jin-Hua Peng, Liang-Yu Chen, Zhen Zhang, and Sheng Lu
Subjects: Mg alloy, Yield anisotropy, Basal texture, Texture weakening, Twinning, Mining engineering. Metallurgy, TN1-997
Abstract: Magnesium and its alloys show strong mechanical anisotropy due to their hexagonal close-packed structure. This paper processed ambient extrusion and subsequent annealing on AZ31 magnesium alloy bars to investigate the effect of working hardening and texture weakening on yield anisotropy of magnesium alloy. After ambient extrusion, yield strength on different loading direction are improved due to working hardening. ED-yield strength has the highest increment because work hardening has a higher effect on prismatic slip. Then, yield anisotropy becomes stronger with the increase of extrusion train. During annealing, static recrystallization occurs firstly in {101‾1} twinning zones and producing recrystallized grains with random orientation. With the growth of recrystallized grains, basal fiber texture is weakened gradually. Such weak basal texture can promote the activation of basal slip when loading along ED, so ED-yield strength shows a high decrement. On this way, the yield anisotropy in AZ31 magnesium alloy bars is weakened through ambient extrusion and subsequent annealing.
Published: 2024
Full Text: View/download PDF

29. Gap-free telomere-to-telomere haplotype assembly of the tomato hind (Cephalopholis sonnerati)

Author: Sheng Lu, Yang Liu, Ming Li, Qijin Ge, Chongwei Wang, Yu Song, Bo Zhou, and Songlin Chen
Subjects: Science
Abstract: Abstract The tomato hind (Cephalopholis sonnerati) is an emerging economically important grouper in recent years. With the increasing maturity of sequencing technologies and assembly methodologies, a higher quality reference genome has become both accessible and necessary. In this study, we present two telomere-to-telomere (T2T) gap-free haplotype assemblies of the tomato hind with lengths of 1039.53 Mb (YSFRI_Csonn_HA_1.0, N50 43.83 Mb) and 1039.91 Mb (YSFRI_Csonn_HB_1.0, N50 44.09 Mb). Reads from next-generation sequencing, ONT ultra-long sequencing, and PacBio HiFi sequencing exhibited mapping rates exceeding 99.8% when aligned to these two assemblies. Evaluation using Merqury indicated high accuracy for both assemblies, with average quality values of 51.80 and 51.83, respectively. Percentages of 97.9% and 97.8% of complete BUSCOs were achieved, and a total of 23,270 and 23,184 protein-code genes were inferred in each assembly. Moreover, telomere identification, centromere prediction, and repetitive sequence annotation were also successfully performed. These two assemblies provide robust foundation for the genetic analysis and development of molecular genetic breeding technologies in C. sonnerati.
Published: 2024
Full Text: View/download PDF

30. Siamese DETR

Author: Chen, Zeren, Huang, Gengshi, Li, Wei, Teng, Jianing, Wang, Kun, Shao, Jing, Loy, Chen Change, and Sheng, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent self-supervised methods are mainly designed for representation learning with the base model, e.g., ResNets or ViTs. They cannot be easily transferred to DETR, with task-specific Transformer modules. In this work, we present Siamese DETR, a Siamese self-supervised pretraining approach for the Transformer architecture in DETR. We consider learning view-invariant and detection-oriented representations simultaneously through two complementary tasks, i.e., localization and discrimination, in a novel multi-view learning framework. Two self-supervised pretext tasks are designed: (i) Multi-View Region Detection aims at learning to localize regions-of-interest between augmented views of the input, and (ii) Multi-View Semantic Discrimination attempts to improve object-level discrimination for each region. The proposed Siamese DETR achieves state-of-the-art transfer performance on COCO and PASCAL VOC detection using different DETR variants in all setups. Code is available at https://github.com/Zx55/SiameseDETR., Comment: 10 pages, 11 figures. Accepted in CVPR 2023
Published: 2023

31. VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

Author: Wang, Ziqin, Cheng, Bowen, Zhao, Lichen, Xu, Dong, Tang, Yang, and Sheng, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The task of 3D semantic scene graph (3DSSG) prediction in the point cloud is challenging since (1) the 3D point cloud only captures geometric structures with limited semantics compared to 2D images, and (2) long-tailed relation distribution inherently hinders the learning of unbiased prediction. Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this study, we propose Visual-Linguistic Semantics Assisted Training (VL-SAT) scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations. The key idea is to train a powerful multi-modal oracle model to assist the 3D model. This oracle learns reliable structural representations based on semantics from vision, language, and 3D geometry, and its benefits can be heterogeneously passed to the 3D model during the training stage. By effectively utilizing visual-linguistic semantics in training, our VL-SAT can significantly boost common 3DSSG prediction models, such as SGFN and SGGpoint, only with 3D inputs in the inference stage, especially when dealing with tail relation triplets. Comprehensive evaluations and ablation studies on the 3DSSG dataset have validated the effectiveness of the proposed scheme. Code is available at https://github.com/wz7in/CVPR2023-VLSAT., Comment: CVPR2023 Highlight
Published: 2023

32. Geochronology and geochemistry of the Neoproterozoic–Mesozoic intrusive rocks in the Xinlin area, northeastern China: new constraints on the tectonic evolution of the Erguna block

Author: Sheng Lu, Chenglu Li, Masroor Alam, Zhichao Song, Xiannan Zhu, Anzong Fu, and Wenpeng Yang
Subjects: LA-ICP-MS zircon U–Pb dating, intrusive rock, Xinlin area, muscovite 40Ar/39Ar dating, geochemistry, Science
Abstract: The occurrence of intrusive rocks within the Xinlin area, northeastern China, provides insights into the Neoproterozoic–Mesozoic geodynamic setting of the Erguna block. In this study, we present petrographic, geochemical, and geochronological data on intrusive rocks from the Xinlin area. Zircon U–Pb and muscovite 40Ar/39Ar geochronology reveal that magmatism occurred during the Neoproterozoic (ca. 864.98 Ma), Early Ordovician (ca. 470.0 Ma), Late Carboniferous (ca. 306.9 Ma), Early Permian (ca. 296.9 Ma), and Early Cretaceous (ca. 117.8 Ma) periods. The Neoproterozoic and Early Ordovician intermediate–mafic intrusive rocks have low Rb/Sr contents, high Mg#, and weakly negative Eu anomalies. These results suggest that the magma sources of these rocks varied: intermediate–acidic magmas were derived from the lower crust, and intermediate–mafic magmas originated from the mantle and were subsequently contaminated by crustal material. In contrast, the Late Carboniferous, Early Permian, Late Triassic–Early Jurassic, and Early Cretaceous intermediate–acidic intrusive rocks display high Rb/Sr contents, low Mg#, and strongly negative Eu anomalies, indicating derivation from the partial melting of the lower crust. Our findings, along with previous studies, suggest that Neoproterozoic intrusive rocks were formed during the breakup of the Rodinia supercontinent. The Paleozoic intrusive rocks are associated with the collision and amalgamation of the Erguna and Xing’an blocks, as well as the Songnen and Xing’an blocks. Early Mesozoic intrusive rocks were developed during the subduction of the Mongol-Okhotsk oceanic intracontinental system. Finally, the late Mesozoic intrusive rocks were formed in a non-orogenic extensional setting, potentially linked to the final closure of the Mongol-Okhotsk Ocean or the rollback of the Paleo-Pacific Plate.
Published: 2025
Full Text: View/download PDF

33. Evaluation of the association between bevacizumab concentration and clinical outcomes in patients with breast cancer brain metastasis

Author: Chih-Ning Cheng, Yun-Jung Tsai, Huai-Hsuan Chiu, Tom Wei-Wu Chen, Ching-Hung Lin, Yen-Sheng Lu, and Ching-Hua Kuo
Subjects: Bevacizumab, Therapeutic drug monitoring, Overall survival, Liquid chromatography–mass spectrometry, Science (General), Q1-390, Social sciences (General), H1-99
Abstract: Bevacizumab is widely used in various clinical indications, but investigations into its optimal dosage for treating CNS metastases remain limited. The BEEP regimen, comprising bevacizumab, etoposide, and cisplatin, has recently demonstrated promising clinical outcomes for patients with breast cancer brain metastasis (BCBM) or leptomeningeal metastasis (LM). This study aimed to evaluate the exposure-response relationship of bevacizumab in BCBM patients and to explore the improved CNS penetration of chemotherapy by bevacizumab with LM patients. Twenty-two BCBM patients and six LM patients receiving the BEEP regimen were enrolled. For BCBM patients, blood samples were drawn at trough level of cycles 1 and 6 to investigate the association between bevacizumab concentrations and clinical outcomes. For LM patients, plasma and cerebrospinal fluid (CSF) concentrations of bevacizumab and etoposide were measured to investigate the enhancement of etoposide penetration provided by bevacizumab. Concentration evaluation revealed that bevacizumab plasma concentrations substantially varied between individuals. Additionally, concentrations increased after 6 cycles, indicating bevacizumab accumulation during treatment. Although bevacizumab concentrations did not associate with therapeutic response and progression-free survival, patients with higher bevacizumab concentrations exhibited longer overall survival (adjusted HR 0.78; p = 0.039). Furthermore, a positive correlation was observed between time-weighted average concentration of plasma bevacizumab and CSF penetration of etoposide on day 2 (post-bevacizumab) relative to day 1 (pre-bevacizumab) (r = 0.83; p = 0.042). These findings offer valuable insights into the application of therapeutic drug monitoring of bevacizumab to improve survival outcomes in BCBM patients. Further studies are warranted to determine the optimal bevacizumab concentration.
Published: 2025
Full Text: View/download PDF

34. Observations of three-dimensional ionospheric plasma properties in a space hurricane

Author: Sheng Lu, Zan-Yang Xing, Qing-He Zhang, Yongliang Zhang, Kjellmar Oksavik, L. R. Lyons, Michael Lockwood, Yu-Zhang Ma, Xiang-Yu Wang, N. Balan, Hui-Gen Yang, Yong Wang, Zhong-Xin Deng, Tong Xu, and Shu-Ji Sun
Subjects: space hurricane, polar ionosphere, polar cap aurora, particle precipitation, high-latitude lobe reconnection, Astronomy, QB1-991, Geophysics. Cosmic physics, QC801-809
Abstract: The space hurricane is a newly discovered large-scale three-dimensional magnetic vortex structure that spans the polar ionosphere and magnetosphere. It has been suggested to open a fast energy transport channel for the solar wind to invade Earth’s magnetosphere under northward interplanetary magnetic field (IMF) conditions. It is, therefore, an important phenomenon to understand the solar wind–magnetosphere–ionosphere coupling process under northward IMF conditions. In this study, we report the three-dimensional ionospheric plasma properties of a space hurricane event in the Northern Hemisphere observed by multiple instruments. Based on the convection velocity observations from ground-based radars and polar satellites, we confirm that the major modulation to the polar cap convection called a space hurricane rotates clockwise at the altitude of the ionosphere. Ground-based incoherent scatter radar and polar satellite observations reveal four features associated with the space hurricane: 1) strong plasma flow shears and being embedded in a clockwise lobe convection cell; 2) a major addition to the total energy deposition in the ionosphere–thermosphere system by Joule heating; 3) downward ionospheric electron transport; and 4) multiple ion-temperature enhancements in the sunward velocity region, likely from the spiral arms of the space hurricane. These results present, first, the impact of space hurricane on the low-altitude ionosphere and provide additional insights on the magnetospheric impact on structuring in the polar ionosphere.
Published: 2024
Full Text: View/download PDF

35. Engineering a Mesoporous Silicon Nanoparticle Cage to Enhance Performance of a Phosphotriesterase Enzyme for Degradation of VX Nerve Agent

Author: Yi‐Sheng Lu, Eduardo Reynoso Moreno, Yubin Huang, Ruhan Fan, Ashley T. Tucker, Linnzi K. Wright, Ronald A. Evans, Brooke M. Ahern, Donald E. Owens, Stephen A. Chappell, Dale J. Christensen, John Dresios, and Michael J. Sailor
Subjects: acetylcholinesterase activity assay, dermal protection, enzyme immobilization, enzyme stability, phosphotriesterase variant L7ep‐3a, Science
Abstract: Abstract The organophosphate (OP)‐hydrolyzing enzyme phosphotriesterase (PTE, variant L7ep‐3a) immobilized within a partially oxidized mesoporous silicon nanoparticle cage is synthesized and the catalytic performance of the enzyme@nanoparticle construct for hydrolysis of a simulant, dimethyl p‐nitrophenyl phosphate (DMNP), and the live nerve agent VX is benchmarked against the free enzyme. In a neutral aqueous buffer, the optimized construct shows a ≈2‐fold increase in the rate of DMNP turnover relative to the free enzyme. Enzyme@nanoparticles with more hydrophobic surface chemistry in the interior of the pores show lower catalytic activity, suggesting the importance of hydration of the pore interior on performance. The enzyme@nanoparticle construct is readily separated from the neutralized agent; the nanoparticle is found to retain DMNP hydrolysis activity through seven decontamination/recovery cycles. The nanoparticle cage stabilizes the enzyme against thermal denaturing and enzymatic (trypsin) degradation conditions relative to free enzyme. When incorporated into a topical gel formulation, the PTE‐loaded nanoparticles show high activity toward the nerve agent VX in an ex vivo rabbit skin model. In vitro acetylcholinesterase (AChE) assays in human blood show that the enzyme@nanoparticle construct decontaminates VX, preserving the biological function of AChE when exposed to an otherwise incapacitating dose.
Published: 2024
Full Text: View/download PDF

36. Effects and safety of propofol intravenous anesthesia in transvaginal oocyte retrieval on outcomes of in vitro fertilization and embryo transplantation

Author: Xiao-Ming Liu, Fan Zhang, Xiao-Sheng Lu, Hai-Tao Xi, and Jun-Zhao Zhao
Subjects: oocyte retrieval, in vitro fertilization, pregnancy rate, embryo quality, propofol, Diseases of the endocrine glands. Clinical endocrinology, RC648-665
Abstract: PurposePropofol, a widely utilized anesthetic, is employed to alleviate pain and anxiety in outpatient oocyte retrieval procedures. However, its potential impact and safety profile in the context of in vitro fertilization and embryo transfer (IVF-ET) remain unclear.MethodsThis retrospective study enrolled 1187 patients undergoing IVF-ET, and divided into two groups depending on whether they received propofol (propofol group, n=140) or not (control group, n=1047) for anesthesia during oocyte retrieval.ResultsThe baseline characteristics were comparable between the groups. Compared with control group, the number of oocytes retrieved in propofol group was more (p=0.012), while both the estradiol (E2) level on the trigger day and the pre-ovulatory follicle count were higher in propofol group ((p20) to eliminate the influence of inconsistency in the estimation of the pre-ovulatory follicle count between the two groups. Analysis revealed that the use of propofol during oocyte retrieval was particularly advantageous in the subgroup with a pre-ovulatory follicle count of 11–20, yielding a higher oocyte retrieval rate (p
Published: 2024
Full Text: View/download PDF

37. Changes in liver and kidney function, red blood cell count and hemoglobin levels 1 day after ultrasound-guided percutaneous microwave ablation for uterine fibroids

Author: Xiao-Yu Huang, Qin-Sheng Lu, Shao-Ping Wu, Han-Ming Huang, and Yong-Fa Zhang
Subjects: Uterine fibroids, ultrasound-guided percutaneous microwave ablation, acute kidney injury, liver and kidney function, perioperative organ protection, Medical technology, R855-855.5
Abstract: Objective To investigate the changes in liver and kidney function, red blood cell (RBC) count and hemoglobin (HGB) levels in patients undergoing ultrasound-guided percutaneous microwave ablation (UPMWA) for uterine fibroids on postoperative day 1.Methods The changes in liver and kidney function, RBC count and HGB levels in 181 patients who underwent selective UPMWA in the Second Affiliated Hospital of Shantou University Medical College, China, between August 2017 and January 2023 were retrospectively analyzed.Results All patients underwent UPMWA for uterine fibroids; 179 patients had multiple uterine fibroids and 2 patients had single uterine fibroids. The maximum fibroid diameter ranged from 18 to 140 mm, with an average of 68.3 mm. Ultrasound imaging was used to confirm that the blood flow signal within the mass had disappeared in all patients, indicating that the ablation was effective. Within 24 h, compared with before UPMWA, levels of total bilirubin, direct bilirubin, indirect bilirubin and aspartate aminotransferase had significantly increased (p
Published: 2024
Full Text: View/download PDF

38. Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

Author: Li, Yangguang, Huang, Bin, Chen, Zeren, Cui, Yufeng, Liang, Feng, Shen, Mingzhu, Liu, Fenggang, Xie, Enze, Sheng, Lu, Ouyang, Wanli, and Shao, Jing
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, perception task based on Bird's-Eye View (BEV) representation has drawn more and more attention, and BEV representation is promising as the foundation for next-generation Autonomous Vehicle (AV) perception. However, most existing BEV solutions either require considerable resources to execute on-vehicle inference or suffer from modest performance. This paper proposes a simple yet effective framework, termed Fast-BEV , which is capable of performing faster BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive transformer based transformation nor depth representation. Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference. We further introduce (4) a strong data augmentation strategy for both image and BEV space to avoid over-fitting, (5) a multi-frame feature fusion mechanism to leverage the temporal information. Through experiments, on 2080Ti platform, our R50 model can run 52.6 FPS with 47.3% NDS on the nuScenes validation set, exceeding the 41.3 FPS and 47.5% NDS of the BEVDepth-R50 model and 30.2 FPS and 45.7% NDS of the BEVDet4D-R50 model. Our largest model (R101@900x1600) establishes a competitive 53.5% NDS on the nuScenes validation set. We further develop a benchmark with considerable accuracy and efficiency on current popular on-vehicle chips. The code is released at: https://github.com/Sense-GVT/Fast-BEV., Comment: arXiv admin note: text overlap with arXiv:2301.07870
Published: 2023

39. Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline

Author: Zhao, Lichen, Cai, Daigang, Zhang, Jing, Sheng, Lu, Xu, Dong, Zheng, Rui, Zhao, Yinjie, Wang, Lipeng, and Fan, Xibo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, 3D vision-and-language tasks have attracted increasing research interest. Compared to other vision-and-language tasks, the 3D visual question answering (VQA) task is less exploited and is more susceptible to language priors and co-reference ambiguity. Meanwhile, a couple of recently proposed 3D VQA datasets do not well support 3D VQA task due to their limited scale and annotation methods. In this work, we formally define and address a 3D grounded VQA task by collecting a new 3D VQA dataset, referred to as FE-3DGQA, with diverse and relatively free-form question-answer pairs, as well as dense and completely grounded bounding box annotations. To achieve more explainable answers, we labelled the objects appeared in the complex QA pairs with different semantic types, including answer-grounded objects (both appeared and not appeared in the questions), and contextual objects for answer-grounded objects. We also propose a new 3D VQA framework to effectively predict the completely visually grounded and explainable answer. Extensive experiments verify that our newly collected benchmark datasets can be effectively used to evaluate various 3D VQA methods from different aspects and our newly proposed framework also achieves state-of-the-art performance on the new benchmark dataset. Both the newly collected dataset and our codes will be publicly available at http://github.com/zlccccc/3DGQA., Comment: 13 pages, 10 figures
Published: 2022

40. Exploring the Efficacy of Dynamic Approaches in Database Encryption and Translational Security.

Author: Tse-Chuan Hsu and Han-Sheng Lu
Published: 2024
Full Text: View/download PDF

41. A Novel Delay-Aware Packing Algorithm for FPGA Architecture Using RFET.

Author: Sheng Lu, Liuting Shang, Sungyong Jung, Yichen Zhang 0005, and Chenyun Pan
Published: 2024
Full Text: View/download PDF

42. Using Machine Learning to Validate a Novel Taxonomy of Phenomenal Translation States.

Author: Michael Carl, Sheng Lu, and Ali Al-Ramadan
Published: 2024

43. Research on Security Enhancement Methods of Internet of Things Communication-Based on Whitelist and Encryption Key Exchange.

Author: Tse-Chuan Hsu and Han-Sheng Lu
Published: 2024
Full Text: View/download PDF

44. How are Prompts Different in Terms of Sensitivity?

Author: Sheng Lu, Hendrik Schuff, and Iryna Gurevych
Published: 2024
Full Text: View/download PDF

45. Are Emergent Abilities in Large Language Models just In-Context Learning?

Author: Sheng Lu, Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, and Iryna Gurevych
Published: 2024
Full Text: View/download PDF

46. Sub-10nm Standard Cell Library Design Methodology for On-Grid Pin Accesses.

Author: Rung-Bin Lin and Pei-Sheng Lu
Published: 2024
Full Text: View/download PDF

47. Emerging Reconfigurable Logic Device Based FPGA Design and Optimization.

Author: Sheng Lu, Liuting Shang, Sungyong Jung, and Chenyun Pan
Published: 2024
Full Text: View/download PDF

48. Designing a Secure and Scalable Service Model Using Blockchain and MQTT for IoT Devices.

Author: Tse-Chuan Hsu and Han-Sheng Lu
Published: 2024
Full Text: View/download PDF

49. The correlation between Diabetes and age-related degeneration and the static and dynamic 3D mechanical distribution of different plantar regions

Author: Xiong-gang Yang, Xing-xi Hu, Qi-yang Wang, Zhi Peng, Hao-tian Luo, and Sheng Lu
Subjects: aging, Diabetic foot, mechanical distribution, plantar pressure, plantar shear force, Diseases of the endocrine glands. Clinical endocrinology, RC648-665
Abstract: PurposeThis study aimed to compare the distribution of plantar pressure and anterior-posterior (AP) or medial-lateral (ML) shear forces in healthy younger (HY) people, healthy older (HO) people, and diabetic patients, both in static standing and during gait.Materials and methodsA total of 20 HY adults, 16 HO adults and 15 diabetic patients were included. The static mechanical distribution measurements included: static horizontal, AP slope plane, and left/right slope standing. Data collected during the gait cycle encompassed the plantar pressure-time integral (PTI), peak pressure (PP), AP/ML shear force-time integral (AP-STI/ML-STI), and AP/ML peak shear force (AP-PS/ML-PS). The plantar surface was segmented into regions including hallux (HL), 2nd~5th toes (T2-5), 1st metatarsal head (M1), 2nd~3rd metatarsal heads (M2-3), 4th~5th metatarsal heads (M4-5), lateral foot arch (LA), and heel regions.ResultsThe HO group exhibited increased static pressure in M2-3 and heel regions and AP shear force in the entire plantar and M1 regions, in comparison to the HY group. The diabetes group showed increased static pressure in entire plantar, M1, M2-3 and heel regions and AP shear force in the entire plantar, T2-5, M1, M2-3 and heel regions. During gait, the HO group exhibited increased PTI in the whole plantar, T2-5, M2-3, and M4-5 regions, while the diabetes group showed increased PTI in the whole plantar, M1 and M2-3 regions. The HO group showed increased PP in the whole plantar, M1 and heel regions, while decreased in the M2-3 region. The diabetes group showed increased PP in the whole plantar, T2-5, M2-3, M4-5 and heel regions. The HO group showed increased AP-STI in the T2-5, M1, and M2-3 regions, while the diabetes group showed increased AP-STI in the whole plantar, M2-3 and heel regions.ConclusionsOur findings indicate that both static and dynamic plantar pressures and shear forces are significantly greater in diabetic patients and HO individuals compared to HY adults. The most substantial increases was occurred under the M2-3 and heel regions.
Published: 2024
Full Text: View/download PDF

50. A case report of severe pneumonia caused by Aeromonas dhakensis infection complicated with severe atrial septal defect

Author: Jun Sha, Jie Shao, Sheng Lu, Mengmeng Zhang, Cheng Gu, Yimai Deng, Jianfeng Zhang, and Yufeng Feng
Subjects: Aeromonas dhakensis, severe pneumonia, atrial septal defect, mNGS, ECMO, Medicine (General), R5-920
Abstract: Aeromonas dhakensis is an increasingly recognized human pathogen in recent years and was first isolated and reported in a sample of childhood diarrhea in Bangladesh. More and more cases of Aeromonas dhakensis infection have been reported in recent years. Here we report a case of severe pneumonia caused by Aeromonas dhakensis with severe atrial septal defect. The patient, a 56-year-old male, was admitted to the hospital with severe hypoxemia and severe septic shock. Detection of the patient’s bronchoalveolar lavage fluid (BALF) and peripheral blood by the metagenomic next generation sequencing (mNGS) indicated Aeromonas dhakensis infection.
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

3,708 results on '"Sheng, Lu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources