Author: "Wang, Haofan" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Wang, Haofan"' showing total 356 results

Start Over Author "Wang, Haofan"

356 results on '"Wang, Haofan"'

1. Multi-scale Multi-instance Visual Sound Localization and Segmentation

Author: Mo, Shentong and Wang, Haofan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Visual sound localization is a typical and challenging problem that predicts the location of objects corresponding to the sound source in a video. Previous methods mainly used the audio-visual association between global audio and one-scale visual features to localize sounding objects in each image. Despite their promising performance, they omitted multi-scale visual features of the corresponding image, and they cannot learn discriminative regions compared to ground truths. To address this issue, we propose a novel multi-scale multi-instance visual sound localization framework, namely M2VSL, that can directly learn multi-scale semantic features associated with sound sources from the input image to localize sounding objects. Specifically, our M2VSL leverages learnable multi-scale visual features to align audio-visual representations at multi-level locations of the corresponding image. We also introduce a novel multi-scale multi-instance transformer to dynamically aggregate multi-scale cross-modal representations for visual sound localization. We conduct extensive experiments on VGGSound-Instruments, VGG-Sound Sources, and AVSBench benchmarks. The results demonstrate that the proposed M2VSL can achieve state-of-the-art performance on sounding object localization and segmentation.
Published: 2024

2. CSGO: Content-Style Composition in Text-to-Image Generation

Author: Xing, Peng, Wang, Haofan, Sun, Yanpeng, Wang, Qixun, Bai, Xu, Ai, Hao, Huang, Renyuan, and Li, Zechao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The diffusion model has shown exceptional capabilities in controlled image generation, which has further fueled interest in image style transfer. Existing works mainly focus on training free-based methods (e.g., image inversion) due to the scarcity of specific data. In this study, we present a data construction pipeline for content-style-stylized image triplets that generates and automatically cleanses stylized data triplets. Based on this pipeline, we construct a dataset IMAGStyle, the first large-scale style transfer dataset containing 210k image triplets, available for the community to explore and research. Equipped with IMAGStyle, we propose CSGO, a style transfer model based on end-to-end training, which explicitly decouples content and style features employing independent feature injection. The unified CSGO implements image-driven style transfer, text-driven stylized synthesis, and text editing-driven stylized synthesis. Extensive experiments demonstrate the effectiveness of our approach in enhancing style control capabilities in image generation. Additional visualization and access to the source code can be located on the project page: \url{https://csgo-gen.github.io/}.
Published: 2024

3. InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation

Author: Wang, Haofan, Xing, Peng, Huang, Renyuan, Ai, Hao, Wang, Qixun, and Bai, Xu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Style transfer is an inventive process designed to create an image that maintains the essence of the original while embracing the visual style of another. Although diffusion models have demonstrated impressive generative power in personalized subject-driven or style-driven applications, existing state-of-the-art methods still encounter difficulties in achieving a seamless balance between content preservation and style enhancement. For example, amplifying the style's influence can often undermine the structural integrity of the content. To address these challenges, we deconstruct the style transfer task into three core elements: 1) Style, focusing on the image's aesthetic characteristics; 2) Spatial Structure, concerning the geometric arrangement and composition of visual elements; and 3) Semantic Content, which captures the conceptual meaning of the image. Guided by these principles, we introduce InstantStyle-Plus, an approach that prioritizes the integrity of the original content while seamlessly integrating the target style. Specifically, our method accomplishes style injection through an efficient, lightweight process, utilizing the cutting-edge InstantStyle framework. To reinforce the content preservation, we initiate the process with an inverted content latent noise and a versatile plug-and-play tile ControlNet for preserving the original image's intrinsic layout. We also incorporate a global semantic adapter to enhance the semantic content's fidelity. To safeguard against the dilution of style information, a style extractor is employed as discriminator for providing supplementary style guidance. Codes will be available at https://github.com/instantX-research/InstantStyle-Plus., Comment: Technical Report
Published: 2024

4. Unified Video-Language Pre-training with Synchronized Audio

Author: Mo, Shentong, Wang, Haofan, Li, Huaxia, and Tang, Xu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two modalities. In this work, we propose an enhanced framework for Video-Language pre-training with Synchronized Audio, termed as VLSA, that can learn tri-modal representations in a unified self-supervised transformer. Specifically, our VLSA jointly aggregates embeddings of local patches and global tokens for video, text, and audio. Furthermore, we utilize local-patch masked modeling to learn modality-aware features, and leverage global audio matching to capture audio-guided features for video and text. We conduct extensive experiments on retrieval across text, video, and audio. Our simple model pre-trained on only 0.9M data achieves improving results against state-of-the-art baselines. In addition, qualitative visualizations vividly showcase the superiority of our VLSA in learning discriminative visual-textual representations.
Published: 2024

5. Multimodal Sense-Informed Prediction of 3D Human Motions

Author: Lou, Zhenyu, Cui, Qiongjie, Wang, Haofan, Tang, Xu, and Zhou, Hong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Predicting future human pose is a fundamental application for machine intelligence, which drives robots to plan their behavior and paths ahead of time to seamlessly accomplish human-robot collaboration in real-world 3D scenarios. Despite encouraging results, existing approaches rarely consider the effects of the external scene on the motion sequence, leading to pronounced artifacts and physical implausibilities in the predictions. To address this limitation, this work introduces a novel multi-modal sense-informed motion prediction approach, which conditions high-fidelity generation on two modal information: external 3D scene, and internal human gaze, and is able to recognize their salience for future human activity. Furthermore, the gaze information is regarded as the human intention, and combined with both motion and scene features, we construct a ternary intention-aware attention to supervise the generation to match where the human wants to reach. Meanwhile, we introduce semantic coherence-aware attention to explicitly distinguish the salient point clouds and the underlying ones, to ensure a reasonable interaction of the generated sequence with the 3D scene. On two real-world benchmarks, the proposed method achieves state-of-the-art performance both in 3D human pose and trajectory prediction.
Published: 2024

6. InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Author: Wang, Haofan, Spinelli, Matteo, Wang, Qixun, Bai, Xu, Qin, Zekui, and Chen, Anthony
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Tuning-free diffusion-based models have demonstrated significant potential in the realm of image personalization and customization. However, despite this notable progress, current models continue to grapple with several complex challenges in producing style-consistent image generation. Firstly, the concept of style is inherently underdetermined, encompassing a multitude of elements such as color, material, atmosphere, design, and structure, among others. Secondly, inversion-based methods are prone to style degradation, often resulting in the loss of fine-grained details. Lastly, adapter-based approaches frequently require meticulous weight tuning for each reference image to achieve a balance between style intensity and text controllability. In this paper, we commence by examining several compelling yet frequently overlooked observations. We then proceed to introduce InstantStyle, a framework designed to address these issues through the implementation of two key strategies: 1) A straightforward mechanism that decouples style and content from reference images within the feature space, predicated on the assumption that features within the same space can be either added to or subtracted from one another. 2) The injection of reference image features exclusively into style-specific blocks, thereby preventing style leaks and eschewing the need for cumbersome weight tuning, which often characterizes more parameter-heavy designs.Our work demonstrates superior visual stylization outcomes, striking an optimal balance between the intensity of style and the controllability of textual elements. Our codes will be available at https://github.com/InstantStyle/InstantStyle., Comment: Technical Report
Published: 2024

7. InstantID: Zero-shot Identity-Preserving Generation in Seconds

Author: Wang, Qixun, Bai, Xu, Wang, Haofan, Qin, Zekui, Chen, Anthony, Li, Huaxia, Tang, Xu, and Hu, Yao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images. Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we introduce InstantID, a powerful diffusion model-based solution. Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity. To achieve this, we design a novel IdentityNet by imposing strong semantic and weak spatial conditions, integrating facial and landmark images with textual prompts to steer the image generation. InstantID demonstrates exceptional performance and efficiency, proving highly beneficial in real-world applications where identity preservation is paramount. Moreover, our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin. Our codes and pre-trained checkpoints will be available at https://github.com/InstantID/InstantID., Comment: Technical Report, project page available at https://instantid.github.io/
Published: 2024

8. Expressive Forecasting of 3D Whole-body Human Motions

Author: Ding, Pengxiang, Cui, Qiongjie, Zhang, Min, Liu, Mengyuan, Wang, Haofan, and Wang, Donglin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Human motion forecasting, with the goal of estimating future human behavior over a period of time, is a fundamental task in many real-world applications. However, existing works typically concentrate on predicting the major joints of the human body without considering the delicate movements of the human hands. In practical applications, hand gesture plays an important role in human communication with the real world, and expresses the primary intention of human beings. In this work, we are the first to formulate a whole-body human pose forecasting task, which jointly predicts the future body and hand activities. Correspondingly, we propose a novel Encoding-Alignment-Interaction (EAI) framework that aims to predict both coarse (body joints) and fine-grained (gestures) activities collaboratively, enabling expressive and cross-facilitated forecasting of 3D whole-body human motions. Specifically, our model involves two key constituents: cross-context alignment (XCA) and cross-context interaction (XCI). Considering the heterogeneous information within the whole-body, XCA aims to align the latent features of various human components, while XCI focuses on effectively capturing the context interaction among the human components. We conduct extensive experiments on a newly-introduced large-scale benchmark and achieve state-of-the-art performance. The code is public for research purposes at https://github.com/Dingpx/EAI., Comment: Accepted by AAAI24
Published: 2023

9. Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

Author: Chen, Anthony, Yang, Huanrui, Gan, Yulu, Gudovskiy, Denis A, Dong, Zhen, Wang, Haofan, Okuno, Tomoyuki, Nakata, Yohei, Keutzer, Kurt, and Zhang, Shanghang
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Uncertainty estimation is crucial for machine learning models to detect out-of-distribution (OOD) inputs. However, the conventional discriminative deep learning classifiers produce uncalibrated closed-set predictions for OOD data. A more robust classifiers with the uncertainty estimation typically require a potentially unavailable OOD dataset for outlier exposure training, or a considerable amount of additional memory and compute to build ensemble models. In this work, we improve on uncertainty estimation without extra OOD data or additional inference costs using an alternative Split-Ensemble method. Specifically, we propose a novel subtask-splitting ensemble training objective, where a common multiclass classification task is split into several complementary subtasks. Then, each subtask's training data can be considered as OOD to the other subtasks. Diverse submodels can therefore be trained on each subtask with OOD-aware objectives. The subtask-splitting objective enables us to share low-level features across submodels to avoid parameter and computational overheads. In particular, we build a tree-like Split-Ensemble architecture by performing iterative splitting and pruning from a shared backbone model, where each branch serves as a submodel corresponding to a subtask. This leads to improved accuracy and uncertainty estimation across submodels under a fixed ensemble computation budget. Empirical study with ResNet-18 backbone shows Split-Ensemble, without additional computation cost, improves accuracy over a single model by 0.8%, 1.8%, and 25.5% on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively. OOD detection for the same backbone and in-distribution datasets surpasses a single model baseline by, correspondingly, 2.2%, 8.1%, and 29.6% mean AUROC., Comment: ICML2024. Project website is available at https://antonioo-c.github.io/projects/split-ensemble
Published: 2023

10. Synthesizing Physically Plausible Human Motions in 3D Scenes

Author: Pan, Liang, Wang, Jingbo, Huang, Buzhen, Zhang, Junyu, Wang, Haofan, Tang, Xu, and Wang, Yangang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: Synthesizing physically plausible human motions in 3D scenes is a challenging problem. Kinematics-based methods cannot avoid inherent artifacts (e.g., penetration and foot skating) due to the lack of physical constraints. Meanwhile, existing physics-based methods cannot generalize to multi-object scenarios since the policy trained with reinforcement learning has limited modeling capacity. In this work, we present a framework that enables physically simulated characters to perform long-term interaction tasks in diverse, cluttered, and unseen scenes. The key idea is to decompose human-scene interactions into two fundamental processes, Interacting and Navigating, which motivates us to construct two reusable Controller, i.e., InterCon and NavCon. Specifically, InterCon contains two complementary policies that enable characters to enter and leave the interacting state (e.g., sitting on a chair and getting up). To generate interaction with objects at different places, we further design NavCon, a trajectory following policy, to keep characters' locomotion in the free space of 3D scenes. Benefiting from the divide and conquer strategy, we can train the policies in simple environments and generalize to complex multi-object scenes. Experimental results demonstrate that our framework can synthesize physically plausible long-term human motions in complex 3D scenes. Code will be publicly released at https://github.com/liangpan99/InterScene.
Published: 2023

11. 1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop

Author: Wang, Qixun, Guo, Xiaofeng, and Wang, Haofan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes. Existing PSG methods utilize one-stage paradigm which simultaneously generates scene graphs and predicts semantic segmentation masks or two-stage paradigm that first adopt an off-the-shelf panoptic segmentor, then pairwise relationship prediction between these predicted objects. One-stage approach despite having a simplified training paradigm, its segmentation results are usually under-satisfactory, while two-stage approach lacks global context and leads to low performance on relation prediction. To bridge this gap, in this paper, we propose GRNet, a Global Relation Network in two-stage paradigm, where the pre-extracted local object features and their corresponding masks are fed into a transformer with class embeddings. To handle relation ambiguity and predicate classification bias caused by long-tailed distribution, we formulate relation prediction in the second stage as a multi-class classification task with soft label. We conduct comprehensive experiments on OpenPSG dataset and achieve the state-of-art performance on the leadboard. We also show the effectiveness of our soft label strategy for long-tailed classes in ablation studies. Our code has been released in https://github.com/wangqixun/mfpsg., Comment: Tech Report
Published: 2023

12. One-shot Implicit Animatable Avatars with Model-based Priors

Author: Huang, Yangyi, Yi, Hongwei, Liu, Weiyang, Wang, Haofan, Wu, Boxi, Wang, Wenxiao, Lin, Binbin, Zhang, Debing, and Cai, Deng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics
Abstract: Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can effortlessly estimate the body geometry and imagine full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT utilizes the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pretrained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. Taking advantage of the CLIP models, ELICIT can use text descriptions to generate text-conditioned unseen regions. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar. Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed strong baseline methods of avatar creation when only a single image is available. The code is public for research purposes at https://huangyangyi.github.io/ELICIT/., Comment: To appear at ICCV 2023. Project website: https://huangyangyi.github.io/ELICIT/
Published: 2022

13. Assessing the utility of MRI-based vertebral bone quality (VBQ) for predicting lumbar pedicle screw loosening

Author: Gao, Yu, Ye, Wu, Ge, Xuhui, Wang, Haofan, Xiong, Junjun, Zhu, Yufeng, Wang, Zhuanghui, Wang, Jiaxing, Tang, Pengyu, Liu, Wei, and Cai, Weihua
Published: 2024
Full Text: View/download PDF

14. LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval

Author: Bai, Jinbin, Liu, Chunhui, Ni, Feiyue, Wang, Haofan, Hu, Mengying, Guo, Xiaofeng, and Cheng, Lele
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Video-text retrieval is a class of cross-modal representation learning problems, where the goal is to select the video which corresponds to the text query between a given text query and a pool of candidate videos. The contrastive paradigm of vision-language pretraining has shown promising success with large-scale datasets and unified transformer architecture, and demonstrated the power of a joint latent space. Despite this, the intrinsic divergence between the visual domain and textual domain is still far from being eliminated, and projecting different modalities into a joint latent space might result in the distorting of the information inside the single modality. To overcome the above issue, we present a novel mechanism for learning the translation relationship from a source modality space $\mathcal{S}$ to a target modality space $\mathcal{T}$ without the need for a joint latent space, which bridges the gap between visual and textual domains. Furthermore, to keep cycle consistency between translations, we adopt a cycle loss involving both forward translations from $\mathcal{S}$ to the predicted target space $\mathcal{T'}$, and backward translations from $\mathcal{T'}$ back to $\mathcal{S}$. Extensive experiments conducted on MSR-VTT, MSVD, and DiDeMo datasets demonstrate the superiority and effectiveness of our LaT approach compared with vanilla state-of-the-art methods.
Published: 2022

15. Characterization of winter PM2.5 source contributions and impacts of meteorological conditions and anthropogenic emission changes in the Sichuan Basin, 2002–2020

Author: Xian, Yaohan, Zhang, Yang, Liu, Zhihong, Wang, Haofan, and Xiong, Tianxin
Published: 2024
Full Text: View/download PDF

16. Pore pressure prediction based on rock physics theory and its application in seismic inversion

Author: Wang, Haofan, Ma, Jinfeng, and Li, Lin
Published: 2024
Full Text: View/download PDF

17. MEIAT-CMAQ: A modular emission inventory allocation tool for Community Multiscale Air Quality Model

Author: Wang, Haofan, Qiu, Jiaxin, Liu, Yiming, Fan, Qi, Lu, Xiao, Zhang, Yang, Wu, Kai, Shen, Ao, Xu, Yifei, Jin, Yinbao, Zhu, Yuqi, Sun, Jiayin, and Wang, Haolin
Published: 2024
Full Text: View/download PDF

18. Experimental study of the influence of synergistic effects on the co-firing characteristics of biomass and coal

Author: Pu, Yang, Wang, Haofan, Wang, Xianhua, Lim, Mooktzeng, Yao, Bin, Yang, Haiping, and Lou, Chun
Published: 2024
Full Text: View/download PDF

19. Idarubicin versus epirubicin in drug-eluting beads-transarterial chemoembolization for treating hepatocellular carcinoma: A real-world retrospective study

Author: Zhao, Chenghao, Yan, Huzheng, Xiang, Zhanwang, Wang, Haofan, Li, Mingan, and Huang, Mingsheng
Published: 2023
Full Text: View/download PDF

20. Silica gel supported Cu nanoparticles for selective reductive etherification of furfural into isopropyl furfuryl ether

Author: Wang, Rong, Zhang, Min, Wang, Qi, Zhang, Wei, Wang, Haofan, Zheng, Mengfei, Qu, Zhuodong, Zhou, Zhiyang, Li, Peng, and Yang, Jing-He
Published: 2024
Full Text: View/download PDF

21. TransAug: Translate as Augmentation for Sentence Embeddings

Author: Wang, Jue, Wang, Haofan, Wu, Xing, Gao, Chaochen, and Zhang, Debing
Subjects: Computer Science - Computation and Language
Abstract: While contrastive learning greatly advances the representation of sentence embeddings, it is still limited by the size of the existing sentence datasets. In this paper, we present TransAug (Translate as Augmentation), which provide the first exploration of utilizing translated sentence pairs as data augmentation for text, and introduce a two-stage paradigm to advances the state-of-the-art sentence embeddings. Instead of adopting an encoder trained in other languages setting, we first distill a Chinese encoder from a SimCSE encoder (pretrained in English), so that their embeddings are close in semantic space, which can be regraded as implicit data augmentation. Then, we only update the English encoder via cross-lingual contrastive learning and frozen the distilled Chinese encoder. Our approach achieves a new state-of-art on standard semantic textual similarity (STS), outperforming both SimCSE and Sentence-T5, and the best performance in corresponding tracks on transfer tasks evaluated by SentEval., Comment: The result in this paper are obtained under a bug. Because we train our model under an evaluation setting (dropout and batch normalization are 0.), but the dropout in our paper is 0.1. So, there is a big mistake in our paper and is not appropriate to published
Published: 2021

22. EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling

Author: Wang, Jue, Wang, Haofan, Deng, Jincan, Wu, Weijia, and Zhang, Debing
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: While large scale pre-training has achieved great achievements in bridging the gap between vision and language, it still faces several challenges. First, the cost for pre-training is expensive. Second, there is no efficient way to handle the data noise which degrades model performance. Third, previous methods only leverage limited image-text paired data, while ignoring richer single-modal data, which may result in poor generalization to single-modal downstream tasks. In this work, we propose an EfficientCLIP method via Ensemble Confident Learning to obtain a less noisy data subset. Extra rich non-paired single-modal text data is used for boosting the generalization of text branch. We achieve the state-of-the-art performance on Chinese cross-modal retrieval tasks with only 1/10 training resources compared to CLIP and WenLan, while showing excellent generalization to single-modal tasks, including text retrieval and text classification.
Published: 2021

23. Source apportionment and formation of warm season ozone pollution in Chengdu based on CMAQ-ISAM

Author: Xian, Yaohan, Zhang, Yang, Liu, Zhihong, Wang, Haofan, Wang, Junjie, and Tang, Chao
Published: 2024
Full Text: View/download PDF

24. Application of Short T1 Inversion Recovery Sequence in Increased Signal Intensity Following Cervical Spondylotic Myelopathy

Author: Wang, Haofan, Ye, Wu, Xiong, Junjun, Gao, Yu, Ge, Xuhui, Wang, Jiaxing, Zhu, Yufeng, Tang, Pengyu, Zhou, Yitong, Wang, Xiaokun, Gu, Yao, Liu, Wei, Luo, Yongjun, and Cai, Weihua
Published: 2024
Full Text: View/download PDF

25. Deep simulated annealing for the discovery of novel dental anesthetics with local anesthesia and anti-inflammatory properties

Author: Hao, Yihang, Wang, Haofan, Liu, Xianggen, Gai, Wenrui, Hu, Shilong, Liu, Wencheng, Miao, Zhuang, Gan, Yu, Yu, Xianghua, Shi, Rongjia, Tan, Yongzhen, Kang, Ting, Hai, Ao, Zhao, Yi, Fu, Yihang, Tang, Yaling, Ye, Ling, Liu, Jin, Liang, Xinhua, and Ke, Bowen
Published: 2024
Full Text: View/download PDF

26. Analysis of comprehensive magnetic shielding and optimization design of high-conductive layers in magnetic shielding devices for atomic magnetometer

Author: Sun, Jinji, Ren, Jianyi, Xu, Xueping, Zhou, Weiyong, Qian, Jiang, Wang, Hanmou, and Wang, Haofan
Published: 2024
Full Text: View/download PDF

27. Safety, efficacy, and survival of drug-eluting beads-transarterial chemoembolization vs. conventional-transarterial chemoembolization in advanced HCC patients with main portal vein tumor thrombus

Author: Chen, Junwei, Lai, Lisha, Zhou, Churen, Luo, Junyang, Wang, Haofan, Li, Mingan, and Huang, Mingsheng
Published: 2023
Full Text: View/download PDF

28. Exosomal USP13 derived from microvascular endothelial cells regulates immune microenvironment and improves functional recovery after spinal cord injury by stabilizing IκBα

Author: Ge, Xuhui, Zhou, Zheng, Yang, Siting, Ye, Wu, Wang, Zhuanghui, Wang, Jiaxing, Xiao, Chenyu, Cui, Min, Zhou, Jiawen, Zhu, Yufeng, Wang, Rixiao, Gao, Yu, Wang, Haofan, Tang, Pengyu, Zhou, Xuhui, Wang, Ce, and Cai, Weihua
Published: 2023
Full Text: View/download PDF

29. When Differential Privacy Meets Interpretability: A Case Study

Author: Naidu, Rakshit, Priyanshu, Aman, Kumar, Aadith, Kotti, Sasikanth, Wang, Haofan, and Mireshghallah, Fatemehsadat
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Cryptography and Security
Abstract: Given the increase in the use of personal data for training Deep Neural Networks (DNNs) in tasks such as medical imaging and diagnosis, differentially private training of DNNs is surging in importance and there is a large body of work focusing on providing better privacy-utility trade-off. However, little attention is given to the interpretability of these models, and how the application of DP affects the quality of interpretations. We propose an extensive study into the effects of DP training on DNNs, especially on medical imaging applications, on the APTOS dataset., Comment: 4 pages, 7 figures; Extended abstract presented at RCV-CVPR'21
Published: 2021

30. Transjugular intrahepatic portosystemic shunt for portal hypertension with chronic portal vein occlusion

Author: Luo, Junyang, Li, Mingan, Wu, Jialin, Wang, Haofan, Pan, Tao, Wu, Chun, Chen, Junwei, Huang, Mingsheng, and Jiang, Zaibo
Published: 2024
Full Text: View/download PDF

31. Hypoxia induced cell dormancy of salivary adenoid cystic carcinoma through miR-922/DEC2 axis

Author: Dai, Li, Xian, Hongchun, Wang, Haofan, Li, Mao, Zhang, Mei, Liang, Xin-hua, and Tang, Ya-ling
Published: 2024
Full Text: View/download PDF

32. Automatic Speech Verification Spoofing Detection

Author: Mo, Shentong, Wang, Haofan, Ren, Pinxu, and Chi, Ta-Chung
Subjects: Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Automatic speech verification (ASV) is the technology to determine the identity of a person based on their voice. While being convenient for identity verification, we should aim for the highest system security standard given that it is the safeguard of valuable digital assets. Bearing this in mind, we follow the setup in ASVSpoof 2019 competition to develop potential countermeasures that are robust and efficient. Two metrics, EER and t-DCF, will be used for system evaluation.
Published: 2020

33. Assessment of tropospheric ozone simulations in a regional chemical transport model using GEOS-Chem outputs as chemical boundary conditions

Author: Zhu, Yuqi, Liu, Yiming, Li, Siting, Wang, Haolin, Lu, Xiao, Wang, Haichao, Shen, Chong, Chen, Xiaoyang, Chan, Pakwai, Shen, Ao, Wang, Haofan, Jin, Yinbao, Xu, Yifei, Fan, Shaojia, and Fan, Qi
Published: 2024
Full Text: View/download PDF

34. A simultaneous energy self-sufficient desalination and energy output process based on a novel membrane stack design

Author: Bao, Zhiqi, Zhang, Xu, Wang, Haofan, Yuan, Yuting, Li, Zhiwei, Zhu, Wending, Liu, Li, Jin, Guanping, and Liu, Yahua
Published: 2025
Full Text: View/download PDF

35. Rising frequency of ozone-favorable synoptic weather patterns contributes to 2015–2022 ozone increase in Guangzhou

Author: Liu, Nanxi, He, Guowen, Wang, Haolin, He, Cheng, Wang, Haofan, Liu, Chenxi, Wang, Yiming, Wang, Haichao, Li, Lei, Lu, Xiao, and Fan, Shaojia
Published: 2025
Full Text: View/download PDF

36. SS-CAM: Smoothed Score-CAM for Sharper Visual Feature Localization

Author: Wang, Haofan, Naidu, Rakshit, Michael, Joy, and Kundu, Soumya Snigdha
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Interpretation of the underlying mechanisms of Deep Convolutional Neural Networks has become an important aspect of research in the field of deep learning due to their applications in high-risk environments. To explain these black-box architectures there have been many methods applied so the internal decisions can be analyzed and understood. In this paper, built on the top of Score-CAM, we introduce an enhanced visual explanation in terms of visual sharpness called SS-CAM, which produces centralized localization of object features within an image through a smooth operation. We evaluate our method on the ILSVRC 2012 Validation dataset, which outperforms Score-CAM on both faithfulness and localization tasks., Comment: 7 pages, 4 figures and 4 tables
Published: 2020

37. Smoothed Geometry for Robust Attribution

Author: Wang, Zifan, Wang, Haofan, Ramkumar, Shakul, Fredrikson, Matt, Mardziel, Piotr, and Datta, Anupam
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs. This lack of robustness is especially problematic in high-stakes applications where adversarially-manipulated explanations could impair safety and trustworthiness. Building on a geometric understanding of these attacks presented in recent work, we identify Lipschitz continuity conditions on models' gradient that lead to robust gradient-based attributions, and observe that smoothness may also be related to the ability of an attack to transfer across multiple attribution methods. To mitigate these attacks in practice, we propose an inexpensive regularization method that promotes these conditions in DNNs, as well as a stochastic smoothing technique that does not require re-training. Our experiments on a range of image models demonstrate that both of these mitigations consistently improve attribution robustness, and confirm the role that smooth geometry plays in these attacks on real, large-scale models.
Published: 2020

38. XDeep: An Interpretation Tool for Deep Neural Networks

Author: Yang, Fan, Zhang, Zijian, Wang, Haofan, Li, Yuening, and Hu, Xia
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: XDeep is an open-source Python package developed to interpret deep models for both practitioners and researchers. Overall, XDeep takes a trained deep neural network (DNN) as the input, and generates relevant interpretations as the output with the post-hoc manner. From the functionality perspective, XDeep integrates a wide range of interpretation algorithms from the state-of-the-arts, covering different types of methodologies, and is capable of providing both local explanation and global explanation for DNN when interpreting model behaviours. With the well-documented API designed in XDeep, end-users can easily obtain the interpretations for their deep models at hand with several lines of codes, and compare the results among different algorithms. XDeep is generally compatible with Python 3, and can be installed through Python Package Index (PyPI). The source codes are available at: https://github.com/datamllab/xdeep.
Published: 2019

39. Modeling regional nitrogen cycle in the atmosphere: Present situation and its response to the future emissions control strategy

Author: Shen, Ao, Liu, Yiming, Lu, Xiao, Xu, Yifei, Jin, Yinbao, Wang, Haofan, Zhang, Juan, Wang, Xuemei, Chang, Ming, and Fan, Qi
Published: 2023
Full Text: View/download PDF

40. Do Cervical Parameters Increase the Risk of Thoracic Spinal Stenosis in Patients with Cervical Spinal Stenosis?

Author: Wang, Zhuanghui, Wang, Rixiao, Wang, Haofan, Gao, Yu, Ye, Wu, Zhu, Yufeng, Wang, Jiaxing, Tang, Pengyu, and Cai, Weihua
Published: 2023
Full Text: View/download PDF

41. Enhancing photocatalytic CO2 reduction reaction on amorphous Ni@NiO aerogel via oxygen incorporated tuning

Author: Zhong, Zuqi, Wang, Haofan, Liang, Shujie, Zhong, Xiaohui, and Deng, Hong
Published: 2023
Full Text: View/download PDF

42. Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks

Author: Wang, Haofan, Wang, Zifan, Du, Mengnan, Yang, Fan, Zhang, Zijian, Ding, Sirui, Mardziel, Piotr, and Hu, Xia
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions. In this paper, we develop a novel post-hoc visual explanation method called Score-CAM based on class activation mapping. Unlike previous class activation mapping based approaches, Score-CAM gets rid of the dependence on gradients by obtaining the weight of each activation map through its forward passing score on target class, the final result is obtained by a linear combination of weights and activation maps. We demonstrate that Score-CAM achieves better visual performance and fairness for interpreting the decision making process. Our approach outperforms previous methods on both recognition and localization tasks, it also passes the sanity check. We also indicate its application as debugging tools. Official code has been released., Comment: Accepted to CVPR 2020: Workshop on Fair, Data Efficient and Trusted Computer Vision
Published: 2019

43. Contextual Local Explanation for Black Box Classifiers

Author: Zhang, Zijian, Yang, Fan, Wang, Haofan, and Hu, Xia
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We introduce a new model-agnostic explanation technique which explains the prediction of any classifier called CLE. CLE gives an faithful and interpretable explanation to the prediction, by approximating the model locally using an interpretable model. We demonstrate the flexibility of CLE by explaining different models for text, tabular and image classification, and the fidelity of it by doing simulated user experiments.
Published: 2019

44. Modeling the electrooxidation of CO on the catalyst with heterogeneous sites

Author: Yuan, Jiayu, Yang, Guangxing, Wang, Haofan, Cao, Yonghai, Wang, Hongjuan, Peng, Feng, and Yu, Hao
Published: 2023
Full Text: View/download PDF

45. Impacts of land use and land cover changes on local meteorology and PM2.5 concentrations in Changchun, Northeast China

Author: Qiu, Jiaxin, Fang, Chunsheng, Tian, Naixu, Wang, Haofan, and Wang, Ju
Published: 2023
Full Text: View/download PDF

46. An automatic spectral baseline estimation method and its application in industrial alkali-pulverized coal flames

Author: Pu, Yang, Wang, Haofan, Lou, Chun, and Yao, Bin
Published: 2023
Full Text: View/download PDF

47. Hybrid coarse-fine classification for head pose estimation

Author: Wang, Haofan, Chen, Zhenghua, and Zhou, Yi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Head pose estimation, which computes the intrinsic Euler angles (yaw, pitch, roll) from the human, is crucial for gaze estimation, face alignment, and 3D reconstruction. Traditional approaches heavily relies on the accuracy of facial landmarks. It limits their performances, especially when the visibility of the face is not in good condition. In this paper, to do the estimation without facial landmarks, we combine the coarse and fine regression output together for a deep network. Utilizing more quantization units for the angles, a fine classifier is trained with the help of other auxiliary coarse units. Integrating regression is adopted to get the final prediction. The proposed approach is evaluated on three challenging benchmarks. It achieves the state-of-the-art on AFLW2000, BIWI and performs favorably on AFLW. The code has been released on Github., Comment: 5 pages
Published: 2019

48. Influence of the synergistic effects and potassium on the combustion behaviors of biomass under high heating rate

Author: Wang, Shusen, Zou, Chun, Lou, Chun, Yang, Haiping, Jiang, Tong, Wang, Cong, and Wang, Haofan
Published: 2023
Full Text: View/download PDF

49. State-of-health estimation for lithium-ion batteries based on GWO–VMD-transformer neural network

Author: Wang, Haofan, primary, Sun, Jing, additional, and Zhai, Qianchun, additional
Published: 2024
Full Text: View/download PDF

50. Prediction of effective percutaneous transhepatic biliary drainage in patients with hepatocellular carcinoma: A multi-central retrospective study

Author: Wang, Haofan, Mao, Yitao, Zhang, Chunning, Hu, Xiaojun, Chen, Bin, Mu, Luwen, Wang, Shuyi, Lin, Yifen, Xiang, Zhanwang, and Huang, Mingsheng
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

356 results on '"Wang, Haofan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources