4,221 results on '"SHEN Fei"'
Search Results
2. LEO satellite Internet resource allocation strategy based on terminal traffic prediction
- Author
-
SHEN Fei, LYU Chengcheng, ZHANG Jiaxuan, and RUAN Xiaoting
- Subjects
LEO satellite Internet ,data flow prediction ,resource allocation ,data transmission ,task offloading ,Telecommunication ,TK5101-6720 - Abstract
A resource allocation strategy for the low earth orbit (LEO) satellite Internet based on terminal traffic prediction was proposed to address the problems of blind coverage spots in ground network and the low resource utilization of satellite network. An improved LSTM-ARIMA algorithm was proposed with real datasets by the strategy to accurately predict the data traffic generated in the ground area over a certain period of time in the future. Two communication models, differentiated data transmission and task offloading were constructed through Stackelberg games, taking into the data processing latency and energy consumption account. By solving the Nash equilibrium, the optimal ratio for users to transmit data or unload tasks through the LEO satellite Internet, as well as the optimal pricing for satellites to provide network services, were obtained. Extensive simulation results verify that the proposed strategy can increase the revenue by approximately 40% in data transmission services and 50% in task offloading services.
- Published
- 2024
- Full Text
- View/download PDF
3. A Modeling Study on the Teaching System of Chinese Language in Colleges and Universities and the Cultivation Path of Bi-Creative Talents
- Author
-
Shen Fei
- Subjects
chinese language ,dual-creative talent training ,competency model ,qfd model ,reliability analysis ,97b20 ,Mathematics ,QA1-939 - Abstract
The trend of cultivating Chinese language dual-creative talents in colleges and universities is inevitable due to the increasing importance of national soft power. This paper studies the Chinese language teaching system and the cultivation path of dual-creative talents in colleges and universities using a modeling approach. The competency model’s iceberg model determines the dimensions and measurement items of dual-creative talents, verified through reliability analysis. At the same time, a QFD model for quality assurance of dual-creative talent cultivation in universities was established, and the importance of demand factors was calculated by analyzing the obtained results of each rating. Among the four factors, the degree of conformity was from the highest to the lowest: creative attitude, basic creative ability, creative expansion ability, professional knowledge, and skills, with rating values of 4.19, 4.12, 3.67, and 3.49, respectively. The empirical analysis of QFD showed that the highest weight of the demand level of development achievement of dual-creative talents cultivation reached 0.4066, followed by the basic conditions of teaching, which reached This study can provide some reference and help for the cultivation of Chinese language double-creative talents in colleges and universities, and help improve the quality of double-creative talents cultivation.
- Published
- 2024
- Full Text
- View/download PDF
4. Influence of Microbiota-modulating Agents on Gut Flora in Community Patients with Diabetic Nephropathy
- Author
-
SHEN Fei, JIANG Weiping, MEI Xiaobin, HAN Yiping, ZHAO Jiayi, FAN Jian, GU Juan, SHEN Yanhong, XU Hongmei, ZHANG Dan, MEN Ying, DING Haiguang, CHEN Caiping, HAN Junhua
- Subjects
diabetic nephropathies ,intestinal flora ,culturelle ,intervention study ,dysbacteriosis ,chronic inflammation ,intestinal mucosal barrier ,community health centers ,Medicine - Abstract
Background Imbalanced gut flora caused by changes in gut microecological structure and diversity plays an important role in the interaction between diabetes and chronic kidney disease. Rational application of probiotics, prebiotics and other microbiota-modulating agents is contributive to the improvement of gut microbial flora environment and chronic inflammation, as well as the delay of deterioration of renal function in patients with diabetic nephropathy (DN) . Objective To understand the effect of probiotics, a microbiota-modulating agent, administered based on gut flora status in patients with DN. Methods Participants were selected from Shanghai Yinhang Community Health Center by use of stratified random sampling in 2019, including 115 patients with DN were randomly divided into control group (57 with usual treatment) and treatment group (58 with treatment with microbiota-modulating agents) . Laboratory test indices and intestinal bacterial culture results were compared between the two groups after eight weeks of treatment to assess the effect of microbiota-modulating agents on improving gut flora in DN. Results Among 115 patients with DN, there were 28 males and 87 females, the mean age was (62.9±10.0) years, and the duration of diabetic nephropathy was (14.3±7.1) years. There were no significant differences in the proportion of males, mean age, body mass index, proportion of early DN, and duration of DN between DN patients with usual treatment and those with microbiota-modulating agents treatment (P>0.05) . Compared with DN patients with usual treatment, DN patients with microbiota-modulating agents treatment had decreased levels of glucose, triglyceride, blood urea nitrogen, serum creatinine, albumin to creatinine ratio, Cystatin C, C-reactive protein, interleukin-1β, and tumor necrosis factor-α, and increased levels of high-density lipoprotein and estimated glomerular filtration rate after treatment (P
- Published
- 2023
- Full Text
- View/download PDF
5. BIM Engineering Management Oriented to Curve Equation Model
- Author
-
Shen Fei, Ma Qiang, and Salama Mohamed
- Subjects
curvilinear equation model ,bim engineering management ,internal right-angle step ,rolling straight curve ,differential equation solution method ,34a12 ,Mathematics ,QA1-939 - Abstract
This article uses the curve equation model to describe the initial value problem of differential equations in BIM project management. A new set of rolling curve-solving models is established for step-aligning in BIM project management. Based on the premise that the differential equation can be solved numerically, we appropriately simplify or set the required relational functions in the equation. Finally, we use mathematical software to numerically solve the differential equation and obtain the discrete function of the rolling curve. The research shows that the accuracy of the step flatness and the width of the stepped groove formed by the rolling curve of the solution in this paper is better than the original solution.
- Published
- 2023
- Full Text
- View/download PDF
6. Conservation Study of Hakka Architecture Nanyuan Shidi Wailong House
- Author
-
Chiu Chen-Yuan, Cai Fupeng, Dong Zuorong, Shen Fei, Lan Keran, Zeng Xinyue, and Deng Shiyun
- Subjects
Environmental sciences ,GE1-350 - Abstract
In Chinese traditional culture, the Hakka spirit embedded in traditional Hakka buildings occupies an extremely important position, and if the protection method is not appropriate, it will damage the typical buildings inherited by the Hakka for centuries, which is not conducive to the cultural inheritance of the Hakka. Traditional restoration methods are difficult to accurately and completely restore the original buildings due to reasons such as low efficiency, aging equipment and thus inability to achieve the expected restoration results. Based on the point cloud data obtained from CAD processing by 3D scanning technology, a comprehensive 3D building model information is established to form a building structure database, which is helpful for the formation of the Hakka traditional building database.
- Published
- 2024
- Full Text
- View/download PDF
7. Automated Monitoring and Emergency Response System for Sensitive Areas Along High-Speed Railway Lines
- Author
-
Chiu Chen-Yuan, Lin Yi-Chia, Shen Fei, Liu Yujie, Chen Xilin, and Lin Yuyan
- Subjects
Environmental sciences ,GE1-350 - Abstract
The safety of high-speed railroads is an important indicator of travelers’ trust, and it is also an important factor in determining whether or not the operation volume can reach the expected goal. Therefore, high-speed railroad companies in Europe, the United States, and Japan have invested heavily in establishing safety monitoring systems in order to avoid the occurrence of disasters and to strengthen the mechanism of crisis response and emergency treatment. High-speed railroad operation may occur in a variety of accidents, different disaster warning, emergency response and rescue, need to have different equipment, contingency measures, and well-trained personnel to deal with. Therefore, the safety standards for high-speed rail systems in advanced countries around the world are much higher than those for traditional train systems. The theories and technologies of disaster prevention and relief have matured both at home and abroad. This study combines the theories and technologies of disaster prevention with GIS, such as monitoring instruments, satellite positioning, remote sensing detection, radio transmission, the Internet, database management, and information management, to provide effective support for decision-making and analysis of commanders, to rapidly assist in crisis management, and to strengthen the mechanism of emergency response.
- Published
- 2024
- Full Text
- View/download PDF
8. Architectural design of an open cultural space for a cultural heritage art museum
- Author
-
Chen Yuanchiu, Zhang Shan, Dong Zuorong, Shen Fei, Chen Xilin, and Lin Yuyan
- Subjects
Social Sciences - Abstract
Starting from the interaction between art museums and the public and the way to get along with them, art museums can be closer to the life of the public, and provide the public with spiritual enjoyment, while also playing a certain role in promoting the dissemination and inheritance of local and even national culture. The audience, as the receiver of the exhibition information of the art museum, is also the medium of cultural dissemination of the art museum. This interactive relationship is exactly what we want to explore in the process of art museum design.
- Published
- 2024
- Full Text
- View/download PDF
9. RADICE: Causal Graph Based Root Cause Analysis for System Performance Diagnostic
- Author
-
Tonon, Andrea, Zhang, Meng, Caglayan, Bora, Shen, Fei, Gui, Tong, Wang, MingXue, and Zhou, Rong
- Subjects
Computer Science - Software Engineering - Abstract
Root cause analysis is one of the most crucial operations in software reliability regarding system performance diagnostic. It aims to identify the root causes of system performance anomalies, allowing the resolution or the future prevention of issues that can cause millions of dollars in losses. Common existing approaches relying on data correlation or full domain expert knowledge are inaccurate or infeasible in most industrial cases, since correlation does not imply causation, and domain experts may not have full knowledge of complex and real-time systems. In this work, we define a novel causal domain knowledge model representing causal relations about the underlying system components to allow domain experts to contribute partial domain knowledge for root cause analysis. We then introduce RADICE, an algorithm that through the causal graph discovery, enhancement, refinement, and subtraction processes is able to output a root cause causal sub-graph showing the causal relations between the system components affected by the anomaly. We evaluated RADICE with simulated data and reported a real data use case, sharing the lessons we learned. The experiments show that RADICE provides better results than other baseline methods, including causal discovery algorithms and correlation based approaches for root cause analysis., Comment: Accepted at IEEE SANER 2025
- Published
- 2025
10. Artificial Intelligence for Central Dogma-Centric Multi-Omics: Challenges and Breakthroughs
- Author
-
Xin, Lei, Huang, Caiyun, Li, Hao, Huang, Shihong, Feng, Yuling, Kong, Zhenglun, Liu, Zicheng, Li, Siyuan, Yu, Chang, Shen, Fei, and Tang, Hao
- Subjects
Quantitative Biology - Genomics - Abstract
With the rapid development of high-throughput sequencing platforms, an increasing number of omics technologies, such as genomics, metabolomics, and transcriptomics, are being applied to disease genetics research. However, biological data often exhibit high dimensionality and significant noise, making it challenging to effectively distinguish disease subtypes using a single-omics approach. To address these challenges and better capture the interactions among DNA, RNA, and proteins described by the central dogma, numerous studies have leveraged artificial intelligence to develop multi-omics models for disease research. These AI-driven models have improved the accuracy of disease prediction and facilitated the identification of genetic loci associated with diseases, thus advancing precision medicine. This paper reviews the mathematical definitions of multi-omics, strategies for integrating multi-omics data, applications of artificial intelligence and deep learning in multi-omics, the establishment of foundational models, and breakthroughs in multi-omics technologies, drawing insights from over 130 related articles. It aims to provide practical guidance for computational biologists to better understand and effectively utilize AI-based multi-omics machine learning algorithms in the context of central dogma.
- Published
- 2024
11. DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo
- Author
-
Yuan, Zhenlong, Luo, Jinguo, Shen, Fei, Li, Zhaoxin, Liu, Cong, Mao, Tianlu, and Wang, Zhaoqi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Patch deformation-based methods have recently exhibited substantial effectiveness in multi-view stereo, due to the incorporation of deformable and expandable perception to reconstruct textureless areas. However, such approaches typically focus on exploring correlative reliable pixels to alleviate match ambiguity during patch deformation, but ignore the deformation instability caused by mistaken edge-skipping and visibility occlusion, leading to potential estimation deviation. To remedy the above issues, we propose DVP-MVS, which innovatively synergizes depth-edge aligned and cross-view prior for robust and visibility-aware patch deformation. Specifically, to avoid unexpected edge-skipping, we first utilize Depth Anything V2 followed by the Roberts operator to initialize coarse depth and edge maps respectively, both of which are further aligned through an erosion-dilation strategy to generate fine-grained homogeneous boundaries for guiding patch deformation. In addition, we reform view selection weights as visibility maps and restore visible areas by cross-view depth reprojection, then regard them as cross-view prior to facilitate visibility-aware patch deformation. Finally, we improve propagation and refinement with multi-view geometry consistency by introducing aggregated visible hemispherical normals based on view selection and local projection depth differences based on epipolar lines, respectively. Extensive evaluations on ETH3D and Tanks & Temples benchmarks demonstrate that our method can achieve state-of-the-art performance with excellent robustness and generalization.
- Published
- 2024
12. Epidemiology and drug sensitivity analysis of Salmonella in Huadu District of Guangzhou from 2016 to 2020
- Author
-
CHEN Siting, FENG Feng, DU Lijun, SHEN Fei, TANG Fengzhen, and FANG Ping
- Subjects
salmonella ,epidemic characteristics ,serotype ,drug sensitivity ,Food processing and manufacture ,TP368-456 ,Nutrition. Foods and food supply ,TX341-641 - Abstract
ObjectiveTo study the epidemiological characteristics and drug sensitivity of Salmonella in Huadu District of Guangzhou from January 2016 to December 2020, so as to provide scientific basis for prevention, control, diagnosis and treatment of diseases caused by this kind of pathogens.MethodsSix thousand six hundred and sixty-five fecal samples of diarrhea patients in Huadu District of Guangzhou city from 2016 to 2020 were collected for isolation, identification, serotyping and drug sensitivity test.ResultsA total of 435 Salmonella strains were detected, with a total detection rate of 6.53%. The infection population was mainly infants (0-3 years old), and the sex ratio was male∶female=1.18∶1, which was not statistically significant. The epidemic peak was from May to November. General pediatrics, emergency medicine and gastroenterology were the three clinical departments with the highest positive detection rate. Salmonella typhimurium and Salmonella dublin were the dominant strains in the epidemic. Drug sensitivity test showed that the resistance rates of Salmonella to ampicillin, tetracycline, piperacillin and ampicillin/sulbactam were very high. The drug resistance rate of cephalosporins ranged from 17.01% to 22.63%. The resistance rate of quinolones (levofloxacin and ciprofloxacin) increased slightly. The result showed that the dominant strain of Salmonella in this area, Salmonella typhimurium, was less sensitive to β-lactam antibiotics in this study.ConclusionIn the recent 5 years, Salmonella infection in Huadu District of Guangzhou city was on the rise, especially in infants. The drug resistance of Salmonella in this area was relatively strong, especially Salmonella typhimurium. It suggested that the monitoring of the epidemic should be strengthened in this area, and antibiotics be used reasonably in the treatment.
- Published
- 2021
- Full Text
- View/download PDF
13. FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
- Author
-
Guo, Hao-Han, Liu, Kun, Shen, Fei-Yu, Wu, Yi-Chen, Xie, Feng-Long, Xie, Kun, and Xu, Kai-Tuo
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
This work proposes FireRedTTS, a foundation text-to-speech framework, to meet the growing demands for personalized and diverse generative speech applications. The framework comprises three parts: data processing, foundation system, and downstream applications. First, we comprehensively present our data processing pipeline, which transforms massive raw audio into a large-scale high-quality TTS dataset with rich annotations and a wide coverage of content, speaking style, and timbre. Then, we propose a language-model-based foundation TTS system. The speech signal is compressed into discrete semantic tokens via a semantic-aware speech tokenizer, and can be generated by a language model from the prompt text and audio. Then, a two-stage waveform generator is proposed to decode them to the high-fidelity waveform. We present two applications of this system: voice cloning for dubbing and human-like speech generation for chatbots. The experimental results demonstrate the solid in-context learning capability of FireRedTTS, which can stably synthesize high-quality speech consistent with the prompt text and audio. For dubbing, FireRedTTS can clone target voices in a zero-shot way for the UGC scenario and adapt to studio-level expressive voice characters in the PUGC scenario via few-shot fine-tuning with 1-hour recording. Moreover, FireRedTTS achieves controllable human-like speech generation in a casual style with paralinguistic behaviors and emotions via instruction tuning, to better serve spoken chatbots.
- Published
- 2024
14. Lesion-aware network for diabetic retinopathy diagnosis
- Author
-
Xia, Xue, Zhan, Kun, Fang, Yuming, Jiang, Wenhui, and Shen, Fei
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Deep learning brought boosts to auto diabetic retinopathy (DR) diagnosis, thus, greatly helping ophthalmologists for early disease detection, which contributes to preventing disease deterioration that may eventually lead to blindness. It has been proved that convolutional neural network (CNN)-aided lesion identifying or segmentation benefits auto DR screening. The key to fine-grained lesion tasks mainly lies in: (1) extracting features being both sensitive to tiny lesions and robust against DR-irrelevant interference, and (2) exploiting and re-using encoded information to restore lesion locations under extremely imbalanced data distribution. To this end, we propose a CNN-based DR diagnosis network with attention mechanism involved, termed lesion-aware network, to better capture lesion information from imbalanced data. Specifically, we design the lesion-aware module (LAM) to capture noise-like lesion areas across deeper layers, and the feature-preserve module (FPM) to assist shallow-to-deep feature fusion. Afterward, the proposed lesion-aware network (LANet) is constructed by embedding the LAM and FPM into the CNN decoders for DR-related information utilization. The proposed LANet is then further extended to a DR screening network by adding a classification layer. Through experiments on three public fundus datasets with pixel-level annotations, our method outperforms the mainstream methods with an area under curve of 0.967 in DR screening, and increases the overall average precision by 7.6%, 2.1%, and 1.2% in lesion segmentation on three datasets. Besides, the ablation study validates the effectiveness of the proposed sub-modules., Comment: This is submitted version wihout improvements by reviewers. The final version is published on International Journal of Imaging Systems and Techonology (https://onlinelibrary.wiley.com/doi/10.1002/ima.22933)
- Published
- 2024
- Full Text
- View/download PDF
15. SHREC: a SRE Behaviour Knowledge Graph Model for Shell Command Recommendations
- Author
-
Tonon, Andrea, Caglayan, Bora, Wang, MingXue, Hu, Peng, Shen, Fei, and Zhang, Puchao
- Subjects
Computer Science - Software Engineering - Abstract
In IT system operations, shell commands are common command line tools used by site reliability engineers (SREs) for daily tasks, such as system configuration, package deployment, and performance optimization. The efficiency in their execution has a crucial business impact since shell commands very often aim to execute critical operations, such as the resolution of system faults. However, many shell commands involve long parameters that make them hard to remember and type. Additionally, the experience and knowledge of SREs using these commands are almost always not preserved. In this work, we propose SHREC, a SRE behaviour knowledge graph model for shell command recommendations. We model the SRE shell behaviour knowledge as a knowledge graph and propose a strategy to directly extract such a knowledge from SRE historical shell operations. The knowledge graph is then used to provide shell command recommendations in real-time to improve the SRE operation efficiency. Our empirical study based on real shell commands executed in our company demonstrates that SHREC can improve the SRE operation efficiency, allowing to share and re-utilize the SRE knowledge., Comment: Accepted at IEEE SANER 2024
- Published
- 2024
- Full Text
- View/download PDF
16. Few-shot Defect Image Generation based on Consistency Modeling
- Author
-
Shi, Qingfeng, Wei, Jing, Shen, Fei, and Zhang, Zhengtao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image generation can solve insufficient labeled data issues in defect detection. Most defect generation methods are only trained on a single product without considering the consistencies among multiple products, leading to poor quality and diversity of generated results. To address these issues, we propose DefectDiffu, a novel text-guided diffusion method to model both intra-product background consistency and inter-product defect consistency across multiple products and modulate the consistency perturbation directions to control product type and defect strength, achieving diversified defect image generation. Firstly, we leverage a text encoder to separately provide consistency prompts for background, defect, and fusion parts of the disentangled integrated architecture, thereby disentangling defects and normal backgrounds. Secondly, we propose the double-free strategy to generate defect images through two-stage perturbation of consistency direction, thereby controlling product type and defect strength by adjusting the perturbation scale. Besides, DefectDiffu can generate defect mask annotations utilizing cross-attention maps from the defect part. Finally, to improve the generation quality of small defects and masks, we propose the adaptive attention-enhance loss to increase the attention to defects. Experimental results demonstrate that DefectDiffu surpasses state-of-the-art methods in terms of generation quality and diversity, thus effectively improving downstream defection performance. Moreover, defect perturbation directions can be transferred among various products to achieve zero-shot defect generation, which is highly beneficial for addressing insufficient data issues. The code are available at https://github.com/FFDD-diffusion/DefectDiffu.
- Published
- 2024
17. MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
- Author
-
Yuan, Zhenlong, Liu, Cong, Shen, Fei, Li, Zhaoxin, Luo, Jinguo, Mao, Tianlu, and Wang, Zhaoqi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, patch deformation-based methods have demonstrated significant strength in multi-view stereo by adaptively expanding the reception field of patches to help reconstruct textureless areas. However, such methods mainly concentrate on searching for pixels without matching ambiguity (i.e., reliable pixels) when constructing deformed patches, while neglecting the deformation instability caused by unexpected edge-skipping, resulting in potential matching distortions. Addressing this, we propose MSP-MVS, a method introducing multi-granularity segmentation prior for edge-confined patch deformation. Specifically, to avoid unexpected edge-skipping, we first aggregate and further refine multi-granularity depth edges gained from Semantic-SAM as prior to guide patch deformation within depth-continuous (i.e., homogeneous) areas. Moreover, to address attention imbalance caused by edge-confined patch deformation, we implement adaptive equidistribution and disassemble-clustering of correlative reliable pixels (i.e., anchors), thereby promoting attention-consistent patch deformation. Finally, to prevent deformed patches from falling into local-minimum matching costs caused by the fixed sampling pattern, we introduce disparity-sampling synergistic 3D optimization to help identify global-minimum matching costs. Evaluations on ETH3D and Tanks & Temples benchmarks prove our method obtains state-of-the-art performance with remarkable generalization.
- Published
- 2024
18. IMAGDressing-v1: Customizable Virtual Dressing
- Author
-
Shen, Fei, Jiang, Xin, He, Xin, Ye, Hu, Wang, Cong, Du, Xiaoyu, Li, Zechao, and Tang, Jinhui
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience. However, existing VTON technologies neglect the need for merchants to showcase garments comprehensively, including flexible control over garments, optional faces, poses, and scenes. To address this issue, we define a virtual dressing (VD) task focused on generating freely editable human images with fixed garments and optional conditions. Meanwhile, we design a comprehensive affinity metric index (CAMI) to evaluate the consistency between generated images and reference garments. Then, we propose IMAGDressing-v1, which incorporates a garment UNet that captures semantic features from CLIP and texture features from VAE. We present a hybrid attention module, including a frozen self-attention and a trainable cross-attention, to integrate garment features from the garment UNet into a frozen denoising UNet, ensuring users can control different scenes through text. IMAGDressing-v1 can be combined with other extension plugins, such as ControlNet and IP-Adapter, to enhance the diversity and controllability of generated images. Furthermore, to address the lack of data, we release the interactive garment pairing (IGPair) dataset, containing over 300,000 pairs of clothing and dressed images, and establish a standard pipeline for data assembly. Extensive experiments demonstrate that our IMAGDressing-v1 achieves state-of-the-art human image synthesis performance under various controlled conditions. The code and model will be available at https://github.com/muzishen/IMAGDressing.
- Published
- 2024
19. VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
- Author
-
Qu, Zhen, Tao, Xian, Prasad, Mukesh, Shen, Fei, Zhang, Zhengtao, Gong, Xinyi, and Ding, Guiguang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, large-scale vision-language models such as CLIP have demonstrated immense potential in zero-shot anomaly segmentation (ZSAS) task, utilizing a unified model to directly detect anomalies on any unseen product with painstakingly crafted text prompts. However, existing methods often assume that the product category to be inspected is known, thus setting product-specific text prompts, which is difficult to achieve in the data privacy scenarios. Moreover, even the same type of product exhibits significant differences due to specific components and variations in the production process, posing significant challenges to the design of text prompts. In this end, we propose a visual context prompting model (VCP-CLIP) for ZSAS task based on CLIP. The insight behind VCP-CLIP is to employ visual context prompting to activate CLIP's anomalous semantic perception ability. In specific, we first design a Pre-VCP module to embed global visual information into the text prompt, thus eliminating the necessity for product-specific prompts. Then, we propose a novel Post-VCP module, that adjusts the text embeddings utilizing the fine-grained features of the images. In extensive experiments conducted on 10 real-world industrial anomaly segmentation datasets, VCP-CLIP achieved state-of-the-art performance in ZSAS task. The code is available at https://github.com/xiaozhen228/VCP-CLIP.
- Published
- 2024
20. Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models
- Author
-
Shen, Fei, Ye, Hu, Liu, Sibo, Zhang, Jun, Wang, Cong, Han, Xiao, and Yang, Wei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent research showcases the considerable potential of conditional diffusion models for generating consistent stories. However, current methods, which predominantly generate stories in an autoregressive and excessively caption-dependent manner, often underrate the contextual consistency and relevance of frames during sequential generation. To address this, we propose a novel Rich-contextual Conditional Diffusion Models (RCDMs), a two-stage approach designed to enhance story generation's semantic consistency and temporal consistency. Specifically, in the first stage, the frame-prior transformer diffusion model is presented to predict the frame semantic embedding of the unknown clip by aligning the semantic correlations between the captions and frames of the known clip. The second stage establishes a robust model with rich contextual conditions, including reference images of the known clip, the predicted frame semantic embedding of the unknown clip, and text embeddings of all captions. By jointly injecting these rich contextual conditions at the image and feature levels, RCDMs can generate semantic and temporal consistency stories. Moreover, RCDMs can generate consistent stories with a single forward inference compared to autoregressive models. Our qualitative and quantitative results demonstrate that our proposed RCDMs outperform in challenging scenarios. The code and model will be available at https://github.com/muzishen/RCDMs.
- Published
- 2024
21. V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation
- Author
-
Wang, Cong, Tian, Kuan, Zhang, Jun, Guan, Yonghang, Luo, Feng, Shen, Fei, Jiang, Zhiwei, Gu, Qing, Han, Xiao, and Yang, Wei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effective due to interference from stronger conditions, posing a challenge in balancing these conditions. In our work on portrait video generation, we identified audio signals as particularly weak, often overshadowed by stronger signals such as facial pose and reference image. However, direct training with weak signals often leads to difficulties in convergence. To address this, we propose V-Express, a simple method that balances different control signals through the progressive training and the conditional dropout operation. Our method gradually enables effective control by weak conditions, thereby achieving generation capabilities that simultaneously take into account the facial pose, reference image, and audio. The experimental results demonstrate that our method can effectively generate portrait videos controlled by audio. Furthermore, a potential solution is provided for the simultaneous and effective use of conditions of varying strengths.
- Published
- 2024
22. Ensembling Diffusion Models via Adaptive Feature Aggregation
- Author
-
Wang, Cong, Tian, Kuan, Guan, Yonghang, Zhang, Jun, Jiang, Zhiwei, Shen, Fei, Han, Xiao, Gu, Qing, and Yang, Wei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The success of the text-guided diffusion model has inspired the development and release of numerous powerful diffusion models within the open-source community. These models are typically fine-tuned on various expert datasets, showcasing diverse denoising capabilities. Leveraging multiple high-quality models to produce stronger generation ability is valuable, but has not been extensively studied. Existing methods primarily adopt parameter merging strategies to produce a new static model. However, they overlook the fact that the divergent denoising capabilities of the models may dynamically change across different states, such as when experiencing different prompts, initial noises, denoising steps, and spatial locations. In this paper, we propose a novel ensembling method, Adaptive Feature Aggregation (AFA), which dynamically adjusts the contributions of multiple models at the feature level according to various states (i.e., prompts, initial noises, denoising steps, and spatial locations), thereby keeping the advantages of multiple diffusion models, while suppressing their disadvantages. Specifically, we design a lightweight Spatial-Aware Block-Wise (SABW) feature aggregator that adaptive aggregates the block-wise intermediate features from multiple U-Net denoisers into a unified one. The core idea lies in dynamically producing an individual attention map for each model's features by comprehensively considering various states. It is worth noting that only SABW is trainable with about 50 million parameters, while other models are frozen. Both the quantitative and qualitative experiments demonstrate the effectiveness of our proposed Adaptive Feature Aggregation method. The code is available at https://github.com/tenvence/afa/.
- Published
- 2024
23. SPEAK: Speech-Driven Pose and Emotion-Adjustable Talking Head Generation
- Author
-
Cai, Changpeng, Guo, Guinan, Li, Jiao, Su, Junhao, Shen, Fei, He, Chenghao, Xiao, Jing, Chen, Yuanxu, Dai, Lei, and Zhu, Feiyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,I.4.5 ,I.4.9 - Abstract
Most earlier researches on talking face generation have focused on the synchronization of lip motion and speech content. However, head pose and facial emotions are equally important characteristics of natural faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and cannot be applied to arbitrary subjects. In this paper, we propose a novel one-shot Talking Head Generation framework (SPEAK) that distinguishes itself from the general Talking Face Generation by enabling emotional and postural control. Specifically, we introduce Inter-Reconstructed Feature Disentanglement (IRFD) module to decouple facial features into three latent spaces. Then we design a face editing module that modifies speech content and facial latent codes into a single latent space. Subsequently, we present a novel generator that employs modified latent codes derived from the editing module to regulate emotional expression, head poses, and speech content in synthesizing facial animations. Extensive trials demonstrate that our method ensures lip synchronization with the audio while enabling decoupled control of facial features, it can generate realistic talking head with coordinated lip motions, authentic facial emotions, and smooth head movements. The demo video is available: https://anonymous.4open.science/r/SPEAK-8A22
- Published
- 2024
24. Measurement and spectral analysis of medical shock wave parameters based on flexible PVDF sensors
- Author
-
Xu, Liansheng, Shen, Fei, Fan, Fan, Wu, Qiong, Wang, Li, Li, Fengji, Fan, Yubo, and Niu, Haijun
- Published
- 2025
- Full Text
- View/download PDF
25. Biomass-templated strategy to synthesize Fe2P/Co2P heterojunction bifunctional electrocatalyst for high performance flexible zinc-air batteries
- Author
-
Huang, Kang, Hu, Jiapeng, Cao, Jicun, Wei, Xinmin, Liu, Shihao, Dai, Qiming, Shen, Fei, Zhang, Xinyang, Zhao, Xiaohui, Peng, Yang, Deng, Zhao, and Huang, Yizhong
- Published
- 2025
- Full Text
- View/download PDF
26. Detection of adulteration of non-transgenic soybean oil with transgenic soybean oil by integrating absorption, scattering with fluorescence spectroscopy
- Author
-
He, Xueming, Wang, Meng, You, Jie, Liu, Haowen, Shen, Fei, Wang, Liu, Li, Peng, and Fang, Yong
- Published
- 2025
- Full Text
- View/download PDF
27. Evaluation of instability in patients with chronic vestibular syndrome using dynamic stability indicators
- Author
-
Ma, Yingnan, Gao, Xing, Wang, Li, Lyu, Ziyang, Shen, Fei, and Niu, Haijun
- Published
- 2025
- Full Text
- View/download PDF
28. Disentangling Fluorescence of Endogenous Fluorescent Substances from Absorption and Scattering Effects for Quantitative Prediction for Oxidation Degree of Peanut Oils
- Author
-
Wang, Yue, Guo, Na, He, Xueming, Shen, Fei, and Liang, Yong
- Published
- 2024
- Full Text
- View/download PDF
29. Virtual DC machine-based distributed SoC balancing control strategy for parallel battery storage units in DC microgrids: Virtual DC machine-based distributed SoC balancing control strategy
- Author
-
Huang, Jun, Mao, Shiwei, and Shen, Fei
- Published
- 2024
- Full Text
- View/download PDF
30. Sliding Electrical Contact Model Considering Frictional and Joule Heating
- Author
-
Dai, Hang-Cen, Shen, Fei, Li, You-Hua, and Ke, Liao-Liang
- Published
- 2024
- Full Text
- View/download PDF
31. Orthogonal Latent Compression for Streaming Anomaly Detection in Industrial Vision
- Author
-
Gao, Han, Luo, Huiyuan, Shen, Fei, Zhang, Zhengtao, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Antonacopoulos, Apostolos, editor, Chaudhuri, Subhasis, editor, Chellappa, Rama, editor, Liu, Cheng-Lin, editor, Bhattacharya, Saumik, editor, and Pal, Umapada, editor
- Published
- 2025
- Full Text
- View/download PDF
32. VCP-CLIP: A Visual Context Prompting Model for Zero-Shot Anomaly Segmentation
- Author
-
Qu, Zhen, Tao, Xian, Prasad, Mukesh, Shen, Fei, Zhang, Zhengtao, Gong, Xinyi, Ding, Guiguang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
33. PosCap: Boosting Video Captioning with Part-of-Speech Guidance
- Author
-
Xiao, Jingfu, Chen, Zhiliang, Jiang, Wenhui, Fang, Yuming, Shen, Fei, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
34. LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network
- Author
-
Li, Hanqian, Zhang, Ruinan, Pan, Ye, Ren, Junchi, and Shen, Fei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Remote sensing target detection aims to identify and locate critical targets within remote sensing images, finding extensive applications in agriculture and urban planning. Feature pyramid networks (FPNs) are commonly used to extract multi-scale features. However, existing FPNs often overlook extracting low-level positional information and fine-grained context interaction. To address this, we propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information and facilitate fine-grained context interaction. The LR-FPN consists of two primary modules: the shallow position information extraction module (SPIEM) and the contextual interaction module (CIM). Specifically, SPIEM first maximizes the retention of solid location information of the target by simultaneously extracting positional and saliency information from the low-level feature map. Subsequently, CIM injects this robust location information into different layers of the original FPN through spatial and channel interaction, explicitly enhancing the object area. Moreover, in spatial interaction, we introduce a simple local and non-local interaction strategy to learn and retain the saliency information of the object. Lastly, the LR-FPN can be readily integrated into common object detection frameworks to improve performance significantly. Extensive experiments on two large-scale remote sensing datasets (i.e., DOTAV1.0 and HRSC2016) demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches. Our code and models will be publicly available.
- Published
- 2024
35. Low-Complexity Estimation Algorithm and Decoupling Scheme for FRaC System
- Author
-
Sun, Mengjiang, Chen, Peng, Cao, Zhenxin, and Shen, Fei
- Subjects
Electrical Engineering and Systems Science - Signal Processing - Abstract
With the leaping advances in autonomous vehicles and transportation infrastructure, dual function radar-communication (DFRC) systems have become attractive due to the size, cost and resource efficiency. A frequency modulated continuous waveform (FMCW)-based radar-communication system (FRaC) utilizing both sparse multiple-input and multiple-output (MIMO) arrays and index modulation (IM) has been proposed to form a DFRC system specifically designed for vehicular applications. In this paper, the three-dimensional (3D) parameter estimation problem in the FRaC is considered. Since the 3D-parameters including range, direction of arrival (DOA) and velocity are coupled in the estimating matrix of the FRaC system, the existing estimation algorithms cannot estimate the 3D-parameters accurately. Hence, a novel decomposed decoupled atomic norm minimization (DANM) method is proposed by splitting the 3D-parameter estimating matrix into multiple 2D matrices with sparsity constraints. Then, the 3D-parameters are estimated and efficiently and separately with the optimized decoupled estimating matrix. Moreover, the Cram\'{e}r-Rao lower bound (CRLB) of the 3D-parameter estimation are derived, and the computational complexity of the proposed algorithm is analyzed. Simulation results show that the proposed decomposed DANM method exploits the advantage of the virtual aperture in the existence of coupling caused by IM and sparse MIMO array and outperforms the co-estimation algorithm with lower computation complexity.
- Published
- 2024
- Full Text
- View/download PDF
36. Fractionation of poplar through solid acid pretreatment assisted by mild alkali-1-butanol immersion to effectively produce xylose, glucose, and solid lignin
- Author
-
Chen, Yang, Qi, Wei, Shakeel, Usama, Liang, Cuiyi, Wang, Wen, Hu, Yunzi, Wang, Zhongming, Yuan, Zhenhong, Shen, Fei, and Wang, Qiong
- Published
- 2024
- Full Text
- View/download PDF
37. Bifunctional electrolyte addition for longer life and higher capacity of aqueous zinc-ion hybrid supercapacitors
- Author
-
Zhang, Fan, Li, Si-Qi, Xia, Li-Nan, Yang, Chao, Li, Lei, Wang, Kai-Ming, Xu, Chen-Liang, Feng, Yuan-Yuan, Zhao, Bin, Shen, Fei, Han, Xiao-Gang, and Zhu, Ling-Yun
- Published
- 2024
- Full Text
- View/download PDF
38. Establishment and Validation of a Risk Prediction Model for Non-Invasive Ventilation Failure After Birth in Premature Infants with Gestational Age < 32 Weeks
- Author
-
Shen, Fei, Yu, Meng-ya, Rong, Hui, Guo, Yan, Zou, Yun-su, Cheng, Rui, and Yang, Yang
- Published
- 2024
- Full Text
- View/download PDF
39. Thermo-elastoplastic sliding frictional contact and wear analysis of FGM-coated half-planes
- Author
-
Zhou, Jia-Lin, Shen, Fei, El-Borgi, Sami, and Ke, Liao-Liang
- Published
- 2024
- Full Text
- View/download PDF
40. Perceived Parental Attachment and Psychological Distress Among Child Sexual Abuse Survivors: The Mediating Role of Coping Strategies
- Author
-
Shen, Fei and Liu, Yanhong
- Published
- 2024
- Full Text
- View/download PDF
41. Appreciation of differences: promoting diversity and flourishing among college students
- Author
-
Zhang, Ying, Shen, Fei, Paredes, Jean Carlos, and Wang, Cong
- Published
- 2024
- Full Text
- View/download PDF
42. Kinetic origin of hysteresis and the strongly enhanced reversible barocaloric effect by regulating the atomic coordination environment
- Author
-
Yu, Zi-Bing, Zhou, Hou-Bo, Hu, Feng-Xia, Wang, Jian-Tao, Shen, Fei-Ran, He, Lun-Hua, Tian, Zheng-Ying, Gao, Yi-Hong, Wang, Bing-Jie, Lin, Yuan, Kan, Yue, Wang, Jing, Chen, Yun-Zhong, Sun, Ji-Rong, Zhao, Tong-Yun, and Shen, Bao-Gen
- Published
- 2024
- Full Text
- View/download PDF
43. Optimization of a human induced pluripotent stem cell-derived sensory neuron model for the in vitro evaluation of taxane-induced neurotoxicity
- Author
-
Cantor, Erica L, Shen, Fei, Jiang, Guanglong, Philips, Santosh, and Schneider, Bryan P
- Published
- 2024
- Full Text
- View/download PDF
44. LINC00460/miR-186-3p/MYC feedback loop facilitates colorectal cancer immune escape by enhancing CD47 and PD-L1 expressions
- Author
-
Luo, Qingqing, Shen, Fei, Zhao, Sheng, Dong, Lan, Wei, Jianchang, Hu, He, Huang, Qing, Wang, Qiang, Yang, Ping, Liang, Wenlong, Li, Wanglin, He, Feng, and Cao, Jie
- Published
- 2024
- Full Text
- View/download PDF
45. A combined amplicon approach to nematode polyparasitism occurring in captive wild animals in southern China
- Author
-
Li, Hongyi, Ren, Zhengjiu, Wang, Weijian, Shen, Fei, Huang, Jingyi, Wang, Chuyue, Lu, Jinzhi, Pan, Xi, Xiao, Lihua, Feng, Yaoyu, and Yuan, Dongjuan
- Published
- 2024
- Full Text
- View/download PDF
46. Anomalous forward scattering of gain-assisted dielectric shell-coated metallic core spherical particles
- Author
-
Shen Fei, An Ning, Tao Yifei, Zhou Hongping, Jiang Zhaoneng, and Guo Zhongyi
- Subjects
core-shell nanoparticles ,mie theory ,forward scattering ,dipole and quadrupole ,gain medium ,Physics ,QC1-999 - Abstract
We have investigated the scattering properties of an individual core-shell nanoparticle using the Mie theory, which can be tuned to support both electric and magnetic modes simultaneously. In general, the suppression of forward scattering can be realized by the second Kerker condition. Here, a novel mechanism has to be adopted to explain zero-forward scattering, which originates from the complex interactions between dipolar and quadrupolar modes. However, for lossy and lossless core-shell spherical nanoparticles, zero-forward scattering can never be achieved because the real parts of Mie expansion coefficients are always positive. By adding proper gain in dielectric shell, zero-forward scattering can be found at certain incident wavelengths, which means that all electric and magnetic responses in Mie scattering can be counteracted totally in the forward direction. In addition, if the absolute values of dipolar and quadrupolar terms are in the same order of magnitude, the local scattering minimum and maximum can be produced away from the forward and backward directions due to the interacting effect between the dipolar and quadrupolar terms. Furthermore, by adding suitable gain in shell, super-forward scattering can also be realized at certain incident wavelengths. We also demonstrated that anomalously weak scattering or superscattering could be obtained for the core-shell nanoparticles with suitable gain in shell. In particular, for such a choice of suitable gain in shell, we can obtain zero-forward scattering and anomalously weak scattering at the same wavelength as well as super-forward scattering at another wavelength. These features may provide new opportunities for cloaking, plasmonic lasers, optical antennas, and so on.
- Published
- 2016
- Full Text
- View/download PDF
47. High-efficiency refractive index sensor based on the metallic nanoslit arrays with gain-assisted materials
- Author
-
Luo Linbao, Ge Caiwang, Tao Yifei, Zhu Lie, Zheng Kun, Wang Wei, Sun Yongxuan, Shen Fei, and Guo Zhongyi
- Subjects
gain-assisted materials ,nanoslit arrays ,cavity mode ,surface plasmon resonance (spr) ,Physics ,QC1-999 - Abstract
We have designed and investigated a three-band refractive index (RI) sensor in the range of 550–900 nm based on the metal nanoslit array with gain-assisted materials. The underlying mechanism of the three-band and enhanced characteristics of the metal nanoslit array with gain-assisted materials, have also been investigated theoretically and numerically. Three resonant peaks in transmission spectra are deemed to be in different plasmonic resonant modes in the metal nanoslit array, which leads to different responses for the plasmonic sensor. By embedding the structure into the CYTOP with proper gain-assisted materials, the sensing performances can be greatly enhanced due to a dramatic amplification of the extraordinary optical transmission (EOT) resonance by the gain medium. When the gain values reach their corresponding thresholds for the three plasmonic modes, the ultrahigh sensitivities in three bands can be obtained, and especially for the second resonant wavelength (λ2), the FOM=128.1 and FOM* = 39100 can be attained at the gain threshold of k =0.011. Due to these unique features, the designing scheme of the proposed gain-assisted nanoslit sensor could provide a powerful approach to optimize the performance of EOT-based sensors and offer an excellent platform for biological sensing.
- Published
- 2016
- Full Text
- View/download PDF
48. Optical Realization of Wave-Based Analog Computing with Metamaterials
- Author
-
Kaiyang Cheng, Yuancheng Fan, Weixuan Zhang, Yubin Gong, Shen Fei, and Hongqiang Li
- Subjects
analog optical computing ,metamaterials ,metasurfaces ,quantum algorithm ,edge detection ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Recently, the study of analog optical computing raised renewed interest due to its natural advantages of parallel, high speed and low energy consumption over conventional digital counterpart, particularly in applications of big data and high-throughput image processing. The emergence of metamaterials or metasurfaces in the last decades offered unprecedented opportunities to arbitrarily manipulate the light waves within subwavelength scale. Metamaterials and metasurfaces with freely controlled optical properties have accelerated the progress of wave-based analog computing and are emerging as a practical, easy-integration platform for optical analog computing. In this review, the recent progress of metamaterial-based spatial analog optical computing is briefly reviewed. We first survey the implementation of classical mathematical operations followed by two fundamental approaches (metasurface approach and Green’s function approach). Then, we discuss recent developments based on different physical mechanisms and the classical optical simulating of quantum algorithms are investigated, which may lead to a new way for high-efficiency signal processing by exploiting quantum behaviors. The challenges and future opportunities in the booming research field are discussed.
- Published
- 2020
- Full Text
- View/download PDF
49. Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
- Author
-
Shen, Fei, Ye, Hu, Zhang, Jun, Wang, Cong, Han, Xiao, and Yang, Wei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent work has showcased the significant potential of diffusion models in pose-guided person image synthesis. However, owing to the inconsistency in pose between the source and target images, synthesizing an image with a distinct pose, relying exclusively on the source image and target pose information, remains a formidable challenge. This paper presents Progressive Conditional Diffusion Models (PCDMs) that incrementally bridge the gap between person images under the target and source poses through three stages. Specifically, in the first stage, we design a simple prior conditional diffusion model that predicts the global features of the target image by mining the global alignment relationship between pose coordinates and image appearance. Then, the second stage establishes a dense correspondence between the source and target images using the global features from the previous stage, and an inpainting conditional diffusion model is proposed to further align and enhance the contextual features, generating a coarse-grained person image. In the third stage, we propose a refining conditional diffusion model to utilize the coarsely generated image from the previous stage as a condition, achieving texture restoration and enhancing fine-detail consistency. The three-stage PCDMs work progressively to generate the final high-quality and high-fidelity synthesized image. Both qualitative and quantitative results demonstrate the consistency and photorealism of our proposed PCDMs under challenging scenarios.The code and model will be available at https://github.com/tencent-ailab/PCDMs., Comment: Accepted to ICLR 2024. The final version is available at OpenReview: https://openreview.net/forum?id=rHzapPnCgT
- Published
- 2023
50. Investigating Shift Equivalence of Convolutional Neural Networks in Industrial Defect Segmentation
- Author
-
Qu, Zhen, Tao, Xian, Shen, Fei, Zhang, Zhengtao, and Li, Tao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In industrial defect segmentation tasks, while pixel accuracy and Intersection over Union (IoU) are commonly employed metrics to assess segmentation performance, the output consistency (also referred to equivalence) of the model is often overlooked. Even a small shift in the input image can yield significant fluctuations in the segmentation results. Existing methodologies primarily focus on data augmentation or anti-aliasing to enhance the network's robustness against translational transformations, but their shift equivalence performs poorly on the test set or is susceptible to nonlinear activation functions. Additionally, the variations in boundaries resulting from the translation of input images are consistently disregarded, thus imposing further limitations on the shift equivalence. In response to this particular challenge, a novel pair of down/upsampling layers called component attention polyphase sampling (CAPS) is proposed as a replacement for the conventional sampling layers in CNNs. To mitigate the effect of image boundary variations on the equivalence, an adaptive windowing module is designed in CAPS to adaptively filter out the border pixels of the image. Furthermore, a component attention module is proposed to fuse all downsampled features to improve the segmentation performance. The experimental results on the micro surface defect (MSD) dataset and four real-world industrial defect datasets demonstrate that the proposed method exhibits higher equivalence and segmentation performance compared to other state-of-the-art methods.Our code will be available at https://github.com/xiaozhen228/CAPS., Comment: submit to IEEE Transactions on Instrumentation & Measurement
- Published
- 2023
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.