Author: "Chen, Yen-Chun" / Publication Year Range: Last 3 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chen, Yen-Chun"' showing total 232 results

Start Over Author "Chen, Yen-Chun" Publication Year Range Last 3 years

232 results on '"Chen, Yen-Chun"'

1. On Pre-training of Multimodal Language Models Customized for Chart Understanding

Author: Fan, Wan-Cyuan, Chen, Yen-Chun, Liu, Mengchen, Yuan, Lu, and Sigal, Leonid
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Recent studies customizing Multimodal Large Language Models (MLLMs) for domain-specific tasks have yielded promising results, especially in the field of scientific chart comprehension. These studies generally utilize visual instruction tuning with specialized datasets to enhance question and answer (QA) accuracy within the chart domain. However, they often neglect the fundamental discrepancy between natural image-caption pre-training data and digital chart image-QA data, particularly in the models' capacity to extract underlying numeric values from charts. This paper tackles this oversight by exploring the training processes necessary to improve MLLMs' comprehension of charts. We present three key findings: (1) Incorporating raw data values in alignment pre-training markedly improves comprehension of chart data. (2) Replacing images with their textual representation randomly during end-to-end fine-tuning transfer the language reasoning capability to chart interpretation skills. (3) Requiring the model to first extract the underlying chart data and then answer the question in the fine-tuning can further improve the accuracy. Consequently, we introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension. CHOPINLLM effectively interprets various types of charts, including unannotated ones, while maintaining robust reasoning abilities. Furthermore, we establish a new benchmark to evaluate MLLMs' understanding of different chart types across various comprehension levels. Experimental results show that CHOPINLLM exhibits strong performance in understanding both annotated and unannotated charts across a wide range of types.
Published: 2024

2. ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

Author: Chen, Jr-Jen, Liao, Yu-Chien, Lin, Hsi-Che, Yu, Yu-Chu, Chen, Yen-Chun, and Wang, Yu-Chiang Frank
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across video segments, poses significant challenges to even the frontier multimodal large language models. To facilitate this evaluation, we develop an automated pipeline for generating temporal reasoning question-answer pairs, significantly reducing the need for labor-intensive manual annotations. Our benchmark includes 921 carefully vetted validation samples and 2,143 test samples, each manually curated for accuracy and relevance. Evaluation results show that while frontier large language models outperform academic models, they still lag behind human performance by a significant 14.3% accuracy gap. Additionally, our pipeline creates a training dataset of 9,695 machine generated samples without manual effort, which empirical studies suggest can enhance the across-time reasoning via fine-tuning., Comment: Project page: https://rextime.github.io/
Published: 2024

3. Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Author: Liu, Max, Yu, Chan-Hung, Lee, Wei-Hsu, Hung, Cheng-Wei, Chen, Yen-Chun, and Sun, Shao-Hua
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Programming Languages
Abstract: Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency, necessitating tens of millions of program-environment interactions. To tackle this challenge, we introduce a novel LLM-guided search framework (LLM-GS). Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods. We address the challenge of LLMs' inability to generate precise and grammatically correct programs in domain-specific languages (DSLs) by proposing a Pythonic-DSL strategy - an LLM is instructed to initially generate Python codes and then convert them into DSL programs. To further optimize the LLM-generated programs, we develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to improve the programs consistently. Experimental results in the Karel domain demonstrate our LLM-GS framework's superior effectiveness and efficiency. Extensive ablation studies further verify the critical role of our Pythonic-DSL strategy and Scheduled Hill Climbing algorithm. Moreover, we conduct experiments with two novel tasks, showing that LLM-GS enables users without programming skills and knowledge of the domain or DSL to describe the tasks in natural language to obtain performant programs.
Published: 2024

4. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Author: Abdin, Marah, Aneja, Jyoti, Awadalla, Hany, Awadallah, Ahmed, Awan, Ammar Ahmad, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Martin, Cai, Qin, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Weizhu, Chen, Yen-Chun, Chen, Yi-Ling, Cheng, Hao, Chopra, Parul, Dai, Xiyang, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Gao, Jianfeng, Gao, Mei, Gao, Min, Garg, Amit, Del Giorno, Allie, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Hu, Wenxiang, Huynh, Jamie, Iter, Dan, Jacobs, Sam Ade, Javaheripi, Mojan, Jin, Xin, Karampatziakis, Nikos, Kauffmann, Piero, Khademi, Mahoud, Kim, Dongwoo, Kim, Young Jin, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Lin, Xihui, Lin, Zeqi, Liu, Ce, Liu, Liyuan, Liu, Mengchen, Liu, Weishung, Liu, Xiaodong, Luo, Chong, Madan, Piyush, Mahmoudzadeh, Ali, Majercak, David, Mazzola, Matt, Mendes, Caio César Teodoro, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Ren, Liliang, de Rosa, Gustavo, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shen, Yelong, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Vaddamanu, Praneetha, Wang, Chunyu, Wang, Guanhua, Wang, Lijuan, Wang, Shuohang, Wang, Xin, Wang, Yu, Ward, Rachel, Wen, Wen, Witte, Philipp, Wu, Haiping, Wu, Xiaoxia, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Xue, Jilong, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Yifan, Yang, Ziyi, Yu, Donghan, Yuan, Lu, Zhang, Chenruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, and Zhou, Xiren
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. Our training dataset is a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide parameter-scaling results with a 7B, 14B models trained for 4.8T tokens, called phi-3-small, phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75%, 78% on MMLU, and 8.7, 8.9 on MT-bench). To enhance multilingual, multimodal, and long-context capabilities, we introduce three models in the phi-3.5 series: phi-3.5-mini, phi-3.5-MoE, and phi-3.5-Vision. The phi-3.5-MoE, a 16 x 3.8B MoE model with 6.6 billion active parameters, achieves superior performance in language reasoning, math, and code tasks compared to other open-source models of similar scale, such as Llama 3.1 and the Mixtral series, and on par with Gemini-1.5-Flash and GPT-4o-mini. Meanwhile, phi-3.5-Vision, a 4.2 billion parameter model derived from phi-3.5-mini, excels in reasoning tasks and is adept at handling both single-image and text prompts, as well as multi-image and text prompts., Comment: 24 pages
Published: 2024

5. iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views

Author: Wu, Chin-Hsuan, Chen, Yen-Chun, Solarte, Bolivar, Yuan, Lu, and Sun, Min
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present iFusion, a novel 3D object reconstruction framework that requires only two views with unknown camera poses. While single-view reconstruction yields visually appealing results, it can deviate significantly from the actual object, especially on unseen sides. Additional views improve reconstruction fidelity but necessitate known camera poses. However, assuming the availability of pose may be unrealistic, and existing pose estimators fail in sparse view scenarios. To address this, we harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects. Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views. (2) The diffusion model is fine-tuned using provided views and estimated poses, turned into a novel view synthesizer tailored for the target object. (3) Leveraging registered views and the fine-tuned diffusion model, we reconstruct the 3D object. Experiments demonstrate strong performance in both pose estimation and novel view synthesis. Moreover, iFusion seamlessly integrates with various reconstruction methods and enhances them., Comment: Code: https://github.com/chinhsuanwu/ifusion, Project page: https://chinhsuanwu.github.io/ifusion
Published: 2023

6. LACMA: Language-Aligning Contrastive Learning with Meta-Actions for Embodied Instruction Following

Author: Yang, Cheng-Fu, Chen, Yen-Chun, Yang, Jianwei, Dai, Xiyang, Yuan, Lu, Wang, Yu-Chiang Frank, and Chang, Kai-Wei
Subjects: Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: End-to-end Transformers have demonstrated an impressive success rate for Embodied Instruction Following when the environment has been seen in training. However, they tend to struggle when deployed in an unseen environment. This lack of generalizability is due to the agent's insensitivity to subtle changes in natural language instructions. To mitigate this issue, we propose explicitly aligning the agent's hidden states with the instructions via contrastive learning. Nevertheless, the semantic gap between high-level language instructions and the agent's low-level action space remains an obstacle. Therefore, we further introduce a novel concept of meta-actions to bridge the gap. Meta-actions are ubiquitous action patterns that can be parsed from the original action sequence. These patterns represent higher-level semantics that are intuitively aligned closer to the instructions. When meta-actions are applied as additional training signals, the agent generalizes better to unseen environments. Compared to a strong multi-modal Transformer baseline, we achieve a significant 4.5% absolute gain in success rate in unseen environments of ALFRED Embodied Instruction Following. Additional analysis shows that the contrastive objective and meta-actions are complementary in achieving the best results, and the resulting agent better aligns its states with corresponding instructions, making it more suitable for real-world embodied agents. The code is available at: https://github.com/joeyy5588/LACMA., Comment: EMNLP 2023
Published: 2023

7. Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser

Author: Lai, Yung-Hsuan, Chen, Yen-Chun, and Wang, Yu-Chiang Frank
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e., the audio and visual modality are both assumed to signal the prediction target. With the Look, Listen, and Parse dataset (LLP), we investigate the under-explored unaligned setting, where the goal is to recognize audio and visual events in a video with only weak labels observed. Such weak video-level labels only tell what events happen without knowing the modality they are perceived (audio, visual, or both). To enhance learning in this challenging setting, we incorporate large-scale contrastively pre-trained models as the modality teachers. A simple, effective, and generic method, termed Visual-Audio Label Elaboration (VALOR), is innovated to harvest modality labels for the training events. Empirical studies show that the harvested labels significantly improve an attentional baseline by 8.0 in average F-score (Type@AV). Surprisingly, we found that modality-independent teachers outperform their modality-fused counterparts since they are noise-proof from the other potentially unaligned modality. Moreover, our best model achieves the new state-of-the-art on all metrics of LLP by a substantial margin (+5.4 F-score for Type@AV). VALOR is further generalized to Audio-Visual Event Localization and achieves the new state-of-the-art as well. Code is available at: https://github.com/Franklin905/VALOR., Comment: NeurIPS 2023
Published: 2023

8. Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Author: Zhao, Shihao, Chen, Dongdong, Chen, Yen-Chun, Bao, Jianmin, Hao, Shaozhe, Yuan, Lu, and Wong, Kwan-Yee K.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challenges in understanding such complex texts and generating the corresponding images. Therefore, there is a growing need to enable more control modes beyond text description. In this paper, we introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls (e.g., edge maps, depth map, segmentation masks) and global controls (e.g., CLIP image embeddings) in a flexible and composable manner within one single model. Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models, eliminating the huge cost of training from scratch. Moreover, thanks to some dedicated adapter designs, Uni-ControlNet only necessitates a constant number (i.e., 2) of adapters, regardless of the number of local or global controls used. This not only reduces the fine-tuning costs and model size, making it more suitable for real-world deployment, but also facilitate composability of different conditions. Through both quantitative and qualitative comparisons, Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability. Code is available at \url{https://github.com/ShihaoZhaoZSH/Uni-ControlNet}., Comment: Camera Ready, Code is available at https://github.com/ShihaoZhaoZSH/Uni-ControlNet
Published: 2023

9. Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Author: Fan, Wan-Cyuan, Chen, Yen-Chun, Chen, Dongdong, Cheng, Yu, Yuan, Lu, and Wang, Yu-Chiang Frank
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Diffusion models (DMs) have shown great potential for high-quality image synthesis. However, when it comes to producing images with complex scenes, how to properly describe both image global structures and object details remains a challenging task. In this paper, we present Frido, a Feature Pyramid Diffusion model performing a multi-scale coarse-to-fine denoising process for image synthesis. Our model decomposes an input image into scale-dependent vector quantized features, followed by a coarse-to-fine gating for producing image output. During the above multi-scale representation learning stage, additional input conditions like text, scene graph, or image layout can be further exploited. Thus, Frido can be also applied for conditional or cross-modality image synthesis. We conduct extensive experiments over various unconditioned and conditional image generation tasks, ranging from text-to-image synthesis, layout-to-image, scene-graph-to-image, to label-to-image. More specifically, we achieved state-of-the-art FID scores on five benchmarks, namely layout-to-image on COCO and OpenImages, scene-graph-to-image on COCO and Visual Genome, and label-to-image on COCO. Code is available at https://github.com/davidhalladay/Frido., Comment: AAAI 2023
Published: 2022

10. GLIPv2: Unifying Localization and Vision-Language Understanding

Author: Zhang, Haotian, Zhang, Pengchuan, Hu, Xiaowei, Chen, Yen-Chun, Li, Liunian Harold, Dai, Xiyang, Wang, Lijuan, Yuan, Lu, Hwang, Jenq-Neng, and Gao, Jianfeng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e.g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e.g., VQA, image captioning). GLIPv2 elegantly unifies localization pre-training and Vision-Language Pre-training (VLP) with three pre-training tasks: phrase grounding as a VL reformulation of the detection task, region-word contrastive learning as a novel region-word level contrastive learning task, and the masked language modeling. This unification not only simplifies the previous multi-stage VLP procedure but also achieves mutual benefits between localization and understanding tasks. Experimental results show that a single GLIPv2 model (all model weights are shared) achieves near SoTA performance on various localization and understanding tasks. The model also shows (1) strong zero-shot and few-shot adaption performance on open-vocabulary object detection tasks and (2) superior grounding capability on VL understanding tasks. Code will be released at https://github.com/microsoft/GLIP., Comment: NeurIPS 2022; updated with reviewers' comments addressed; Code is released at https://github.com/microsoft/GLIP
Published: 2022

11. Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

Author: Wang, Zhecan, Codella, Noel, Chen, Yen-Chun, Zhou, Luowei, Dai, Xiyang, Xiao, Bin, Yang, Jianwei, You, Haoxuan, Chang, Kai-Wei, Chang, Shih-fu, and Yuan, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: Cross-modal encoders for vision-language (VL) tasks are often pretrained with carefully curated vision-language datasets. While these datasets reach an order of 10 million samples, the labor cost is prohibitive to scale further. Conversely, unimodal encoders are pretrained with simpler annotations that are less cost-prohibitive, achieving scales of hundreds of millions to billions. As a result, unimodal encoders have achieved state-of-art (SOTA) on many downstream tasks. However, challenges remain when applying to VL tasks. The pretraining data is not optimal for cross-modal architectures and requires heavy computational resources. In addition, unimodal architectures lack cross-modal interactions that have demonstrated significant benefits for VL tasks. Therefore, how to best leverage pretrained unimodal encoders for VL tasks is still an area of active research. In this work, we propose a method to leverage unimodal vision and text encoders for VL tasks that augment existing VL approaches while conserving computational complexity. Specifically, we propose Multimodal Adaptive Distillation (MAD), which adaptively distills useful knowledge from pretrained encoders to cross-modal VL encoders. Second, to better capture nuanced impacts on VL task performance, we introduce an evaluation protocol that includes Visual Commonsense Reasoning (VCR), Visual Entailment (SNLI-VE), and Visual Question Answering (VQA), across a variety of data constraints and conditions of domain shift. Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data. Finally, MAD outperforms concurrent works utilizing pretrained vision encoder from CLIP. Code will be made available., Comment: arXiv admin note: substantial text overlap with arXiv:2201.05729
Published: 2022

12. Photothermal responsivity of van der Waals material-based nanomechanical resonators

Author: Aguila, Myrron Albert C., Esmenda, Joshoua C., Wang, Jyh-Yang, Lee, Teik-Hui, Chen, Yen-Chun, Yang, Chi-Yuan, Lin, Kung-Hsuan, Chang-Liao, Kuei-Shu, Kafanov, Sergey, Pashkin, Yuri A., and Chen, Chii-Dong
Subjects: Physics - Applied Physics, Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: Nanomechanical resonators made from van der Waals materials (vdW NMRs) provide a new tool for sensing absorbed laser power. The photothermal response of vdW NMRs, quantified from the resonant frequency shifts induced by optical absorption, is enhanced when incorporated in a Fabry-Perot (FP) interferometer. Along with the enhancement comes the dependence of the photothermal response on NMR displacement, which lacks investigation. Here, we address the knowledge gap by studying electromotively driven niobium diselenide drumheads fabricated on highly reflective substrates. We use a FP-mediated absorptive heating model to explain the measured variations of the photothermal response. The model predicts a higher magnitude and tuning range of photothermal responses on few-layer and monolayer NbSe$_{2}$ drumheads, which outperform other clamped vdW drum-type NMRs at a laser wavelength of $532\,$nm. Further analysis of the model shows that both the magnitude and tuning range of NbSe$_{2}$ drumheads scale with thickness, establishing a displacement-based framework for building bolometers using FP-mediated vdW NMRs., Comment: 7 pages, 4 figures
Published: 2022

13. Light-driven soft crawling robots capable of multidirectional locomotion and cargo transport

Author: Chen, Yan-Jun, Chen, Yen-Chun, Huang, Chih-Lin, and Yang, Yao-Joe
Published: 2024
Full Text: View/download PDF

14. Illumination system contributing zooming function to lensless digital holographic microscope by using lightguide incorporated with volume holographic optical elements

Author: Yu, Yeh-Wei, Wang, Wen-Li, Chen, Yen-Chun, Lin, Shiuan-Huei, Wang, Jyun-Jie, Wang, Chih-Ming, Huang, Pin-Duan, Qiu, Bing-Hong, Yang, Tsung-Hsun, and Sun, Ching-Cherng
Published: 2024
Full Text: View/download PDF

15. CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

Author: Wang, Zhecan, Codella, Noel, Chen, Yen-Chun, Zhou, Luowei, Yang, Jianwei, Dai, Xiyang, Xiao, Bin, You, Haoxuan, Chang, Shih-Fu, and Yuan, Lu
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: Contrastive language-image pretraining (CLIP) links vision and language modalities into a unified embedding space, yielding the tremendous potential for vision-language (VL) tasks. While early concurrent works have begun to study this potential on a subset of tasks, important questions remain: 1) What is the benefit of CLIP on unstudied VL tasks? 2) Does CLIP provide benefit in low-shot or domain-shifted scenarios? 3) Can CLIP improve existing approaches without impacting inference or pretraining complexity? In this work, we seek to answer these questions through two key contributions. First, we introduce an evaluation protocol that includes Visual Commonsense Reasoning (VCR), Visual Entailment (SNLI-VE), and Visual Question Answering (VQA), across a variety of data availability constraints and conditions of domain shift. Second, we propose an approach, named CLIP Targeted Distillation (CLIP-TD), to intelligently distill knowledge from CLIP into existing architectures using a dynamically weighted objective applied to adaptively selected tokens per instance. Experiments demonstrate that our proposed CLIP-TD leads to exceptional gains in the low-shot (up to 51.9%) and domain-shifted (up to 71.3%) conditions of VCR, while simultaneously improving performance under standard fully-supervised conditions (up to 2%), achieving state-of-art performance on VCR compared to other single models that are pretrained with image-text data only. On SNLI-VE, CLIP-TD produces significant gains in low-shot conditions (up to 6.6%) as well as fully supervised (up to 3%). On VQA, CLIP-TD provides improvement in low-shot (up to 9%), and in fully-supervised (up to 1.3%). Finally, CLIP-TD outperforms concurrent works utilizing CLIP for finetuning, as well as baseline naive distillation approaches. Code will be made available., Comment: This paper is greatly modified and updated to be re-submitted to another conference. The new paper is under the name "Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks", https://doi.org/10.48550/arXiv.2204.10496
Published: 2022

16. Burning down the house: Pyroptosis in the tumor microenvironment of hepatocellular carcinoma

Author: Cheng, Chi, Hsu, Sheng-Kai, Chen, Yen-Chun, Liu, Wangta, Shu, En-De, Chien, Ching-Ming, Chiu, Chien-Chih, and Chang, Wen-Tsan
Published: 2024
Full Text: View/download PDF

17. Acute invasive fungal rhinosinusitis in post-COVID-19 patients in Vietnam

Author: Quang, Ly Xuan, Tam, Truong Thanh, Dang, Luong Huu, Chen, Yen-Chun, Hung, Shih-Han, Tai, Tran Thanh, Le Vu Hoang, Nguyen, and Thanh, Nguyen Van
Published: 2024
Full Text: View/download PDF

18. Optical performance of synthetic aperture metalens based on hybrid unit-cells

Author: Yu, Chen-Yi, Chen, Yen-Chun, Zeng, Qiu-Chun, Hsu, Wei-Lun, and Wang, Chih-Ming
Published: 2024
Full Text: View/download PDF

19. Polarization router in radiative near-field based on dielectric nano-elliptical cylinders

Author: Zeng, Qiu-Chun, Hsu, Wei-Lun, Wang, Chun-Yuan, Chen, Yen-Chun, Chen, Che-Chin, Lin, Yu-Hsin, Chen, Fong-Zhi, and Wang, Chih-Ming
Published: 2024
Full Text: View/download PDF

20. The miniature light-field camera with high spatial resolution

Author: Chen, Yen-Chun, Hsu, Wei-Lun, Xie, Meng-Qi, Yang, Hsiao-Hsuan, Cheng, Yuan-Chieh, and Wang, Chih-Ming
Published: 2023
Full Text: View/download PDF

21. Castability of a Ti-7.5Mo alloy for fabricating frameworks for removable partial dentures

Author: Chen, Yung-Chung, Chang, Min-Chieh, Hsiao, Wen-Yu, and Chen, Yen-Chun
Published: 2023
Full Text: View/download PDF

22. An intuitive pre-processing method based on human–robot interactions: zero-shot learning semantic segmentation based on synthetic semantic template

Author: Chen, Yen-Chun and Lai, Chin-Feng
Published: 2023
Full Text: View/download PDF

23. Disturbance Suppression and Contour Following Accuracy Improvement: An Adaptive PI-Type Sliding Mode Nonlinear Extended State Observer Approach

Author: Chen, Yen-Chun, Cai, Yan-Rou, Cheng, Ming-Yang, and Su, Ke-Han
Published: 2023
Full Text: View/download PDF

24. Sandwich nano-fin to reduce the aspect ratio requirement of metasurface

Author: Hsu, Wei-Lun, Yu, Chen-Yi, Lai, Hao-Ting, Chen, Yen-Chun, and Wang, Chih-Ming
Published: 2023
Full Text: View/download PDF

25. Understanding the role of entrepreneurial orientation in creating ambidextrous competitive advantage: a comparative-design, longitudinal study

Author: Chen, Yen-Chun, Arnold, Todd, Liu, Ping-Yu, and Huang, Chun-Yao
Published: 2023
Full Text: View/download PDF

26. Axicon metalens for broadband light harvesting

Author: Chang Kai-Hao, Chen Yen-Chun, Huang Yo-Song, Hsu Wei-Lun, Lu Guo-Hao, Liu Chao-Feng, Weng Chun-Jen, Lin Yu-Hsin, Chen Che-Chin, Lee Chien-Chieh, Chang Yu-Chi, Wang Po-Hsiang, and Wang Chih-Ming
Subjects: axicon, color router, hyperspectral imaging, imaging sensor, light harvest, metalens, Physics, QC1-999
Abstract: In this study, an axicon metalens comprising a large central disc surrounded by nanoposts for energy harvesting in composite metal-oxide semiconductor sensors was designed, fabricated, and experimentally characterized. The main role of the central disc is focusing light; the nanoposts of various diameters deflect light to form a Bessel-like beam. The spatial distribution of the optical transmission was measured using micro-hyperspectral imaging. The axicon metalens concentrates the light to the sensitive area of the sensor and also harvests light from adjacent pixels. After adding an axicon metalens, the normalized peak transmission is up to 250% at λ = 700 nm as compared to a blank TiO2 film. The experimental results had fair agreement with the finite-difference-time-domain simulation. The ultra-broadband energy-harvesting performance of the sensor suggests that it could be applied in surveillance and Internet of Things applications.
Published: 2023
Full Text: View/download PDF

27. Does alliance orientation matter for new product success? An empirical study of Taiwanese electronics firms

Author: Chen, Yen-Chun and Arnold, Todd
Published: 2022
Full Text: View/download PDF

28. Simple Advanced Preparation Method for Improving the Thickness Stability of Powder Thickening Agents in Dysphagia Management

Author: Kao, Jui-Chu, Yu, Hsin-Ya, Hsu, Yuan-Hao, Hsu, Chia-Ning, Chen, Yen-Chun, Su, Yen-Ling, Yen, Li-ni, Liao, Kuo-Tung, Tsai, Shao-Chen, Lin, Sheng-Kai, and Hung, Shih-Han
Published: 2022
Full Text: View/download PDF

29. The realization of nipip HIT photodetectors with an optimized thickness of intrinsic a-Si:H

Author: Kumar, Mukul, Chen, Yi-Chin, Chen, Yen-Chun, Peng, Yu-Chang, Yang, Ying-Jhe, Li, Ying-Zhi, Jiang, Zih-Kang, Tsai, Chih-Hung, Lin, Jia-De, Hsu, Yu-Kuei, and Lin, Chu-Hsuan
Published: 2022
Full Text: View/download PDF

30. The Role of Ultrasonography in Diagnosing and Managing Sialolithiasis: A Case Report and Literature Review.

Author: Chen, Yen-Chun, Yueh, Hann-Ziong, Lu, Shih-Chun, and Lin, Che-Hsuan
Abstract: Sialolith-induced obstructive sialadenitis is a commonly encountered clinical scenario, yet the variations in the size and location of the stone can complicate immediate clinical assessment. Utilizing dynamic ultrasound imaging along with specific structural markers can provide valuable, immediate objective evidence in diagnosing submandibular sialolithiasis. This initial ultrasound evaluation streamlines the decision-making process by facilitating the timely scheduling of confirmatory computed tomography scans and guiding subsequent surgical interventions. This case report illustrates how in-office ultrasonography expedited the diagnosis and subsequent surgical decision-making process for submandibular sialolithiasis within a span of just 1 week. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Advancements in Hyperspectral Imaging and Computer-Aided Diagnostic Methods for the Enhanced Detection and Diagnosis of Head and Neck Cancer.

Author: Wu, I-Chen, Chen, Yen-Chun, Karmakar, Riya, Mukundan, Arvind, Gabriel, Gahiga, Wang, Chih-Chiang, and Wang, Hsiang-Chen
Subjects: MACHINE learning, FISHER discriminant analysis, CONVOLUTIONAL neural networks, CANCER diagnosis, HEAD & neck cancer
Abstract: Background/Objectives: Head and neck cancer (HNC), predominantly squamous cell carcinoma (SCC), presents a significant global health burden. Conventional diagnostic approaches often face challenges in terms of achieving early detection and accurate diagnosis. This review examines recent advancements in hyperspectral imaging (HSI), integrated with computer-aided diagnostic (CAD) techniques, to enhance HNC detection and diagnosis. Methods: A systematic review of seven rigorously selected studies was performed. We focused on CAD algorithms, such as convolutional neural networks (CNNs), support vector machines (SVMs), and linear discriminant analysis (LDA). These are applicable to the hyperspectral imaging of HNC tissues. Results: The meta-analysis findings indicate that LDA surpasses other algorithms, achieving an accuracy of 92%, sensitivity of 91%, and specificity of 93%. CNNs exhibit moderate performance, with an accuracy of 82%, sensitivity of 77%, and specificity of 86%. SVMs demonstrate the lowest performance, with an accuracy of 76% and sensitivity of 48%, but maintain a high specificity level at 89%. Additionally, in vivo studies demonstrate superior performance when compared to ex vivo studies, reporting higher accuracy (81%), sensitivity (83%), and specificity (79%). Conclusion: Despite these promising findings, challenges persist, such as HSI's sensitivity to external conditions, the need for high-resolution and high-speed imaging, and the lack of comprehensive spectral databases. Future research should emphasize dimensionality reduction techniques, the integration of multiple machine learning models, and the development of extensive spectral libraries to enhance HSI's clinical utility in HNC diagnostics. This review underscores the transformative potential of HSI and CAD techniques in revolutionizing HNC diagnostics, facilitating more accurate and earlier detection, and improving patient outcomes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Understanding the interplay between competitor and alliance orientations in product innovativeness: An integrative framework

Author: Chen, Yen-Chun, Lin, Ya-Hui, Li, Po-Chien, and Chen, Chung-Jen
Published: 2022
Full Text: View/download PDF

33. Identification and management of contraindicated drug–drug interactions through pharmaceutical care programs: Experience in direct-acting antivirals therapy

Author: Kuo, Meng Hsuan, Tseng, Chih-Wei, Lee, Chi-Hui, Yang, Ya-Ching, Wu, Hsin-Ju, Lin, Hsiu-Ju, Chu, Ya-Lan, Chen, Yen-Chun, and Tseng, Kuo-Chih
Published: 2022
Full Text: View/download PDF

34. “A Two-Flap Combination for Auricular elevation in Microtia Reconstruction”

Author: Quang, Ly Xuan, Linh, Tran Ngoc Tuong, Ha, Van Thi Hai, Quyen, Le Van Vinh, Ngoc, Tran Le Hong, Dung, Nguyen Tan, Nga, Nguyen Thi Thuy, Chen, Yen-Chun, Hung, Shih-Han, and Dang, Luong Huu
Published: 2022
Full Text: View/download PDF

35. A nationwide cohort study suggests clarithromycin-based therapy for Helicobacter pylori eradication is safe in patients with stable coronary heart disease and subsequent peptic ulcer disease

Author: Chen, Yen-Chun, Li, Yi-Da, Yu, Ben-Hui, and Chen, Yi-Chun
Published: 2022
Full Text: View/download PDF

36. Efficacy of Office-Based Salivary Ductal Steroid Irrigation for Managing Post-Irradiation Xerostomia in Head and Neck Cancer Patients: A Retrospective Study

Author: Chen, Yen-Chun, primary, Viet-Nhi, Nguyen-Kieu, additional, Dang, Luong Huu, additional, Su, Chin-Hui, additional, and Hung, Shih-Han, additional
Published: 2024
Full Text: View/download PDF

37. BODIPY-based hydroxypyridyl derivative as a highly Ni2+-selective fluorescent chemosensor

Author: Huang, Po-Jui, Kumarasamy, Keerthika, Devendhiran, Tamiloli, Chen, Yen-Chun, Dong, Teng-Yuan, and Lin, Mei-Ching
Published: 2021
Full Text: View/download PDF

38. Mueller matrix-based calculation model for extracting polarization parameters in full-wave simulation of all-dielectric Pancharatnam–Berry (PB)-phase metasurfaces

Author: Chen, Yen-Chun, Yu, Chih-Jen, and Wang, Chih-Ming
Abstract: •A robust framework for determining polarization properties of a metasurface.•Equivalent model for PB phase metasurfaces using Stokes-Mueller calculus.•To analyze amplitude and phase modulation of a nonideal PB phase metasurface.•The proposed method can integrate with ray tracing software for system design.
Published: 2024
Full Text: View/download PDF

39. Broadband achromatic thermal metalens with a wide field of view based on wafer-level monolithic processes.

Author: Chen, Yen-Chun, Hsu, Wei-Lun, Zeng, Qiu-Chun, Yu, Chen-Yi, Chen, Pin-Do, Chen, Che-Chin, Lin, Yu-Hsin, Chen, Fong-Zhi, and Wang, Chih-Ming
Abstract: We present a monolithic metalens free of chromatic aberration over the 8–12 μm wavelength range for thermal imaging. The metalens consists of nano-donut-pillars for dispersion engineering. The proposed metalens design is based on a telecentric optical system, which effectively eliminates off-focus distortion and aberration, enhancing overall imaging quality. Offering a 90° field of view, the metalens ensures uniform focal spot sizes within a 45° field angle across the working wavelength. This enables the capture of high-quality thermal images with sharp images and minimal distortion. With a diameter of 5.75 mm, the metalens is suitable for integration into commercial thermal imaging cameras. The nano-donut-pillar structure of the metalens allows for relatively straightforward mass production, involving i-line stepper lithography and silicon deep etching processes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. A Study on the Shrinkage and Compressive Strength of GGBFS and Metakaolin Based Geopolymer under Different NaOH Concentrations

Author: Chen, Yen-Chun, primary, Lee, Wei-Hao, additional, Cheng, Ta-Wui, additional, and Li, Yeou-Fong, additional
Published: 2024
Full Text: View/download PDF

41. Baseline thrombopoietin level is associated with platelet count improvement in thrombocytopenic chronic hepatitis C patients after successful direct-acting antiviral agent therapy

Author: Chen, Yen-Chun, Ko, Ping-Hung, Lee, Chi-Che, Tseng, Chih-Wei, and Tseng, Kuo-Chih
Published: 2021
Full Text: View/download PDF

42. Trends in the incidence of peripheral vestibular disorders: a Nationwide population-based study

Author: Hung, Shih-Han, primary, Xirasagar, Sudha, additional, Dang, Luong Huu, additional, Chen, Yen-Chun, additional, Cheng, Yen-Fu, additional, Lin, Herng-Ching, additional, and Chen, Chin-Shyan, additional
Published: 2023
Full Text: View/download PDF

43. Versatile Effects of GABA Oolong Tea on Improvements in Diastolic Blood Pressure, Alpha Brain Waves, and Quality of Life

Author: Lin, Chih-Cheng, primary, Hsieh, Chih-Yu, additional, Chen, Li-Fen, additional, Chen, Yen-Chun, additional, Ho, Tien-Hwa, additional, Chang, Shao-Chin, additional, and Chang, Jia-Feng, additional
Published: 2023
Full Text: View/download PDF

44. Acute invasive fungal rhinosinusitis in post-COVID-19 patients in Vietnam

Author: Quang, Ly Xuan, primary, Tam, Truong Thanh, additional, Dang, Luong Huu, additional, Chen, Yen-Chun, additional, Hung, Shih-Han, additional, Tai, Tran Thanh, additional, Le Vu Hoang, Nguyen, additional, and Thanh, Nguyen Van, additional
Published: 2023
Full Text: View/download PDF

45. Analysis of the MPL/GDL Interface: Impact of MPL Intrusion into the GDL Substrate

Author: Berger, Anne, primary, Chen, Yen-Chun, additional, Gatzemeier, Jacqueline, additional, Schmidt, Thomas J., additional, Büchi, Felix N., additional, and Gasteiger, Hubert A., additional
Published: 2023
Full Text: View/download PDF

46. Catalyst Aggregate Size Effect on the Mass Transport Properties of Non-Noble Metal Catalyst Layers for PEMFC Cathodes

Author: Ünsal, Seçil, primary, Bozzetti, Michele, additional, Chen, Yen-Chun, additional, Girod, Robin, additional, Berger, Anne, additional, Diercks, Justus S., additional, Gialamoidou, Sofia, additional, Lyu, Jike, additional, Medarde, Marisa, additional, Gasteiger, Hubert A., additional, Tileli, Vasiliki, additional, Schmidt, Thomas J., additional, and Herranz, Juan, additional
Published: 2023
Full Text: View/download PDF

47. Impact of Low Muscle Mass on Hepatocellular Carcinoma Patients Undergoing Transcatheter Liver-Directed Therapies: Systematic Review & Meta-Analysis.

Author: Chen, Yen-Chun, Kuo, Meng-Hsuan, Hsu, Ching-Sheng, Kao, I-Ting, Wu, Chen-Yi, Tseng, Chih-Wei, and Shao, Shih-Chieh
Subjects: *ONLINE information services, *META-analysis, *MEDICAL information storage & retrieval systems, *CONFIDENCE intervals, *SYSTEMATIC reviews, *SARCOPENIA, *THERAPEUTIC embolization, *CANCER patients, *RESEARCH funding, *DISEASE prevalence, *MEDLINE, *ODDS ratio, *HEPATOCELLULAR carcinoma, *INTRA-arterial injections, *OVERALL survival
Abstract: Simple Summary: This research addresses the understudied impact of low skeletal muscle mass (LSMM) on intermediate-stage hepatocellular carcinoma (HCC) patients undergoing transcatheter liver-directed intra-arterial therapies. Aiming to determine LSMM's prevalence and its prognostic significance, the study reveals that 46% of these patients exhibit LSMM, which is consistently associated with decreased overall survival. These findings suggest the need for routine LSMM assessments in clinical settings, potentially influencing treatment strategies and clinical guidelines for HCC management, thus marking a significant contribution to the research community and patient care practices. Background and Aim: Transcatheter liver-directed intra-arterial therapies are mainstream treatment options for intermediate-stage hepatocellular carcinoma (HCC). However, the effect of low skeletal muscle mass (LSMM) on overall survival (OS) in these patients remains uncertain. We aimed to ascertain the prevalence and prognostic effect of LSMM in this population. Method: According to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, a comprehensive search was performed in the PubMed and Embase databases until Oct 2023. Random-effects meta-analysis was performed to determine the pooled prevalence of LSMM and calculate the hazard ratio (HR) for OS with a 95% confidence interval (CI) in patients with intermediate-stage HCC undergoing various transarterial therapies, comparing those with and without LSMM. Results: Twelve studies involving 2450 patients were included. The pooled prevalence of LSMM was 46% (95% CI, 38–55%), and the results were consistent across different treatments, regions, and age subgroups. The meta-analysis indicated that LSMM was significantly associated with decreased OS (HR, 1.78; 95% CI, 1.36–2.33; I2, 75%). Subgroup analyses reassured the main findings across various therapies, including transarterial chemoembolization (TACE) (HR, 1.68; 95% CI, 1.23–2.30; I2, 81%), transarterial embolization (TAE) (HR, 2.45; 95% CI, 1.42–4.22; I2, 0%), and transarterial radioembolization (TARE) (HR, 1.94; 95% CI, 1.01–3.73; I2, 0%). Conclusions: In intermediate-stage HCC, LSMM is common and associated with reduced OS. To achieve an optimal prognosis, clinicians should incorporate routine LSMM measurement into practice, while caring for patients with intermediate-stage HCC, irrespective of TACE, TAE, and TARE. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis

Author: Fan, Wan-Cyuan, primary, Chen, Yen-Chun, additional, Chen, DongDong, additional, Cheng, Yu, additional, Yuan, Lu, additional, and Wang, Yu-Chiang Frank, additional
Published: 2023
Full Text: View/download PDF

49. Realization of Forest Internet of Things Using Wireless Network Communication Technology of Low-Power Wide-Area Network

Author: Zhao, Ming, primary, Ye, Ren-Jie, additional, Chen, Shuo-Tsung, additional, Chen, Yen-Chun, additional, and Chen, Zi-Yu, additional
Published: 2023
Full Text: View/download PDF

50. Factors Associated with the Underestimation of Manual CPAP Titration Pressure

Author: Chen, Po-Yueh, primary, Viet-Nhi, Nguyen-Kieu, additional, Chen, Yen-Chun, additional, Kao, Yi-Lin, additional, Dang, Luong Huu, additional, and Hung, Shih-Han, additional
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

232 results on '"Chen, Yen-Chun"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources