428,620 results on '"Yen A"'
Search Results
2. Effect of rice straw and garbage enzyme addition on soil properties and plant growth of rice
- Author
-
Toan Nguyen-Sy, Hanh Hong Do, Yen Anh Thi Tran, Hoa Thi Kieu, Uyen Huynh Thi Diem, and Ngoc-Son Tran
- Subjects
garbage enzyme ,paddy soil ,rice growth ,rice straw ,soil carbohydrates ,Agriculture ,Agriculture (General) ,S1-972 ,Plant culture ,SB1-1110 - Abstract
The objective of the current study was to examine the impacts of rice straw and garbage enzyme generated from local vegetable and fruit waste on plant growth and carbohydrate or ammonium extraction from paddy soil after one month of growth in a pot experiment. Samples of topsoil were obtained from a depth of 0-15 cm, and the following treatments were applied: control (10 g soil), RS (adding 30 g soil + 0.6 g rice straw), GE (30 g soil + garbage enzyme), and combination (adding 30 g soil+ rice straw and garbage enzyme) maintained at room temperature. The study findings indicated that there were no observable impacts of rice straw and garbage enzyme application on biomass. However, RS addition seems to reduce root length but enhance shoot length. Soil carbohydrates that were extracted ranged from 61 to 207 mg kg−1 soil, and treatments with rice straw addition exhibited significantly higher levels compared to those without it (p < 0.05). The ammonium content was low. It could be concluded that at the initial seedling stage, rice straw has more effects on soil properties and plant growth than garbage enzyme. To fully assess the effects of rice straw and garbage enzyme on soil properties and plant growth, it is recommended that further research be conducted over longer periods
- Published
- 2023
- Full Text
- View/download PDF
3. Experimental and theoretical investigation of the mechanisms of drying during CO2 injection into saline reservoirs
- Author
-
Yen Adams Sokama-Neuyam, Muhammad Aslam Md Yusof, Shadrack Kofi Owusu, Victor Darkwah-Owusu, Joshua Nsiah Turkson, Adwoa Sampongmaa Otchere, and Jann Rune Ursin
- Subjects
Medicine ,Science - Abstract
Abstract A viable CO2 storage resource must have sufficient storage capacity, reliable containment efficiency and adequate well injectivity. Deep saline formations stand out in terms of storage capacity and containment efficiency. However, formation brine dry-out and salt precipitation in the near well region could impair CO2 injectivity in deep saline reservoirs, thus reducing their potential for CO2 storage. Core-flood experiments and analytical modelling were used to investigate various mechanisms of external and internal salt precipitation. Particularly, the impact of the extension of the dry-out region on CO2 injectivity was investigated. It was found that, for high permeability rocks, injection of CO2 at relatively low injection rates could result in salt cake deposition at the injection inlet especially under high salinity conditions. It was also found that extension of the dry-out region does not have significant impact on CO2 injectivity. Although the magnitude of CO2 injectivity impairment increased more than two-fold when initial brine salinity was doubled, real-time changes in CO2 injectivity during the drying process was found to be independent of initial brine salinity. We have shown that the bundle-of-tubes model could provide useful insight into the process of brine vaporization and salt deposition in the dry-out region during CO2 injection. This work provides vital understanding of the effect of salt precipitation on CO2 injectivity.
- Published
- 2023
- Full Text
- View/download PDF
4. Pyrolysis of municipal food waste: A sustainable potential approach for solid food waste management and organic crop fertilizer production
- Author
-
Patrick Boakye, Miriam Beneireh Nuagah, Sampson Oduro-Kwarteng, Eugene Appiah-Effah, Jolly Kanjua, Anthony Boakye Antwi, Lawrence Darkwah, Kwame Sarkodie, and Yen Adams Sokama-Neuyam
- Subjects
Biochar ,pyrolysis ,food waste ,organic fertilizer ,solid waste management ,Environmental sciences ,GE1-350 - Abstract
ABSTRACTFood waste can be converted to a useful product such as biochar as a way of recycling waste to retain nutrients in the soil, which in turn contributes to carbon sequestration and offset some greenhouse gas emissions in the struggle to achieve carbon neutrality. Mixed food waste-derived biochars (FWB1–300°C, FWB2–450°C and FWB3–600°C) were pyrolysed at 300°C, 450°C and 600°C, respectively, using an electric kiln. Tests for physiochemical parameters and germination tests were performed. It was realized that at 300°C biochars produced had high nitrogen, organic matter, bulk density, biochar yield, and longer root lengths. The results indicate that municipal food waste biochars produced at three temperatures were suitable for use as fertilizer. However, biochar produced at a moderately lower temperature is favourable for agriculture purposes, FWB1–300°C and FWB2–450°C obtained moderate pH and ash levels and so are less toxic to the growth of plants.
- Published
- 2023
- Full Text
- View/download PDF
5. Zeta potential prediction of dominant sandstone minerals via surface complexation modelling
- Author
-
Samuel Erzuah, Wilberforce Nkrumah Aggrey, Joel Teye Tetteh, Vida Bodi, Caspar Daniel Adenutsi, Yen Adams Sokama-Neuyam, Kwame Sarkodie, William Ampomah, Godfred Ohemeng-Boahen, and Kwabena Biritwum Nyarko
- Subjects
Surface complexation modelling ,Zeta potential ,Surface potential ,Ionic strength ,Slip distance ,Science - Abstract
Numerous injection water compositions have been developed by various researchers to optimize the oil recovery via wettability alteration. The ionic composition of the injected brine plays a profound role in the rock/brine and crude-oil/brine interfacial charge thereby its role is pivotal in oil recovery. Numerous experimental techniques have been used to assess the potential of injected brines to optimize wettability via wettability alteration. Notably among them are the spontaneous imbibition, contact angle and core flooding techniques. Zeta potential (ζ-potential) has been used by some researcher to evaluate the wettability alteration during crude oil/brine/rock interactions. However, ζ-potential measurement is relatively expensive and time consuming. Hence, the need for relatively cheap and easy approach of ζ-potential measurements. This was accomplished using Surface Complexation Modelling (SCM) via PHREEQ-C. The focus of this study was to predict ζ-potential using SCM. The qualities and quantities of the materials used during the existing experiment from literature were also used as input in the SCM. Dominant minerals in sandstone reservoir rock notably quartz, calcite, dolomite, kaolinite, illite, montmorillonite, chlorites, ilmenite, muscovite, biotite and anorthoclase were considered in this study. The SCM technique could capture the trend during the ζ-potential measurements. However, out of the 39 mineral-brine ζ-potential measured, the SCM approach could not capture the trend in 5 of these samples namely; dolomite/0.5wt% NaCl (pH = 9.5), microcline/SW (pH = 7.8), muscovite/SW (pH = 7.8), chlorite/SW (pH = 7.8) and ilmenite/20% Dil-SW (pH = 7.6). This was attributed to the effect of atmospheric CO2 on the pH of the brine during the ζ-potential measurement. The ζ-potential of carbonates (calcite and dolomite) were predominantly positive while that of tectosilicates (Quartz, Anorthoclase and Microcline), phyllosilicates (Montmorillonite, Kaolinite, Biotite, Illite, Muscovite and Chlorites) and oxides (Ilmenite) were also predominantly negative. For instance, at the calcite-SW (pH = 7.9) interface, the predicted a ζ-potential value was 20.5 mV while its measured value was observed to be 12.9 mV. The dolomite-SW (pH = 7.9) also predicted 16.7 mV and 19.7 mV for the predicted and the measured ζ-potential value respectively. Considering quartz, anorthoclase and microcline with 0.5wt% NaCl (pH = 7.9), the predicted ζ-potential values were observed to be -11.65 mV, -24.21 mV and -43.00 mV while the measured values were also observed to be -16.9 mV, -33.8 mV and -36.5 mV respectively.
- Published
- 2023
- Full Text
- View/download PDF
6. Assessment of solar radiation resource from the NASA-POWER reanalysis products for tropical climates in Ghana towards clean energy application
- Author
-
Alfred Dawson Quansah, Felicia Dogbey, Prince Junior Asilevi, Patrick Boakye, Lawrence Darkwah, Sampson Oduro-Kwarteng, Yen Adams Sokama-Neuyam, and Patrick Mensah
- Subjects
Medicine ,Science - Abstract
Abstract In order to expand the output of solar power systems for efficient integration into the national grid, solar energy resource assessment at site is required. A major impediment however, is the widespread scarcity of radiometric measurements, which can be augmented by satellite observation. This paper assessed the suitability of satellite-based solar radiation resource retrieved from the NASA-POWER archives at $$0.5^\circ \times 0.5^\circ$$ 0 . 5 ∘ × 0 . 5 ∘ spatial resolution over Ghana–West Africa, to develop a long-term source reference. The assessment is based on the criteria of comparison with estimations from sunshine duration measurement for 22 synoptic stations. Overall, the satellite-based data compared well with ground-based estimations by r = 0.6–0.94 ± 0.1. Spatiotemporally, the agreement is strongest over the northern half Savannah-type climate during March–May, and weakest over the southern half Forest-type climate during June–August. The assessment provides empirical framework to support solar energy utilization in the sub-region.
- Published
- 2022
- Full Text
- View/download PDF
7. Penggunaan Aplikasi Parafrasa untuk Menurunkan Plagiarisme pada Mahasiswa FKIP Universitas Asahan dalam Menyelesaikan Skripsi
- Author
-
Khairun Nisa, Ely Syafitri, Sri Rahma Dewi Saragih, Yen Aryni, and Elfira Rahmadani
- Subjects
Social sciences (General) ,H1-99 - Abstract
Paraphrasing must be done in writing scientific papers to avoid plagiarism. So this is important for all students. However, in reality, Asahan University FKIP students still have not mastered the correct paraphrasing technique. These abilities can be developed by diligently reading or with the help of free or paid online applications. The method of this activity was carried out in three stages, namely: preparation, implementation, and evaluation, by distributing questionnaires for student responses to the activities that have been carried out. This community service activity was at the Asahan University FKIP, Jalan Jend. Ahmad Yani Kisaran. Participants in this activity were all students of the seventh semester of the English Education study program, totalling 43 students, and Mathematics Education, totalling 57 students. This activity was carried out by introducing paraphrasing tools to help students reduce plagiarism. However, these tools must be re-read by students so that the meaning of the articles quoted is the same. The results of the introduction and training on paraphrasing applications, namely smodin, spinner id, quilbolt, and paraphrasing tools, were fascinating and help students reduce plagiarism. Based on the results of the evaluation through the distribution of google form questionnaires, it was found that 82% of students stated that the activities of using paraphrasing applications that were taught were beneficial. In comparison, 18% of students stated that it was helpful.
- Published
- 2022
- Full Text
- View/download PDF
8. Indoor air quality improvement and purification by atmospheric pressure Non-Thermal Plasma (NTP)
- Author
-
Prince Junior Asilevi, Patrick Boakye, Sampson Oduro-Kwarteng, Bernard Fei-Baffoe, and Yen Adams Sokama-Neuyam
- Subjects
Medicine ,Science - Abstract
Abstract Non-thermal plasma (NTP) is a promising technology for the improvement of indoor air quality (IAQ) by removing volatile organic compounds (VOCs) through advanced oxidation process (AOP). In this paper, authors developed a laboratory scale dielectric barrier discharge (DBD) reactor which generates atmospheric NTP to study the removal of low-concentration formaldehyde (HCHO), a typical indoor air VOC in the built environment associated with cancer and leukemia, under different processing conditions. Strong ionization NTP was generated between the DBD electrodes by a pulse power zero-voltage switching flyback transformer (ZVS-FBT), which caused ionization of air molecules leading to active species formation to convert HCHO into carbon dioxide (CO2) and water vapor (H2O). The impact of key electrical and physical processing parameters i.e. discharge power (P), initial concentration (Cin), flow rate (F), and relative humidity (RH) which affect the formaldehyde removal efficiency (ɳ) were studied to determine optimum conditions. Results show that, the correlation coefficient (R2) of removal efficiency dependence on the processing parameters follow the order R2 (F) = 0.99 > R2 (RH) = 0.96, > R2 (Cin) = 0.94 > R2 (P) = 0.93. The removal efficiency reached 99% under the optimum conditions of P = 0.6 W, Cin = 0.1 ppm, F = 0.2 m3/h, and RH = 65% with no secondary pollution. The study provided a theoretical and experimental basis for the application of DBD plasma for air purification in the built environment.
- Published
- 2021
- Full Text
- View/download PDF
9. Evaluating the performance of machine-learning-based phase pickers when applied to ocean bottom seismic data: Blanco oceanic transform fault as a case study
- Author
-
Liu, Min and Tan, Yen Joe
- Subjects
Physics - Geophysics - Abstract
Machine-learning-based phase pickers have been successfully leveraged to build high-resolution earthquake catalogs using seismic data on land. However, their performance when applied to ocean bottom seismic (OBS) data remains to be evaluated. In this study, we first adopt three machine-learning-based phase pickers - EQTransformer, Pickblue, and OBSTansformer - to build three earthquake catalogs for the 350-km-long Blanco oceanic transform fault (BTF) based on a year-long OBS deployment. We then systematically compare these catalogs with an existing catalog which utilized a traditional workflow. Results indicate that the Pickblue-based catalog documents more events and/or provides better-constrained locations than the other catalogs. The different performances of the three phase pickers suggest that detailed assessment of catalogs built using automatic workflows is necessary to prevent misinterpretations, especially when applied to regions without training samples. The Pickblue-based catalog reveals seismicity gaps in three extensional segments of BTF which likely represent aseismic slip zones affected by seawater infiltration. Furthermore, most earthquakes are shallower than the 600-degree isotherm predicted by a half-space conductive cooling model, except for the Blanco Ridge segment which has hosted 80% of the Mw > 6.0 earthquakes along BTF since 1976. These Blanco Ridge deep earthquake clusters can be explained by hydrothermal cooling or the serpentinization of mantle peridotite due to seawater infiltration along conduits created by the deeper ruptures of large earthquakes. Our analyses also demonstrate the importance of careful examination of automatically produced earthquake catalogs since mislocated events can lead to very different interpretations of fault slip modes from seismicity distribution., Comment: 38 pages and 16 figures
- Published
- 2024
10. DataTales: A Benchmark for Real-World Intelligent Data Narration
- Author
-
Yang, Yajing, Liu, Qian, and Kan, Min-Yen
- Subjects
Computer Science - Artificial Intelligence - Abstract
We introduce DataTales, a novel benchmark designed to assess the proficiency of language models in data narration, a task crucial for transforming complex tabular data into accessible narratives. Existing benchmarks often fall short in capturing the requisite analytical complexity for practical applications. DataTales addresses this gap by offering 4.9k financial reports paired with corresponding market data, showcasing the demand for models to create clear narratives and analyze large datasets while understanding specialized terminology in the field. Our findings highlights the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration, suggesting promising avenues for future model development and evaluation methodologies.
- Published
- 2024
11. Stability analysis of split equality and split feasibility problems
- Author
-
Huong, Vu Thi, Xu, Hong-Kun, and Yen, Nguyen Dong
- Subjects
Mathematics - Optimization and Control ,49J53, 49K40, 65K10, 90C25, 90C31 - Abstract
In this paper, for the first time in the literature, we study the stability of solutions of two classes of feasibility (i.e., split equality and split feasibility) problems by set-valued and variational analysis techniques. Our idea is to equivalently reformulate the feasibility problems as parametric generalized equations to which set-valued and variational analysis techniques apply. Sufficient conditions, as well as necessary conditions, for the Lipschitz-likeness of the involved solution maps are proved by exploiting special structures of the problems and by using an advanced result of B.S. Mordukhovich [J. Global Optim. 28, 347--362 (2004)]. These conditions stand on a solid interaction among all the input data by means of their dual counterparts, which are transposes of matrices and regular/limiting normal cones to sets. Several examples are presented to illustrate how the obtained results work in practice and also show that the existence of nonzero solution assumption made in the necessity conditions cannot be lifted.
- Published
- 2024
12. Low Energy Backgrounds and Excess Noise in a Two-Channel Low-Threshold Calorimeter
- Author
-
Anthony-Petersen, Robin, Chang, Clarence L., Chang, Yen-Yung, Chaplinsky, Luke, Fink, Caleb W., Garcia-Sciveres, Maurice, Guo, Wei, Hertel, Scott A., Li, Xinran, Lin, Junsong, Lisovenko, Marharyta, Mahapatra, Rupak, Matava, William, McKinsey, Daniel N., Osterman, David Z., Patel, Pratyush K., Penning, Bjoern, Platt, Mark, Pyle, Matt, Qi, Yinghe, Reed, Maggie, Rydstrom, Ivar, Romani, Roger K., Sadoulet, Bernard, Serfass, Bruno, Sorensen, Peter, Suerfu, Burkhant, Velan, Vetri, Wang, Gensheng, Wang, Yue, Watkins, Samuel L., and Williams, Michael R.
- Subjects
Physics - Instrumentation and Detectors - Abstract
We describe observations of low energy excess (LEE) events (background events observed in all light dark matter direct detection calorimeters) and noise in a two-channel silicon athermal phonon detector with 375 meV baseline energy resolution. We measure two distinct LEE populations: ``shared'' multichannel events with a pulse shape consistent with athermal phonon events, and sub-eV events which couple nearly exclusively to a single channel with a significantly faster pulse shape. These ``singles'' are consistent with events occurring within the aluminum athermal phonon collection fins. Similarly, our measured detector noise is higher than the theoretical expectation. Measured noise can be split into an uncorrelated component, consistent with shot noise from small energy depositions within the athermal phonon sensor itself, and a correlated component, consistent with shot noise from energy depositions within the silicon crystal's phonon system., Comment: 6 pages, 5 figures
- Published
- 2024
13. FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
- Author
-
Lin, Chin-Yang, Wu, Chung-Ho, Yeh, Chang-Han, Yen, Shih-Han, Sun, Cheng, and Liu, Yu-Lun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Neural Radiance Fields (NeRF) face significant challenges in few-shot scenarios, primarily due to overfitting and long training times for high-fidelity rendering. Existing methods, such as FreeNeRF and SparseNeRF, use frequency regularization or pre-trained priors but struggle with complex scheduling and bias. We introduce FrugalNeRF, a novel few-shot NeRF framework that leverages weight-sharing voxels across multiple scales to efficiently represent scene details. Our key contribution is a cross-scale geometric adaptation scheme that selects pseudo ground truth depth based on reprojection errors across scales. This guides training without relying on externally learned priors, enabling full utilization of the training data. It can also integrate pre-trained priors, enhancing quality without slowing convergence. Experiments on LLFF, DTU, and RealEstate-10K show that FrugalNeRF outperforms other few-shot NeRF methods while significantly reducing training time, making it a practical solution for efficient and accurate 3D scene reconstruction., Comment: Project page: https://linjohnss.github.io/frugalnerf/
- Published
- 2024
14. Momentum-Resolved Fingerprint of Mottness in Layer-Dimerized Nb$_3$Br$_8$
- Author
-
Date, Mihir, Petocchi, Francesco, Yen, Yun, Krieger, Jonas A., Pal, Banabir, Hasse, Vicky, McFarlane, Emily C., Körner, Chris, Yoon, Jiho, Watson, Matthew D., Strocov, Vladimir N., Xu, Yuanfeng, Kostanovski, Ilya, Ali, Mazhar N., Ju, Sailong, Plumb, Nicholas C., Sentef, Michael A., Woltersdorf, Georg, Schüler, Michael, Werner, Philipp, Felser, Claudia, Parkin, Stuart S. P., and Schröter, Niels B. M.
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Condensed Matter - Materials Science ,Condensed Matter - Other Condensed Matter - Abstract
In a well-ordered crystalline solid, insulating behaviour can arise from two mechanisms: electrons can either scatter off a periodic potential, thus forming band gaps that can lead to a band insulator, or they localize due to strong interactions, resulting in a Mott insulator. For an even number of electrons per unit cell, either band- or Mott-insulators can theoretically occur. However, unambiguously identifying an unconventional Mott-insulator with an even number of electrons experimentally has remained a longstanding challenge due to the lack of a momentum-resolved fingerprint. This challenge has recently become pressing for the layer dimerized van der Waals compound Nb$_3$Br$_8$, which exhibits a puzzling magnetic field-free diode effect when used as a weak link in Josephson junctions, but has previously been considered to be a band-insulator. In this work, we present a unique momentum-resolved signature of a Mott-insulating phase in the spectral function of Nb$_3$Br$_8$: the top of the highest occupied band along the out-of-plane dimerization direction $k_z$ has a momentum space separation of $\Delta k_z=2\pi/d$, whereas the valence band maximum of a band insulator would be separated by less than $\Delta k_z=\pi/d$, where $d$ is the average spacing between the layers. As the strong electron correlations inherent in Mott insulators can lead to unconventional superconductivity, identifying Nb$_3$Br$_8$ as an unconventional Mott-insulator is crucial for understanding its apparent time-reversal symmetry breaking Josephson diode effect. Moreover, the momentum-resolved signature employed here could be used to detect quantum phase transition between band- and Mott-insulating phases in van der Waals heterostructures, where interlayer interactions and correlations can be easily tuned to drive such transition., Comment: 9 pages, 3 figures
- Published
- 2024
15. Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation
- Author
-
Lee, Sung-Wook and Kuo, Yen-Ling
- Subjects
Computer Science - Robotics - Abstract
Recently, diffusion policy has shown impressive results in handling multi-modal tasks in robotic manipulation. However, it has fundamental limitations in out-of-distribution failures that persist due to compounding errors and its limited capability to extrapolate. One way to address these limitations is robot-gated DAgger, an interactive imitation learning with a robot query system to actively seek expert help during policy rollout. While robot-gated DAgger has high potential for learning at scale, existing methods like Ensemble-DAgger struggle with highly expressive policies: They often misinterpret policy disagreements as uncertainty at multi-modal decision points. To address this problem, we introduce Diff-DAgger, an efficient robot-gated DAgger algorithm that leverages the training objective of diffusion policy. We evaluate Diff-DAgger across different robot tasks including stacking, pushing, and plugging, and show that Diff-DAgger improves the task failure prediction by 37%, the task completion rate by 14%, and reduces the wall-clock time by up to 540%. We hope that this work opens up a path for efficiently incorporating expressive yet data-hungry policies into interactive robot learning settings. Project website: diffdagger.github.io, Comment: Project website: diffdagger.github.io
- Published
- 2024
16. Debiasing Large Vision-Language Models by Ablating Protected Attribute Representations
- Author
-
Ratzlaff, Neale, Olson, Matthew Lyle, Hinck, Musashi, Tseng, Shao-Yen, Lal, Vasudev, and Howard, Phillip
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Large Vision Language Models (LVLMs) such as LLaVA have demonstrated impressive capabilities as general-purpose chatbots that can engage in conversations about a provided input image. However, their responses are influenced by societal biases present in their training datasets, leading to undesirable differences in how the model responds when presented with images depicting people of different demographics. In this work, we propose a novel debiasing framework for LVLMs by directly ablating biased attributes during text generation to avoid generating text related to protected attributes, or even representing them internally. Our method requires no training and a relatively small amount of representative biased outputs (~1000 samples). Our experiments show that not only can we can minimize the propensity of LVLMs to generate text related to protected attributes, but we can even use synthetic data to inform the ablation while retaining captioning performance on real data such as COCO. Furthermore, we find the resulting generations from a debiased LVLM exhibit similar accuracy as a baseline biased model, showing that debiasing effects can be achieved without sacrificing model performance., Comment: NeurIPS workshop on SafeGenAI, 10 pages, 2 figures
- Published
- 2024
17. Movie Gen: A Cast of Media Foundation Models
- Author
-
Polyak, Adam, Zohar, Amit, Brown, Andrew, Tjandra, Andros, Sinha, Animesh, Lee, Ann, Vyas, Apoorv, Shi, Bowen, Ma, Chih-Yao, Chuang, Ching-Yao, Yan, David, Choudhary, Dhruv, Wang, Dingkang, Sethi, Geet, Pang, Guan, Ma, Haoyu, Misra, Ishan, Hou, Ji, Wang, Jialiang, Jagadeesh, Kiran, Li, Kunpeng, Zhang, Luxin, Singh, Mannat, Williamson, Mary, Le, Matt, Yu, Matthew, Singh, Mitesh Kumar, Zhang, Peizhao, Vajda, Peter, Duval, Quentin, Girdhar, Rohit, Sumbaly, Roshan, Rambhatla, Sai Saketh, Tsai, Sam, Azadi, Samaneh, Datta, Samyak, Chen, Sanyuan, Bell, Sean, Ramaswamy, Sharadh, Sheynin, Shelly, Bhattacharya, Siddharth, Motwani, Simran, Xu, Tao, Li, Tianhe, Hou, Tingbo, Hsu, Wei-Ning, Yin, Xi, Dai, Xiaoliang, Taigman, Yaniv, Luo, Yaqiao, Liu, Yen-Cheng, Wu, Yi-Chiao, Zhao, Yue, Kirstain, Yuval, He, Zecheng, He, Zijian, Pumarola, Albert, Thabet, Ali, Sanakoyeu, Artsiom, Mallya, Arun, Guo, Baishan, Araya, Boris, Kerr, Breena, Wood, Carleigh, Liu, Ce, Peng, Cen, Vengertsev, Dimitry, Schonfeld, Edgar, Blanchard, Elliot, Juefei-Xu, Felix, Nord, Fraylie, Liang, Jeff, Hoffman, John, Kohler, Jonas, Fire, Kaolin, Sivakumar, Karthik, Chen, Lawrence, Yu, Licheng, Gao, Luya, Georgopoulos, Markos, Moritz, Rashel, Sampson, Sara K., Li, Shikai, Parmeggiani, Simone, Fine, Steve, Fowler, Tara, Petrovic, Vladan, and Du, Yuming
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
We present Movie Gen, a cast of foundation models that generates high-quality, 1080p HD videos with different aspect ratios and synchronized audio. We also show additional capabilities such as precise instruction-based video editing and generation of personalized videos based on a user's image. Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation. Our largest video generation model is a 30B parameter transformer trained with a maximum context length of 73K video tokens, corresponding to a generated video of 16 seconds at 16 frames-per-second. We show multiple technical innovations and simplifications on the architecture, latent spaces, training objectives and recipes, data curation, evaluation protocols, parallelization techniques, and inference optimizations that allow us to reap the benefits of scaling pre-training data, model size, and training compute for training large scale media generation models. We hope this paper helps the research community to accelerate progress and innovation in media generation models. All videos from this paper are available at https://go.fb.me/MovieGenResearchVideos.
- Published
- 2024
18. SemSim: Revisiting Weak-to-Strong Consistency from a Semantic Similarity Perspective for Semi-supervised Medical Image Segmentation
- Author
-
Xie, Shiao, Wang, Hongyi, Niu, Ziwei, Sun, Hao, Ouyang, Shuyi, Chen, Yen-Wei, and Lin, Lanfen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Semi-supervised learning (SSL) for medical image segmentation is a challenging yet highly practical task, which reduces reliance on large-scale labeled dataset by leveraging unlabeled samples. Among SSL techniques, the weak-to-strong consistency framework, popularized by FixMatch, has emerged as a state-of-the-art method in classification tasks. Notably, such a simple pipeline has also shown competitive performance in medical image segmentation. However, two key limitations still persist, impeding its efficient adaptation: (1) the neglect of contextual dependencies results in inconsistent predictions for similar semantic features, leading to incomplete object segmentation; (2) the lack of exploitation of semantic similarity between labeled and unlabeled data induces considerable class-distribution discrepancy. To address these limitations, we propose a novel semi-supervised framework based on FixMatch, named SemSim, powered by two appealing designs from semantic similarity perspective: (1) rectifying pixel-wise prediction by reasoning about the intra-image pair-wise affinity map, thus integrating contextual dependencies explicitly into the final prediction; (2) bridging labeled and unlabeled data via a feature querying mechanism for compact class representation learning, which fully considers cross-image anatomical similarities. As the reliable semantic similarity extraction depends on robust features, we further introduce an effective spatial-aware fusion module (SFM) to explore distinctive information from multiple scales. Extensive experiments show that SemSim yields consistent improvements over the state-of-the-art methods across three public segmentation benchmarks.
- Published
- 2024
19. Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration
- Author
-
Chuang, Yun-Yen, Hsu, Hung-Min, Lin, Kevin, Gu, Chen-Sheng, Li, Ling Zhen, Chang, Ray-I, and Lee, Hung-yi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The diffusion model, a new generative modeling paradigm, has achieved significant success in generating images, audio, video, and text. It has been adapted for sequence-to-sequence text generation (Seq2Seq) through DiffuSeq, termed S2S Diffusion. Existing S2S-Diffusion models predominantly rely on fixed or hand-crafted rules to schedule noise during the diffusion and denoising processes. However, these models are limited by non-contextualized noise, which fails to fully consider the characteristics of Seq2Seq tasks. In this paper, we propose the Meta-DiffuB framework - a novel scheduler-exploiter S2S-Diffusion paradigm designed to overcome the limitations of existing S2S-Diffusion models. We employ Meta-Exploration to train an additional scheduler model dedicated to scheduling contextualized noise for each sentence. Our exploiter model, an S2S-Diffusion model, leverages the noise scheduled by our scheduler model for updating and generation. Meta-DiffuB achieves state-of-the-art performance compared to previous S2S-Diffusion models and fine-tuned pre-trained language models (PLMs) across four Seq2Seq benchmark datasets. We further investigate and visualize the impact of Meta-DiffuB's noise scheduling on the generation of sentences with varying difficulties. Additionally, our scheduler model can function as a "plug-and-play" model to enhance DiffuSeq without the need for fine-tuning during the inference stage.
- Published
- 2024
20. A low complexity contextual stacked ensemble-learning approach for pedestrian intent prediction
- Author
-
Chiang, Chia-Yen, Fathy, Yasmin, Slabaugh, Gregory, and Jaber, Mona
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Walking as a form of active travel is essential in promoting sustainable transport. It is thus crucial to accurately predict pedestrian crossing intention and avoid collisions, especially with the advent of autonomous and advanced driver-assisted vehicles. Current research leverages computer vision and machine learning advances to predict near-misses; however, this often requires high computation power to yield reliable results. In contrast, this work proposes a low-complexity ensemble-learning approach that employs contextual data for predicting the pedestrian's intent for crossing. The pedestrian is first detected, and their image is then compressed using skeleton-ization, and contextual information is added into a stacked ensemble-learning approach. Our experiments on different datasets achieve similar pedestrian intent prediction performance as the state-of-the-art approaches with 99.7% reduction in computational complexity. Our source code and trained models will be released upon paper acceptance
- Published
- 2024
21. CCSBench: Evaluating Compositional Controllability in LLMs for Scientific Document Summarization
- Author
-
Ding, Yixi, Wu, Jiaying, Zhu, Tongyao, Qin, Yanxia, Liu, Qian, and Kan, Min-Yen
- Subjects
Computer Science - Computation and Language - Abstract
To broaden the dissemination of scientific knowledge to diverse audiences, scientific document summarization must simultaneously control multiple attributes such as length and empirical focus. However, existing research typically focuses on controlling single attributes, leaving the compositional control of multiple attributes underexplored. To address this gap, we introduce CCSBench, a benchmark for compositional controllable summarization in the scientific domain. Our benchmark enables fine-grained control over both explicit attributes (e.g., length), which are objective and straightforward, and implicit attributes (e.g., empirical focus), which are more subjective and conceptual. We conduct extensive experiments on GPT-4, LLaMA2, and other popular LLMs under various settings. Our findings reveal significant limitations in large language models' ability to balance trade-offs between control attributes, especially implicit ones that require deeper understanding and abstract reasoning.
- Published
- 2024
22. A Cross-Lingual Statutory Article Retrieval Dataset for Taiwan Legal Studies
- Author
-
Wang, Yen-Hsiang, Su, Feng-Dian, Yeh, Tzu-Yu, and Fan, Yao-Chung
- Subjects
Computer Science - Computation and Language - Abstract
This paper introduces a cross-lingual statutory article retrieval (SAR) dataset designed to enhance legal information retrieval in multilingual settings. Our dataset features spoken-language-style legal inquiries in English, paired with corresponding Chinese versions and relevant statutes, covering all Taiwanese civil, criminal, and administrative laws. This dataset aims to improve access to legal information for non-native speakers, particularly for foreign nationals in Taiwan. We propose several LLM-based methods as baselines for evaluating retrieval effectiveness, focusing on mitigating translation errors and improving cross-lingual retrieval performance. Our work provides a valuable resource for developing inclusive legal information retrieval systems.
- Published
- 2024
23. Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval
- Author
-
Nguyen, Hai-Long, Nguyen, Tan-Minh, Nguyen, Duc-Minh, Vuong, Thi-Hai-Yen, Nguyen, Ha-Thanh, and Phan, Xuan-Hieu
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Statutory law retrieval is a typical problem in legal language processing, that has various practical applications in law engineering. Modern deep learning-based retrieval methods have achieved significant results for this problem. However, retrieval systems relying on semantic and lexical correlations often exhibit limitations, particularly when handling queries that involve real-life scenarios, or use the vocabulary that is not specific to the legal domain. In this work, we focus on overcoming this weaknesses by utilizing the logical reasoning capabilities of large language models (LLMs) to identify relevant legal terms and facts related to the situation mentioned in the query. The proposed retrieval system integrates additional information from the term--based expansion and query reformulation to improve the retrieval accuracy. The experiments on COLIEE 2022 and COLIEE 2023 datasets show that extra knowledge from LLMs helps to improve the retrieval result of both lexical and semantic ranking models. The final ensemble retrieval system outperformed the highest results among all participating teams in the COLIEE 2022 and 2023 competitions., Comment: Presented at NeLaMKRR@KR, 2024 (arXiv:2410.05339)
- Published
- 2024
24. DISP-LLM: Dimension-Independent Structural Pruning for Large Language Models
- Author
-
Gao, Shangqian, Lin, Chi-Heng, Hua, Ting, Zheng, Tang, Shen, Yilin, Jin, Hongxia, and Hsu, Yen-Chang
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with these models pose significant challenges for deployment on resource-limited devices. Structural pruning has emerged as a promising solution to reduce the costs of LLMs without requiring post-processing steps. Prior structural pruning methods either follow the dependence of structures at the cost of limiting flexibility, or introduce non-trivial additional parameters by incorporating different projection matrices. In this work, we propose a novel approach that relaxes the constraint imposed by regular structural pruning methods and eliminates the structural dependence along the embedding dimension. Our dimension-independent structural pruning method offers several benefits. Firstly, our method enables different blocks to utilize different subsets of the feature maps. Secondly, by removing structural dependence, we facilitate each block to possess varying widths along its input and output dimensions, thereby significantly enhancing the flexibility of structural pruning. We evaluate our method on various LLMs, including OPT, LLaMA, LLaMA-2, Phi-1.5, and Phi-2. Experimental results demonstrate that our approach outperforms other state-of-the-art methods, showing for the first time that structural pruning can achieve an accuracy similar to semi-structural pruning., Comment: Accepted by NeurIPS 2024
- Published
- 2024
25. Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies
- Author
-
Chen, Zixuan, He, Xialin, Wang, Yen-Jen, Liao, Qiayuan, Ze, Yanjie, Li, Zhongyu, Sastry, S. Shankar, Wu, Jiajun, Sreenath, Koushil, Gupta, Saurabh, and Peng, Xue Bin
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Reinforcement learning combined with sim-to-real transfer offers a general framework for developing locomotion controllers for legged robots. To facilitate successful deployment in the real world, smoothing techniques, such as low-pass filters and smoothness rewards, are often employed to develop policies with smooth behaviors. However, because these techniques are non-differentiable and usually require tedious tuning of a large set of hyperparameters, they tend to require extensive manual tuning for each robotic platform. To address this challenge and establish a general technique for enforcing smooth behaviors, we propose a simple and effective method that imposes a Lipschitz constraint on a learned policy, which we refer to as Lipschitz-Constrained Policies (LCP). We show that the Lipschitz constraint can be implemented in the form of a gradient penalty, which provides a differentiable objective that can be easily incorporated with automatic differentiation frameworks. We demonstrate that LCP effectively replaces the need for smoothing rewards or low-pass filters and can be easily integrated into training frameworks for many distinct humanoid robots. We extensively evaluate LCP in both simulation and real-world humanoid robots, producing smooth and robust locomotion controllers. All simulation and deployment code, along with complete checkpoints, is available on our project page: https://lipschitz-constrained-policy.github.io., Comment: 8 pages
- Published
- 2024
26. ALMA Observations of Proper Motions of the Dust Clumps in the Protoplanetary Disk MWC 758
- Author
-
Kuo, I-Hsuan Genevieve, Yen, Hsi-Wei, and Gu, Pin-Gao
- Subjects
Astrophysics - Earth and Planetary Astrophysics - Abstract
To study the dust dynamics in the dust trapping vortices in the protoplanetary disk around MWC~758, we analyzed the 1.3 mm continuum images of the MWC~758 disk obtained with the Atacama Large Millimeter/submillimeter Array (ALMA) in 2017 and 2021. We detect proper motions of 22 mas and 24 mas in the two dust clumps at radii of 0\farcs32 and 0\farcs54 in the disk on the plane of the sky, respectively. On the assumption that the dust clumps are located in the disk midplane, the velocities of the observed proper motions along the azimuthal direction of the inner and outer dust clumps are sub- and super-Keplerian, respectively, and both have angular velocities corresponding to the Keplerian angular velocity at a radius of $0\farcs46\pm0\farcs04$. This deviation from the Keplerian motion is not expected in the conventional theory of vortices formed by the Rossby wave instability. The observed non-Keplerian proper motions of the dust clumps are unlikely due to the disk warp and eccentricity, nor be associated with any predicted planets. The two dust clumps are likely spatially coincident with the infrared spirals. In addition, we detect the changes in the intensity profiles of the dust clumps over the four-year span. Therefore, we suggest that the observed proper motions are possibly due to changes in the density distributions in the dust clumps caused by their interaction with the spirals in the disk.
- Published
- 2024
27. PAVLM: Advancing Point Cloud based Affordance Understanding Via Vision-Language Model
- Author
-
Liu, Shang-Ching, Tran, Van Nhiem, Chen, Wenkai, Cheng, Wei-Lun, Huang, Yen-Lin, Liao, I-Bin, Li, Yung-Hui, and Zhang, Jianwei
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Affordance understanding, the task of identifying actionable regions on 3D objects, plays a vital role in allowing robotic systems to engage with and operate within the physical world. Although Visual Language Models (VLMs) have excelled in high-level reasoning and long-horizon planning for robotic manipulation, they still fall short in grasping the nuanced physical properties required for effective human-robot interaction. In this paper, we introduce PAVLM (Point cloud Affordance Vision-Language Model), an innovative framework that utilizes the extensive multimodal knowledge embedded in pre-trained language models to enhance 3D affordance understanding of point cloud. PAVLM integrates a geometric-guided propagation module with hidden embeddings from large language models (LLMs) to enrich visual semantics. On the language side, we prompt Llama-3.1 models to generate refined context-aware text, augmenting the instructional input with deeper semantic cues. Experimental results on the 3D-AffordanceNet benchmark demonstrate that PAVLM outperforms baseline methods for both full and partial point clouds, particularly excelling in its generalization to novel open-world affordance tasks of 3D objects. For more information, visit our project site: pavlm-source.github.io.
- Published
- 2024
28. Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits
- Author
-
Kang, Xuhui and Kuo, Yen-Ling
- Subjects
Computer Science - Robotics - Abstract
Understanding the progress of a task allows humans to not only track what has been done but also to better plan for future goals. We demonstrate TaKSIE, a novel framework that incorporates task progress knowledge into visual subgoal generation for robotic manipulation tasks. We jointly train a recurrent network with a latent diffusion model to generate the next visual subgoal based on the robot's current observation and the input language command. At execution time, the robot leverages a visual progress representation to monitor the task progress and adaptively samples the next visual subgoal from the model to guide the manipulation policy. We train and validate our model in simulated and real-world robotic tasks, achieving state-of-the-art performance on the CALVIN manipulation benchmark. We find that the inclusion of task progress knowledge can improve the robustness of trained policy for different initial robot poses or various movement speeds during demonstrations. The project website can be found at https://live-robotics-uva.github.io/TaKSIE/ ., Comment: 11 pages, 9 figures
- Published
- 2024
29. A Permutation Group Isomorphic to the $n$-Qubit Projective Clifford Group
- Author
-
Lee, Chin-Yen
- Subjects
Mathematics - Group Theory - Abstract
In this paper, we construct a permutation group of degree $2(4^n-1)$, which is isomorphic to the $n$-qubit projective Clifford group. To establish this result, we study the centralizers of the $z$ gate and the phase gate within the $n$-qubit projective Clifford group by employing the normal form of the Clifford operators.
- Published
- 2024
30. Adaptive Reasoning and Acting in Medical Language Agents
- Author
-
Dutta, Abhishek and Hsiao, Yen-Che
- Subjects
Computer Science - Artificial Intelligence - Abstract
This paper presents an innovative large language model (LLM) agent framework for enhancing diagnostic accuracy in simulated clinical environments using the AgentClinic benchmark. The proposed automatic correction enables doctor agents to iteratively refine their reasoning and actions following incorrect diagnoses, fostering improved decision-making over time. Experiments show that the implementation of the adaptive LLM-based doctor agents achieve correct diagnoses through dynamic interactions with simulated patients. The evaluations highlight the capacity of autonomous agents to adapt and improve in complex medical scenarios. Future enhancements will focus on refining the algorithm and expanding its applicability across a wider range of tasks and different large language models.
- Published
- 2024
31. COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement
- Author
-
Xie, Yuxi, Goyal, Anirudh, Wu, Xiaobao, Yin, Xunjian, Xu, Xiao, Kan, Min-Yen, Pan, Liangming, and Wang, William Yang
- Subjects
Computer Science - Computation and Language - Abstract
Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. However, existing approaches typically implement iterative refinement at the application or prompting level, relying on autoregressive (AR) modeling. The sequential token generation in AR models can lead to high inference latency. To overcome these challenges, we propose Context-Wise Order-Agnostic Language Modeling (COrAL), which incorporates iterative refinement directly into the LLM architecture while maintaining computational efficiency. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally during the generation process. Leveraging the order-agnostic nature of COrAL, we introduce sliding blockwise order-agnostic decoding, which performs multi-token forward prediction and backward reconstruction within context windows. This allows the model to iteratively refine its outputs in parallel in the sliding block, effectively capturing diverse dependencies without the high inference cost of sequential generation. Empirical evaluations on reasoning tasks demonstrate that COrAL improves performance and inference speed, respectively, achieving absolute accuracy gains of $4.6\%$ on GSM8K and $4.0\%$ on LogiQA, along with inference speedups of up to $3.9\times$ over next-token baselines. Preliminary results on code generation indicate a drop in pass rates due to inconsistencies in order-agnostic outputs, highlighting the inherent quality--speed trade-off. Our code is publicly available at https://github.com/YuxiXie/COrAL., Comment: 12 pages, 7 figures, 3 tables (23 pages, 9 figures, 4 tables including references and appendices)
- Published
- 2024
32. Quantum-Trained Convolutional Neural Network for Deepfake Audio Detection
- Author
-
Lin, Chu-Hsuan Abraham, Liu, Chen-Yu, Chen, Samuel Yen-Chi, and Chen, Kuan-Cheng
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing ,Quantum Physics - Abstract
The rise of deepfake technologies has posed significant challenges to privacy, security, and information integrity, particularly in audio and multimedia content. This paper introduces a Quantum-Trained Convolutional Neural Network (QT-CNN) framework designed to enhance the detection of deepfake audio, leveraging the computational power of quantum machine learning (QML). The QT-CNN employs a hybrid quantum-classical approach, integrating Quantum Neural Networks (QNNs) with classical neural architectures to optimize training efficiency while reducing the number of trainable parameters. Our method incorporates a novel quantum-to-classical parameter mapping that effectively utilizes quantum states to enhance the expressive power of the model, achieving up to 70% parameter reduction compared to classical models without compromising accuracy. Data pre-processing involved extracting essential audio features, label encoding, feature scaling, and constructing sequential datasets for robust model evaluation. Experimental results demonstrate that the QT-CNN achieves comparable performance to traditional CNNs, maintaining high accuracy during training and testing phases across varying configurations of QNN blocks. The QT framework's ability to reduce computational overhead while maintaining performance underscores its potential for real-world applications in deepfake detection and other resource-constrained scenarios. This work highlights the practical benefits of integrating quantum computing into artificial intelligence, offering a scalable and efficient approach to advancing deepfake detection technologies.
- Published
- 2024
33. Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering
- Author
-
Chen, I-Chun, Liu, Hsu-Shen, Sun, Wei-Fang, Chao, Chen-Hao, Hsu, Yen-Chang, and Lee, Chun-Yi
- Subjects
Computer Science - Machine Learning - Abstract
Sparse Mixture-of-Experts (SMoE) models represent a significant breakthrough in large language model development. These models enable performance improvements without a proportional increase in inference costs. By selectively activating a small set of parameters during task execution, SMoEs enhance model capacity. However, their deployment remains challenging due to the substantial memory footprint required to accommodate the growing number of experts. This constraint renders them less feasible in environments with limited hardware resources. To address this challenge, we propose Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE), a task-agnostic expert merging framework that reduces SMoE model parameters without retraining. Unlike previous methods, HC-SMoE employs hierarchical clustering based on expert outputs. This approach ensures that the merging process remains unaffected by routing decisions. The output-based clustering strategy captures functional similarities between experts, offering an adaptable solution for models with numerous experts. We validate our approach through extensive experiments on eight zero-shot language tasks and demonstrate its effectiveness in large-scale SMoE models such as Qwen and Mixtral. Our comprehensive results demonstrate that HC-SMoE consistently achieves strong performance, which highlights its potential for real-world deployment., Comment: Code: https://github.com/wazenmai/HC-SMoE
- Published
- 2024
34. TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration
- Author
-
Wang, Hsing-Hua, Tsai, Fu-Jen, Lin, Yen-Yu, and Lin, Chia-Wen
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Adverse weather image restoration aims to remove unwanted degraded artifacts, such as haze, rain, and snow, caused by adverse weather conditions. Existing methods achieve remarkable results for addressing single-weather conditions. However, they face challenges when encountering unpredictable weather conditions, which often happen in real-world scenarios. Although different weather conditions exhibit different degradation patterns, they share common characteristics that are highly related and complementary, such as occlusions caused by degradation patterns, color distortion, and contrast attenuation due to the scattering of atmospheric particles. Therefore, we focus on leveraging common knowledge across multiple weather conditions to restore images in a unified manner. In this paper, we propose a Triplet Attention Network (TANet) to efficiently and effectively address all-in-one adverse weather image restoration. TANet consists of Triplet Attention Block (TAB) that incorporates three types of attention mechanisms: Local Pixel-wise Attention (LPA) and Global Strip-wise Attention (GSA) to address occlusions caused by non-uniform degradation patterns, and Global Distribution Attention (GDA) to address color distortion and contrast attenuation caused by atmospheric phenomena. By leveraging common knowledge shared across different weather conditions, TANet successfully addresses multiple weather conditions in a unified manner. Experimental results show that TANet efficiently and effectively achieves state-of-the-art performance in all-in-one adverse weather image restoration. The source code is available at https://github.com/xhuachris/TANet-ACCV-2024., Comment: 17 pages (ACCV 2024)
- Published
- 2024
35. Quantum-Inspired Portfolio Optimization In The QUBO Framework
- Author
-
Lu, Ying-Chang, Chang, Yen-Jui, Yu, Lien-Po, and Fu, Chao-Ming
- Subjects
Quantitative Finance - Portfolio Management ,Quantum Physics - Abstract
A quantum-inspired optimization approach is proposed to study the portfolio optimization aimed at maximizing the returns of investment portfolio while minimizing its risk by diversifying investment across different asset classes. By integrating conventional approaches with quantum-inspired methods and simulation techniques for penalty coefficient estimation, this approach enables faster solutions to portfolio optimization. The proposed two-stage search strategy further enhances the method by starting with a broad search to quickly identify potential solutions and then refining these results to increase accuracy. The effectiveness of our approach is validated through experiments using a real-world dataset of quarterly financial data spanning ten years. Moreover, the effectiveness of various portfolio strategies under volatile market conditions is also investigated with emphasis on the robustness and predictive capacity of our methodology. This research contributes to the growing body of literature on quantum-inspired techniques in finance, demonstrating its potential as a powerful tool for asset allocation and portfolio management.
- Published
- 2024
36. Efficient transformer with reinforced position embedding for language models
- Author
-
Hsiao, Yen-Che and Dutta, Abhishek
- Subjects
Computer Science - Computation and Language - Abstract
In this paper, we propose an efficient transformer architecture that uses reinforced positional embedding to obtain superior performance with half the number of encoder decoder layers. We demonstrate that concatenating positional encoding with trainable token embeddings, normalizing columns in the token embedding matrix, and using the normalized token embedding matrix as the value of the attention layer improve the training and validation loss and the training time in an encoder-decoder Transformer model for a Portuguese-English translation task with 10 epochs or 12 hours of training across 10 trials. Our method, with roughly a threefold parameter reduction compared to the baseline model, yields a mean training loss of 1.21, a mean validation loss of 1.51, and an average training time of 1352.27 seconds per epoch, surpassing the baseline model with the same embedding dimension that employs addition of positional encoding and token embeddings, which achieves a mean training loss of 1.96, a validation loss of 2.18, and an average training time of 4297.79 seconds per epoch. Additionally, we evaluated our proposed architecture and the baseline across 14 diverse translation datasets from TensorFlow. The results indicate that our method consistently achieves lower or comparable training and validation losses, suggesting enhanced learning efficiency.
- Published
- 2024
37. Demo of Zero-Shot Guitar Amplifier Modelling: Enhancing Modeling with Hyper Neural Networks
- Author
-
Chen, Yu-Hua, Cheng, Yuan-Chiao, Yeh, Yen-Tung, Wu, Jui-Te, Ho, Yu-Hsiang, Jang, Jyh-Shing Roger, and Yang, Yi-Hsuan
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Electric guitar tone modeling typically focuses on the non-linear transformation from clean to amplifier-rendered audio. Traditional methods rely on one-to-one mappings, incorporating device parameters into neural models to replicate specific amplifiers. However, these methods are limited by the need for specific training data. In this paper, we adapt a model based on the previous work, which leverages a tone embedding encoder and a feature wise linear modulation (FiLM) condition method. In this work, we altered conditioning method using a hypernetwork-based gated convolutional network (GCN) to generate audio that blends clean input with the tone characteristics of reference audio. By extending the training data to cover a wider variety of amplifier tones, our model is able to capture a broader range of tones. Additionally, we developed a real-time plugin to demonstrate the system's practical application, allowing users to experience its performance interactively. Our results indicate that the proposed system achieves superior tone modeling versatility compared to traditional methods., Comment: demo of the ISMIR paper
- Published
- 2024
38. MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans?
- Author
-
Li, Guanzhen, Xie, Yuxi, and Kan, Min-Yen
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Humans perform visual perception at multiple levels, including low-level object recognition and high-level semantic interpretation such as behavior understanding. Subtle differences in low-level details can lead to substantial changes in high-level perception. For example, substituting the shopping bag held by a person with a gun suggests violent behavior, implying criminal or violent activity. Despite significant advancements in various multimodal tasks, Large Visual-Language Models (LVLMs) remain unexplored in their capabilities to conduct such multi-level visual perceptions. To investigate the perception gap between LVLMs and humans, we introduce MVP-Bench, the first visual-language benchmark systematically evaluating both low- and high-level visual perception of LVLMs. We construct MVP-Bench across natural and synthetic images to investigate how manipulated content influences model perception. Using MVP-Bench, we diagnose the visual perception of 10 open-source and 2 closed-source LVLMs, showing that high-level perception tasks significantly challenge existing LVLMs. The state-of-the-art GPT-4o only achieves an accuracy of $56\%$ on Yes/No questions, compared with $74\%$ in low-level scenarios. Furthermore, the performance gap between natural and manipulated images indicates that current LVLMs do not generalize in understanding the visual semantics of synthetic images as humans do. Our data and code are publicly available at https://github.com/GuanzhenLi/MVP-Bench.
- Published
- 2024
39. Attainable Force Approximation and Full-Pose Tracking Control of an Over-Actuated Thrust-Vectoring Modular Team UAV
- Author
-
Chu, Yen-Cheng, Fang, Kai-Cheng, and Lian, Feng-Li
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Traditional vertical take-off and landing (VTOL) aircraft can not achieve optimal efficiency for various payload weights and has limited mobility due to its under-actuation. With the thrust-vectoring mechanism, the proposed modular team UAV is fully actuated at certain attitudes. However, the attainable force space (AFS) differs according to the team configuration, which makes the controller design difficult. We propose an approximation to the AFS and a full-pose tracking controller with an attitude planner and a force projection, which guarantees the control force is feasible. The proposed approach can be applied to UAVs having multiple thrust-vectoring effectors with homogeneous agents. The simulation and experiment demonstrate a tilting motion during hovering for a 4-agent team.
- Published
- 2024
40. HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
- Author
-
Yen, Howard, Gao, Tianyu, Hou, Minmin, Ding, Ke, Fleischer, Daniel, Izsak, Peter, Wasserblat, Moshe, and Chen, Danqi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
There have been many benchmarks for evaluating long-context language models (LCLMs), but developers often rely on synthetic tasks like needle-in-a-haystack (NIAH) or arbitrary subsets of tasks. It remains unclear whether they translate to the diverse downstream applications of LCLMs, and the inconsistency further complicates model comparison. We investigate the underlying reasons behind current practices and find that existing benchmarks often provide noisy signals due to low coverage of applications, insufficient lengths, unreliable metrics, and incompatibility with base models. In this work, we present HELMET (How to Evaluate Long-context Models Effectively and Thoroughly), a comprehensive benchmark encompassing seven diverse, application-centric categories. We also address many issues in previous benchmarks by adding controllable lengths up to 128k tokens, model-based evaluation for reliable metrics, and few-shot prompting for robustly evaluating base models. Consequently, we demonstrate that HELMET offers more reliable and consistent rankings of frontier LCLMs. Through a comprehensive study of 51 LCLMs, we find that (1) synthetic tasks like NIAH are not good predictors of downstream performance; (2) the diverse categories in HELMET exhibit distinct trends and low correlation with each other; and (3) while most LCLMs achieve perfect NIAH scores, open-source models significantly lag behind closed ones when the task requires full-context reasoning or following complex instructions -- the gap widens with increased lengths. Finally, we recommend using our RAG tasks for fast model development, as they are easy to run and more predictive of other downstream performance; ultimately, we advocate for a holistic evaluation across diverse tasks., Comment: Code and data are available here: https://github.com/princeton-nlp/HELMET
- Published
- 2024
41. How to Train Long-Context Language Models (Effectively)
- Author
-
Gao, Tianyu, Wettig, Alexander, Yen, Howard, and Chen, Danqi
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information. We first establish a reliable evaluation protocol to guide model development -- Instead of perplexity or simple needle-in-a-haystack (NIAH) tests, we use a broad set of long-context tasks, and we evaluate models after SFT with instruction data as this better reveals long-context abilities. Supported by our robust evaluations, we run thorough experiments to decide the data mix for continued pre-training, the instruction tuning dataset, and many other design choices. We find that (1) code repositories and books are excellent sources of long data, but it is crucial to combine them with high-quality short data; (2) training with a sequence length beyond the evaluation length boosts long-context performance; (3) for SFT, using only short instruction datasets yields strong performance on long-context tasks. Our final model, ProLong-8B, which is initialized from Llama-3 and trained on 40B tokens, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K. ProLong outperforms Llama-3.18B-Instruct on the majority of long-context tasks despite having seen only 5% as many tokens during long-context training. Additionally, ProLong can effectively process up to 512K tokens, one of the longest context windows of publicly available LMs., Comment: Our code, data, and models are available at https://github.com/princeton-nlp/ProLong
- Published
- 2024
42. Efficient Long-Form Speech Recognition for General Speech In-Context Learning
- Author
-
Yen, Hao, Ling, Shaoshi, and Ye, Guoli
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
We propose a novel approach to end-to-end automatic speech recognition (ASR) to achieve efficient speech in-context learning (SICL) for (i) long-form speech decoding, (ii) test-time speaker adaptation, and (iii) test-time contextual biasing. Specifically, we introduce an attention-based encoder-decoder (AED) model with SICL capability (referred to as SICL-AED), where the decoder utilizes an utterance-level cross-attention to integrate information from the encoder's output efficiently, and a document-level self-attention to learn contextual information. Evaluated on the benchmark TEDLIUM3 dataset, SICL-AED achieves an 8.64% relative word error rate (WER) reduction compared to a baseline utterance-level AED model by leveraging previously decoded outputs as in-context examples. It also demonstrates comparable performance to conventional long-form AED systems with significantly reduced runtime and memory complexity. Additionally, we introduce an in-context fine-tuning (ICFT) technique that further enhances SICL effectiveness during inference. Experiments on speaker adaptation and contextual biasing highlight the general speech in-context learning capabilities of our system, achieving effective results with provided contexts. Without specific fine-tuning, SICL-AED matches the performance of supervised AED baselines for speaker adaptation and improves entity recall by 64% for contextual biasing task., Comment: 5 pages, Submitted to ICASSP 2025
- Published
- 2024
43. Improving Visual Object Tracking through Visual Prompting
- Author
-
Chen, Shih-Fang, Chen, Jun-Cheng, Jhuo, I-Hong, and Lin, Yen-Yu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Multimedia ,Electrical Engineering and Systems Science - Image and Video Processing ,68 ,I.4 ,I.2 ,I.5 ,I.4.1 ,I.4.8 ,I.4.9 ,I.4.10 - Abstract
Learning a discriminative model to distinguish a target from its surrounding distractors is essential to generic visual object tracking. Dynamic target representation adaptation against distractors is challenging due to the limited discriminative capabilities of prevailing trackers. We present a new visual Prompting mechanism for generic Visual Object Tracking (PiVOT) to address this issue. PiVOT proposes a prompt generation network with the pre-trained foundation model CLIP to automatically generate and refine visual prompts, enabling the transfer of foundation model knowledge for tracking. While CLIP offers broad category-level knowledge, the tracker, trained on instance-specific data, excels at recognizing unique object instances. Thus, PiVOT first compiles a visual prompt highlighting potential target locations. To transfer the knowledge of CLIP to the tracker, PiVOT leverages CLIP to refine the visual prompt based on the similarities between candidate objects and the reference templates across potential targets. Once the visual prompt is refined, it can better highlight potential target locations, thereby reducing irrelevant prompt information. With the proposed prompting mechanism, the tracker can generate improved instance-aware feature maps through the guidance of the visual prompt, thus effectively reducing distractors. The proposed method does not involve CLIP during training, thereby keeping the same training complexity and preserving the generalization capability of the pretrained foundation model. Extensive experiments across multiple benchmarks indicate that PiVOT, using the proposed prompting method can suppress distracting objects and enhance the tracker., Comment: Accepted and to appear in IEEE Transactions on Multimedia
- Published
- 2024
44. A minimizing movement approach for crystalline eikonal-curvature flows of spirals
- Author
-
Ohtsuka, Takeshi and Tsai, Yen-Hsi Richard
- Subjects
Mathematics - Numerical Analysis ,35K65, 53E10, 65M06, 65K10, 53A04 - Abstract
We propose an algorithm for evolving spiral curves on a planar domain by normal velocities depending on the so-called crystalline curvatures. The algorithm uses a minimizing movement approach and relies on a special level set method for embedding the spirals. We present numerical simulations and comparisons demonstrating the efficacy of the proposed numerical algorithm., Comment: 45pages, 17 figures
- Published
- 2024
45. No Evidence of a Dichotomy in the Elliptical Galaxy Population
- Author
-
Monteiro-Oliveira, Rogério, Lin, Yen-Ting, Chen, Wei-Huai, Chuang, Chen-Yu, Abdurro'uf, and Wu, Po-Feng
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
The advent of large integral field spectroscopic surveys has found that elliptical galaxies (EGs) can be classified into two classes: the fast rotators (whose kinematics are dominated by rotation) and the slow rotators (which exhibit slow or no rotation pattern). It is often suggested that while the slow rotators typically have boxy isophotal shapes, have a high $\alpha$-to-iron abundance ratio, and are quite massive, the fast rotators often exhibit the opposite properties (that is, having disky isophotes, lower $\alpha$-to-iron ratio, and of typical masses). Whether the EGs consist of two distinct populations (i.e., a dichotomy exists), remains an unsolved issue. To examine the existence of the dichotomy, we used a sample of 1,895 EGs from the SDSS-IV MaNGA survey, and measured robustly the stellar kinematics, isophotal shapes, and [Mg/Fe] ratio. We confirmed the previous finding that the bulk of the EGs are disky (65%) and fast rotators (67%), but found no evidence supporting a dichotomy, based on a principal component analysis. The different classes (boxy/disky and slow/fast rotators) of EGs occupy slightly different loci in the principal component space. This may explain the observed trends that led to the premature support of a dichotomy based on small samples of galaxies., Comment: 27 pages, 19 figures, 5 tables. Submitted to ApJ. Comments are welcome!
- Published
- 2024
46. An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement
- Author
-
Ku, Pin-Jui, Ho, Chun-Wei, Yen, Hao, Siniscalchi, Sabato Marco, and Lee, Chin-Hui
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
In this work, we propose a novel consistency-preserving loss function for recovering the phase information in the context of phase reconstruction (PR) and speech enhancement (SE). Different from conventional techniques that directly estimate the phase using a deep model, our idea is to exploit ad-hoc constraints to directly generate a consistent pair of magnitude and phase. Specifically, the proposed loss forces a set of complex numbers to be a consistent short-time Fourier transform (STFT) representation, i.e., to be the spectrogram of a real signal. Our approach thus avoids the difficulty of estimating the original phase, which is highly unstructured and sensitive to time shift. The influence of our proposed loss is first assessed on a PR task, experimentally demonstrating that our approach is viable. Next, we show its effectiveness on an SE task, using both the VB-DMD and WSJ0-CHiME3 data sets. On VB-DMD, our approach is competitive with conventional solutions. On the challenging WSJ0-CHiME3 set, the proposed framework compares favourably over those techniques that explicitly estimate the phase., Comment: 5 pages, Submitted to ICASSP 2025
- Published
- 2024
47. End-User-Centric Collaborative MIMO: Performance Analysis and Proof of Concept
- Author
-
Wen, Chao-Kai, Chan, Yen-Cheng, Huang, Tzu-Hao, Zeng, Hao-Jun, Wang, Fu-Kang, Tsai, Lung-Sheng, and Liao, Pei-Kai
- Subjects
Computer Science - Information Theory ,Electrical Engineering and Systems Science - Signal Processing - Abstract
The trend toward using increasingly large arrays of antenna elements continues. However, fitting more antennas into the limited space available on user equipment (UE) within the currently popular Frequency Range 1 spectrum presents a significant challenge. This limitation constrains the capacity scaling gains for end users, even when networks can support a higher number of antennas. To address this issue, we explore a user-centric collaborative MIMO approach, termed UE-CoMIMO, which leverages several fixed or portable devices within a personal area to form a virtually expanded antenna array. This paper develops a comprehensive mathematical framework to analyze the performance of UE-CoMIMO. Our analytical results demonstrate that UE-CoMIMO can significantly enhance the system's effective channel response within the current communication system without requiring extensive modifications. Further performance improvements can be realized by optimizing the phase shifters on the expanded antenna arrays at the collaborative devices. These findings are corroborated by ray-tracing simulations. Beyond the simulations, we implemented these collaborative devices and successfully conducted over-the-air validation in a real 5G environment, showcasing the practical potential of UE-CoMIMO. Several practical perspectives are discussed, highlighting the feasibility and benefits of this approach in real-world scenarios., Comment: 13 pages, 11 figures, this work has been submitted to IEEE for possible publication
- Published
- 2024
48. M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images
- Author
-
Wang, Hongyi, Du, Xiuju, Liu, Jing, Ouyang, Shuyi, Chen, Yen-Wei, and Lin, Lanfen
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
The advancement of Spatial Transcriptomics (ST) has facilitated the spatially-aware profiling of gene expressions based on histopathology images. Although ST data offers valuable insights into the micro-environment of tumors, its acquisition cost remains expensive. Therefore, directly predicting the ST expressions from digital pathology images is desired. Current methods usually adopt existing regression backbones along with patch-sampling for this task, which ignores the inherent multi-scale information embedded in the pyramidal data structure of digital pathology images, and wastes the inter-spot visual information crucial for accurate gene expression prediction. To address these limitations, we propose M2OST, a many-to-one regression Transformer that can accommodate the hierarchical structure of the pathology images via a decoupled multi-scale feature extractor. Unlike traditional models that are trained with one-to-one image-label pairs, M2OST uses multiple images from different levels of the digital pathology image to jointly predict the gene expressions in their common corresponding spot. Built upon our many-to-one scheme, M2OST can be easily scaled to fit different numbers of inputs, and its network structure inherently incorporates nearby inter-spot features, enhancing regression performance. We have tested M2OST on three public ST datasets and the experimental results show that M2OST can achieve state-of-the-art performance with fewer parameters and floating-point operations (FLOPs). The code will be released upon acceptance.
- Published
- 2024
49. Hyperdisordered cell packing on a growing surface
- Author
-
Ross, Robert J. H., Masucci, Giovanni D., Lin, Chun Yen, Iglesias, Teresa L., Reiter, Sam, and Pigolotti, Simone
- Subjects
Condensed Matter - Soft Condensed Matter ,Condensed Matter - Statistical Mechanics ,Physics - Biological Physics ,Quantitative Biology - Tissues and Organs - Abstract
While the physics of disordered packing in non-growing systems is well understood, unexplored phenomena can emerge when packing takes place in growing domains. We study the arrangements of pigment cells (chromatophores) on squid skin as a biological example of a packed system on an expanding surface. We find that relative density fluctuations in cell numbers grow with spatial scale. We term this behavior ''hyperdisordered'', in contrast with hyperuniform behavior in which relative fluctuations tend to zero at large scale. We find that hyperdisordered scaling, akin to that of a critical system, is quantitatively reproduced by a model in which hard disks are randomly inserted in a homogeneously growing surface. In addition, we find that chromatophores increase in size during animal development, but maintain a stationary size distribution. The physical mechanisms described in our work may apply to a broad class of growing dense systems.
- Published
- 2024
50. Qualitative Properties of $k-$Center Problems
- Author
-
Long, Vo Si Trong, Nam, Nguyen Mau, Sharkansky, Jacob, and Yen, Nguyen Dong
- Subjects
Mathematics - Optimization and Control - Abstract
In this paper, we study generalized versions of the k-center problem, which involves finding k circles of the smallest possible equal radius that cover a finite set of points in the plane. By utilizing the Minkowski gauge function, we extend this problem to generalized balls induced by various convex sets in finite dimensions, rather than limiting it to circles in the plane. First, we establish several fundamental properties of the global optimal solutions to this problem. We then introduce the notion of local optimal solutions and provide a sufficient condition for their existence. We also provide several illustrative examples to clarify the proposed problems.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.