Author: "Mousavi, A." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Mousavi, A."' showing total 67,322 results

Start Over Author "Mousavi, A."

67,322 results on '"Mousavi, A."'

1. A Comparative Analysis of Instruction Fine-Tuning LLMs for Financial Text Classification

Author: Fatemi, Sorouralsadat, Hu, Yuheng, and Mousavi, Maryam
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across diverse Natural Language Processing (NLP) tasks, including language understanding, reasoning, and generation. However, general-domain LLMs often struggle with financial tasks due to the technical and specialized nature of financial texts. This study investigates the efficacy of instruction fine-tuning smaller-scale LLMs, including Mistral-7B, Llama3-8B, and Phi3-mini, to enhance their performance in financial text classification tasks. We fine-tuned both instruction-tuned and base models across four financial classification tasks, achieving significant improvements in task-specific performance. Furthermore, we evaluated the zero-shot capabilities of these fine-tuned models on three unseen complex financial tasks, including argument classification, deal completeness classification, and causal classification. Our results indicate while base model fine-tuning led to greater degradation, instruction-tuned models maintained more robust performance. To address this degradation, we employed model merging techniques, integrating single-task domain-specific fine-tuned models with the base model. Using this merging method resulted in significant enhancements in zero-shot performance, even exceeding the original model's accuracy on certain datasets. Our findings underscore the effectiveness of instruction fine-tuning and model merging for adapting LLMs to specialized financial text classification tasks.
Published: 2024

2. Safety Verification for Evasive Collision Avoidance in Autonomous Vehicles with Enhanced Resolutions

Author: Arab, Aliasghar, Khaleghi, Milad, Partovi, Alireza, Abbaspour, Alireza, Shinde, Chaitanya, Mousavi, Yashar, Azimi, Vahid, and Karimmoddini, Ali
Subjects: Computer Science - Robotics
Abstract: This paper presents a comprehensive hazard analysis, risk assessment, and loss evaluation for an Evasive Minimum Risk Maneuvering (EMRM) system designed for autonomous vehicles. The EMRM system is engineered to enhance collision avoidance and mitigate loss severity by drawing inspiration from professional drivers who perform aggressive maneuvers while maintaining stability for effective risk mitigation. Recent advancements in autonomous vehicle technology demonstrate a growing capability for high-performance maneuvers. This paper discusses a comprehensive safety verification process and establishes a clear safety goal to enhance testing validation. The study systematically identifies potential hazards and assesses their risks to overall safety and the protection of vulnerable road users. A novel loss evaluation approach is introduced, focusing on the impact of mitigation maneuvers on loss severity. Additionally, the proposed mitigation integrity level can be used to verify the minimum-risk maneuver feature. This paper applies a verification method to evasive maneuvering, contributing to the development of more reliable active safety features in autonomous driving systems.
Published: 2024

3. A Density Theorem for Higher Order Sums of Prime Numbers

Author: Lacey, Michael T., Mousavi, Hamed, Rahimi, Yaghoub, and Vempati, Manasa N.
Subjects: Mathematics - Number Theory
Abstract: Let $P$ be a subset of the primes of lower density strictly larger than $\frac12$. Then, every sufficiently large even integer is a sum of four primes from the set $P$. We establish similar results for $k$-summands, with $k\geq 4$, and for $k \geq 4$ distinct subsets of primes. This extends the work of H.~Li, H.~Pan, as well as X.~Shao on sums of three primes, and A.~Alsteri and X.~Shao on sums of two primes. The primary new contributions come from elementary combinatorial lemmas., Comment: 17 pages
Published: 2024

4. Enhancing Autonomous Driving Safety Analysis with Generative AI: A Comparative Study on Automated Hazard and Risk Assessment

Author: Abbaspour, Alireza, Arab, Aliasghar, and Mousavi, Yashar
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: The advent of autonomous driving technology has accentuated the need for comprehensive hazard analysis and risk assessment (HARA) to ensure the safety and reliability of vehicular systems. Traditional HARA processes, while meticulous, are inherently time-consuming and subject to human error, necessitating a transformative approach to fortify safety engineering. This paper presents an integrative application of generative artificial intelligence (AI) as a means to enhance HARA in autonomous driving safety analysis. Generative AI, renowned for its predictive modeling and data generation capabilities, is leveraged to automate the labor-intensive elements of HARA, thus expediting the process and augmenting the thoroughness of the safety analyses. Through empirical research, the study contrasts conventional HARA practices conducted by safety experts with those supplemented by generative AI tools. The benchmark comparisons focus on critical metrics such as analysis time, error rates, and scope of risk identification. By employing generative AI, the research demonstrates a significant upturn in efficiency, evidenced by reduced timeframes and expanded analytical coverage. The AI-augmented processes also deliver enhanced brainstorming support, stimulating creative problem-solving and identifying previously unrecognized risk factors.
Published: 2024

5. Choice confidence bridges credit assignment to levels of decision hierarchy

Author: Harris, Amir. M Mousavi, Esmaily, Jamal, Zabbah, Sajjad, Ebrahimpour, Reza, and Bahrami, Bahador
Subjects: Quantitative Biology - Neurons and Cognition
Abstract: Everyday decisions often involve many different levels. What connects these higher and lower level decisions hierarchy to one another determines how the cause(s) of failures are interpreted. It is hypothesized that decision confidence guides the assignment of blame to the correct level of hierarchy but this hypothesis has only been tested by manipulation of sensory evidence itself. We examined the consequences of modulating subjective confidence in hierarchical decision making via extra-sensory, social influence. Participants who made hierarchical, motion-plus-bandit decisions also received social information from a partner that advised the participant in the motion task. The strength of social advice -- independently from sensory signals -- modulated the likelihood of strategy change after negative feedback. Our findings therefore provide strong empirical evidence that subjective confidence per se acts as the bridge in assignment of credit and blame to various levels of decision hierarchy.
Published: 2024

6. Robust Feature Learning for Multi-Index Models in High Dimensions

Author: Mousavi-Hosseini, Alireza, Javanmard, Adel, and Erdogdu, Murat A.
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Recently, there have been numerous studies on feature learning with neural networks, specifically on learning single- and multi-index models where the target is a function of a low-dimensional projection of the input. Prior works have shown that in high dimensions, the majority of the compute and data resources are spent on recovering the low-dimensional projection; once this subspace is recovered, the remainder of the target can be learned independently of the ambient dimension. However, implications of feature learning in adversarial settings remain unexplored. In this work, we take the first steps towards understanding adversarially robust feature learning with neural networks. Specifically, we prove that the hidden directions of a multi-index model offer a Bayes optimal low-dimensional projection for robustness against $\ell_2$-bounded adversarial perturbations under the squared loss, assuming that the multi-index coordinates are statistically independent from the rest of the coordinates. Therefore, robust learning can be achieved by first performing standard feature learning, then robustly tuning a linear readout layer on top of the standard representations. In particular, we show that adversarially robust learning is just as easy as standard learning, in the sense that the additional number of samples needed to robustly learn multi-index models when compared to standard learning, does not depend on dimensionality., Comment: 39 pages, 1 figure
Published: 2024

7. Dispersion properties of pulsar magnetospheric plasmas with relativistic Kappa distribution

Author: Mousavi, M. and Benáček, J.
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: The Kappa distribution could encompass the diverse characteristics of the magnetospheric plasma of surrounding of neutron stars in both hot and cold environments; however, the Maxwell-J\"uttner distribution is so far widely used to characterise these plasmas. We aim to analyse the linear dispersion properties and to compare the growth rates yielded from the relativistic kinetic dispersion relation for the pulsar magnetospheric plasmas. We developed a numerical dispersion solver to investigate the plasmas with arbitrary velocity distributions, mainly focusing on relativistic kappa and Maxwell-J\"uttner distribution functions. By considering different kappa distribution indices and using analytical and numerical approaches, the dispersion properties of the kappa and Maxwell-J\"uttner distributions converge at high wave numbers and low temperatures, indicating that the choice of distribution functions has little effect at higher wave numbers $c.k/\omega_p \gg 1$ and high inverse temperatures $\rho=100$. However, each distribution function exhibits unique and complimentary properties in semi-relativistic to relativistic inverse temperatures $\rho \leq 10-1$ and at lower wave numbers $c.k/\omega_p\leq 1$. This highlights the necessity of utilizing such dispersion solver for these parameters that allows to properly comprehend the dispersion properties of the neutron star magnetospheric plasmas., Comment: 13 pages, 9 figures, submitted to AIP Publishing - Physics of Plasmas
Published: 2024

8. Scaled quantum theory. The bouncing ball problem

Author: Mousavi, S. V. and Miret-Artés, S.
Subjects: Quantum Physics, Physics - Physics Education
Abstract: Within the so-called scaled quantum theory, the standard bouncing ball problem is analyzed under the presence of a gravitational field and harmonic potential. In this framework, the quantum-classical transition of the density matrix is described by the linear scaled von Neumann equation for mixed states and after it has been particularized to the case of pure states. The main purpose of this work is to show how this theory works for conservative systems and the quantum-classical transition is carried out in a continuous and smooth way, being equivalent to a nonlinear differential wave equation which contains a transition parameter ranging continuously from one to zero and covering all dynamical regimes in-between the two extreme quantum and classical regimes. This parameter can be seen as a degree of quantumness where all intermediate dynamical regimes show quantum features but are fading gradually when approaching to the classical value., Comment: Accepted for publication in Euro. Phys. J. Plus
Published: 2024
Full Text: View/download PDF

9. A note on meta and para-$\mathfrak{Nil}$-Hamiltonian groups

Author: Mousavi, Hamid
Subjects: Mathematics - Group Theory
Abstract: Let $\mathfrak{Nil}$ be the class of nilpotent groups. This article explores the finiteness of meta and para-$\mathfrak{Nil}$-Hamiltonian groups or their derived subgroups when these groups contain a non-nilpotent (or insoluble) subgroup of finite order or a nilpotent subgroup of finite index.
Published: 2024

10. A Quantum Unique Games Conjecture

Author: Mousavi, Hamoon and Spirig, Taro
Subjects: Quantum Physics, Computer Science - Computational Complexity
Abstract: After the NP-hardness of computational problems such as 3SAT and MaxCut was established, a natural next step was to explore whether these problems remain hard to approximate. While the quantum extensions of some of these problems are known to be hard-indeed undecidable-their inapproximability remains largely unresolved. In this work, we introduce definitions for the quantum extensions of Label-Cover and Unique-Label-Cover. We show that these problems play a similarly crucial role in studying the inapproximability of quantum constraint satisfaction problems as they do in the classical setting., Comment: 43 pages, 12 figures
Published: 2024

11. What Are They Doing? Joint Audio-Speech Co-Reasoning

Author: Wang, Yingzhi, Mousavi, Pooneh, Ploujnikov, Artem, and Ravanelli, Mirco
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In audio and speech processing, tasks usually focus on either the audio or speech modality, even when both sounds and human speech are present in the same audio clip. Recent Auditory Large Language Models (ALLMs) have made it possible to process audio and speech simultaneously within a single model, leading to further considerations of joint audio-speech tasks. In this paper, we investigate how well ALLMs can perform joint audio-speech processing. Specifically, we introduce Joint Audio-Speech Co-Reasoning (JASCO), a novel task that unifies audio and speech processing, strictly requiring co-reasoning across both modalities. We release a scene-reasoning dataset called "What Are They Doing" and establish a joint audio-speech benchmark to evaluate the joint reasoning capability of popular ALLMs. Additionally, we provide deeper insights into the models' behaviors by analyzing their dependence on each modality., Comment: Submitted to ICASSP 2025
Published: 2024

12. Pointwise convergence of bilinear polynomial averages over the primes

Author: Krause, Ben, Mousavi, Hamed, Tao, Terence, and Teräväinen, Joni
Subjects: Mathematics - Dynamical Systems, Mathematics - Classical Analysis and ODEs, Mathematics - Number Theory, 37A30, 37A44, 37A46, 11B30
Abstract: We show that on a $\sigma$-finite measure preserving system $X = (X,\nu, T)$, the non-conventional ergodic averages $$ \mathbb{E}_{n \in [N]} \Lambda(n) f(T^n x) g(T^{P(n)} x)$$ converge pointwise almost everywhere for $f \in L^{p_1}(X)$, $g \in L^{p_2}(X)$, and $1/p_1 + 1/p_2 \leq 1$, where $P$ is a polynomial with integer coefficients of degree at least $2$. This had previously been established with the von Mangoldt weight $\Lambda$ replaced by the constant weight $1$ by the first and third authors with Mirek, and by the M\"obius weight $\mu$ by the fourth author. The proof is based on combining tools from both of these papers, together with several Gowers norm and polynomial averaging operator estimates on approximants to the von Mangoldt function of ''Cram\'er'' and ''Heath-Brown'' type., Comment: 37 pages
Published: 2024

13. Open-Source Conversational AI with SpeechBrain 1.0

Author: Ravanelli, Mirco, Parcollet, Titouan, Moumen, Adel, de Langen, Sylvain, Subakan, Cem, Plantinga, Peter, Wang, Yingzhi, Mousavi, Pooneh, Della Libera, Luca, Ploujnikov, Artem, Paissan, Francesco, Borra, Davide, Zaiem, Salah, Zhao, Zeyu, Zhang, Shucong, Karakasidis, Georgios, Yeh, Sung-Lin, Champion, Pierre, Rouhe, Aku, Braun, Rudolf, Mai, Florian, Zuluaga-Gomez, Juan, Mousavi, Seyed Mahed, Nautsch, Andreas, Nguyen, Ha, Liu, Xuechen, Sagar, Sangeet, Duret, Jarod, Mdhaffar, Salima, Laperriere, Gaelle, Rouvier, Mickael, De Mori, Renato, and Esteve, Yannick
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Human-Computer Interaction, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks., Comment: Accepted to the Journal of Machine Learning research (JMLR), Machine Learning Open Source Software
Published: 2024

14. An Exploratory Study of the Influence of CELTA/TESOL Certification on Non-Native English Teachers' Practical Teaching Knowledge

Author: Yousef Mousavi, Peyman Rajabi, and Hamid Reza Khalaji
Abstract: The demand for proficient English language teachers has increased significantly in non-native English-speaking countries, emphasizing the need for effective teacher training programs. This study investigated the influence of the CELTA/TESOL certification on non-native English teachers' practical teaching knowledge and efficacy perceptions in an English as a Foreign Language (EFL) setting. A qualitative approach was employed, and data were collected via semi-structured interviews. The participants comprised eight Iranian EFL teachers who completed the CELTA/TESOL certification program. Before the certification course, the teachers exhibited diverse teaching approaches, emphasizing teacher-centered methods. Their confidence levels varied, and they faced challenges in classroom management and addressing student needs. The lack of formal training in practical teaching techniques was evident among participants. Following the certification course, significant positive changes were observed in the teachers' self-reported practices and self-efficacy perceptions. Participants also showed positive dispositions towards student-centered approaches, integrated learner needs and interests, and utilized diverse instructional strategies. The course contributed to effective lesson planning and classroom management. Collaborative learning and ongoing professional development were fostered, enhancing the teachers' reflective practice. As a result, the non-native English teachers reported increased confidence in their teaching abilities. This study contributes to understanding the benefits of the CELTA/TESOL certification program for non-native English teachers and emphasizes the importance of practical teacher training in EFL contexts. It highlights the transformative potential of such certification courses in bridging the gap in practical teaching knowledge and supporting the professional growth of non-native English teachers.
Published: 2024

15. Exploring Iranian Pre-Service Teachers' Conceptual Understanding of Chemical Equilibrium

Author: Mahshid Golestaneh and Seyed Mohsen Mousavi
Abstract: This study aimed to develop a two-tier test to identify misconceptions of pre-service teachers about chemical equilibrium. The sample was made up of 135 pre-service chemistry teachers at Farhangian University in Iran (70 female and 65 male) who were spending the final semester of the eighth semester of the teacher training programme. After analysing the distribution pattern of the participants' answers in the first and second tiers, fifteen misconceptions were identified. A new misconception was identified for the first time, which we called the common ion effect which was held by about 50% of participants. Gender was a significant factor in the rate of misconceptions, with male preservice teachers having fewer rate misconceptions compared with females. The results showed that when the first tier or the second tier was considered alone, female participants performed better, but when both tiers were combined, the performance of males was better. However, males had a weaker performance in three questions related to the approach to equilibrium in this situation. These findings will help educators plan their instruction by knowing pre-service teachers' preconceptions about chemical equilibrium.
Published: 2024

16. SustainDC: Benchmarking for Sustainable Data Center Control

Author: Naug, Avisek, Guillen, Antonio, Luna, Ricardo, Gundecha, Vineet, Rengarajan, Desik, Ghorbanpour, Sahand, Mousavi, Sajad, Babu, Ashwin Ramesh, Markovikj, Dejan, Kashyap, Lekhapriya D, and Sarkar, Soumyendu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Systems and Control
Abstract: Machine learning has driven an exponential increase in computational demand, leading to massive data centers that consume significant amounts of energy and contribute to climate change. This makes sustainable data center control a priority. In this paper, we introduce SustainDC, a set of Python environments for benchmarking multi-agent reinforcement learning (MARL) algorithms for data centers (DC). SustainDC supports custom DC configurations and tasks such as workload scheduling, cooling optimization, and auxiliary battery management, with multiple agents managing these operations while accounting for the effects of each other. We evaluate various MARL algorithms on SustainDC, showing their performance across diverse DC designs, locations, weather conditions, grid carbon intensity, and workload requirements. Our results highlight significant opportunities for improvement of data center operations using MARL algorithms. Given the increasing use of DC due to AI, SustainDC provides a crucial platform for the development and benchmarking of advanced algorithms essential for achieving sustainable computing and addressing other heterogeneous real-world challenges., Comment: Under review at Advances in Neural Information Processing Systems 2024 (NeurIPS 2024)
Published: 2024

17. Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

Author: Mousavi-Hosseini, Alireza, Wu, Denny, and Erdogdu, Murat A.
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm. Under mild distributional assumptions on the data, we characterize the effective dimension $d_{\mathrm{eff}}$ that controls both sample and computational complexity by utilizing the adaptivity of neural networks to latent low-dimensional structures. When the data exhibit such a structure, $d_{\mathrm{eff}}$ can be significantly smaller than the ambient dimension. We prove that the sample complexity grows almost linearly with $d_{\mathrm{eff}}$, bypassing the limitations of the information and generative exponents that appeared in recent analyses of gradient-based feature learning. On the other hand, the computational complexity may inevitably grow exponentially with $d_{\mathrm{eff}}$ in the worst-case scenario. Motivated by improving computational complexity, we take the first steps towards polynomial time convergence of the mean-field Langevin algorithm by investigating a setting where the weights are constrained to be on a compact manifold with positive Ricci curvature, such as the hypersphere. There, we study assumptions under which polynomial time convergence is achievable, whereas similar assumptions in the Euclidean setting lead to exponential time complexity., Comment: 35 pages, 1 figure
Published: 2024

18. ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models

Author: Pradeep, Ronak, Lee, Daniel, Mousavi, Ali, Pound, Jeff, Sang, Yisi, Lin, Jimmy, Ilyas, Ihab, Potdar, Saloni, Arefiyan, Mostafa, and Li, Yunyao
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: The rapid advancement of Large Language Models (LLMs) and conversational assistants necessitates dynamic, scalable, and configurable conversational datasets for training and evaluation. These datasets must accommodate diverse user interaction modes, including text and voice, each presenting unique modeling challenges. Knowledge Graphs (KGs), with their structured and evolving nature, offer an ideal foundation for current and precise knowledge. Although human-curated KG-based conversational datasets exist, they struggle to keep pace with the rapidly changing user information needs. We present ConvKGYarn, a scalable method for generating up-to-date and configurable conversational KGQA datasets. Qualitative psychometric analyses confirm our method can generate high-quality datasets rivaling a popular conversational KGQA dataset while offering it at scale and covering a wide range of human-interaction configurations. We showcase its utility by testing LLMs on diverse conversations - exploring model behavior on conversational KGQA sets with different configurations grounded in the same KG fact set. Our results highlight the ability of ConvKGYarn to improve KGQA foundations and evaluate parametric knowledge of LLMs, thus offering a robust solution to the constantly evolving landscape of conversational assistants.
Published: 2024

19. Enhancing Computational Efficiency in Intensive Domains via Redundant Residue Number Systems

Author: Mousavi, Soudabeh, Rahmati, Dara, Gorgin, Saeid, and Lee, Jeong-A
Subjects: Computer Science - Hardware Architecture, Computer Science - Artificial Intelligence
Abstract: In computation-intensive domains such as digital signal processing, encryption, and neural networks, the performance of arithmetic units, including adders and multipliers, is pivotal. Conventional numerical systems often fall short of meeting the efficiency requirements of these applications concerning area, time, and power consumption. Innovative approaches like residue number systems (RNS) and redundant number systems have been introduced to surmount this challenge, markedly elevating computational efficiency. This paper examines from multiple perspectives how the fusion of redundant number systems with RNS (termed R-RNS) can diminish latency and enhance circuit implementation, yielding substantial benefits in practical scenarios. We conduct a comparative analysis of four systems - RNS, redundant number system, Binary Number System (BNS), and Signed-Digit Redundant Residue Number System (SD-RNS)-and appraise SD-RNS through an advanced Deep Neural Network (DNN) utilizing the CIFAR-10 dataset. Our findings are encouraging, demonstrating that SD-RNS attains computational speedups of 1.27 times and 2.25 times over RNS and BNS, respectively, and reduces energy consumption by 60% compared to BNS during sequential addition and multiplication tasks., Comment: This paper has been accepted by the 21st International SoC Conference (ISOCC), 2024, 2 pages
Published: 2024

20. Educational Customization by Homogenous Grouping of e-Learners based on their Learning Styles

Author: amiri, Mohammadreza, montazer, GholamAli, and Mousavi, Ebrahim
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence
Abstract: The E-learning environment offers greater flexibility compared to face-to-face interactions, allowing for adapting educational content to meet learners' individual needs and abilities through personalization and customization of e-content and the educational process. Despite the advantages of this approach, customizing the learning environment can reduce the costs of tutoring systems for similar learners by utilizing the same content and process for co-like learning groups. Various indicators for grouping learners exist, but many of them are conceptual, uncertain, and subject to change over time. In this article, we propose using the Felder-Silverman model, which is based on learning styles, to group similar learners. Additionally, we model the behaviors and actions of e-learners in a network environment using Fuzzy Set Theory (FST). After identifying the learning styles of the learners, co-like learning groups are formed, and each group receives adaptive content based on their preferences, needs, talents, and abilities. By comparing the results of the experimental and control groups, we determine the effectiveness of the proposed grouping method. In terms of "educational success," the weighted average score of the experimental group is 17.65 out of 20, while the control group achieves a score of 12.6 out of 20. Furthermore, the "educational satisfaction" of the experimental group is 67%, whereas the control group's satisfaction level is 37%.
Published: 2024

21. Evaluating fracture energy predictions using phase-field and gradient-enhanced damage models for elastomers

Author: Mousavi, S. Mohammad, Ang, Ida, Mulderrig, Jason, and Bouklas, Nikolaos
Subjects: Condensed Matter - Materials Science, Condensed Matter - Soft Condensed Matter
Abstract: Recently, the phase field method has been increasingly used for brittle fractures in soft materials like polymers, elastomers, and biological tissues. When considering finite deformations to account for the highly deformable nature of soft materials, the convergence of the phase-field method becomes challenging, especially in scenarios of unstable crack growth. To overcome these numerical difficulties, several approaches have been introduced, with artificial viscosity being among the most widely utilized. This study investigates the energy release rate due to crack propagation in hyperelastic nearly-incompressible materials and compares the phase-field method and a novel gradient-enhanced damage (GED) approach. First, we simulate unstable loading scenarios using the phase-field method, which leads to convergence problems. To address these issues, we introduce artificial viscosity to stabilize the problem and analyze its impact on the energy release rate utilizing a domain J-integral approach giving quantitative measurements during crack propagation. It is observed that the measured energy released rate during crack propagation does not comply with the imposed critical energy release rate, and shows non-monotonic behavior. In the second part of the paper, we introduce a novel stretch-based GED model as an alternative to the phase-field method for modeling crack evolution in elastomers. It is demonstrated that in this method, the energy release rate can be obtained as an output of the simulation rather than as an input which could be useful in the exploration of rate-dependent responses, as one could directly impose chain-level criteria for damage initiation. We show that while this novel approach provides reasonable results for fracture simulations, it still suffers from some numerical issues that strain-based GED formulations are known to be susceptible to.
Published: 2024

22. High Performance MoS2 Phototransistors Photogated by PN Junction

Author: Khaleghi, Seyed Saleh Mousavi, Wei, Jianyong, Liu, Yumeng, Fan, Zhengfang, Li, Kai, Crozier, Kenneth B., and Dan, Yaping
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Physics - Applied Physics
Abstract: Photodetectors based on two-dimensional (2D) atomically thin semiconductors suffer from low light absorption, limiting their potential for practical applications. In this work, we demonstrate a high-performance MoS2 phototransistors by integrating few-layer MoS2 on a PN junction formed in a silicon (Si) substrate. The photovoltage created in the PN junction under light illumination electrically gates the MoS2 channel, creating a strong photoresponse in MoS2. We present an analytical model for the photoresponse of our device and show that it is in good agreement with measured experimental photocurrent in MoS2 and photovoltage in the Si PN junction. This device structure separates light absorption and electrical response functions, which provides us an opportunity to design new types of photodetectors. For example, incorporating ferroelectric materials into the gate structure can produce a negative capacitance that boosts gate voltage, enabling low power, high sensitivity phototransistor; this, combined with separating light absorption and electrical functions, enables advanced high-performance photodetectors., Comment: 21 pages, 4 figures
Published: 2024

23. ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

Author: Wang, Ziyu, Kanduri, Anil, Aqajari, Seyed Amir Hossein, Jafarlou, Salar, Mousavi, Sanaz R., Liljeberg, Pasi, Malik, Shaista, and Rahmani, Amir M.
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Machine Learning
Abstract: While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification risks using transparent machine learning models. We use SHapley Additive exPlanations (SHAP) analysis to identify and explain the key features contributing to re-identification risks. We conduct an empirical analysis of identity re-identification risks using ECG data from five diverse real-world datasets, encompassing 223 participants. By employing transparent machine learning models, we reveal the diversity among different ECG features in contributing towards re-identification of individuals with an accuracy of 0.76 for gender, 0.67 for age group, and 0.82 for participant ID re-identification. Our approach provides valuable insights for clinical experts and guides the development of effective privacy-preserving mechanisms. Further, our findings emphasize the necessity for robust privacy measures in real-world health applications and offer detailed, actionable insights for enhancing data anonymization techniques.
Published: 2024

24. SHS: Scorpion Hunting Strategy Swarm Algorithm

Author: Singh, Abhilash, Mousavi, Seyed Muhammad Hossein, and Gaurav, Kumar
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence
Abstract: We introduced the Scorpion Hunting Strategy (SHS), a novel population-based, nature-inspired optimisation algorithm. This algorithm draws inspiration from the hunting strategy of scorpions, which identify, locate, and capture their prey using the alpha and beta vibration operators. These operators control the SHS algorithm's exploitation and exploration abilities. To formulate an optimisation method, we mathematically simulate these dynamic events and behaviors. We evaluate the effectiveness of the SHS algorithm by employing 20 benchmark functions (including 10 conventional and 10 CEC2020 functions), using both qualitative and quantitative analyses. Through a comparative analysis with 12 state-of-the-art meta-heuristic algorithms, we demonstrate that the proposed SHS algorithm yields exceptionally promising results. These findings are further supported by statistically significant results obtained through the Wilcoxon rank sum test. Additionally, the ranking of SHS, as determined by the average rank derived from the Friedman test, positions it at the forefront when compared to other algorithms. Going beyond theoretical validation, we showcase the practical utility of the SHS algorithm by applying it to six distinct real-world optimisation tasks. These applications illustrate the algorithm's potential in addressing complex optimisation challenges. In summary, this work not only introduces the innovative SHS algorithm but also substantiates its effectiveness and versatility through rigorous benchmarking and real-world problem-solving scenarios.
Published: 2024

25. Calibrated Diverse Ensemble Entropy Minimization for Robust Test-Time Adaptation in Prostate Cancer Detection

Author: Gilany, Mahdi, Harmanani, Mohamed, Wilson, Paul, To, Minh Nguyen Nhat, Jamzad, Amoon, Fooladgar, Fahimeh, Wodlinger, Brian, Abolmaesumi, Purang, and Mousavi, Parvin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: High resolution micro-ultrasound has demonstrated promise in real-time prostate cancer detection, with deep learning becoming a prominent tool for learning complex tissue properties reflected on ultrasound. However, a significant roadblock to real-world deployment remains, which prior works often overlook: model performance suffers when applied to data from different clinical centers due to variations in data distribution. This distribution shift significantly impacts the model's robustness, posing major challenge to clinical deployment. Domain adaptation and specifically its test-time adaption (TTA) variant offer a promising solution to address this challenge. In a setting designed to reflect real-world conditions, we compare existing methods to state-of-the-art TTA approaches adopted for cancer detection, demonstrating the lack of robustness to distribution shifts in the former. We then propose Diverse Ensemble Entropy Minimization (DEnEM), questioning the effectiveness of current TTA methods on ultrasound data. We show that these methods, although outperforming baselines, are suboptimal due to relying on neural networks output probabilities, which could be uncalibrated, or relying on data augmentation, which is not straightforward to define on ultrasound data. Our results show a significant improvement of $5\%$ to $7\%$ in AUROC over the existing methods and $3\%$ to $5\%$ over TTA methods, demonstrating the advantage of DEnEM in addressing distribution shift. \keywords{Ultrasound Imaging \and Prostate Cancer \and Computer-aided Diagnosis \and Distribution Shift Robustness \and Test-time Adaptation.}
Published: 2024

26. PSO Fuzzy XGBoost Classifier Boosted with Neural Gas Features on EEG Signals in Emotion Recognition

Author: Mousavi, Seyed Muhammad Hossein
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Emotion recognition is the technology-driven process of identifying and categorizing human emotions from various data sources, such as facial expressions, voice patterns, body motion, and physiological signals, such as EEG. These physiological indicators, though rich in data, present challenges due to their complexity and variability, necessitating sophisticated feature selection and extraction methods. NGN, an unsupervised learning algorithm, effectively adapts to input spaces without predefined grid structures, improving feature extraction from physiological data. Furthermore, the incorporation of fuzzy logic enables the handling of fuzzy data by introducing reasoning that mimics human decision-making. The combination of PSO with XGBoost aids in optimizing model performance through efficient hyperparameter tuning and decision process optimization. This study explores the integration of Neural-Gas Network (NGN), XGBoost, Particle Swarm Optimization (PSO), and fuzzy logic to enhance emotion recognition using physiological signals. Our research addresses three critical questions concerning the improvement of XGBoost with PSO and fuzzy logic, NGN's effectiveness in feature selection, and the performance comparison of the PSO-fuzzy XGBoost classifier with standard benchmarks. Acquired results indicate that our methodologies enhance the accuracy of emotion recognition systems and outperform other feature selection techniques using the majority of classifiers, offering significant implications for both theoretical advancement and practical application in emotion recognition technology., Comment: PSO, Fuzzy, XGBoost, Neural Gas Network (NGN), Feature Selection, EEG Signals, Emotion Recognition
Published: 2024

27. ${\chi}^{(2)}$-Induced Artifact Overwhelming the Third-Order Signal in 2D Raman-THz Spectroscopy of Non-Centrosymmetric Materials

Author: Mousavi, Seyyed Jabbar, Biggs, Megan F., Johnson, Jeremy A., Hamm, Peter, and Shalit, Andrey
Subjects: Physics - Optics
Abstract: Through comprehensive data analysis, we demonstrate that a ${\chi}^{(2)}$-induced artifact, arising from imperfect balancing in the conventional electro-optic sampling (EOS) detection scheme, contributes significantly to the measured signal in 2D Raman-THz spectroscopy of non-centrosymmetric materials. The artifact is a product of two 1D responses, overwhelming the desired 2D response. We confirm that by analyzing the 2D Raman-THz response of a x-cut beta barium borate (BBO) crystal. We furthermore show that this artifact can be effectively suppressed by implementing a special detection scheme. We successfully isolate the desired third-order 2D Raman-THz response, revealing a distinct cross-peak feature, whose frequency position suggests the coupling between two crystal phonons.
Published: 2024

28. The Magic XRoom: A Flexible VR Platform for Controlled Emotion Elicitation and Recognition

Author: Mousavi, S. M. Hossein, Besenzoni, Matteo, Andreoletti, Davide, Peternier, Achille, and Giordano, Silvia
Subjects: Computer Science - Human-Computer Interaction
Abstract: Affective computing has recently gained popularity, especially in the field of human-computer interaction systems, where effectively evoking and detecting emotions is of paramount importance to enhance users experience. However, several issues are hindering progress in the field. In fact, the complexity of emotions makes it difficult to understand their triggers and control their elicitation. Additionally, effective emotion recognition requires analyzing multiple sensor data, such as facial expressions and physiological signals. These factors combined make it hard to collect high-quality datasets that can be used for research purposes (e.g., development of emotion recognition algorithms). Despite these challenges, Virtual Reality (VR) holds promise as a solution. By providing a controlled and immersive environment, VR enables the replication of real-world emotional experiences and facilitates the tracking of signals indicative of emotional states. However, controlling emotion elicitation remains a challenging task also within VR. This research paper introduces the Magic Xroom, a VR platform designed to enhance control over emotion elicitation by leveraging the theory of flow. This theory establishes a mapping between an individuals skill levels, task difficulty, and perceived emotions. In the Magic Xroom, the users skill level is continuously assessed, and task difficulty is adjusted accordingly to evoke specific emotions. Furthermore, user signals are collected using sensors, and virtual panels are utilized to determine the ground truth emotional states, making the Magic Xroom an ideal platform for collecting extensive datasets. The paper provides detailed implementation information, highlights the main properties of the Magic Xroom, and presents examples of virtual scenarios to illustrate its abilities and capabilities., Comment: Proceedings of the 25th International Conference on Mobile Human-Computer Interaction
Published: 2024
Full Text: View/download PDF

29. A Novel Framework for Automated Warehouse Layout Generation

Author: Shahroudnejad, Atefeh, Mousavi, Payam, Perepelytsia, Oleksii, Sahir, Staszak, David, Taylor, Matthew E., and Bawel, Brent
Subjects: Computer Science - Artificial Intelligence
Abstract: Optimizing warehouse layouts is crucial due to its significant impact on efficiency and productivity. We present an AI-driven framework for automated warehouse layout generation. This framework employs constrained beam search to derive optimal layouts within given spatial parameters, adhering to all functional requirements. The feasibility of the generated layouts is verified based on criteria such as item accessibility, required minimum clearances, and aisle connectivity. A scoring function is then used to evaluate the feasible layouts considering the number of storage locations, access points, and accessibility costs. We demonstrate our method's ability to produce feasible, optimal layouts for a variety of warehouse dimensions and shapes, diverse door placements, and interconnections. This approach, currently being prepared for deployment, will enable human designers to rapidly explore and confirm options, facilitating the selection of the most appropriate layout for their use-case.
Published: 2024

30. Laser-scanning of induction-melted Al alloys: are they representative of additively manufactured ones?

Author: Ge, Zhaoxuan, Calderon, Sebastian, and Taheri-Mousavi, S. Mohadeseh
Subjects: Physics - Applied Physics, Condensed Matter - Materials Science
Abstract: The bottleneck of alloy design for powder-based additive manufacturing (AM) resides in powder production-an expensive and time-consuming process hindering the rapid closed-loop design iterations. This study analyzed an expedited experimental workflow, i.e., multipath laser scanning of induction-melted samples, to mimic rapid solidification of AM to serve as an alternative approach to downselect from the design space. Using Al-Ni-Zr-Er model alloy, we compared the microstructural features between the laser-scanned sample and the laser powder bed fusion (LPBF) one. Our results showed that the microstructure morphology is the same for both samples. SEM (< x12000 magnification) shows differences in Zr and Er distribution justified by the repeated reheating for additional layers in the 3D-printed sample. Nonetheless, phase distribution was nearly the same at a high magnification scale under STEM (> x80000). Phase sizes were also compared -- the laser-scanned sample resembles 3D-printed with an average size difference of 15% in Al grain, 14% in Al-Ni-Er ternary precipitates, and 4% in L12 nanoprecipitates. Cooling rates for the two samples were estimated by Rosenthal equation. The higher rate of the 3D-printed sample compared with that of the laser-scanned sample explains its slightly finer phases. The mechanical properties of the two samples were evaluated by microhardness tests. The hardness of the laser-scanned sample was found to be 21% less than that of the 3D-printed sample. The potential reasons were discussed. The study also showed that a similar difference in hardness was observed when the experiments were repeated on a printable benchmark Al alloy showcasing only 1% absolute error. Thus, laser-scanned samples can serve as a predictor of the hardness of LPBF samples, and their highly similar microstructure at high magnification allows their applications in rapid screening tests., Comment: 16 pages report, 8 figures, 3 tables
Published: 2024

31. Enhancing Language Learning through Technology: Introducing a New English-Azerbaijani (Arabic Script) Parallel Corpus

Author: Khiarak, Jalil Nourmohammadi, Ahmadi, Ammar, Saeed, Taher Ak-bari, Asgari-Chenaghlu, Meysam, Atabay, Toğrul, Karimi, Mohammad Reza Baghban, Ceferli, Ismail, Hasanvand, Farzad, Mousavi, Seyed Mahboub, and Noshad, Morteza
Subjects: Computer Science - Computation and Language
Abstract: This paper introduces a pioneering English-Azerbaijani (Arabic Script) parallel corpus, designed to bridge the technological gap in language learning and machine translation (MT) for under-resourced languages. Consisting of 548,000 parallel sentences and approximately 9 million words per language, this dataset is derived from diverse sources such as news articles and holy texts, aiming to enhance natural language processing (NLP) applications and language education technology. This corpus marks a significant step forward in the realm of linguistic resources, particularly for Turkic languages, which have lagged in the neural machine translation (NMT) revolution. By presenting the first comprehensive case study for the English-Azerbaijani (Arabic Script) language pair, this work underscores the transformative potential of NMT in low-resource contexts. The development and utilization of this corpus not only facilitate the advancement of machine translation systems tailored for specific linguistic needs but also promote inclusive language learning through technology. The findings demonstrate the corpus's effectiveness in training deep learning MT systems and underscore its role as an essential asset for researchers and educators aiming to foster bilingual education and multilingual communication. This research covers the way for future explorations into NMT applications for languages lacking substantial digital resources, thereby enhancing global language education frameworks. The Python package of our code is available at https://pypi.org/project/chevir-kartalol/, and we also have a website accessible at https://translate.kartalol.com/., Comment: This paper is accepted and published at NeTTT 2024 Conf
Published: 2024

32. Additively manufacturable high-strength aluminum alloys with thermally stable microstructures enabled by hybrid machine learning-based design

Author: Taheri-Mousavi, S. Mohadeseh, Xu, Michael, Hengsbach, Florian, Houser, Clay, Ge, Zhaoxuan, Glaser, Benjamin, Wei, Shaolou, Schaper, Mikro, LeBeau, James M., Olson, Greg B., and Hart, A. John
Subjects: Condensed Matter - Materials Science
Abstract: Additively manufactured (AM) structural components with complex geometries and tailored properties at voxel-size resolution will lead to significant leap in performance in various critical engineering applications. However, at each voxel, we first need to be able to design the alloy efficiently and reliably. We demonstrate a hybrid approach combining calculation of phase diagram (CALPHAD)-based integrated computational materials engineering (ICME) with machine learning and inverse design techniques and performed a full alloy design cycle of a novel Al alloy (Al-Er-Zr-Y-Yb-Ni) for AM from virtual predictions to experimental validation. We designed this alloy to exhibit high tensile strength at room temperature through nanoscale L1$_2$-phase precipitation which stabilizes the microstructure to maintain strength after high-temperature aging. We initially exploit a fine distribution of metastable eutectic ternary phases through rapid solidification, which serve as the source for the reactive elements enabling nanoscale precipitation of a high phase fraction of the thermally stable L1$_2$ strengthening phases. The strength of the 3D-printed samples manufactured via laser powder bed fusion (LPBF) from the designed composition is comparable to that of wrought Al 7075, and after high-temperature (400$^\circ$C) aging is 50% stronger than the best benchmark printable Al alloy1. The stable strengthening strategy is applicable to a wide range of alloys and rapid solidification processes, and our hybrid ML/CALPHAD numerical framework can be used for the efficient and robust design of alloy microstructures and properties, expanding the capabilities of additive as well as traditional manufacturing.
Published: 2024

33. Mean-Field Langevin Dynamics for Signed Measures via a Bilevel Approach

Author: Wang, Guillaume, Mousavi-Hosseini, Alireza, and Chizat, Lénaïc
Subjects: Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: Mean-field Langevin dynamics (MLFD) is a class of interacting particle methods that tackle convex optimization over probability measures on a manifold, which are scalable, versatile, and enjoy computational guarantees. However, some important problems -- such as risk minimization for infinite width two-layer neural networks, or sparse deconvolution -- are originally defined over the set of signed, rather than probability, measures. In this paper, we investigate how to extend the MFLD framework to convex optimization problems over signed measures. Among two known reductions from signed to probability measures -- the lifting and the bilevel approaches -- we show that the bilevel reduction leads to stronger guarantees and faster rates (at the price of a higher per-iteration complexity). In particular, we investigate the convergence rate of MFLD applied to the bilevel reduction in the low-noise regime and obtain two results. First, this dynamics is amenable to an annealing schedule, adapted from Suzuki et al. (2023), that results in improved convergence rates to a fixed multiplicative accuracy. Second, we investigate the problem of learning a single neuron with the bilevel approach and obtain local exponential convergence rates that depend polynomially on the dimension and noise level (to compare with the exponential dependence that would result from prior analyses)., Comment: 57 pages, 1 figure
Published: 2024

34. DASB - Discrete Audio and Speech Benchmark

Author: Mousavi, Pooneh, Della Libera, Luca, Duret, Jarod, Ploujnikov, Artem, Subakan, Cem, and Ravanelli, Mirco
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Discrete audio tokens have recently gained considerable attention for their potential to connect audio and language processing, enabling the creation of modern multimodal large language models. Ideal audio tokens must effectively preserve phonetic and semantic content along with paralinguistic information, speaker identity, and other details. While several types of audio tokens have been recently proposed, identifying the optimal tokenizer for various tasks is challenging due to the inconsistent evaluation settings in existing studies. To address this gap, we release the Discrete Audio and Speech Benchmark (DASB), a comprehensive leaderboard for benchmarking discrete audio tokens across a wide range of discriminative tasks, including speech recognition, speaker identification and verification, emotion recognition, keyword spotting, and intent classification, as well as generative tasks such as speech enhancement, separation, and text-to-speech. Our results show that, on average, semantic tokens outperform compression tokens across most discriminative and generative tasks. However, the performance gap between semantic tokens and standard continuous representations remains substantial, highlighting the need for further research in this field., Comment: 9 pages, 5 tables
Published: 2024

35. How Should We Extract Discrete Audio Tokens from Self-Supervised Models?

Author: Mousavi, Pooneh, Duret, Jarod, Zaiem, Salah, Della Libera, Luca, Ploujnikov, Artem, Subakan, Cem, and Ravanelli, Mirco
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Discrete audio tokens have recently gained attention for their potential to bridge the gap between audio and language processing. Ideal audio tokens must preserve content, paralinguistic elements, speaker identity, and many other audio details. Current audio tokenization methods fall into two categories: Semantic tokens, acquired through quantization of Self-Supervised Learning (SSL) models, and Neural compression-based tokens (codecs). Although previous studies have benchmarked codec models to identify optimal configurations, the ideal setup for quantizing pretrained SSL models remains unclear. This paper explores the optimal configuration of semantic tokens across discriminative and generative tasks. We propose a scalable solution to train a universal vocoder across multiple SSL layers. Furthermore, an attention mechanism is employed to identify task-specific influential layers, enhancing the adaptability and performance of semantic tokens in diverse audio applications., Comment: 4 pages, 2 figures, 2 tables, Accepted at Interspeech 2024
Published: 2024

36. Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue

Author: Alghisi, Simone, Rizzoli, Massimo, Roccabruna, Gabriel, Mousavi, Seyed Mahed, and Riccardi, Giuseppe
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: We study the limitations of Large Language Models (LLMs) for the task of response generation in human-machine dialogue. Several techniques have been proposed in the literature for different dialogue types (e.g., Open-Domain). However, the evaluations of these techniques have been limited in terms of base LLMs, dialogue types and evaluation metrics. In this work, we extensively analyze different LLM adaptation techniques when applied to different dialogue types. We have selected two base LLMs, Llama-2 and Mistral, and four dialogue types Open-Domain, Knowledge-Grounded, Task-Oriented, and Question Answering. We evaluate the performance of in-context learning and fine-tuning techniques across datasets selected for each dialogue type. We assess the impact of incorporating external knowledge to ground the generation in both scenarios of Retrieval-Augmented Generation (RAG) and gold knowledge. We adopt consistent evaluation and explainability criteria for automatic metrics and human evaluation protocols. Our analysis shows that there is no universal best-technique for adapting large language models as the efficacy of each technique depends on both the base LLM and the specific type of dialogue. Last but not least, the assessment of the best adaptation technique should include human evaluation to avoid false expectations and outcomes derived from automatic metrics., Comment: Accepted at INLG 2024
Published: 2024

37. ProAct: Progressive Training for Hybrid Clipped Activation Function to Enhance Resilience of DNNs

Author: Mousavi, Seyedhamidreza, Ahmadilivani, Mohammad Hasan, Raik, Jaan, Jenihhin, Maksim, and Daneshtalab, Masoud
Subjects: Computer Science - Machine Learning
Abstract: Deep Neural Networks (DNNs) are extensively employed in safety-critical applications where ensuring hardware reliability is a primary concern. To enhance the reliability of DNNs against hardware faults, activation restriction techniques significantly mitigate the fault effects at the DNN structure level, irrespective of accelerator architectures. State-of-the-art methods offer either neuron-wise or layer-wise clipping activation functions. They attempt to determine optimal clipping thresholds using heuristic and learning-based approaches. Layer-wise clipped activation functions cannot preserve DNNs resilience at high bit error rates. On the other hand, neuron-wise clipping activation functions introduce considerable memory overhead due to the addition of parameters, which increases their vulnerability to faults. Moreover, the heuristic-based optimization approach demands numerous fault injections during the search process, resulting in time-consuming threshold identification. On the other hand, learning-based techniques that train thresholds for entire layers concurrently often yield sub-optimal results. In this work, first, we demonstrate that it is not essential to incorporate neuron-wise activation functions throughout all layers in DNNs. Then, we propose a hybrid clipped activation function that integrates neuron-wise and layer-wise methods that apply neuron-wise clipping only in the last layer of DNNs. Additionally, to attain optimal thresholds in the clipping activation function, we introduce ProAct, a progressive training methodology. This approach iteratively trains the thresholds on a layer-by-layer basis, aiming to obtain optimal threshold values in each layer separately.
Published: 2024

38. LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs

Author: Davoodi, Arash Gholami, Davoudi, Seyed Pouyan Mousavi, and Pezeshkpour, Pouya
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large language models (LLMs) demonstrate impressive capabilities in mathematical reasoning. However, despite these achievements, current evaluations are mostly limited to specific mathematical topics, and it remains unclear whether LLMs are genuinely engaging in reasoning. To address these gaps, we present the Mathematical Topics Tree (MaTT) benchmark, a challenging and structured benchmark that offers 1,958 questions across a wide array of mathematical subjects, each paired with a detailed hierarchical chain of topics. Upon assessing different LLMs using the MaTT benchmark, we find that the most advanced model, GPT-4, achieved a mere 54\% accuracy in a multiple-choice scenario. Interestingly, even when employing Chain-of-Thought prompting, we observe mostly no notable improvement. Moreover, LLMs accuracy dramatically reduced by up to 24.2 percentage point when the questions were presented without providing choices. Further detailed analysis of the LLMs' performance across a range of topics showed significant discrepancy even for closely related subtopics within the same general mathematical area. In an effort to pinpoint the reasons behind LLMs performances, we conducted a manual evaluation of the completeness and correctness of the explanations generated by GPT-4 when choices were available. Surprisingly, we find that in only 53.3\% of the instances where the model provided a correct answer, the accompanying explanations were deemed complete and accurate, i.e., the model engaged in genuine reasoning.
Published: 2024

39. Time Sensitive Knowledge Editing through Efficient Finetuning

Author: Ge, Xiou, Mousavi, Ali, Grave, Edouard, Joulin, Armand, Qian, Kun, Han, Benjamin, Arefiyan, Mostafa, and Li, Yunyao
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large Language Models (LLMs) have demonstrated impressive capability in different tasks and are bringing transformative changes to many domains. However, keeping the knowledge in LLMs up-to-date remains a challenge once pretraining is complete. It is thus essential to design effective methods to both update obsolete knowledge and induce new knowledge into LLMs. Existing locate-and-edit knowledge editing (KE) method suffers from two limitations. First, the post-edit LLMs by such methods generally have poor capability in answering complex queries that require multi-hop reasoning. Second, the long run-time of such locate-and-edit methods to perform knowledge edits make it infeasible for large scale KE in practice. In this paper, we explore Parameter-Efficient Fine-Tuning (PEFT) techniques as an alternative for KE. We curate a more comprehensive temporal KE dataset with both knowledge update and knowledge injection examples for KE performance benchmarking. We further probe the effect of fine-tuning on a range of layers in an LLM for the multi-hop QA task. We find that PEFT performs better than locate-and-edit techniques for time-sensitive knowledge edits., Comment: ACL 2024 main
Published: 2024

40. Comparison of the Effect of Omega-3 vs. MCT Supplementation on Iron-Related Indices in Patients Undergoing Dialysis

Author: Alami, Farkhondeh, Mousavi Shalmani, Seyedeh Hayedeh, Mahmoudi, Zahra, Nooriani, Narjes, Mousavi, Zahra, Amjadi, Arezoo, Masoumvand, Mohammad, Mohajerani, Malikeh, Abbasi Mobarakeh, Khadijeh, Harsini, Asma Rajabi, Shafaei, Hanieh, Omidi, Saeed, Khoshdooz, Sara, Doaei, Saeid, and Khosravi, Masoud
Published: 2024
Full Text: View/download PDF

41. Epidemiological Aspects and Pattern of Intoxication among Elderly in Khorasan-Razavi; Northeast of Iran

Author: Nemati, Ahmad, Dadpour, Bita, Etemad, Leila, Mousavi, Seyed Reza, Alizadeh Ghomsari, Anahita, Mousavi, Seyed Hadi, Ghasemi-Toosi, Alireza, Kimiafar, Khalil, Ataee, Zahra, Vahabzadeh, Maryam, Zarifkia, Shiva, Khoshbakht, Reza, Khoshrou, Alireza, Salmani Izadi, Hanie, and Moshiri, Mohammad
Published: 2024
Full Text: View/download PDF

42. Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts

Author: Mousavi, S. Mostafa, Stogaitis, Marc, Gadh, Tajinder, Allen, Richard M, Barski, Alexei, Bosch, Robert, Robertson, Patrick, Thiruverahan, Nivetha, Cho, Youngmin, and Raj, Aman
Subjects: Physics - Geophysics, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Physics - Applied Physics
Abstract: This paper presents a novel approach to extract scientifically valuable information about Earth's physical phenomena from unconventional sources, such as multi-modal social media posts. Employing a state-of-the-art large language model (LLM), Gemini 1.5 Pro (Reid et al. 2024), we estimate earthquake ground shaking intensity from these unstructured posts. The model's output, in the form of Modified Mercalli Intensity (MMI) values, aligns well with independent observational data. Furthermore, our results suggest that LLMs, trained on vast internet data, may have developed a unique understanding of physical phenomena. Specifically, Google's Gemini models demonstrate a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, accurately describing observational data even though it's not identical to established models. These findings raise intriguing questions about the extent to which Gemini's training has led to a broader understanding of the physical world and its phenomena. The ability of Generative AI models like Gemini to generate results consistent with established scientific knowledge highlights their potential to augment our understanding of complex physical phenomena like earthquakes. The flexible and effective approach proposed in this study holds immense potential for enriching our understanding of the impact of physical phenomena and improving resilience during natural disasters. This research is a significant step toward harnessing the power of social media and AI for natural disaster mitigation, opening new avenues for understanding the emerging capabilities of Generative AI and LLMs for scientific applications.
Published: 2024

43. A Separation in Heavy-Tailed Sampling: Gaussian vs. Stable Oracles for Proximal Samplers

Author: He, Ye, Mousavi-Hosseini, Alireza, Balasubramanian, Krishnakumar, and Erdogdu, Murat A.
Subjects: Mathematics - Statistics Theory, Statistics - Machine Learning
Abstract: We study the complexity of heavy-tailed sampling and present a separation result in terms of obtaining high-accuracy versus low-accuracy guarantees i.e., samplers that require only $O(\log(1/\varepsilon))$ versus $\Omega(\text{poly}(1/\varepsilon))$ iterations to output a sample which is $\varepsilon$-close to the target in $\chi^2$-divergence. Our results are presented for proximal samplers that are based on Gaussian versus stable oracles. We show that proximal samplers based on the Gaussian oracle have a fundamental barrier in that they necessarily achieve only low-accuracy guarantees when sampling from a class of heavy-tailed targets. In contrast, proximal samplers based on the stable oracle exhibit high-accuracy guarantees, thereby overcoming the aforementioned limitation. We also prove lower bounds for samplers under the stable oracle and show that our upper bounds cannot be fundamentally improved.
Published: 2024

44. Analytical photoresponses of Schottky contact MoS2 phototransistors

Author: Wei, Jianyong, Liu, Yumeng, Wang, Yizhuo, Li, Kai, Lian, Zhentao, Xie, Maosong, Yang, Xinhan, Khaleghi, Seyed Saleh Mousavi, Dai, Fuxing, Hu, Weida, Gao, Xuejiao, Yang, Rui, and Dan, Yaping
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Materials Science
Abstract: High-gain photodetectors based on two-dimensional (2D) semiconductors, in particular those in photoconductive mode, have been extensively investigated in the past decade. However, the classical photoconductive theory was derived on two misplaced assumptions. In this work, we established an explicit analytical device model for Schottky contact MoS2 phototransistors that fits well with experimental data. From the fitting results, we found that the Richardson constant of the MoS2 Schottky contact is temperature dependent, indicating that the Schottky contacts for the 2D material is best described by the mixed thermionic emission and diffusion model. Based on this device model, we further established an analytical photoresponse for the few-layer MoS2 phototransistors, from which we found the voltage distribution on the two Schottky contacts and the channel, and extracted the minority carrier recombination lifetimes. The lifetimes are comparable with the values found from transient photoluminescence measurements, which therefore validates our analytical photoresponses for Schottky contact 2D semiconducting phototransistors., Comment: 15 pages, 6 figures
Published: 2024

45. Cost-Effective Fault Tolerance for CNNs Using Parameter Vulnerability Based Hardening and Pruning

Author: Ahmadilivani, Mohammad Hasan, Mousavi, Seyedhamidreza, Raik, Jaan, Daneshtalab, Masoud, and Jenihhin, Maksim
Subjects: Computer Science - Machine Learning
Abstract: Convolutional Neural Networks (CNNs) have become integral in safety-critical applications, thus raising concerns about their fault tolerance. Conventional hardware-dependent fault tolerance methods, such as Triple Modular Redundancy (TMR), are computationally expensive, imposing a remarkable overhead on CNNs. Whereas fault tolerance techniques can be applied either at the hardware level or at the model levels, the latter provides more flexibility without sacrificing generality. This paper introduces a model-level hardening approach for CNNs by integrating error correction directly into the neural networks. The approach is hardware-agnostic and does not require any changes to the underlying accelerator device. Analyzing the vulnerability of parameters enables the duplication of selective filters/neurons so that their output channels are effectively corrected with an efficient and robust correction layer. The proposed method demonstrates fault resilience nearly equivalent to TMR-based correction but with significantly reduced overhead. Nevertheless, there exists an inherent overhead to the baseline CNNs. To tackle this issue, a cost-effective parameter vulnerability based pruning technique is proposed that outperforms the conventional pruning method, yielding smaller networks with a negligible accuracy loss. Remarkably, the hardened pruned CNNs perform up to 24\% faster than the hardened un-pruned ones., Comment: 7 pages, 7 figures, 2 tables, 32 references, the paper is accepted at IOLTS 2024
Published: 2024

46. Short term vs. long term: optimization of microswimmer navigation on different time horizons

Author: Mousavi, Navid, Qiu, Jingran, Zhao, Lihao, Mehlig, Bernhard, and Gustavsson, Kristian
Subjects: Physics - Fluid Dynamics
Abstract: We use reinforcement learning to find strategies that allow microswimmers in turbulence to avoid regions of large strain. This question is motivated by the hypothesis that swimming microorganisms tend to avoid such regions to minimise the risk of predation. We ask which local cues a microswimmer must measure to efficiently avoid such straining regions. We find that it can succeed without directional information, merely by measuring the magnitude of the local strain. However, the swimmer avoids straining regions more efficiently if it can measure the sign of local strain gradients. We compare our results with those of an earlier study [Mousavi et al. arxiv:2309.09641] where a short-time expansion was used to find optimal strategies. We find that the short-time strategies work well in some cases but not in others. We derive a new theory that explains when the time-horizon matters for our optimisation problem, and when it does not. We find the strategy with best performance when the time-horizon coincides with the correlation time of the turbulent fluctuations. We also explain how the update frequency (the frequency at which the swimmer updates its state) affects the found strategies. We find that higher update frequencies yield better performance, as long as the time between updates is smaller than the correlation time of the flow., Comment: 18 pages, 5 figures
Published: 2024

47. Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model

Author: Rahmanian, Mojdeh, Fakhrahmad, Seyed Mostafa, and Mousavi, Seyedeh Zahra
Subjects: Computer Science - Computation and Language, I.7
Abstract: Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have emerged as a solution with a recent focus on transformer models. In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task from unstructured medical notes collected in the EHR. Methods: To process the medical records, we selected the most related sentences of the records to the eligibility criteria needed for the trial. The SNOMED CT concepts related to each eligibility criterion were collected. Medical records were also annotated with MedCAT based on the SNOMED CT ontology. Annotated sentences including concepts matched with the criteria-relevant terms were extracted. A prompt-based large language model (Generative Pre-trained Transformer (GPT) in this study) was then used with the extracted sentences as the training set. To assess its effectiveness, we evaluated the model's performance using the dataset from the 2018 n2c2 challenge, which aimed to classify medical records of 311 patients based on 13 eligibility criteria through NLP techniques. Results: Our proposed model showed the overall micro and macro F measures of 0.9061 and 0.8060 which were among the highest scores achieved by the experiments performed with this dataset. Conclusion: The application of a prompt-based large language model in this study to classify patients based on eligibility criteria received promising scores. Besides, we proposed a method of extractive summarization with the aid of SNOMED CT ontology that can be also applied to other medical texts.
Published: 2024

48. A Configurable Pythonic Data Center Model for Sustainable Cooling and ML Integration

Author: Naug, Avisek, Guillen, Antonio, Gutierrez, Ricardo Luna, Gundecha, Vineet, Ghorbanpour, Sahand, Mousavi, Sajad, Babu, Ashwin Ramesh, and Sarkar, Soumyendu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Systems and Control
Abstract: There have been growing discussions on estimating and subsequently reducing the operational carbon footprint of enterprise data centers. The design and intelligent control for data centers have an important impact on data center carbon footprint. In this paper, we showcase PyDCM, a Python library that enables extremely fast prototyping of data center design and applies reinforcement learning-enabled control with the purpose of evaluating key sustainability metrics including carbon footprint, energy consumption, and observing temperature hotspots. We demonstrate these capabilities of PyDCM and compare them to existing works in EnergyPlus for modeling data centers. PyDCM can also be used as a standalone Gymnasium environment for demonstrating sustainability-focused data center control., Comment: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning https://www.climatechange.ai/papers/neurips2023/15. arXiv admin note: substantial text overlap with arXiv:2310.03906
Published: 2024

49. Bayesian Inference for Estimating Heat Sources through Temperature Assimilation

Author: Mousavi, Hanieh and Eldredge, Jeff D.
Subjects: Statistics - Applications, Mathematics - Statistics Theory
Abstract: This paper introduces a Bayesian inference framework for two-dimensional steady-state heat conduction, focusing on the estimation of unknown distributed heat sources in a thermally-conducting medium with uniform conductivity. The goal is to infer heater locations, strengths, and shapes using temperature assimilation in the Euclidean space, employing a Fourier series to represent each heater's shape. The Markov Chain Monte Carlo (MCMC) method, incorporating the random-walk Metropolis-Hasting algorithm and parallel tempering, is utilized for posterior distribution exploration in both unbounded and wall-bounded domains. Strong correlations between heat strength and heater area prompt caution against simultaneously estimating these two quantities. It is found that multiple solutions arise in cases where the number of temperature sensors is less than the number of unknown states. Moreover, smaller heaters introduce greater uncertainty in estimated strength. The diffusive nature of heat conduction smooths out any deformations in the temperature contours, especially in the presence of multiple heaters positioned near each other, impacting convergence. In wall-bounded domains with Neumann boundary conditions, the inference of heater parameters tends to be more accurate than in unbounded domains.
Published: 2024

50. Sustainability of Data Center Digital Twins with Reinforcement Learning

Author: Sarkar, Soumyendu, Naug, Avisek, Guillen, Antonio, Luna, Ricardo, Gundecha, Vineet, Babu, Ashwin Ramesh, and Mousavi, Sajad
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multiagent Systems, Electrical Engineering and Systems Science - Systems and Control
Abstract: The rapid growth of machine learning (ML) has led to an increased demand for computational power, resulting in larger data centers (DCs) and higher energy consumption. To address this issue and reduce carbon emissions, intelligent design and control of DC components such as IT servers, cabinets, HVAC cooling, flexible load shifting, and battery energy storage are essential. However, the complexity of designing and controlling them in tandem presents a significant challenge. While some individual components like CFD-based design and Reinforcement Learning (RL) based HVAC control have been researched, there's a gap in the holistic design and optimization covering all elements simultaneously. To tackle this, we've developed DCRL-Green, a multi-agent RL environment that empowers the ML community to design data centers and research, develop, and refine RL controllers for carbon footprint reduction in DCs. It is a flexible, modular, scalable, and configurable platform that can handle large High Performance Computing (HPC) clusters. Furthermore, in its default setup, DCRL-Green provides a benchmark for evaluating single as well as multi-agent RL algorithms. It easily allows users to subclass the default implementations and design their own control approaches, encouraging community development for sustainable data centers. Open Source Link: https://github.com/HewlettPackard/dc-rl, Comment: 2024 Proceedings of the AAAI Conference on Artificial Intelligence
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

67,322 results on '"Mousavi, A."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources