Author: "ELYADERANI" / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"ELYADERANI"' showing total 4 results

Start Over Author "ELYADERANI" Publication Type Reports

4 results on '"ELYADERANI"'

1. Sequence-to-Sequence Multi-Modal Speech In-Painting

Author: Elyaderani, Mahsa Kadkhodaei and Shirani, Shahram
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Speech in-painting is the task of regenerating missing audio contents using reliable context information. Despite various recent studies in multi-modal perception of audio in-painting, there is still a need for an effective infusion of visual and auditory information in speech in-painting. In this paper, we introduce a novel sequence-to-sequence model that leverages the visual information to in-paint audio signals via an encoder-decoder architecture. The encoder plays the role of a lip-reader for facial recordings and the decoder takes both encoder outputs as well as the distorted audio spectrograms to restore the original speech. Our model outperforms an audio-only speech in-painting model and has comparable results with a recent multi-modal speech in-painter in terms of speech quality and intelligibility metrics for distortions of 300 ms to 1500 ms duration, which proves the effectiveness of the introduced multi-modality in speech in-painting.
Published: 2024

2. Robust Multi-Modal Speech In-Painting: A Sequence-to-Sequence Approach

Author: Elyaderani, Mahsa Kadkhodaei and Shirani, Shahram
Subjects: Computer Science - Multimedia, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The process of reconstructing missing parts of speech audio from context is called speech in-painting. Human perception of speech is inherently multi-modal, involving both audio and visual (AV) cues. In this paper, we introduce and study a sequence-to-sequence (seq2seq) speech in-painting model that incorporates AV features. Our approach extends AV speech in-painting techniques to scenarios where both audio and visual data may be jointly corrupted. To achieve this, we employ a multi-modal training paradigm that boosts the robustness of our model across various conditions involving acoustic and visual distortions. This makes our distortion-aware model a plausible solution for real-world challenging environments. We compare our method with existing transformer-based and recurrent neural network-based models, which attempt to reconstruct missing speech gaps ranging from a few milliseconds to over a second. Our experimental results demonstrate that our novel seq2seq architecture outperforms the state-of-the-art transformer solution by 38.8% in terms of enhancing speech quality and 7.14% in terms of improving speech intelligibility. We exploit a multi-task learning framework that simultaneously performs lip-reading (transcribing video components to text) while reconstructing missing parts of the associated speech.
Published: 2024

3. Improved Support Recovery Guarantees for the Group Lasso With Applications to Structural Health Monitoring

Author: Elyaderani, Mojtaba Kadkhodaie, Jain, Swayambhoo, Druce, Jeffrey, Gonella, Stefano, and Haupt, Jarvis
Subjects: Computer Science - Information Theory, Statistics - Machine Learning
Abstract: This paper considers the problem of estimating an unknown high dimensional signal from noisy linear measurements, {when} the signal is assumed to possess a \emph{group-sparse} structure in a {known,} fixed dictionary. We consider signals generated according to a natural probabilistic model, and establish new conditions under which the set of indices of the non-zero groups of the signal (called the group-level support) may be accurately estimated via the group Lasso. Our results strengthen existing coherence-based analyses that exhibit the well-known "square root" bottleneck, allowing for the number of recoverable nonzero groups to be nearly as large as the total number of groups. We also establish a sufficient recovery condition relating the number of nonzero groups and the signal to noise ratio (quantified in terms of the ratio of the squared Euclidean norms of nonzero groups and the variance of the random additive {measurement} noise), and validate this trend empirically. Finally, we examine the implications of our results in the context of a structural health monitoring application, where the group Lasso approach facilitates demixing of a propagating acoustic wavefield, acquired on the material surface by a scanning laser Doppler vibrometer, into antithetical components, one of which indicates the locations of internal material defects.
Published: 2017

4. Testing a Novel Self-Assembling Data Paradigm in the Context of IACT Data

Author: Weinstein, Amanda, Fortson, Lucy, Brantseg, Thomas, Rulten, Cameron, Lutz, Robyn, Haupt, Jarvis, Elyaderani, Mojtaba Kakhodaie, and Quinn, John
Subjects: Astrophysics - Instrumentation and Methods for Astrophysics, Astrophysics - High Energy Astrophysical Phenomena, High Energy Physics - Experiment, Physics - Data Analysis, Statistics and Probability
Abstract: The process of gathering and associating data from multiple sensors or sub-detectors due to a common physical event (the process of event-building) is used in many fields, including high-energy physics and $\gamma$-ray astronomy. Fault tolerance in event-building is a challenging problem that increases in difficulty with higher data throughput rates and increasing numbers of sub-detectors. We draw on biological self-assembly models in the development of a novel event-building paradigm that treats each packet of data from an individual sensor or sub-detector as if it were a molecule in solution. Just as molecules are capable of forming chemical bonds, "bonds" can be defined between data packets using metadata-based discriminants. A database -- which plays the role of a beaker of solution -- continually selects pairs of assemblies at random to test for bonds, which allows single packets and small assemblies to aggregate into larger assemblies. During this process higher-quality associations supersede spurious ones. The database thereby becomes fluid, dynamic, and self-correcting rather than static. We will describe tests of the self-assembly paradigm using our first fluid database prototype and data from the VERITAS $\gamma$-ray telescope., Comment: In Proceedings of the 34th International Cosmic Ray Conference (ICRC2015), The Hague, The Netherlands
Published: 2015

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

4 results on '"ELYADERANI"'

1. Sequence-to-Sequence Multi-Modal Speech In-Painting

2. Robust Multi-Modal Speech In-Painting: A Sequence-to-Sequence Approach

3. Improved Support Recovery Guarantees for the Group Lasso With Applications to Structural Health Monitoring

4. Testing a Novel Self-Assembling Data Paradigm in the Context of IACT Data

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

4 results on '"ELYADERANI"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources