Author: "Goldstein, Louis" / Database: OpenAIRE - Searchworks@Jio Institute Digital Library Search Results

1. Deep Speech Synthesis from MRI-Based Articulatory Representations

Author: Wu, Peter, Li, Tingle, Lu, Yijing, Zhang, Yubin, Lian, Jiachen, Black, Alan W, Goldstein, Louis, Watanabe, Shinji, and Anumanchipalli, Gopala K.
Subjects: Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In this paper, we study articulatory synthesis, a speech synthesis method using human vocal tract information that offers a way to develop efficient, generalizable and interpretable synthesizers. While recent advances have enabled intelligible articulatory synthesis using electromagnetic articulography (EMA), these methods lack critical articulatory information like excitation and nasality, limiting generalization capabilities. To bridge this gap, we propose an alternative MRI-based feature set that covers a much more extensive articulatory space than EMA. We also introduce normalization and denoising procedures to enhance the generalizability of deep learning methods trained on MRI data. Moreover, we propose an MRI-to-speech model that improves both computational efficiency and speech fidelity. Finally, through a series of ablations, we show that the proposed MRI representation is more comprehensive than EMA and identify the most suitable MRI feature subset for articulatory synthesis.
Published: 2023
Full Text: View/download PDF

2. Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization

Author: Lian, Jiachen, Black, Alan W, Lu, Yijing, Goldstein, Louis, Watanabe, Shinji, and Anumanchipalli, Gopala K.
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Articulatory representation learning is the fundamental research in modeling neural speech production system. Our previous work has established a deep paradigm to decompose the articulatory kinematics data into gestures, which explicitly model the phonological and linguistic structure encoded with human speech production mechanism, and corresponding gestural scores. We continue with this line of work by raising two concerns: (1) The articulators are entangled together in the original algorithm such that some of the articulators do not leverage effective moving patterns, which limits the interpretability of both gestures and gestural scores; (2) The EMA data is sparsely sampled from articulators, which limits the intelligibility of learned representations. In this work, we propose a novel articulatory representation decomposition algorithm that takes the advantage of guided factor analysis to derive the articulatory-specific factors and factor scores. A neural convolutive matrix factorization algorithm is then employed on the factor scores to derive the new gestures and gestural scores. We experiment with the rtMRI corpus that captures the fine-grained vocal tract contours. Both subjective and objective evaluation results suggest that the newly proposed system delivers the articulatory representations that are intelligible, generalizable, efficient and interpretable., Accepted to 2023 ICASSP. Camera Ready
Published: 2022

3. Complexity patterns underlying speech production activity

Author: Lancia, Leonardo, Li, Jinyu, Goldstein, Louis, LPP - Laboratoire de Phonétique et Phonologie - UMR 7018 (LPP), Université Sorbonne Nouvelle - Paris 3-Centre National de la Recherche Scientifique (CNRS), and University of Southern California (USC)
Subjects: [SCCO.LING]Cognitive science/Linguistics, ComputingMilieux_MISCELLANEOUS
Abstract: International audience
Published: 2020

4. Derivation of Fitts' law from the Task Dynamics model of speech production

Author: Sorensen, Tanner, Lammert, Adam, Goldstein, Louis, and Narayanan, Shrikanth
Subjects: Computer Science::Sound, Quantitative Biology - Neurons and Cognition, FOS: Biological sciences, Neurons and Cognition (q-bio.NC), Computer Science::Human-Computer Interaction
Abstract: Fitts' law is a linear equation relating movement time to an index of movement difficulty. The recent finding that Fitts' law applies to voluntary movement of the vocal tract raises the question of whether the theory of speech production implies Fitts' law. The present letter establishes a theoretical connection between Fitts' law and the Task Dynamics model of speech production. We derive a variant of Fitts' law where the intercept and slope are functions of the parameters of the Task Dynamics model and the index of difficulty is a product logarithm, or Lambert W function, rather than a logarithm., Comment: version before journal submission; 5 pages, 2 figures
Published: 2020
Full Text: View/download PDF

5. Noggin Nodding: Head Movement Correlates With Increased Effort in Accelerating Speech Production Tasks

Author: Tiede, Mark, Mooshammer, Christine, and Goldstein, Louis
Subjects: head movement, ddc:150, speech production, speech errors, EMA, articulatory entrainment, 150 Psychologie
Abstract: Movements of the head and speech articulators have been observed in tandem during an alternating word pair production task driven by an accelerating rate metronome. Word pairs contrasted either onset or coda dissimilarity with same word controls. Results show that as production effort increased, so did speaker head nodding, and that nodding increased abruptly following errors. More errors occurred under faster production rates, and in coda rather than onset alternations. The greatest entrainment between head and articulators was observed at the fastest rate under coda alternation. Neither jaw coupling nor imposed prosodic stress was observed to be a primary driver of head movement. In alternating pairs, nodding frequency tracked the slower alternation rate rather than the syllable rate, interpreted as recruitment of additional degrees of freedom to stabilize the alternation pattern under increasing production rate pressure.
Published: 2019
Full Text: View/download PDF

6. Temporal patterning in speech and birdsong

Author: Goldstein, Louis
Abstract: Speech and birdsong are complex motor behaviors in which patterning over time is itself informational. This is obvious in the case of speech, but in birdsong, too, the sequencing (and possibly timing) of syllables determines in part the well-formedness of the song. Despite gross differences in function, in the physical substrate (method of sound production), in brain structure, and in the scale of the animals, recent work has revealed a surprising degree of similarity in their solutions to the problem of controlling temporal patterning. There are differences, too, of course, and when we find them, it deepens our understanding about the (unique) structure of speech. Because Steve has had a lasting interest in birds and birdsong (Anderson 2006), this seemed to be an appropriate context to review these similarities. Two of them will be the focus of discussion here: decomposition of the behavior into a sequence of discrete motor units and the role of an internal clock system, partly independent of the units themselves.
Published: 2017
Full Text: View/download PDF

7. Global and local interaction of consonant type and tone in the Korean Accentual Phrase

Author: Yoonjeong Lee and Goldstein, Louis
Published: 2016
Full Text: View/download PDF

8. Resolving lexical ambiguity using sub-phonemic duration cues: did you hear 'place kin' or 'play skin'?

Author: Yoonjeong Lee, Kaiser, Elsi, and Goldstein, Louis
Published: 2014
Full Text: View/download PDF

9. Asymmetries in speech errors and their implications for underspecification

Author: Pouplier, Marianne and Goldstein, Louis
Subjects: ddc:410, Versprecher, Sprachproduktion, Sprachliches Merkmal, Psycholinguistik
Abstract: This paper follows a new perspective on speech errors within the framework of Articulatory Phonology, as proposed by Goldstein et al. (in prep.). On the basis of kinematic evidence, their work has demonstrated that speech errors are not restricted to categorical exchanges of position of segmental units, but rather gestures that compose segments can exhibit errors that vary from zero to maximal in magnitude. Here we report results from two perceptual experiments which use stimuli selected on the basis of their articulatory properties only, covering a range of errorful gestural activations. The outcome of the perceptual experiments suggests that different segments show different degrees of vulnerability to (subsegmental) speech errors: While listeners detected errors reliably for some segments, for other segments the reaction to errorful and non-errorful tokens was not distinct. The data suggest that at least for some error types an asymmetric error distribution arises due to perception, while production itself is not asymmetric. However, for error types involving segments whose gestural compositions stand in a subset relationship to each other (as described below), asymmetries may indeed originate in production due to the overall dominance of a gestural intrusion bias observed in the production data of Goldstein et al. (in prep.).
Published: 2013

10. Biomechanically preferred consonant-vowel combinations fail to appear in adult lexicons and spoken corpora

Author: Whalen, Douglas, Giulivi, Sara, Nam, Hosung, Levitt, Andrea, Hallé, Pierre, Goldstein, Louis, Haskins Laboratories, City University of New York [New York] (CUNY), Alma Mater Studiorum Università di Bologna [Bologna] (UNIBO), Wellesley College, LPP - Laboratoire de Phonétique et Phonologie - UMR 7018 (LPP), Université Sorbonne Nouvelle - Paris 3-Centre National de la Recherche Scientifique (CNRS), and University of Southern California (USC)
Subjects: consonant-vowel combinations, speech development, [SHS.LANGUE]Humanities and Social Sciences/Linguistics, [SCCO.LING]Cognitive science/Linguistics, speech corpuses
Abstract: International audience; Certain consonant/vowel (CV) combinations are more frequent than would be expected from the individual C and V frequencies alone, both in babbling and, to a lesser extent, in adult language, based on dictionary counts: Labial consonants cooccur with central vowels more often than chance would dictate; coronals co-occur with front vowels, and velars with back vowels (Davis & MacNeilage, 1994). Plausible biomechanical explanations have been proposed, but it is also possible that infants are mirroring the frequency of the CVs that they hear. As noted, previous assessments of adult language were based on dictionaries; these "type" counts are incommensurate with the babbling measures, which are necessarily "token" counts. We analyzed the tokens in two spoken corpora for English, two for French and one for Mandarin. We found that the adult spoken CV preferences correlated with the type counts for Mandarin and French, not for English. Correlations between the adult spoken corpora and the babbling results had all three possible outcomes: significantly positive (French), uncorrelated (Mandarin), and significantly negative (English). There were no correlations of the dictionary data with the babbling results when we consider all nine combinations of consonants and vowels. The results indicate that spoken frequencies of CV combinations can differ from dictionary (type) counts and that the CV preferences apparent in babbling are biomechanically driven and can ignore the frequencies of CVs in the ambient spoken language.
Published: 2012

11. Coupled Oscillator Planning Model of Speech Timing and Syllable Structure

Author: Goldstein, Louis, Nam, Hosung, Saltzman, Elliot, Chitoran, Ioana, Haskins Laboratories, University of Southern California (USC), Boston University [Boston] (BU), Dartmouth College [Hanover], Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP (URP_3967)), Université de Paris (UP), Fant, C. Gunnar M., Fujisaki, Hiroya, Shen, Jiaxuan, Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP (EA_3967)), Université Paris Diderot - Paris 7 (UPD7), G. Fant, H. Fujisaki, and J. Shen
Subjects: [SCCO]Cognitive science, coupling graph, articulatory phonology, [SCCO.LING]Cognitive science/Linguistics, syllable structure
Abstract: Proceedings of the 8th Phonetic Conference of China and the International Symposium on Phonetic Frontiers in Phonetics and Speech Science; International audience; A fundamental problem in understanding speech production is how the temporal coherence of the speech units associated with a given lexical unit is maintained despite changes due to speaking rate, prosodic embedding, and transient perturbations. To address this, a dynamical model of temporal planning of speech has been developed [21, 13, 26, 27). In this model, each speech unit (constriction gesture) is associated with a planning oscillator, or clock, and the oscillators within the ensemble associated with a particular lexical item are coupled to one another in a pattern represented as a coupling graph. Given this model, it is possible to account for syllable structure in terms of intrinsic modes of coupling and the topology of the coupling graph. Onset consonant gestures are hypothesized to be coupled in-phase to the tautosyllabic vowel (regardless of how many there are in an onset), while coda consonant gestures are coupled in an anti-phase pattern. This topology can account simultaneously for regularities in relative timing and variability, and examples of this will be discussed. Despite successes obtained with the model, it is clear that there are examples in which the same syllable structure can exhibit different patterns of timing, depending, e.g., on the place of articulation of the consonants in the cluster [10], manner [7] and language [17]. In this paper, we will also illustrate how these different patterns of timing can be modeled using coupling graphs with differing topologies and/or with different quantitative specification of coupling strength associated with the graph's edges. .
Published: 2009

12. Syllable structure as coupled oscillator modes: Evidence from Georgian vs. Tashlhiyt Berber

Author: Goldstein, Louis, Chitoran, Ioana, Selkirk, Elisabeth, University of Southern California (USC), Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP (EA_3967)), Université Paris Diderot - Paris 7 (UPD7), Haskins Laboratories, Dartmouth College [Hanover], Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP (URP_3967)), Université de Paris (UP), University of Massachusetts [Amherst] (UMass Amherst), and University of Massachusetts System (UMASS)
Subjects: Tachelhit berber, [SCCO]Cognitive science, Georgian, syllabe structure, [SCCO.LING]Cognitive science/Linguistics, Tashlhiyt, syllable structure, ComputingMilieux_MISCELLANEOUS
Abstract: International audience
Published: 2007

13. Asymmetries in speech errors and their implications for underspecification

Author: Pouplier, Marianne and Goldstein, Louis
Subjects: Speech production, Distribution (number theory), Computer science, Speech recognition, Asymmetric distribution, Point (geometry), Single segment, Underspecification
Abstract: This paper follows a new perspective on speech errors within the framework of Articulatory Phonology, as proposed by Goldstein et al. (in prep.). On the basis of kinematic evidence, their work has demonstrated that speech errors are not restricted to categorical exchanges of position of segmental units, but rather gestures that compose segments can exhibit errors that vary from zero to maximal in magnitude. Here we report results from two perceptual experiments which use stimuli selected on the basis of their articulatory properties only, covering a range of errorful gestural activations. The outcome of the perceptual experiments suggests that different segments show different degrees of vulnerability to (subsegmental) speech errors: While listeners detected errors reliably for some segments, for other segments the reaction to errorful and non-errorful tokens was not distinct. The data suggest that at least for some error types an asymmetric error distribution arises due to perception, while production itself is not asymmetric. However, for error types involving segments whose gestural compositions stand in a subset relationship to each other (as described below), asymmetries may indeed originate in production due to the overall dominance of a gestural intrusion bias observed in the production data of Goldstein et al. (in prep.)., ZAS Papers in Linguistics, Bd. 28 (2002): Papers on phonetics and phonology: the articulation, acoustics and perception of consonants
Published: 2002
Full Text: View/download PDF

14. Gestural overlap and recoverability: articulatory evidence from Georgian

Author: Chitoran, Ioana, Goldstein, Louis, Byrd, Dani, Centre de Linguistique Inter-langues, de Lexicologie, de Linguistique Anglaise et de Corpus (CLILLAC-ARP (EA_3967)), Université Paris Diderot - Paris 7 (UPD7), and University of Southern California (USC)
Subjects: [SCCO]Cognitive science, [SCCO.LING]Cognitive science/Linguistics
Abstract: International audience; According to previous investigations of gestural patterning, consonant gestures exhibit less temporal overlap in a syllable/word onset than in a coda or across syllables. Additionally, front-to-back order of place of articulation in stop-stop sequences (labial- coronal, coronal-dorsal, labial-dorsal) exhibits more overlap than the opposite order. One possible account for these differences is that substantial overlap of obstruent gestures maythreaten their perceptual recoverability, particularly word/utterance-initially and in a back-to-front sequence. We report here on a magnetometer study of gestural overlap, investigating the role of perceptual recoverability. We focus on Georgian, which allows stop sequences in different positions in the word. C1C2 sequences were examined as a function of position in the word, and the order of place of articulation of C1 and C2. The predictions were borne out: more overlap was allowed in positions where recoverability of C1 is less easily compromised (word-internally and in front-to-back sequences). Similar recoverability requirements are proposed to account for consonant sequencing phenomena violating sonority. Georgian syllable onsets violate sonority, but are apparently sensitive to gestural recoverability requirements as reflected in overlap patterns. We propose thatsonority sequencing allows gestures to overlap while still allowing recoverability, but this function can apparently be filled in other ways.
Published: 2002

15. Opening Keynote Address Maryland State Comptroller

Author: Goldstein, Louis
Subjects: ComputingMilieux_GENERAL, ComputingMilieux_MANAGEMENTOFCOMPUTINGANDINFORMATIONSYSTEMS, InformationSystems_MODELSANDPRINCIPLES, ComputingMilieux_THECOMPUTINGPROFESSION, ComputingMilieux_COMPUTERSANDEDUCATION, otorhinolaryngologic diseases, population characteristics
Abstract: Keynote Speech on Maryland, agriculture, agricutural business in Maryland., American Association of Bovine Practitioners Proceedings of the Annual Conference, 1978
Published: 1978
Full Text: View/download PDF

16. WPP, No. 39: Three Studies in Speech Perception: Features, Relative Salience and Bias

Author: Goldstein, Louis
Published: 1977

17. Getting the rhythm right : A cross-linguistic study of segmental duration in babbling and first words

Author: Vihman, M.M., Nakai, S., DePaolis, R.A., Goldstein, Louis, Whalen, David, and Best, Catherine T.
Published: 2006

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Goldstein, Louis"'

1. Deep Speech Synthesis from MRI-Based Articulatory Representations

2. Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization

3. Complexity patterns underlying speech production activity

4. Derivation of Fitts' law from the Task Dynamics model of speech production

5. Noggin Nodding: Head Movement Correlates With Increased Effort in Accelerating Speech Production Tasks

6. Temporal patterning in speech and birdsong

7. Global and local interaction of consonant type and tone in the Korean Accentual Phrase

8. Resolving lexical ambiguity using sub-phonemic duration cues: did you hear 'place kin' or 'play skin'?

9. Asymmetries in speech errors and their implications for underspecification

10. Biomechanically preferred consonant-vowel combinations fail to appear in adult lexicons and spoken corpora

11. Coupled Oscillator Planning Model of Speech Timing and Syllable Structure

12. Syllable structure as coupled oscillator modes: Evidence from Georgian vs. Tashlhiyt Berber

13. Asymmetries in speech errors and their implications for underspecification

14. Gestural overlap and recoverability: articulatory evidence from Georgian

15. Opening Keynote Address Maryland State Comptroller

16. WPP, No. 39: Three Studies in Speech Perception: Features, Relative Salience and Bias

17. Getting the rhythm right : A cross-linguistic study of segmental duration in babbling and first words

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Database

Publisher

17 results on '"Goldstein, Louis"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources