67 results on '"Story BH"'
Search Results
2. The Effects of Remote Signal Transmission and Recording on Acoustical Measures of Simulated Essential Vocal Tremor: Considerations for Remote Treatment Research and Telepractice.
- Author
-
Lester-Smith RA, Jebaily CG, and Story BH
- Subjects
- Humans, Male, Female, Acoustics, Tremor diagnosis, Tremor therapy, Voice Quality, Speech Acoustics, Voice, Voice Disorders diagnosis, Voice Disorders therapy
- Abstract
Purpose: Studies on medical and behavioral interventions for essential vocal tremor (EVT) have shown inconsistent effects on acoustical and perceptual outcome measures across studies and across participants. Remote acoustical and perceptual assessments might facilitate studies with larger samples of participants and repeated measures that could clarify treatment effects and identify optimal treatment candidates. Furthermore, remote acoustical and perceptual assessment might allow clinicians to monitor clients' treatment responses and optimize treatment approaches during telepractice. Thus, the purpose of this study was to evaluate the accuracy of remote signal transmission and recording for acoustical and perceptual assessment of EVT., Method: Simulations of EVT were produced using a computational model and were recorded using local and remote procedures to represent client- and clinician-end recordings respectively. Acoustical analyses measured the extent and rate of fundamental frequency (f
o ) and intensity modulation to represent vocal tremor severity and the cepstral peak prominence (CPPS) to represent voice quality. The data were analyzed using repeated measures analysis of variance (ANOVA) with recording as the within-subjects factor and sex of the computational model as the between-subjects factor., Results: There was a significant main effect of recording on the rate of fo modulation and significant interactions of recording and sex for the extent of intensity modulation, rate of intensity modulation, and CPPS. Posthoc pairwise comparisons and analysis of effect size indicated that recording procedures had the largest effect on the extent of intensity modulation for male simulations, the rate of intensity modulation for male and female simulations, and the CPPS for male and female simulations. Despite having disabled all known software and computer audio enhancing options and having stable ethernet connections, there was inconsistent attenuation of signal amplitude in remote recordings that was most problematic for samples with a breathy voice quality but also affected samples with typical and pressed voice qualities., Conclusions: Acoustical measures that correlate to perception of vocal tremor and voice quality were altered by remote signal transmission and recording. In particular, signal transmission and recording in Zoom altered time-based estimates of intensity modulation and CPPS with male and female simulations of EVT and magnitude-based estimates of intensity modulation with male simulations of EVT. In contrast, signal transmission and recording in Zoom minimally altered time- and magnitude-based estimates of fo modulation with male and female simulations of EVT. Therefore, acoustical and perceptual assessments of EVT should be performed using audio recordings that are collected locally on the participant- or client-end, particularly when measuring modulation of intensity and CPP or estimating vocal tremor severity and voice quality. Development of procedures for collecting local audio recordings in remote settings may expand data collection for treatment research and enhance telepractice., Competing Interests: Declaration of Competing Interest None, (Copyright © 2021 The Voice Foundation. Published by Elsevier Inc. All rights reserved.)- Published
- 2024
- Full Text
- View/download PDF
3. The relation of velopharyngeal coupling area and vocal tract scaling to identification of stop-nasal cognates.
- Author
-
Story BH and Bunton K
- Subjects
- Adult, Female, Male, Humans, Child, Preschool, Acoustics, Language, Nose, Speech, Larynx
- Abstract
The purpose of this study was to determine whether the threshold of velopharyngeal (VP) coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English was different for speech produced by a model based on an adult male, an adult female, and a 4-year-old child. V1CV2 stimuli were generated with a speech production model that encodes phonetic segments as relative acoustic targets imposed on an underlying vocal tract and laryngeal structure that can be scaled according to sex and age. Each V1CV2 was synthesized with a set of VP coupling functions whose maximum area ranged from 0 to 0.1 cm2. Results showed that scaling the vocal tract and vocal folds had essentially no effect on the VP coupling area at which listener identification shifted from stop to nasal. The range of coupling areas at which the crossover occurred was 0.037-0.049 cm2 for the male model, 0.040-0.055 cm2 for the female model, and 0.039-0.052 cm2 for the 4-year-old child model, and overall mean was 0.044 cm2. Calculations of band limited peak nasalance indicated that 85% peak nasalance during the consonant was well aligned with listener responses., (© 2023 Acoustical Society of America.)
- Published
- 2023
- Full Text
- View/download PDF
4. Acoustical Theory of Vowel Modification Strategies in Belting.
- Author
-
Herbst CT, Story BH, and Meyer D
- Abstract
Various authors have argued that belting is to be produced by "speech-like" sounds, with the first and second supraglottic vocal tract resonances (f
R1 and fR2 ) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (fR1 ≈2fo ). It is not clear how both these concepts - (a) phonating with "speech-like," unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by fR1 - can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low fR1 , such as [i] or [u], might have to be modified considerably (by raising fR1 ) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (fo ) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at fo ≈0.5fR1 . In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation - considering the relative prevalence of vowels in American English - suggests that, historically, belting with fR1 ≈2fo was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that - on acoustical grounds - the pedagogical commandment for belting with unmodified, "speech-like" vowels can not always be fulfilled., (Copyright © 2023 The Voice Foundation. Published by Elsevier Inc. All rights reserved.)- Published
- 2023
- Full Text
- View/download PDF
5. Computer simulation of vocal tract resonance tuning strategies with respect to fundamental frequency and voice source spectral slope in singing.
- Author
-
Herbst CT and Story BH
- Subjects
- Male, Female, Humans, Computer Simulation, Sound, Vibration, Singing, Voice
- Abstract
A well-known concept of singing voice pedagogy is "formant tuning," where the lowest two vocal tract resonances ( f
R1 , fR2 ) are systematically tuned to harmonics of the laryngeal voice source to maximize the level of radiated sound. A comprehensive evaluation of this resonance tuning concept is still needed. Here, the effect of fR1 , fR2 variation was systematically evaluated in silico across the entire fundamental frequency range of classical singing for three voice source characteristics with spectral slopes of -6, -12, and -18 dB/octave. Respective vocal tract transfer functions were generated with a previously introduced low-dimensional computational model, and resultant radiated sound levels were expressed in dB(A). Two distinct strategies for optimized sound output emerged for low vs high voices. At low pitches, spectral slope was the predominant factor for sound level increase, and resonance tuning only had a marginal effect. In contrast, resonance tuning strategies became more prevalent and voice source strength played an increasingly marginal role as fundamental frequency increased to the upper limits of the soprano range. This suggests that different voice classes (e.g., low male vs high female) likely have fundamentally different strategies for optimizing sound output, which has fundamental implications for pedagogical practice.- Published
- 2022
- Full Text
- View/download PDF
6. Anatomic development of the upper airway during the first five years of life: A three-dimensional imaging study.
- Author
-
Chuang YJ, Hwang SJ, Buhr KA, Miller CA, Avey GD, Story BH, and Vorperian HK
- Subjects
- Adult, Anatomic Landmarks, Child, Child, Preschool, Cross-Sectional Studies, Female, Humans, Male, Pharynx diagnostic imaging, Imaging, Three-Dimensional methods, Larynx
- Abstract
Purpose: Normative data on the growth and development of the upper airway across the sexes is needed for the diagnosis and treatment of congenital and acquired respiratory anomalies and to gain insight on developmental changes in speech acoustics and disorders with craniofacial anomalies., Methods: The growth of the upper airway in children ages birth to 5 years, as compared to adults, was quantified using an imaging database with computed tomography studies from typically developing individuals. Methodological criteria for scan inclusion and airway measurements included: head position, histogram-based airway segmentation, anatomic landmark placement, and development of a semi-automatic centerline for data extraction. A comprehensive set of 2D and 3D supra- and sub-glottal measurements from the choanae to tracheal opening were obtained including: naso-oro-laryngo-pharynx subregion volume and length, each subregion's superior and inferior cross-sectional-area, and antero-posterior and transverse/width distances., Results: Growth of the upper airway during the first 5 years of life was more pronounced in the vertical and transverse/lateral dimensions than in the antero-posterior dimension. By age 5 years, females have larger pharyngeal measurement than males. Prepubertal sex-differences were identified in the subglottal region., Conclusions: Our findings demonstrate the importance of studying the growth of the upper airway in 3D. As the lumen length increases, its shape changes, becoming increasingly elliptical during the first 5 years of life. This study also emphasizes the importance of methodological considerations for both image acquisition and data extraction, as well as the use of consistent anatomic structures in defining pharyngeal regions., Competing Interests: The authors have declared that no competing interests exist.
- Published
- 2022
- Full Text
- View/download PDF
7. The relation of velopharyngeal coupling area to the identification of stop versus nasal consonants in North American English based on speech generated by acoustically driven vocal tract modulations.
- Author
-
Story BH and Bunton K
- Subjects
- Female, Humans, Male, North America, Phonetics, Speech Production Measurement, Speech, Speech Perception
- Abstract
The purpose of this study was to determine the threshold of velopharyngeal coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English, based on V
1 CV2 stimuli generated with a speech production model that encodes phonetic segments as relative acoustic targets. Each V1 CV2 was synthesized with a set of velopharyngeal coupling functions whose area ranged from 0 to 0.1 cm2 . Results show that consonants were identified by listeners as a stop when the coupling area was less than 0.035-0.057 cm2 , depending on place of articulation and final vowel. The smallest coupling area (0.035 cm2 ) at which the stop-to-nasal switch occurred was found for an alveolar consonant in the /ɑCi/ context, whereas the largest (0.057 cm2 ) was for a bilabial in /ɑCɑ/. For each stimulus, the balance of oral versus nasal acoustic energy was characterized by the peak nasalance during the consonant. Stimuli with peak nasalance below 40% were mostly identified by listeners as stops, whereas those above 40% were identified as nasals. This study was intended to be a precursor to further investigations using the same model but scaled to represent the developing speech production system of male and female talkers.- Published
- 2021
- Full Text
- View/download PDF
8. Identification of voiced stop consonants produced by acoustically driven vocal tract modulations.
- Author
-
Story BH and Bunton K
- Subjects
- Acoustics, Speech Acoustics, Phonetics, Voice
- Abstract
A recently developed speech production model, in which speech segments are specified by relative acoustic events called resonance deflection patterns, was used to generate speech signals that were presented to listeners in a perceptual test. The purpose was to determine the effect of variations of the magnitude and polarity of the third resonance deflection on identification of the consonant in a V
1 CV2 disyllable while the deflections of the first and second resonances were held constant. Result showed that listeners' identification changed from /d/ to /ɡ/ when the polarity of the third resonance deflection switched from positive to negative.- Published
- 2021
- Full Text
- View/download PDF
9. Apraxia of speech and the study of speech production impairments: Can we avoid further confusion? Reply to Romani (2021).
- Author
-
Mailend ML, Maas E, and Story BH
- Subjects
- Confusion complications, Female, Humans, Speech, Speech Disorders, Speech Production Measurement, Aphasia complications, Apraxias etiology
- Abstract
We agree with Cristina Romani (CR) about reducing confusion and agree that the issues raised in her commentary are central to the study of apraxia of speech (AOS). However, CR critiques our approach from the perspective of basic cognitive neuropsychology. This is confusing and misleading because, contrary to CR's claim, we did not attempt to inform models of typical speech production. Instead, we relied on such models to study the impairment in the clinical category of AOS (translational cognitive neuropsychology). Thus, the approach along with the underlying assumptions is different. This response aims to clarify these assumptions, broaden the discussion regarding the methodological approach, and address CR's concerns. We argue that our approach is well-suited to meet the goals of our recent studies and is commensurate with the current state of the science of AOS. Ultimately, a plurality of approaches is needed to understand a phenomenon as complex as AOS.
- Published
- 2021
- Full Text
- View/download PDF
10. Examining speech motor planning difficulties in apraxia of speech and aphasia via the sequential production of phonetically similar words.
- Author
-
Mailend ML, Maas E, Beeson PM, Story BH, and Forster KI
- Subjects
- Adult, Aged, Female, Humans, Male, Middle Aged, Reaction Time, Speech Production Measurement methods, Aphasia physiopathology, Apraxias physiopathology, Phonetics, Speech, Speech Disorders physiopathology
- Abstract
This study investigated the underlying nature of apraxia of speech (AOS) by testing two competing hypotheses. The Reduced Buffer Capacity Hypothesis argues that people with AOS can plan speech only one syllable at a time Rogers and Storkel [1999. Planning speech one syllable at a time: The reduced buffer capacity hypothesis in apraxia of speech. Aphasiology , 13 (9-11), 793-805. https://doi.org/10.1080/026870399401885]. The Program Retrieval Deficit Hypothesis states that selecting a motor programme is difficult in face of competition from other simultaneously activated programmes Mailend and Maas [2013. Speech motor programming in apraxia of speech: Evidence from a delayed picture-word interference task. American Journal of Speech-Language Pathology , 22 (2), S380-S396. https://doi.org/10.1044/1058-0360(2013/12-0101)]. Speakers with AOS and aphasia, aphasia without AOS, and unimpaired controls were asked to prepare and hold a two-word utterance until a go-signal prompted a spoken response. Phonetic similarity between target words was manipulated. Speakers with AOS had longer reaction times in conditions with two similar words compared to two identical words. The Control and the Aphasia group did not show this effect. These results suggest that speakers with AOS need additional processing time to retrieve target words when multiple motor programmes are simultaneously activated.
- Published
- 2021
- Full Text
- View/download PDF
11. Effects of sampling rate and type of anti-aliasing filter on linear-predictive estimates of formant frequencies in men, women, and children.
- Author
-
Milenkovic PH, Wagner M, Kent RD, Story BH, and Vorperian HK
- Subjects
- Child, Female, Humans, Male, Speech Acoustics, Acoustics, Speech
- Abstract
The purpose of this study was to assess the effect of downsampling the acoustic signal on the accuracy of linear-predictive (LPC) formant estimation. Based on speech produced by men, women, and children, the first four formant frequencies were estimated at sampling rates of 48, 16, and 10 kHz using different anti-alias filtering. With proper selection of number of LPC coefficients, anti-alias filter and between-frame averaging, results suggest that accuracy is not improved by rates substantially below 48 kHz. Any downsampling should not go below 16 kHz with a filter cut-off centered at 8 kHz.
- Published
- 2020
- Full Text
- View/download PDF
12. A model of speech production based on the acoustic relativity of the vocal tract.
- Author
-
Story BH and Bunton K
- Subjects
- Acoustics, Humans, Jaw physiology, Larynx physiology, Lip physiology, Male, Tongue physiology, Models, Biological, Speech physiology, Speech Production Measurement
- Abstract
A model is described in which the effects of articulatory movements to produce speech are generated by specifying relative acoustic events along a time axis. These events consist of directional changes of the vocal tract resonance frequencies that, when associated with a temporal event function, are transformed via acoustic sensitivity functions, into time-varying modulations of the vocal tract shape. Because the time course of the events may be considerably overlapped in time, coarticulatory effects are automatically generated. Production of sentence-level speech with the model is demonstrated with audio samples and vocal tract animations.
- Published
- 2019
- Full Text
- View/download PDF
13. Speech motor planning in the context of phonetically similar words: Evidence from apraxia of speech and aphasia.
- Author
-
Mailend ML, Maas E, Beeson PM, Story BH, and Forster KI
- Subjects
- Adult, Aged, Apraxias, Female, Humans, Individuality, Male, Middle Aged, Psychomotor Performance, Reaction Time, Anticipation, Psychological, Aphasia psychology, Phonetics, Speech, Speech Disorders psychology
- Abstract
The purpose of this study was to test two competing hypotheses about the nature of the impairment in apraxia of speech (AOS). The Reduced Buffer Capacity Hypothesis argues that people with AOS can hold only one syllable at a time in the speech motor planning buffer. The Program Retrieval Deficit Hypothesis, states that people with AOS have difficulty accessing the intended motor program in the context where several motor programs are activated simultaneously. The participants included eight speakers with AOS, most of whom also had aphasia, nine speakers with aphasia without AOS, and 25 age-matched control speakers. The experimental paradigm prompted single word production following three types of primes. In most trials, prime and target were the same (e.g., bill-bill). On some trials, the initial consonant differed in one phonetic feature (e.g., bill-dill; Similar) or in all phonetic features (fill-bill; Different). The dependent measures were accuracy and reaction time. The results revealed a switch cost - longer reaction times in trials where the prime and target differed compared to trials where they were the same words - in all groups; however, the switch cost was significantly larger in the AOS group compared to the other two groups. These findings are in line with the prediction of the Program Retrieval Deficit Hypothesis and suggest that speakers with AOS have difficulty with selecting one program over another when several programs compete for selection., (Copyright © 2019 Elsevier Ltd. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
14. An age-dependent vocal tract model for males and females based on anatomic measurements.
- Author
-
Story BH, Vorperian HK, Bunton K, and Durtschi RB
- Subjects
- Adult, Age Factors, Child, Child, Preschool, Female, Humans, Infant, Infant, Newborn, Male, Sex Factors, Vocal Cords diagnostic imaging, Child Development physiology, Sex Characteristics, Speech physiology, Vocal Cords anatomy & histology, Vocal Cords physiology
- Abstract
The purpose of this study was to take a first step toward constructing a developmental and sex-specific version of a parametric vocal tract area function model representative of male and female vocal tracts ranging in age from infancy to 12 yrs, as well as adults. Anatomic measurements collected from a large imaging database of male and female children and adults provided the dataset from which length warping and cross-dimension scaling functions were derived, and applied to the adult-based vocal tract model to project it backward along an age continuum. The resulting model was assessed qualitatively by projecting hypothetical vocal tract shapes onto midsagittal images from the cohort of children, and quantitatively by comparison of formant frequencies produced by the model to those reported in the literature. An additional validation of modeled vocal tract shapes was made possible by comparison to cross-sectional area measurements obtained for children and adults using acoustic pharyngometry. This initial attempt to generate a sex-specific developmental vocal tract model paves a path to study the relation of vocal tract dimensions to documented prepubertal acoustic differences.
- Published
- 2018
- Full Text
- View/download PDF
15. Vowel space density as an indicator of speech performance.
- Author
-
Story BH and Bunton K
- Subjects
- Humans, Signal Processing, Computer-Assisted, Sound Spectrography, Acoustics, Phonetics, Speech Acoustics, Speech Production Measurement methods, Voice Quality
- Abstract
The purpose of this study was to develop a method for visualizing and assessing the characteristics of vowel production by measuring the local density of normalized F
1 and F2 formant frequencies. The result is a three-dimensional plot called the vowel space density (VSD) and indicates the regions in the vowel space most heavily used by a talker during speech production. The area of a convex hull enclosing the vowel space at specific threshold density values was proposed as a means of quantifying the VSD.- Published
- 2017
- Full Text
- View/download PDF
16. An acoustically-driven vocal tract model for stop consonant production.
- Author
-
Story BH and Bunton K
- Abstract
The purpose of this study was to further develop a multi-tier model of the vocal tract area function in which the modulations of shape to produce speech are generated by the product of a vowel substrate and a consonant superposition function. The new approach consists of specifying input parameters for a target consonant as a set of directional changes in the resonance frequencies of the vowel substrate. Using calculations of acoustic sensitivity functions, these "resonance deflection patterns" are transformed into time-varying deformations of the vocal tract shape without any direct specification of location or extent of the consonant constriction along the vocal tract. The configuration of the constrictions and expansions that are generated by this process were shown to be physiologically-realistic and produce speech sounds that are easily identifiable as the target consonants. This model is a useful enhancement for area function-based synthesis and can serve as a tool for understanding how the vocal tract is shaped by a talker during speech production.
- Published
- 2017
- Full Text
- View/download PDF
17. Influence of Left-Right Asymmetries on Voice Quality in Simulated Paramedian Vocal Fold Paralysis.
- Author
-
Samlan RA and Story BH
- Subjects
- Biomechanical Phenomena, Humans, Vibration, Vocal Cords physiopathology, Computer Simulation, Models, Biological, Vocal Cord Paralysis physiopathology, Voice Quality physiology
- Abstract
Purpose: The purpose of this study was to determine the vocal fold structural and vibratory symmetries that are important to vocal function and voice quality in a simulated paramedian vocal fold paralysis., Method: A computational kinematic speech production model was used to simulate an exemplar "voice" on the basis of asymmetric settings of parameters controlling glottal configuration. These parameters were then altered individually to determine their effect on maximum flow declination rate, spectral slope, cepstral peak prominence, harmonics-to-noise ratio, and perceived voice quality., Results: Asymmetry of each of the 5 vocal fold parameters influenced vocal function and voice quality; measured change was greatest for adduction and bulging. Increasing the symmetry of all parameters improved voice, and the best voice occurred with overcorrection of adduction, followed by bulging, nodal point ratio, starting phase, and amplitude of vibration., Conclusions: Although vocal process adduction and edge bulging asymmetries are most influential in voice quality for simulated vocal fold motion impairment, amplitude of vibration and starting phase asymmetries are also perceptually important. These findings are consistent with the current surgical approach to vocal fold motion impairment, where goals include medializing the vocal process and straightening concave edges. The results also explain many of the residual postoperative voice limitations.
- Published
- 2017
- Full Text
- View/download PDF
18. A Modeling Study of the Effects of Vocal Tract Movement Duration and Magnitude on the F2 Trajectory in CV Words.
- Author
-
Neely KD, Bunton K, and Story BH
- Subjects
- Adult, Biomechanical Phenomena, Child, Female, Humans, Male, Phonetics, Computer Simulation, Models, Biological, Movement physiology, Speech physiology, Vocal Cords physiology
- Abstract
Purpose: This study used a computational vocal tract model to investigate the relationship of diphthong duration and vocal tract movement magnitude to measures of the F2 trajectory in CV words., Method: Three words (bough, boy, and buy) were simulated on the basis of an adult female vocal tract model, in which the model parameters were estimated from audio recordings of a female talker. Model parameters were then modified to generate 35 simulations of each word corresponding to 7 different durations and 5 movement magnitude settings. In addition, these simulations were repeated with vocal tract lengths representative of an adult male and an approximately 6-year-old child., Results: On the basis of univariate analysis, measures of frequency predicted changes in magnitude, and temporal measures predicted changes in speaking rate consistent with the hypothesis. The combined effects of duration and magnitude showed that F2 was more sensitive to changes in magnitude at shorter word durations compared with longer word durations. This finding held across words and vocal tract length., Conclusions: Results suggest that there is an interaction between duration and magnitude that affects the slope of the F2 trajectory. The next step is to relate kinematics to F2 trajectory output using real speakers.
- Published
- 2016
- Full Text
- View/download PDF
19. The effects of physiological adjustments on the perceptual and acoustical characteristics of vibrato as a model of vocal tremor.
- Author
-
Lester-Smith RA and Story BH
- Subjects
- Adult, Female, Humans, Judgment, Male, Singing, Voice, Voice Quality, Young Adult, Tremor
- Abstract
The purpose of this study was to investigate the effects of physiological adjustments on listeners' perception of the magnitude of modulation of voice and to determine the characteristics of the acoustical modulations that explained listeners' judgments. This research was carried out using singers producing vibrato as a model of vocal tremor. Twenty healthy adults participated in a perceptual study involving pair-comparisons of the magnitude of "shakiness" with singers' samples, which differed by fundamental frequency, vocal quality, and vowel. Results revealed that listeners perceived a higher magnitude of voice modulation when female samples had a pressed vocal quality. Acoustical analyses were performed with voice samples to determine the features that predicted listeners' judgments. Based on regression analyses, listeners' judgments were predicted to some extent by modulation information in frequency bands across the spectrum.
- Published
- 2016
- Full Text
- View/download PDF
20. Arizona Child Acoustic Database Repository.
- Author
-
Bunton K and Story BH
- Subjects
- Acoustics, Arizona, Child, Child, Preschool, Communication, Female, Humans, Language, Male, Phonetics, Speech Perception, Databases, Factual, Speech Acoustics
- Abstract
Objective: The goal of the Arizona Child Acoustic Database project was to obtain a large set of acoustic recordings, primarily vowels, collected from a cohort of children over a critical period of growth and development., Method: Data was recorded longitudinally from 63 children between the ages of 2;0 and 7;0 at 3-month intervals. The protocol included individual American English vowels and diphthongs, nonsense multi-vowel transitions, word level multi-vowel sequences (e.g., Hawaii), single-syllable words targeting each American English vowel, short sentences, and conversation., Results: Acoustic files are available for download through the University of Arizona Library Repository for use in future research projects., Conclusion: Longitudinal recordings may be of interest because they allow tracking of acoustic characteristics produced by an individual child during a period of rapid growth and speech development., (© 2016 S. Karger AG, Basel.)
- Published
- 2016
- Full Text
- View/download PDF
21. The effects of physiological adjustments on the perceptual and acoustical characteristics of simulated laryngeal vocal tremor.
- Author
-
Lester RA and Story BH
- Subjects
- Acoustic Stimulation, Adolescent, Adult, Biomechanical Phenomena, Computer Simulation, Female, Glottis physiopathology, Humans, Judgment, Laryngeal Muscles physiopathology, Male, Middle Aged, Observer Variation, Phonetics, Psychoacoustics, Speech Acoustics, Vocal Cords physiopathology, Young Adult, Speech Disorders physiopathology, Speech Perception physiology, Tremor physiopathology, Voice Quality physiology
- Abstract
The purpose of this study was to determine if adjustments to the voice source [i.e., fundamental frequency (F0), degree of vocal fold adduction] or vocal tract filter (i.e., vocal tract shape for vowels) reduce the perception of simulated laryngeal vocal tremor and to determine if listener perception could be explained by characteristics of the acoustical modulations. This research was carried out using a computational model of speech production that allowed for precise control and manipulation of the glottal and vocal tract configurations. Forty-two healthy adults participated in a perceptual study involving pair-comparisons of the magnitude of "shakiness" with simulated samples of laryngeal vocal tremor. Results revealed that listeners perceived a higher magnitude of voice modulation when simulated samples had a higher mean F0, greater degree of vocal fold adduction, and vocal tract shape for /i/ vs /ɑ/. However, the effect of F0 was significant only when glottal noise was not present in the acoustic signal. Acoustical analyses were performed with the simulated samples to determine the features that affected listeners' judgments. Based on regression analyses, listeners' judgments were predicted to some extent by modulation information present in both low and high frequency bands.
- Published
- 2015
- Full Text
- View/download PDF
22. Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization.
- Author
-
Titze IR, Baken RJ, Bozeman KW, Granqvist S, Henrich N, Herbst CT, Howard DM, Hunter EJ, Kaelin D, Kent RD, Kreiman J, Kob M, Löfqvist A, McCoy S, Miller DG, Noé H, Scherer RC, Smith JR, Story BH, Švec JG, Ternström S, and Wolfe J
- Subjects
- Animals, Consensus, Humans, Linguistics standards, Phonetics, Sound, Speech-Language Pathology standards, Vibration, Acoustics, Linguistics classification, Speech Acoustics, Speech-Language Pathology classification, Terminology as Topic, Vocalization, Animal classification, Voice Quality
- Published
- 2015
- Full Text
- View/download PDF
23. Discriminating simulated vocal tremor source using amplitude modulation spectra.
- Author
-
Carbonell KM, Lester RA, Story BH, and Lotto AJ
- Subjects
- Humans, Speech Production Measurement methods, Speech Acoustics, Vocal Cords physiopathology, Voice Disorders physiopathology, Voice Quality physiology
- Abstract
Objectives/hypothesis: Sources of vocal tremor are difficult to categorize perceptually and acoustically. This article describes a preliminary attempt to discriminate vocal tremor sources through the use of spectral measures of the amplitude envelope. The hypothesis is that different vocal tremor sources are associated with distinct patterns of acoustic amplitude modulations., Study Design: Statistical categorization methods (discriminant function analysis) were used to discriminate signals from simulated vocal tremor with different sources using only acoustic measures derived from the amplitude envelopes., Methods: Simulations of vocal tremor were created by modulating parameters of a vocal fold model corresponding to oscillations of respiratory driving pressure (respiratory tremor), degree of vocal fold adduction (adductory tremor), and fundamental frequency of vocal fold vibration (F0 tremor). The acoustic measures were based on spectral analyses of the amplitude envelope computed across the entire signal and within select frequency bands., Results: The signals could be categorized (with accuracy well above chance) in terms of the simulated tremor source using only measures of the amplitude envelope spectrum even when multiple sources of tremor were included., Conclusions: These results supply initial support for an amplitude-envelope-based approach to identify the source of vocal tremor and provide further evidence for the rich information about talker characteristics present in the temporal structure of the amplitude envelope., (Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
24. Formant measurement in children's speech based on spectral filtering.
- Author
-
Story BH and Bunton K
- Abstract
Children's speech presents a challenging problem for formant frequency measurement. In part, this is because high fundamental frequencies, typical of a children's speech production, generate widely spaced harmonic components that may undersample the spectral shape of the vocal tract transfer function. In addition, there is often a weakening of upper harmonic energy and a noise component due to glottal turbulence. The purpose of this study was to develop a formant measurement technique based on cepstral analysis that does not require modification of the cepstrum itself or transformation back to the spectral domain. Instead, a narrow-band spectrum is low-pass filtered with a cutoff point (i.e., cutoff "quefrency" in the terminology of cepstral analysis) to preserve only the spectral envelope. To test the method, speech representative of a 2-3 year-old child was simulated with an airway modulation model of speech production. The model, which includes physiologically-scaled vocal folds and vocal tract, generates sound output analogous to a microphone signal. The vocal tract resonance frequencies can be calculated independently of the output signal and thus provide test cases that allow for assessing the accuracy of the formant tracking algorithm. When applied to the simulated child-like speech, the spectral filtering approach was shown to provide a clear spectrographic representation of formant change over the time course of the signal, and facilitates tracking formant frequencies for further analysis.
- Published
- 2015
- Full Text
- View/download PDF
25. Gender and vocal production mode discrimination using the high frequencies for speech and singing.
- Author
-
Monson BB, Lotto AJ, and Story BH
- Abstract
Humans routinely produce acoustical energy at frequencies above 6 kHz during vocalization, but this frequency range is often not represented in communication devices and speech perception research. Recent advancements toward high-definition (HD) voice and extended bandwidth hearing aids have increased the interest in the high frequencies. The potential perceptual information provided by high-frequency energy (HFE) is not well characterized. We found that humans can accomplish tasks of gender discrimination and vocal production mode discrimination (speech vs. singing) when presented with acoustic stimuli containing only HFE at both amplified and normal levels. Performance in these tasks was robust in the presence of low-frequency masking noise. No substantial learning effect was observed. Listeners also were able to identify the sung and spoken text (excerpts from "The Star-Spangled Banner") with very few exposures. These results add to the increasing evidence that the high frequencies provide at least redundant information about the vocal signal, suggesting that its representation in communication devices (e.g., cell phones, hearing aids, and cochlear implants) and speech/voice synthesizers could improve these devices and benefit normal-hearing and hearing-impaired listeners.
- Published
- 2014
- Full Text
- View/download PDF
26. Acoustic and perceptual effects of left-right laryngeal asymmetries based on computational modeling.
- Author
-
Samlan RA, Story BH, Lotto AJ, and Bunton K
- Subjects
- Adolescent, Adult, Aged, Computer Simulation, Female, Humans, Male, Middle Aged, Signal-To-Noise Ratio, Speech Acoustics, Vibration, Vocal Cords physiopathology, Young Adult, Larynx physiopathology, Speech physiology, Speech Perception physiology, Vocal Cord Paralysis physiopathology
- Abstract
Purpose: Computational modeling was used to examine the consequences of 5 different laryngeal asymmetries on acoustic and perceptual measures of vocal function., Method: A kinematic vocal fold model was used to impose 5 laryngeal asymmetries: adduction, edge bulging, nodal point ratio, amplitude of vibration, and starting phase. Thirty /a/ and /ɪ/ vowels were generated for each asymmetry and analyzed acoustically using cepstral peak prominence (CPP), harmonics-to-noise ratio (HNR), and 3 measures of spectral slope (H1*-H2*, B0-B1, and B0-B2). Twenty listeners rated voice quality for a subset of the productions., Results: Increasingly asymmetric adduction, bulging, and nodal point ratio explained significant variance in perceptual rating (R2 = .05, p < .001). The same factors resulted in generally decreasing CPP, HNR, and B0-B2 and in increasing B0-B1. Of the acoustic measures, only CPP explained significant variance in perceived quality (R2 = .14, p < .001). Increasingly asymmetric amplitude of vibration or starting phase minimally altered vocal function or voice quality., Conclusion: Asymmetries of adduction, bulging, and nodal point ratio drove acoustic measures and perception in the current study, whereas asymmetric amplitude of vibration and starting phase demonstrated minimal influence on the acoustic signal or voice quality.
- Published
- 2014
- Full Text
- View/download PDF
27. Structure, Movement, Sound, and Perception.
- Author
-
Story BH
- Abstract
Models that take the form of artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The article begins with a brief history of two artificial speaking devices that exemplify the representation of speech production as a system of modulations. The development of a recent airway modulation model is then described that simulates the time-varying changes of the vocal tract and acoustic wave propagation. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener.
- Published
- 2014
- Full Text
- View/download PDF
28. The perceptual significance of high-frequency energy in the human voice.
- Author
-
Monson BB, Hunter EJ, Lotto AJ, and Story BH
- Abstract
While human vocalizations generate acoustical energy at frequencies up to (and beyond) 20 kHz, the energy at frequencies above about 5 kHz has traditionally been neglected in speech perception research. The intent of this paper is to review (1) the historical reasons for this research trend and (2) the work that continues to elucidate the perceptual significance of high-frequency energy (HFE) in speech and singing. The historical and physical factors reveal that, while HFE was believed to be unnecessary and/or impractical for applications of interest, it was never shown to be perceptually insignificant. Rather, the main causes for focus on low-frequency energy appear to be because the low-frequency portion of the speech spectrum was seen to be sufficient (from a perceptual standpoint), or the difficulty of HFE research was too great to be justifiable (from a technological standpoint). The advancement of technology continues to overcome concerns stemming from the latter reason. Likewise, advances in our understanding of the perceptual effects of HFE now cast doubt on the first cause. Emerging evidence indicates that HFE plays a more significant role than previously believed, and should thus be considered in speech and voice perception research, especially in research involving children and the hearing impaired.
- Published
- 2014
- Full Text
- View/download PDF
29. Detection of high-frequency energy level changes in speech and singing.
- Author
-
Monson BB, Lotto AJ, and Story BH
- Subjects
- Acoustic Stimulation, Adult, Audiometry, Speech, Female, Humans, Male, Psychoacoustics, Sex Factors, Sound Spectrography, Young Adult, Pitch Discrimination, Singing, Speech Acoustics, Speech Perception, Voice Quality
- Abstract
Previous work has shown that human listeners are sensitive to level differences in high-frequency energy (HFE) in isolated vowel sounds produced by male singers. Results indicated that sensitivity to HFE level changes increased with overall HFE level, suggesting that listeners would be more "tuned" to HFE in vocal production exhibiting higher levels of HFE. It follows that sensitivity to HFE level changes should be higher (1) for female vocal production than for male vocal production and (2) for singing than for speech. To test this hypothesis, difference limens for HFE level changes in male and female speech and singing were obtained. Listeners showed significantly greater ability to detect level changes in singing vs speech but not in female vs male speech. Mean differences limen scores for speech and singing were about 5 dB in the 8-kHz octave (5.6-11.3 kHz) but 8-10 dB in the 16-kHz octave (11.3-22 kHz). These scores are lower (better) than those previously reported for isolated vowels and some musical instruments.
- Published
- 2014
- Full Text
- View/download PDF
30. Formant frequency estimation of high-pitched vowels using weighted linear prediction.
- Author
-
Alku P, Pohjalainen J, Vainio M, Laukkanen AM, and Story BH
- Subjects
- Adult, Algorithms, Biomechanical Phenomena, Child, Preschool, Computer Simulation, Female, Glottis anatomy & histology, Humans, Male, Numerical Analysis, Computer-Assisted, Pattern Recognition, Automated, Pressure, Signal Processing, Computer-Assisted, Sound Spectrography, Speech Production Measurement, Time Factors, Vocal Cords physiology, Glottis physiology, Linear Models, Phonation, Phonetics, Pitch Perception, Speech Acoustics, Voice Quality
- Abstract
All-pole modeling is a widely used formant estimation method, but its performance is known to deteriorate for high-pitched voices. In order to address this problem, several all-pole modeling methods robust to fundamental frequency have been proposed. This study compares five such previously known methods and introduces a technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes temporally weighted linear prediction (LP) in which the square of the prediction error is multiplied by a given parametric weighting function. The weighting downgrades the contribution of the main excitation of the vocal tract in optimizing the filter coefficients. Consequently, the resulting all-pole model is affected more by the characteristics of the vocal tract leading to less biased formant estimates. By using synthetic vowels created with a physical modeling approach, the results showed that WLP-AME yields improved formant frequencies for high-pitched sounds in comparison to the previously known methods (e.g., relative error in the first formant of the vowel [a] decreased from 11% to 3% when conventional LP was replaced with WLP-AME). Experiments conducted on natural vowels indicate that the formants detected by WLP-AME changed in a more regular manner between repetitions of different pitch than those computed by conventional LP.
- Published
- 2013
- Full Text
- View/download PDF
31. Relation of perceived breathiness to laryngeal kinematics and acoustic measures based on computational modeling.
- Author
-
Samlan RA, Story BH, and Bunton K
- Subjects
- Biomechanical Phenomena, Computer Simulation, Humans, Reproducibility of Results, Severity of Illness Index, Signal-To-Noise Ratio, Speech Production Measurement, Vocal Cords anatomy & histology, Voice Disorders diagnosis, Models, Biological, Respiration, Speech Acoustics, Vocal Cords physiology, Voice physiology, Voice Disorders physiopathology
- Abstract
Purpose: In this study, the authors sought to determine (a) how specific vocal fold structural and vibratory features relate to breathy voice quality and (b) the relation of perceived breathiness to 4 acoustic correlates of breathiness., Method: A computational, kinematic model of the vocal fold medial surfaces was used to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: vocal process separation, surface bulging, vibratory nodal point, and epilaryngeal constriction. Twelve naïve listeners rated breathiness of 364 samples relative to a reference. The degree of breathiness was then compared to (a) the underlying kinematic profile and (b) 4 acoustic measures: cepstral peak prominence (CPP), harmonics-to-noise ratio, and two measures of spectral slope., Results: Vocal process separation alone accounted for 61.4% of the variance in perceptual rating. Adding nodal point ratio and bulging to the equation increased the explained variance to 88.7%. The acoustic measure CPP accounted for 86.7% of the variance in perceived breathiness, and explained variance increased to 92.6% with the addition of one spectral slope measure., Conclusion: Breathiness ratings were best explained kinematically by the degree of vocal process separation and acoustically by CPP.
- Published
- 2013
- Full Text
- View/download PDF
32. Physiologic and acoustic patterns of essential vocal tremor.
- Author
-
Lester RA, Barkmeier-Kraemer J, and Story BH
- Subjects
- Biomechanical Phenomena, Computer Simulation, Elasticity, Humans, Larynx physiopathology, Oscillometry, Pressure, Respiration, Signal Processing, Computer-Assisted, Sound Spectrography, Speech Production Measurement, Stroboscopy, Tremor diagnosis, Video Recording, Voice Disorders diagnosis, Acoustics, Speech Acoustics, Tremor physiopathology, Voice Disorders physiopathology, Voice Quality
- Abstract
Objectives/hypothesis: This article describes a case study of physiologic and acoustic patterns of essential vocal tremor (EVT). Simulations of vocal tremor were used to test hypotheses regarding measured acoustic patterns and expected physiologic sources., Study Design: This is a case study of EVT using an analysis by synthesis approach., Methods: Oscillations of vocal tract and laryngeal structures were identified using rigid videostroboscopic examination. Acoustical analyses of sustained phonation were completed using the methods previously described in the literature and custom-written MATLAB functions. Simulations of the client's vocal tremor were created using a computational model., Results: The client exhibited vocal fold length changes and oscillation within the laryngeal vestibule during sustained phonation at a comfortable pitch and loudness. Despite the involvement of vocal fold length changes, a low average extent of fundamental frequency (F0) modulation (ie, 5.3%) and high average extent of intensity modulation (ie, 23.0%) were measured. Simulations of vocal tremor involving modulation of F0 demonstrated that this source of tremor contributes to frequency-induced intensity modulation, although there was a greater extent of F0 modulation than intensity modulation., Conclusions: The greater extent of intensity than F0 modulation in one client with EVT exhibiting predominant vocal fold length changes contrasted with the lower extent of intensity than F0 modulation in simulated vocal tremor involving F0 modulation. These findings demonstrate that other potential sources of intensity modulation outside the larynx should be determined during the evaluation of clients with vocal tremor., (Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.)
- Published
- 2013
- Full Text
- View/download PDF
33. Phrase-level speech simulation with an airway modulation model of speech production.
- Author
-
Story BH
- Abstract
Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated.
- Published
- 2013
- Full Text
- View/download PDF
34. Acoustic characteristics of simulated respiratory-induced vocal tremor.
- Author
-
Lester RA and Story BH
- Subjects
- Adaptation, Physiological physiology, Aged, Humans, Male, Middle Aged, Plethysmography, Pressure, Speech physiology, Speech Production Measurement, Tremor diagnosis, Voice Disorders diagnosis, Respiratory Mechanics physiology, Speech Acoustics, Tremor physiopathology, Voice Disorders physiopathology, Voice Quality physiology
- Abstract
Purpose: The purpose of this study was to investigate the relation of respiratory forced oscillation to the acoustic characteristics of vocal tremor., Method: Acoustical analyses were performed to determine the characteristics of the intensity and fundamental frequency (F0) for speech samples obtained by Farinella, Hixon, Hoit, Story, and Jones (2006) using a respiratory forced oscillation paradigm with 5 healthy adult males to simulate vocal tremor involving respiratory pressure modulation. The analyzed conditions were sustained productions of /a/ with amplitudes of applied pressure of 0, 1, 2, and 4 cmH2O and a rate of 5 Hz., Results: Forced oscillation of the respiratory system produced modulation of the intensity and F0 for all participants. Variability was observed between participants and conditions in the change in intensity and F0 per unit of pressure change, as well as in the mean intensity and F0. However, the extent of modulation of intensity and F0 generally increased as the applied pressure increased, as would be expected., Conclusion: These findings suggest that individuals develop idiosyncratic adaptations to pressure modulations, which are important to understanding aspects of variability in vocal tremor, and highlight the need to assess all components of the speech mechanism that may be directly or indirectly affected by tremor.
- Published
- 2013
- Full Text
- View/download PDF
35. The relation of nasality and nasalance to nasal port area based on a computational model.
- Author
-
Bunton K and Story BH
- Subjects
- Auditory Perception, Humans, Models, Theoretical, Nose physiopathology, Speech Disorders physiopathology, Speech Production Measurement methods, Velopharyngeal Insufficiency physiopathology, Voice Quality
- Abstract
Objective: The purpose of this study was to examine the relation of perceptual ratings of nasality by experienced listeners, measures of nasalance, and the size of the nasal port opening for three simulated English corner vowels, /i/, /u/, and /a/., Design: Samples were generated using a computational model that allowed for exact control of nasal port size and a direct measure of nasalance. Perceptual ratings were obtained using a paired-stimulus presentation., Participants: Five experienced listeners., Main Outcome Measures: Measures of nasalance and perceptual nasality ratings., Results: Differences in nasalance and perceptual ratings of nasality were noted among the three vowels, with values being greater for the high vowels /i/ and /u/ compared to the low vowel /a/. Listeners detected nasality for the high and low vowels simulated with nasal port areas of 0.01 and 0.15 cm(2), respectively. Correlations between ratings of nasality and nasalance were high for all three vowels., Conclusions: Results of the present study show a high correlation between ratings of nasality and measures of nasalance for nasal port areas ranging from 0 to 0.5 cm(2). The correlations were based on sustained vowel samples. The restricted speech sample limits generalization of the findings to clinical data; however, the results are a demonstration of the usefulness of modeling to understand the perceptual phenomena of nasality.
- Published
- 2012
- Full Text
- View/download PDF
36. Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives.
- Author
-
Monson BB, Lotto AJ, and Story BH
- Subjects
- Adult, Aged, Analysis of Variance, Female, Humans, Male, Middle Aged, Sex Factors, Signal Processing, Computer-Assisted, Sound Spectrography, Speech Production Measurement, Voice Quality, Young Adult, Singing, Speech Acoustics, Voice
- Abstract
The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.
- Published
- 2012
- Full Text
- View/download PDF
37. Horizontal directivity of low- and high-frequency energy in speech and singing.
- Author
-
Monson BB, Hunter EJ, and Story BH
- Subjects
- Adult, Aged, Female, Humans, Male, Middle Aged, Sex Factors, Sound Spectrography, Speech Acoustics, Speech Production Measurement, Music, Speech physiology, Voice physiology
- Abstract
Speech and singing directivity in the horizontal plane was examined using simultaneous multi-channel full-bandwidth recordings to investigate directivity of high-frequency energy, in particular. This method allowed not only for accurate analysis of running speech using the long-term average spectrum, but also for examination of directivity of separate transient phonemes. Several vocal production factors that could affect directivity were examined. Directivity differences were not found between modes of production (speech vs singing) and only slight differences were found between genders and production levels (soft vs normal vs loud), more pronounced in the higher frequencies. Large directivity differences were found between specific voiceless fricatives, with /s,∫/ more directional than /f,θ/ in the 4, 8, 16 kHz octave bands.
- Published
- 2012
- Full Text
- View/download PDF
38. Relation of structural and vibratory kinematics of the vocal folds to two acoustic measures of breathy voice based on computational modeling.
- Author
-
Samlan RA and Story BH
- Subjects
- Biomechanical Phenomena, Humans, Models, Biological, Speech Acoustics, Vibration, Computer Simulation, Phonation, Sound Spectrography methods, Vocal Cords, Voice Quality
- Abstract
Purpose: To relate vocal fold structure and kinematics to 2 acoustic measures: cepstral peak prominence (CPP) and the amplitude of the first harmonic relative to the second (H1-H2)., Method: The authors used a computational, kinematic model of the medial surfaces of the vocal folds to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: degree of vocal fold adduction, surface bulging, vibratory nodal point, and supraglottal constriction. CPP and H1-H2 were measured from simulated glottal area, glottal flow, and acoustic waveforms and were related to the underlying vocal fold kinematics., Results: CPP decreased with increased separation of the vocal processes, whereas the nodal point location had little effect. H1-H2 increased as a function of separation of the vocal processes in the range of 1.0 mm to 1.5 mm and decreased with separation > 1.5 mm., Conclusions: CPP is generally a function of vocal process separation. H1*-H2* (see paragraph 6 of article text for an explanation of the asterisks) will increase or decrease with vocal process separation on the basis of vocal fold shape, pivot point for the rotational mode, and supraglottal vocal tract shape, limiting its utility as an indicator of breathy voice. Future work will relate the perception of breathiness to vocal fold kinematics and acoustic measures.
- Published
- 2011
- Full Text
- View/download PDF
39. Relation of vocal tract shape, formant transitions, and stop consonant identification.
- Author
-
Story BH and Bunton K
- Subjects
- Acoustic Stimulation methods, Adolescent, Biomechanical Phenomena, Female, Humans, Magnetic Resonance Imaging, Male, Speech Intelligibility, Speech Perception, Speech Production Measurement, Vocal Cords anatomy & histology, Young Adult, Models, Biological, Phonation physiology, Phonetics, Vocal Cords physiology
- Abstract
Purpose: The present study was designed to investigate the relation of formant transitions to place-of-articulation for stop consonants. A speech production model was used to generate simulated utterances containing voiced stop consonants, and a perceptual experiment was performed to test their identification by listeners., Method: Based on a model of the vocal tract shape, a theoretical basis for reducing highly variable formant transitions to more invariant formant deflection patterns as a function of constriction location was proposed. A speech production model was used to simulate vowel-consonant-vowel (VCV) utterances for 3 underlying vowel-vowel contexts and for which the constriction location was incrementally moved from the lips toward the velar part of the vocal tract. These simulated VCVs were presented to listeners who were asked to identify the consonant., Results: Listener responses indicated that phonetic boundaries were well aligned with points along the vocal tract length where there was a shift in the deflection polarity of either the 2nd or 3rd formant., Conclusions: This study demonstrated that regions of the vocal tract exist that, when constricted, shift the formant frequencies in a predictable direction. Based on a perceptual experiment, the boundaries of these acoustically defined regions were shown to coincide with phonetic categories for stop consonants.
- Published
- 2010
- Full Text
- View/download PDF
40. Identification of synthetic vowels based on a time-varying model of the vocal tract area function.
- Author
-
Bunton K and Story BH
- Subjects
- Acoustic Stimulation, Adult, Biomechanical Phenomena, Computer Simulation, Female, Humans, Male, Sound Spectrography, Time Factors, Transducers, Vocal Cords anatomy & histology, Young Adult, Acoustics instrumentation, Models, Theoretical, Phonation, Signal Processing, Computer-Assisted, Speech Acoustics, Speech Intelligibility, Vocal Cords physiology
- Abstract
The purpose of this study was to conduct an identification experiment with synthetic vowels based on the same sets of speaker-dependent area functions as in Bunton and Story [(2009) J. Acoust. Soc. Am. 125, 19-22], but with additional time-varying characteristics that are more representative of natural speech. The results indicated that vowels synthesized using an area function model that allows for time variation of the vocal tract shape and includes natural vowel durations were more accurately identified for 7 of 11 English vowels than those based on static area functions.
- Published
- 2010
- Full Text
- View/download PDF
41. Vowel and consonant contributions to vocal tract shape.
- Author
-
Story BH
- Subjects
- Algorithms, Biomechanical Phenomena, Female, Humans, Jaw physiology, Pharynx physiology, Time Factors, Young Adult, Models, Biological, Mouth physiology, Phonation physiology, Phonetics
- Abstract
The purpose of this study was to develop a method by which a vowel-consonant-vowel (VCV) utterance based on x-ray microbeam articulatory data could be separated into a vowel-to-vowel transition and a consonant superposition function. The result is a model that represents a vowel sequence as a time-dependent perturbation of the neutral vocal tract shape governed by coefficients of canonical deformation patterns. Consonants were modeled as superposition functions that can force specific portions of the vocal tract shape to be constricted or expanded, over a specific time course. The three VCVs [pa], [ta], and [ka], produced by one female speaker, were analyzed and reconstructed with the developed model. They were shown to be reasonable approximations of the original VCVs, as assessed qualitatively by visual inspection and quantitatively by calculating rms error and correlation coefficients. This establishes a method for future modeling of other speech material.
- Published
- 2009
- Full Text
- View/download PDF
42. Vocal tract modes based on multiple area function sets from one speaker.
- Author
-
Story BH
- Subjects
- Algorithms, Humans, Larynx anatomy & histology, Male, Mouth anatomy & histology, Principal Component Analysis, Larynx physiology, Mouth physiology, Phonetics, Speech physiology
- Abstract
The purpose of this study was to derive vocal tract modes from a wider range of vowel area functions for a specific speaker than has been previously reported. Area functions from Story et al. [(1996). J. Acoust. Soc. Am. 100, 537-554] and Story [(2008). J. Acoust. Soc. Am. 123, 327-335] were combined in a composite set from which modes were derived with principal component analysis. Along with scaling coefficients, these modes were used to generate a [F1, F2] formant space. In comparison to formant spaces similarly generated based on the two area function sets alone, the combined version provides a wider range of both F1 and F2 values. This new set of modes may be useful for inverse mapping of formant frequencies to area functions or for modeling of vocal tract shape changes.
- Published
- 2009
- Full Text
- View/download PDF
43. Identification of synthetic vowels based on selected vocal tract area functions.
- Author
-
Bunton K and Story BH
- Subjects
- Adult, Female, Humans, Male, Phonation physiology, Phonetics, Speech Perception, Speech, Alaryngeal, Vocal Cords physiology
- Abstract
The purpose of this study was to determine the degree to which synthetic vowel samples based on previously reported vocal tract area functions of eight speakers could be accurately identified by listeners. Vowels were synthesized with a wave-reflection type of vocal tract model coupled to a voice source. A particular vowel was generated by specifying an area function that had been derived from previous magnetic resonance imaging based measurements. The vowel samples were presented to ten listeners in a forced choice paradigm in which they were asked to identify the vowel. Results indicated that the vowels [i], [ae], and [u] were identified most accurately for all of speakers. The identification errors of the other vowels were typically due to confusions with adjacent vowels.
- Published
- 2009
- Full Text
- View/download PDF
44. Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002.
- Author
-
Story BH
- Subjects
- Adult, Humans, Larynx physiology, Male, Pharynx physiology, Speech Production Measurement, Time Factors, Voice Quality physiology, Larynx anatomy & histology, Magnetic Resonance Imaging, Pharynx anatomy & histology, Phonetics, Vocal Cords physiology
- Abstract
A new set of area functions for vowels has been obtained with magnetic resonance imaging from the same speaker as that previously reported in 1996 [Story et al., J. Acoust. Soc. Am. 100, 537-554 (1996)]. The new area functions were derived from image data collected in 2002, whereas the previously reported area functions were based on magnetic resonance images obtained in 1994. When compared, the new area function sets indicated a tendency toward a constricted pharyngeal region and expanded oral cavity relative to the previous set. Based on calculated formant frequencies and sensitivity functions, these morphological differences were shown to have the primary acoustic effect of systematically shifting the second formant (F2) downward in frequency. Multiple instances of target vocal tract shapes from a specific speaker provide additional sampling of the possible area functions that may be produced during speech production. This may be of benefit for understanding intraspeaker variability in vowel production and for further development of speech synthesizers and speech models that utilize area function information.
- Published
- 2008
- Full Text
- View/download PDF
45. A comparison of vocal tract perturbation patterns based on statistical and acoustic considerations.
- Author
-
Story BH
- Subjects
- Glottis physiology, Humans, Lip physiology, Sensitivity and Specificity, Models, Statistical, Oropharynx physiology, Phonation physiology, Phonetics, Pulmonary Ventilation physiology, Sound Spectrography, Speech Acoustics
- Abstract
The purpose of this study was to investigate the relation between vocal tract deformation patterns obtained from statistical analyses of a set of area functions representative of a vowel repertoire, and the acoustic properties of a neutral vocal tract shape. Acoustic sensitivity functions were calculated for a mean area function based on seven different speakers. Specific linear combinations of the sensitivity functions corresponding to the first two formant frequencies were shown to possess essentially the same amplitude variation along the vocal tract length as the statistically derived deformation patterns reported in previous studies.
- Published
- 2007
- Full Text
- View/download PDF
46. Effects of binaural electronic hearing protectors on localization and response time to sounds in the horizontal plane.
- Author
-
Carmichel EL, Harris FP, and Story BH
- Subjects
- Adult, Female, Hearing Tests methods, Humans, Male, Noise, Transportation, United States, Ear Protective Devices standards, Electronics, Reaction Time, Sound Localization
- Abstract
The effects of electronic hearing protector devices (HPDs) on localization and response time (RT) to stimuli were assessed at six locations in the horizontal plane. The stimuli included a firearm loading, telephone ringing and .5-kHz and 4-kHz tonebursts presented during continuous traffic noise. Eight normally hearing adult listeners were evaluated under two conditions: (a) ears unoccluded; (b) ears occluded with one of three amplitude-sensitive sound transmission HPDs. All HPDs were found to affect localization, and performance was dependent on stimuli and location. Response time (RT) was less in the unoccluded condition than for any of the HPD conditions for the broadband stimuli. In the HPD conditions, RT to incorrect responses was significantly less than RT to correct responses for 120 degrees and 240 degrees , the two locations with the greatest number of errors. The RTs to incorrect responses were significantly greater than to correct responses for 60 degrees and 300 degrees , the two locations with the least number of errors. The HPDs assessed in this study did not preserve localization ability under most stimulus conditions.
- Published
- 2007
- Full Text
- View/download PDF
47. Effects of intensive voice treatment (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson disease: acoustic and perceptual findings.
- Author
-
Sapir S, Spielman JL, Ramig LO, Story BH, and Fox C
- Subjects
- Aged, Aged, 80 and over, Female, Humans, Male, Middle Aged, Phonetics, Reproducibility of Results, Speech, Speech Acoustics, Speech Therapy standards, Treatment Outcome, Dysarthria etiology, Dysarthria therapy, Parkinson Disease complications, Speech Perception, Speech Therapy methods
- Abstract
Purpose: To evaluate the effects of intensive voice treatment targeting vocal loudness (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson's disease (PD)., Method: A group of individuals with PD receiving LSVT (n = 14) was compared to a group of individuals with PD not receiving LSVT (n = 15) and a group of age-matched healthy individuals (n = 14) on the variables vocal sound pressure level (VocSPL); various measures of the first (F1) and second (F2) formants of the vowels /i/, /u/, and /a/; vowel triangle area; and perceptual vowel ratings. The vowels were extracted from the words key, stew, and Bobby embedded in phrases. Perceptual vowel rating was performed by trained raters using a visual analog scale., Results: Only VocSPL, F2 of the vowel /u/ (F2u), and the ratio F2i/F2u significantly differed between patients and healthy individuals pretreatment. These variables, along with perceptual vowel ratings, significantly changed (improved) in the group receiving LSVT only., Conclusion: These results, along with previous findings, add further support to the generalized therapeutic impact of intensive voice treatment on orofacial functions (speech, swallowing, facial expression) and respiratory and laryngeal functions in individuals with PD.
- Published
- 2007
- Full Text
- View/download PDF
48. Time dependence of vocal tract modes during production of vowels and vowel sequences.
- Author
-
Story BH
- Subjects
- Databases, Factual, Humans, Language, Larynx physiology, Models, Biological, Phonation, Speech physiology, Voice physiology
- Abstract
Vocal tract shaping patterns based on articulatory fleshpoint data from four speakers in the University of Wisconsin x-ray microbeam (XRMB) database [J. Westbury, UW-Madison, (1994)] were determined with a principal component analysis (PCA). Midsagittal cross-distance functions representative of approximately the front 6 cm of the oral cavity for each of 11 vowels and vowel-vowel (VV) sequences were obtained from the pellet positions and the hard palate profile for the four speakers. A PCA was independently performed on each speaker's set of cross-distance functions representing static vowels only, and again with time-dependent cross-distance functions representing vowels and VV sequences. In all cases, results indicated that the first two orthogonal components (referred to as modes) accounted for more than 97% of the variance in each speaker's set of cross-distance functions. In addition, the shape of each mode was shown to be similar across the speakers suggesting that the modes represent common patterns of vocal tract deformation. Plots of the resulting time-dependent coefficient records showed that the four speakers activated each mode similarly during production of the vowel sequences. Finally, a procedure was described for using the time-dependent mode coefficients obtained from the XRMB data as input for an area function model of the vocal tract.
- Published
- 2007
- Full Text
- View/download PDF
49. Simulation and analysis of nasalized vowels based on magnetic resonance imaging data.
- Author
-
Pruthi T, Espy-Wilson CY, and Story BH
- Subjects
- Adult, Computer Simulation, Glottis physiology, Humans, Larynx anatomy & histology, Larynx physiology, Male, Paranasal Sinuses anatomy & histology, Paranasal Sinuses physiology, Sound, Vocal Cords physiology, Hearing physiology, Language, Magnetic Resonance Imaging methods, Nose physiology, Speech
- Abstract
In this study, vocal tract area functions for one American English speaker, recorded using magnetic resonance imaging, were used to simulate and analyze the acoustics of vowel nasalization. Computer vocal tract models and susceptance plots were used to study the three most important sources of acoustic variability involved in the production of nasalized vowels: velar coupling area, asymmetry of nasal passages, and the sinus cavities. Analysis of the susceptance plots of the pharyngeal and oral cavities, -(B(p)+B(o)), and the nasal cavity, B(n), helped in understanding the movement of poles and zeros with varying coupling areas. Simulations using two nasal passages clearly showed the introduction of extra pole-zero pairs due to the asymmetry between the passages. Simulations with the inclusion of maxillary and sphenoidal sinuses showed that each sinus can potentially introduce one pole-zero pair in the spectrum. Further, the right maxillary sinus introduced a pole-zero pair at the lowest frequency. The effective frequencies of these poles and zeros due to the sinuses in the sum of the oral and nasal cavity outputs changes with a change in the configuration of the oral cavity, which may happen due to a change in the coupling area, or in the vowel being articulated.
- Published
- 2007
- Full Text
- View/download PDF
50. Simulated effects of cricothyroid and thyroarytenoid muscle activation on adult-male vocal fold vibration.
- Author
-
Lowell SY and Story BH
- Subjects
- Adult, Humans, Male, Models, Biological, Sound Spectrography, Speech Acoustics, Vibration, Laryngeal Muscles physiology, Phonation physiology, Vocal Cords physiology
- Abstract
Adjustments to cricothyroid and thyroarytenoid muscle activation are critical to the control of fundamental frequency and aerodynamic aspects of vocal fold vibration in humans. The aerodynamic and physical effects of these muscles are not well understood and are difficult to study in vivo. Knowledge of the contributions of these two muscles is essential to understanding both normal and disordered voice physiology. In this study, a three-mass model for voice simulation in adult males was used to produce systematic changes to cricothyroid and thyroarytenoid muscle activation levels. Predicted effects on fundamental frequency, aerodynamic quantities, and physical quantities of vocal fold vibration were assessed. Certain combinations of these muscle activations resulted in aerodynamic and physical characteristics of vibration that might increase the mechanical stress placed on the vocal fold tissue.
- Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.