8 results on '"vocal tract"'
Search Results
2. Characterisation of plosive, fricative and aspiration components in speech production
- Author
-
Jackson, Philip J. B. and Shadle, C. H.
- Subjects
571.4 ,Acoustic modelling ,Vocal tract ,Analysis - Abstract
This thesis is a study of the production of human speech sounds by acoustic modelling and signal analysis. It concentrates on sounds that are not produced by voicing (although that may be present), namely plosives, fricatives and aspiration, which all contain noise generated by flow turbulence. It combines the application of advanced speech analysis techniques with acoustic flow-duct modelling of the vocal tract, and draws on dynamic magnetic resonance image (dMRI) data of the pharyngeal and oral cavities, to relate the sounds to physical shapes. Having superimposed vocal-tract outlines on three sagittal dMRI slices of an adult male subject, a simple description of the vocal tract suitable for acoustic modelling was derived through a sequence of transformations. The vocal-tract acoustics program VOAC, which relaxes many of the assumptions of conventional plane-wave models, incorporates the effects of net flow into a one-dimensional model (viz., flow separation, increase of entropy, and changes to resonances), as well as wall vibration and cylindrical wavefronts. It was used for synthesis by computing transfer functions from sound sources specified within the tract to the far field. Being generated by a variety of aero-acoustic mechanisms, unvoiced sounds are somewhat varied in nature. Through analysis that was informed by acoustic modelling, resonance and anti-resonance frequencies of ensemble-averaged plosive spectra were examined for the same subject, and their trajectories observed during release. The anti-resonance frequencies were used to compute the place of occlusion. In vowels and voiced fricatives, voicing obscures the aspiration and frication components. So, a method was devised to separate the voiced and unvoiced parts of a speech signal, the pitch-scaled harmonic filter (PSHF), which was tested extensively on synthetic signals. Based on a harmonic model of voicing, it outputs harmonic and anharmonic signals appropriate for subsequent analysis as time series or as power spectra. By applying the PSHF to sustained voiced fricatives, we found that, not only does voicing modulate the production of frication noise, but that the timing of pulsation cannot be explained by acoustic propagation alone. In addition to classical investigation of voiceless speech sounds, VOAC and the PSHF demonstrated their practical value in helping further to characterise plosion, frication and aspiration noise. For the future, we discuss developing VOAC within an articulatory synthesiser, investigating the observed flow-acoustic mechanism in a dynamic physical model of voiced frication, and applying the PSHF more widely in the field of speech research.
- Published
- 2000
3. Dynamic measurements of speech articulators using magnetic resonance imaging
- Author
-
Mohammad, Mohammad A. S.
- Subjects
621.3994 ,Vocal tract - Abstract
Magnetic Resonance Imaging (MRI) is a common medical imaging technique that has been used for measuring the shape of the vocal tract for use in speech research. A significant drawback of using MRI in speech studies is the long scanning time. In this project, a new method was developed for increasing the temporal resolution of MR images needed for dynamic speech studies. This method works by post-processing the collected MR images, with total control of the reconstruction procedure. The subject does not need to be phonetically trained and is not involved in the measurement procedure, which provides a more natural speech environment. By reconstructing the collected images, multi-planar sequential frames are generated that represent the changing shape of the vocal tract during a short utterance. These frames provide much-needed information about the dynamics of articulatory motion and demonstrate the usage of MRI for dynamic speech studies. Qualitative observation of three-dimensional articulatory motion is possible using Virtual Reality and grey-scale representation of vocal tract cavity. Quantitative data were drawn from the frames generating measurements of articulatory motion for the tongue, velum and jaw. This makes Southampton Dynamic MRI (SDMRI) method capable of producing results comparable to the data collected using a variety of other measurement techniques. The quantitative multi-planar data produced by SDMRI method can be used to achieve a better model of the speech production system.
- Published
- 1999
4. The development of an enhanced electropalatography system for speech research
- Author
-
Chiu, Wilson Sien Chun
- Subjects
621.3994 ,Vocal tract ,Acoustics - Abstract
To understand how speech is produced by individual human beings, it is fundamentally important to be able to determine exactly the three-dimensional shape of the vocal tract. The vocal tract is inaccessible so its exact form is difficult to determine with live subjects. There is a wide variety of methods that provide information on the vocal tract shape. The technique of Electropalatography (EPG) is cheap, relatively simple, non-invasive and highly informative. Using EPG on its own, it is possible to deduce information about the shape, movement and position of tongue-palate contact during continuous speech. However, data provided by EPG is in the form of a two-dimensional representation in which all absolute positional information is lost. This thesis describe the development of an enhanced Electropalatography (eEPG) system, which retains most of the advantages of EPG while overcoming some of the disadvantages by representing the three-dimensional (3D) shape of the palate. The eEPG system uses digitised palate shape data to display the tongue-palate contact pattern in 3D. The 3D palate shape is displayed on a Silicon Graphics workstation as a surface made up of polygons represented by a quadrilateral mesh. EPG contact patterns are superimposed onto the 3D palate shape by displaying the relevant polygons in a different colour. By using this system, differences in shape between individual palates, apparent on visual inspection of the actual palates, are also apparent in the image on screen. The contact patterns can be related more easily to articulatory features such as the alveolar ridge since the ridge is visible on the 3D display. Further, methods have been devised for computing absolute distances along paths lying on the palate surface. Combining this with calibrated palate shape data allows measurements accurate to 1 mm to be made between contact locations on the palate shape. These have been validated with manual measurements. The sampling rate for EPG is 100Hz and the data rate is equivalent to 62 bits per 10ms. In the past few years, some coding (parameterization) methods have been introduced to try to reduce the amount of data while retaining the important aspects. Feature coding methods are proposed here and several parameters are investigated, expressed in terms of both conventional measures such as row number, and in absolute measures of distance and area (i.e. mm and mm2). Features studied include location of constriction and degree of constriction. Finally, in order to reduce the amount of data while retaining the spatial information, composite frames that represent a series of EPG frames are computed. Measures of goodness of the composite frames that do and do not use 3D data are described. Some example are given in which fricative data has been processed by generating a composite frame for the entire fricative, and computing an area estimate for each row of the composite frame using the assumption of a flat tongue. This thesis demonstrates the current capability and inherent flexibility of the enhanced electropalatography system. In the future, the eEPG system can be extended to compute volume estimates again using a flat tongue model. By incorporating information on the tongue surface provided by other imaging methods such as ultrasound, more accurate area and volume estimates can be obtained.
- Published
- 1995
5. Cortical encoding and decoding models of speech production
- Author
-
Chartier, Josh
- Subjects
- Neurosciences, brain, machine learning, production, sensorimotor cortex, speech, vocal tract
- Abstract
To speak is to dynamically orchestrate the movements of the articulators (jaw, tongue, lips, and larynx), which in turn generate speech sounds. It is an amazing mental and motor feat that is controlled by the brain and is fundamental for communication. Technology that could translate brain signals into speech would be transformative for people who are unable to communicate as a result of neurological impairments. This work first investigates how articulator movements that underlie natural speech production are represented in the brain. Building upon this, this work also presents a neural decoder that can synthesize audible speech from brain signals. Data to support these results were from direct cortical recordings of the human sensorimotor cortex while participants spoke natural sentences. Neural activity at individual electrodes encoded a diversity of articulatory kinematic trajectories (AKTs), each revealing coordinated articulator movements towards specific vocal tract shapes. The neural decoder was designed to leverage the kinematic trajectories encoded in the sensorimotor cortex which enhanced performance even with limited data. In closed vocabulary tests, listeners could readily identify and transcribe speech synthesized from cortical activity. These findings advance the clinical viability of using speech neuroprosthetic technology to restore spoken communication.
- Published
- 2019
6. Speech and Nonspeech Production in the Absence of the Vocal Tract
- Author
-
Thompson, Megan
- Subjects
- Bioengineering, Neurosciences, Complex tones, magnetoencephalography, Speech, Touchscreen, Vocal Tract
- Abstract
Sensory feedback plays a crucial role in speech production in both healthy individuals and in individuals with production-limited speech. However, the vast majority of research on the sensory consequences of speech production has focused on auditory feedback while relatively little is known about the role of vocal tract somatosensory feedback. The body of this dissertation investigates speech and nonspeech production in the absence of vocal tract somatosensory feedback by training subjects to use a touchscreen-based speech production platform. Contact with the touchscreen results in instant playback of a vowel or complex tone dependent on the location selected. Because the axes of the touchscreen are associated with continuous F2 and F1 frequencies, every possible vowel within a wide formant range can be produced. Participants with no initial knowledge of the mapping of screen areas to playback sounds were asked to reproduce auditory vowel or complex tonal targets. Their responses were evaluated for accuracy and consistency, and in some cases participants underwent functional neuroimaging via MEG during training. Following training, participants were capable of using the touchscreen to produce speech and nonspeech sounds in the absence of the vocal tract. Their increased accuracy and consistency as they learned to produce speech and nonspeech sounds indicates the development of new audiomotor maps, as does significant changes in their task-based functional neuroimaging over the course of training. While participants demonstrated learning in both speech and nonspeech production, the neural and behavioral differences indicated different learning processes. We hypothesize that these differences can at least partially be attributed to the presence of an existing audiomotor network for producing speech sounds. This would account for more rapid learning rates in the speech variants of the task, the presence of generalization in touchscreen-based speech production, and the neural similarities to vocal speech production in the touchscreen-produced speech sounds that presented differently in touchscreen-produced nonspeech sounds.
- Published
- 2018
7. Investigations of the acoustics of the vocal tract and vocal folds in vivo, ex vivo and in vitro
- Author
-
Hanna, Noel
- Subjects
- Excised larynx, Vocal Tract, Resonances, Straw phonation, Subglottal
- Abstract
Speech and singing are of enormous importance to human culture, yet the physics that underlies the production and control of the voice is incompletely understood, and its parameters not well known, mainly due to the difficulty of accessing them in vivo. In the simplified but well-accepted source-filter model, non-linear vocal fold oscillation produces a sound source at a fundamental frequency and its multiples, the resonances of the vocal tract filter the spectral envelope of the sound to produce voice formants. In this thesis, both source and tract properties are studied experimentally and an in vitro experiment investigates how the filter can affect the source. The control of fundamental frequency by either air supply or mechanical control parameters is investigated ex vivo using excised human larynges. All else equal, and excluding the four types of discontinuity or hysteresis observed, the fundamental frequency was found to be proportional to the square root of subglottal pressure, which has implications for singing and speech production, particularly in tonal languages. Additionally, airflow through the glottis causes a narrowing of the aryepiglottic tube and can initiate ventricular and/or aryepiglottic fold oscillation without muscular control. The acoustic impedance of the vocal tract was measured in vivo over a range of 9 octaves and 80 dB dynamic range with the glottis closed and during phonation. The frequencies, magnitudes and bandwidths were measured for the acoustic and for the mechanical resonances of the surrounding tissues. The bandwidths and the energy losses in the vocal tract that cause them were found to be five-fold higher than the visco-thermal losses of a dry, smooth rigid cylinder, and to increase during phonation. Using a simple vocal tract model and measurements during inhalation, the subglottal system resonances were also estimated. The possible effects of the filter on the source are demonstrated in an experiment on a water-filled latex vocal fold replica: changing the aero-acoustic load of the model tract by inserting a straw at the model lips changes the fundamental frequency. This result is discussed in the context of straw phonation used in speech therapy.
- Published
- 2014
8. Vocal tract interactions in woodwind performance
- Author
-
Chen, Jer-Ming
- Subjects
- Vocal Tract, Woodwind Performance, Acoustic Impedance, Clarinet, Saxophone
- Abstract
How important is the player’s vocal tract in clarinet and saxophone performance? Acoustician’s opinions have ranged from “negligible” [Backus (1985), JASA 78, 17] to “vocal tract resonance frequencies must match the frequency of the required notes” [Clinch et al. (1982), Acustica 50, 280]. Musicians’ opinions are similarly varied. To understand how the tract-reed-bore system interacts, acoustical measurements of performers’ vocal tracts during playing were made using measurement heads mounted in the mouthpieces of a clarinet and a tenor saxophone. Acoustic impedance spectra of the tenor and soprano saxophone bores were also measured for all standard fingerings, and some others. For fingerings high in the tenor saxophone’s second register, bore impedance peaks downstream decrease with increasing pitch. Above the first 2.7 octaves, peak values fall below 30 MPa.s.m-3 and this ends the standard range available to amateurs. To play the higher altissimo notes, experts produced strong vocal tract resonances upstream with impedances 10-40 MPa.s.m-3 tuned to sound the desired note. While expert saxophonists adjust their vocal tract thus for altissimo playing, inexperienced players do not, and consequently cannot produce these notes. The smoothly rising clarinet glissando solo opening Gershwin’s Rhapsody in Blue was also investigated. Partially uncovering an open finger-hole smoothly raises clarinet resonances in the lower register, allowing continuous increases in playing pitch. When pitch bending in the second (clarino) register, experienced players produced strong tract resonances with impedances up to 60 MPa.s.m-3, comparable in magnitude with those of the clarinet bore (40-50 MPa.s.m-3). Thus during the glissando, sounding pitch is controlled by smoothly varying a strong resonance in the player’s vocal tract. The phase of the reed impedance is shown to make downwards pitch bending easier than upwards. Similar vocal tract adjustments were observed on the clarinet and saxophone for other advanced techniques such as bugling and multiphonic selection. During normal playing, although experienced players produced vocal tract impedance peaks with only moderate magnitude, these peaks were adjusted systematically to frequencies about 150 Hz higher than the sounding pitch (determined by strong bore resonances). This strategy may avoid the effects of small unwanted tract-bore interactions on sounding pitch.
- Published
- 2009
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.