7,917 results on '"vocal tract"'
Search Results
2. Exploring awareness of hearing loss and ear health in Jordanian adults.
- Author
-
Gammoh, Yazan, Alasir, Rama, and Qanawati, Laila
- Subjects
- *
SENSORINEURAL hearing loss , *HEARING aid fitting , *VOCAL tract , *HEARING disorders , *VOCAL cords - Abstract
Objective: To assess the awareness about hearing loss and ear health among adults in Jordan. Methods: A cross-sectional study was conducted where a questionnaire was filled from the month of November to the month of December of the year 2023, to assess the level of awareness about hearing loss and ear health. The participants included were Jordanian adults (age ≥ 18 years) residing in the North, Middle and South of Jordan. Results: Data from 333 participants (54.1% men) were analyzed. Participants between 18 and 28 years of age comprised 29.7% of the sample population. More than half of the participants (52.6%) held a university degree. Overall percentage of correct responses was 83%. Women, postgraduate degree holders, and participants diagnosed with hearing loss had an average of 11.96±1.47, 12.65±1.59 and 11.70±1.69 correct answers, respectively. The highest correct response received (97.6%) was for: hearing aids need to fit accurately to provide the maximum benefit. Furthermore, 97% of the sample correctly acknowledged that sudden hearing loss is an emergency and requires an immediate audiological assessment. The main misconception was that a deaf–mute cannot speak because of defects in the vocal tract, with only 39.3% of the sample providing a correct response. The other two misconceptions were: cotton buds are necessary for ear cleaning and are the safest means, and that ear drops are sufficient to treat earache, with 78.1% correct responses for each statement. Participants with higher level of education had higher odds of answering the questions correctly, with limited role observed for gender, prior diagnosis of hearing loss and a family history of hearing loss. Conclusions and relevance: Majority of the adults surveyed provided a correct answer to the hearing loss and ear health survey. While most of the sample population were aware that a sudden loss of hearing is considered an emergency, only one third knew that defects in vocal cords do not play a role in deafness/muteness. The study highlights the need of public education on causes of hearing loss and measures needed to prevent the onset of hearing loss, with emphasis on methods for caring of ear health. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Inter-speaker acoustic differences of sustained vowels at varied dysarthria severities for amyotrophic lateral sclerosis.
- Author
-
Bhattacharjee, Tanuka, Vengalil, Seena, Belur, Yamini, Atchayaram, Nalini, and Ghosh, Prasanta Kumar
- Subjects
AMYOTROPHIC lateral sclerosis ,VOCAL tract ,VOWELS ,DYSARTHRIA ,STANDARD deviations - Abstract
We study inter-speaker acoustic differences during sustained vowel utterances at varied severities of Amyotrophic Lateral Sclerosis-induced dysarthria. Among source attributes, jitter and standard deviation of fundamental frequency exhibit enhanced inter-speaker differences among patients than healthy controls (HCs) at all severity levels. Though inter-speaker differences in vocal tract filter attributes at most severity levels are higher than those among HCs for close vowels /i/ and /u/, these are comparable with or lower than those among HCs for the relatively more open vowels /a/ and /o/. The differences typically increase with severity except for a few parameters for /a/ and /i/. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Analyzing fricative confusions in healthy and pathological speech using modified S-transform.
- Author
-
Roopa, S., Karjigi, Veena, and Chandrashekar, H. M.
- Subjects
CONVOLUTIONAL neural networks ,VOCAL tract ,SPEECH ,FOURIER transforms ,AIR flow - Abstract
Fricatives are a class of speech sounds that are produced when air passes through a partial constriction in the vocal tract resulting in a turbulent airflow with prominent energy in the high-frequency region. Place of constriction decides the resonances resulting in fricatives that differ in place of articulation. The present study considers three classes of fricatives namely dental, alveolar and post-alveolar. To distinguish the fricatives based on place of articulation, it is important to have a signal representation with good frequency resolution at high frequencies. The standard S-transform exhibits the varying resolution with an uncontrolled window width and exhibits good frequency resolution at low-frequencies and good time resolution at high-frequencies. Modified S-transform introduces two adjustable parameters to control the width of the Gaussian window and provides better frequency resolution at high frequencies than S-transform and suitable for classification of fricatives based on place of articulation. The classification of fricatives in normal and pathological speech is attempted by using S-transform and modified S-transform spectrograms. Experimental results show that the use of modified S-transform provides higher fricative classification accuracy of 93.4% and 50% compared to 91.7% and 44.54% by using S-transform for normal and pathological speech respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Correlating Degree of Thyroid Tilt Independent of <italic>f</italic><italic>o</italic> Control as a Mechanism for Phonatory Density with EGG and Acoustic Measures across Loudness Conditions.
- Author
-
Aaen, Mathias, Christoph, Noor, McGlashan, Julian, and Sadolin, Cathrine
- Subjects
- *
TRACHEAL cartilage , *VOCAL cords , *VOCAL tract , *SINGING instruction , *ACOUSTIC measurements , *LARYNGOPLASTY - Abstract
Traditionally, fundamental frequency increase has been viewed as largely associated with vocal fold length as a consequence of tilting the thyroid cartilage forward and downward, a so-called thyroid tilt, caused by cricothyroid muscle contraction. Recent pilot studies in singers suggest vocal fold elongation independent fromIntroduction: f o as related to a pedagogical parameter called “phonatory density,” suggesting a further discrete mechanism of the thyroid cartilage tilt related to voice quality. This study endoscopically, EGG, acoustically, and auditory perceptually explores different vocal modes in relation to degree of phonatory density independent of changes inf o across loudness and voice quality conditions. Case-control with 20 professional singers performing sustained-vowel samples (C4 males, B4 females) for 8 different voice quality conditions with different degrees of auditory-perceptual “density” while undergoing endoscopic examination and concurrent EGG and acoustic measurement. Endoscopic vocal tract assessments were blindly rated according to a 33-item systematic assessment tool and a forced consensus paradigm. MANOVA, Spearman’s rho, and factor density were calculated atMethods: p ≤ 0.05. Auditory-perceptual assessments of 64 samples of the 8 voicing conditions were performed by 33 professional singing teachers. Fleiss’ kappa and percentage agreement were used to calculate assessor accuracy and inter-rater reliability. Forward and downward thyroid tilt was related to the perceptual category of “reduced density (RD)” as the only statistically significant endoscopic assessment variable: “fuller density” conditions exhibited little to no forward visible articulation of the thyroid cartilage, whereas RD conditions exhibited visible to marked forward articulation of the thyroid cartilage across tested conditions suggesting vocal fold elongation for RD conditions while maintaining an unchangedResults: f o with high ICC for the assessors (r = 0.70 andr = 0.94 for male/female datasets, respectively). Correlation analyses revealed negative correlations for SPL, shimmer, and CPP measures for RD conditions, while Qx did not vary with statistical significance. Panel assessors accurately assessed the 8 tested conditions with 87% accuracy and good inter-rater reliability agreement (k : 0.772,p < 0.001). Phonatory density, as an auditory-perceptual denotation of vocal weight, is controlled by the degree of thyroid cartilage tilt. The study documents systematic variations in vocal fold lengths across several conditions of loudness whileConclusion: f o is maintained. The findings suggest a further mechanism of the thyroid cartilage related to voice quality beyond the control off o. Further studies are needed to document pitch production mechanisms compensating for the maintenance off o given vocal fold elongation during RD conditions. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
6. Neumann series of Bessel functions for inverse coefficient problems.
- Author
-
Çetinkaya, Fatma Ayça, Khmelnytskaya, Kira V., and Kravchenko, Vladislav V.
- Subjects
- *
INVERSE problems , *VOCAL tract , *INVERSE functions , *BESSEL functions , *COMPLEX numbers - Abstract
Consider the Sturm–Liouville equation −y′′+q(x)y=ρ2y$$ -{y}&#x0005E;{\prime \prime }&#x0002B;q(x)y&#x0003D;{\rho}&#x0005E;2y $$ with a real‐valued potential q∈L1(0,L),ρ∈ℂ,L>0$$ q\in {\mathcal{L}}_1\left(0,L\right),\rho \in \mathrm{\mathbb{C}},L>0 $$. Let u(ρ,x)$$ u\left(\rho, x\right) $$ be its solution satisfying certain initial conditions u(ρk,0)=ak,u′(ρk,0)=bk$$ u\left({\rho}_k,0\right)&#x0003D;{a}_k,{u}&#x0005E;{\prime}\left({\rho}_k,0\right)&#x0003D;{b}_k $$ for a number of ρk,k=1,2,...,K$$ {\rho}_k,k&#x0003D;1,2,\dots, K $$, where ρk,ak$$ {\rho}_k,{a}_k $$, and bk$$ {b}_k $$ are some complex numbers. Denote ℓk=u′(ρk,L)+Hu(ρk,L)$$ {\ell}_k&#x0003D;{u}&#x0005E;{\prime}\left({\rho}_k,L\right)&#x0002B; Hu\left({\rho}_k,L\right) $$, where H∈ℝ$$ H\in \mathrm{\mathbb{R}} $$. We propose a method for solving the inverse problem of the approximate recovery of the potential q(x)$$ q(x) $$ and number H$$ H $$ from the following data ρk,ak,bk,ℓkk=1K$$ {\left\{{\rho}_k,{a}_k,{b}_k,{\ell}_k\right\}}_{k&#x0003D;1}&#x0005E;K $$. In general, the problem is ill‐posed; however, it finds numerous practical applications. Such inverse problems as the recovery of the potential from a Weyl function or the inverse two‐spectra Sturm–Liouville problem are its special cases. Moreover, the inverse problem of determining the shape of a human vocal tract also reduces to the considered inverse problem. The proposed method is based on special Neumann series of Bessel functions representations for solutions of Sturm–Liouville equations. With their aid the problem is reduced to the classical inverse Sturm–Liouville problem of recovering q(x)$$ q(x) $$ from two spectra, which is solved again with the help of the same representations. The overall approach leads to an efficient numerical algorithm for solving the inverse problem. Its numerical efficiency is illustrated by several examples. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Comparative anatomy of the vocal apparatus in bats and implications for the diversity of laryngeal echolocation.
- Author
-
Brualla, Nicolas L M, Wilson, Laura A B, Tu, Vuong Tan, Nojiri, Taro, Carter, Richard T, Ngamprasertwong, Thongchai, Wannaprasert, Thanakul, Doube, Michael, Fukui, Dai, and Koyabu, Daisuke
- Subjects
- *
X-ray computed microtomography , *COMPARATIVE anatomy , *LARYNGEAL muscles , *VOCAL tract , *MUSCULAR hypertrophy , *LARYNX - Abstract
Most of over 1400 extant bat species produce high-frequency pulses with their larynx for echolocation. However, the debate about the evolutionary origin of laryngeal echolocation in bats remains unresolved. The morphology of the larynx is known to reflect vocal adaptation and thus can potentially help in resolving this controversy. However, the morphological variations of the larynx are poorly known in bats, and a complete anatomical study remains to be conducted. Here, we compare the 3D laryngeal morphology of 23 extant bat species of 11 different families reconstructed by using iodine contrast-enhanced X-ray microtomography techniques. We find that, contrary to previously thought, laryngeal muscle hypertrophy is not a characteristic of all bats and presents differential development. The larynges of Pteropodidae are morphologically similar to those of non-bat mammals. Two morphotypes are described among laryngeal echolocating bats, illustrating morphological differences between Rhinolophoidea and Yangochiroptera, with the main variations being the cricothyroid muscle volume and the shape of the cricoid and thyroid cartilages. For the first time we detail functional specialization for constant frequency echolocation among Rhinolophoidea. Lastly, the nasal-emitting taxa representing a polyphyletic group do not share the same laryngeal form, which raises questions about the potential modular nature of the bat larynx. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Sound source locations and their roles in Japanese voiceless "glottal" fricative production.
- Author
-
Yoshinaga, Tsukasa, Maekawa, Kikuo, and Iida, Akiyoshi
- Subjects
- *
VOCAL tract , *VOCAL cords , *MAGNETIC resonance imaging , *JAPANESE people , *SOFT palate - Abstract
Although [h] is described as a glottal fricative, it has never been demonstrated whether [h] has its source exclusively at the glottis. In this study, sound source locations and their influence on sound amplitudes were investigated by conducting mechanical experiments and airflow simulations. Vocal tract data of [h] were obtained in three phonemic contexts from two native Japanese subjects using three-dimensional static magnetic resonance imaging (MRI). Acrylic vocal tract replicas were constructed, and the sound was reproduced by supplying airflow to the vocal tracts with adducted or abducted vocal folds. The sound source locations were estimated by solving the Navier–Stokes equations. The results showed that the amplitudes of sounds produced by the vocal tracts with an open glottis were in a similar range (±3 dB) to those with a glottal gap of 3 mm in some contexts. The sound sources in these cases were observed in the pharyngeal cavity or near the soft palate. Similar degrees of oral constrictions were observed in the real-time MRI, indicating that the sound traditionally described as [h] is produced, at least in some contexts, with sound sources of turbulent flow generated by a supralaryngeal constriction of the following vowel. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Effects of a Straw Phonation on Acoustic and Self-Reported Measures of Adolescent Female Singers: A Pilot Study.
- Author
-
Manternach, Jeremy N., Clark, Chad, and Sweet, Bridget
- Subjects
VOCAL tract ,TEENAGE girls ,WOMEN singers ,WARMUP ,RESEARCH personnel ,STRAW - Abstract
Characteristics of adolescent female voice change include breathiness, inconsistent pitch, "cracks," abrupt register transitions, vocal range changes, and decreased stamina. Researchers have found that semi-occluded vocal tract exercises (e.g. straw phonation) can assist with such difficulties with other varied populations, facilitating glottal closure, decreasing breathiness, and encouraging easier voicing. Therefore, the purpose of this investigation was to measure the effects of straw phonation (experimental) compared to "ah" vowel (control) warm-ups on acoustic and self-reported measures of seventh-grade female-identifying singers. We calculated each participant's Acoustic Voice Quality Index (AVQI) prior to and after a 4–5-minute straw phonation (n = 6) or unoccluded "ah" vowel (n = 6) warm-up. Results indicated robust improvement in AVQI scores after both warm-ups, with a trend toward more acoustic improvement after straw phonation (5 improved, M = 0.48, compared to 4, M = 0.35). All participants self-reported that their respective voicing helped them to be more warmed up, but the effect was statistically much larger in the straw group (7.23 to 5.00, 10-point scale). Some participants self-reported that straw phonation was more effective than their typical warm-up. These results may indicate more robust benefits from straw phonation, which could facilitate increased motivation during a difficult transition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Speaking to a metronome reduces kinematic variability in typical speakers and people who stutter.
- Author
-
Wiltshire, Charlotte E. E., Cler, Gabriel J., Chiew, Mark, Freudenberger, Jana, Chesters, Jennifer, Healy, Máiréad P., Hoole, Philip, and Watkins, Kate E.
- Subjects
- *
VOCAL tract , *SPEECH , *STUTTERING , *METRONOME , *MAGNETIC resonance imaging - Abstract
Background: Several studies indicate that people who stutter show greater variability in speech movements than people who do not stutter, even when the speech produced is perceptibly fluent. Speaking to the beat of a metronome reliably increases fluency in people who stutter, regardless of the severity of stuttering. Objectives: Here, we aimed to test whether metronome-timed speech reduces articulatory variability. Method: We analysed vocal tract MRI data from 24 people who stutter and 16 controls. Participants repeated sentences with and without a metronome. Midsagittal images of the vocal tract from lips to larynx were reconstructed at 33.3 frames per second. Any utterances containing dysfluencies or non-speech movements (e.g. swallowing) were excluded. For each participant, we measured the variability of movements (coefficient of variation) from the alveolar, palatal and velar regions of the vocal tract. Results: People who stutter had more variability than control speakers when speaking without a metronome, which was then reduced to the same level as controls when speaking with the metronome. The velar region contained more variability than the alveolar and palatal regions, which were similar. Conclusions: These results demonstrate that kinematic variability during perceptibly fluent speech is increased in people who stutter compared with controls when repeating naturalistic sentences without any alteration or disruption to the speech. This extends our previous findings of greater variability in the movements of people who stutter when producing perceptibly fluent nonwords compared with controls. These results also show, that in addition to increasing fluency in people who stutter, metronome-timed speech also reduces articulatory variability to the same level as that seen in control speakers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Neural adaptation to changes in self-voice during puberty.
- Author
-
Pinheiro, Ana P., Aucouturier, Jean-Julien, and Kotz, Sonja A.
- Subjects
- *
VOCAL tract , *NEUROPLASTICITY , *NEURAL development , *HUMAN voice , *SOCIAL networks - Abstract
In adolescence, one's own voice changes significantly due to a surge of pubertal hormones, resulting in a distinctive voice signature. A person's unique voice signature signals one's own individuality and becomes increasingly relevant as social networks expand. While these puberty-related changes contribute to the development of a unique voice signature, they also initiate a sensitive period of voice monitoring. We propose that, together with hormonal changes, the protracted development of brain regions engaged in voice monitoring and a dynamically changing social environment might affect how the self-voice and others' voices are discriminated. A socioneuroendocrine framework is needed to comprehensively examine how we perceive and differentiate ourselves through our voice as well as how alterations in these capacities can lead to pathologies related to self–other distinction. The human voice is a potent social signal and a distinctive marker of individual identity. As individuals go through puberty, their voices undergo acoustic changes, setting them apart from others. In this article, we propose that hormonal fluctuations in conjunction with morphological vocal tract changes during puberty establish a sensitive developmental phase that affects the monitoring of the adolescent voice and, specifically, self–other distinction. Furthermore, the protracted maturation of brain regions responsible for voice processing, coupled with the dynamically evolving social environment of adolescents, likely disrupts a clear differentiation of the self-voice from others' voices. This socioneuroendocrine framework offers a holistic understanding of voice monitoring during adolescence. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Physiological Studies of the Palatopharyngeal Muscle as a Speech Muscle in the Adjustment of Velar Position during Speech Production.
- Author
-
Taro Komachi, Hideto Saigusa, Satoshi Yamaguchi, Osamu Kadosono, Hiroyuki Ito, and Kimihiro Okubo
- Subjects
- *
FRICATIVES (Phonetics) , *LARYNGEAL muscles , *VOCAL tract , *VOCAL cords , *CHROMIUM alloys , *PHARYNGEAL muscles - Published
- 2024
- Full Text
- View/download PDF
13. Screening of voice and vocal tract changes in professional wind instrument players.
- Author
-
El-Demerdash, Ahmed M., Hafez, Nirvana G., Tanyous, Hanaa N., Rezk, Kerollos M., and Shadi, Mariam S.
- Subjects
- *
WIND instrument players , *VOCAL tract , *VOCAL cords , *WIND instruments , *FATIGUE (Physiology) , *VOICE disorders - Abstract
Purpose: Playing wind instruments is a strenuous task on the larynx, predisposing players to voice disorders. This study aims to evaluate potential vocal symptoms and vocal tract alterations in professional wind instrumentalists. Methods: In this cross-sectional study, 26 male military subjects were interviewed, completed the voice handicap index (VHI) -10 questionnaire, and subjected to auditory-perceptual assessment, neck examination, rigid laryngostroboscopy and flexible nasofiberoscopy both before and during instrument playing. Results: All participants had vocal fatigue symptoms, around one-quarter complained of voice change, one-quarter complained of shortness of breath while or after performing, and one-third complained of neck symptoms. The average score of VHI-10 was 16.2 ± 6.5, and approximately three-quarters of participants scored above the cut-off point. There were no significant correlations between age, years of instrument playing, average hours of daily practice, and VHI-10. Participants with neck symptoms had significantly higher VHI-10 scores. Those (around one-fifth) with an external neck swelling during Valsalva maneuver had a significantly higher VHI-10 score. Dysphonia, mainly mild and of strained, leaky quality, was detected in almost one-third of participants. While the instrument was being played, the vocal folds were somewhat adducted, and the vocal tract became more compressed as the task became more demanding. The most frequent observations in the vocal tract examination were hyperemia of the vocal folds or all over the laryngeal and pharyngeal mucosa, excessive secretions over the vocal folds, signs of hyperadduction, arytenoid edema, and phonatory waste. Conclusion: Wind instrumentalists frequently experience voice disorders, which necessitate further care and investigation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. A practical guide to calculating vocal tract length and scale-invariant formant patterns.
- Author
-
Anikin, Andrey, Barreda, Santiago, and Reby, David
- Subjects
- *
LINEAR predictive coding , *VOCAL tract , *LENGTH measurement , *MISSING data (Statistics) , *BODY size - Abstract
Formants (vocal tract resonances) are increasingly analyzed not only by phoneticians in speech but also by behavioral scientists studying diverse phenomena such as acoustic size exaggeration and articulatory abilities of non-human animals. This often involves estimating vocal tract length acoustically and producing scale-invariant representations of formant patterns. We present a theoretical framework and practical tools for carrying out this work, including open-source software solutions included in R packages soundgen and phonTools. Automatic formant measurement with linear predictive coding is error-prone, but formant_app provides an integrated environment for formant annotation and correction with visual and auditory feedback. Once measured, formants can be normalized using a single recording (intrinsic methods) or multiple recordings from the same individual (extrinsic methods). Intrinsic speaker normalization can be as simple as taking formant ratios and calculating the geometric mean as a measure of overall scale. The regression method implemented in the function estimateVTL calculates the apparent vocal tract length assuming a single-tube model, while its residuals provide a scale-invariant vowel space based on how far each formant deviates from equal spacing (the schwa function). Extrinsic speaker normalization provides more accurate estimates of speaker- and vowel-specific scale factors by pooling information across recordings with simple averaging or mixed models, which we illustrate with example datasets and R code. The take-home messages are to record several calls or vowels per individual, measure at least three or four formants, check formant measurements manually, treat uncertain values as missing, and use the statistical tools best suited to each modeling context. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. The effect of sexual orientation on voice acoustic properties.
- Author
-
Holmes, Luke, Rieger, Gerulf, and Paulmann, Silke
- Subjects
SEXUAL orientation ,HOMOSEXUALITY ,GAY men ,LESBIANS ,VOCAL tract - Abstract
Introduction: Previous research has investigated sexual orientation differences in the acoustic properties of individuals' voices, often theorizing that homosexuals of both sexes would have voice properties mirroring those of heterosexuals of the opposite sex. Findings were mixed, but many of these studies have methodological limitations including small sample sizes, use of recited passages instead of natural speech, or grouping bisexual and homosexual participants together for analyses. Methods: To address these shortcomings, the present study examined a wide range of acoustic properties in the natural voices of 142 men and 175 women of varying sexual orientations, with sexual orientation treated as a continuous variable throughout. Results: Homosexual men had less breathy voices (as indicated by a lower harmonics-to-noise ratio) and, contrary to our prediction, a lower voice pitch and narrower pitch range than heterosexual men. Homosexual women had lower F4 formant frequency (vocal tract resonance or so-called overtone) in overall vowel production, and rougher voices (measured via jitter and spectral tilt) than heterosexual women. For those sexual orientation differences that were statistically significant, bisexuals were in-between heterosexuals and homosexuals. No sexual orientation differences were found in formants F1-F3, cepstral peak prominence, shimmer, or speech rate in either sex. Discussion: Recommendations for future "natural voice" investigations are outlined. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Method for testing the stability of an autoregressive model of the vocal tract and adjusting its parameters.
- Author
-
Savchenko, V. V. and Savchenko, L. V.
- Subjects
- *
VOCAL tract , *AUTOREGRESSIVE models , *DATA compression , *IMPULSE response , *SPEECH synthesis , *AUTOMATIC speech recognition - Abstract
Within the framework of the traditional scope of investigations in the field of acoustic measurements, we consider an autoregressive model of the vocal tract, which is a key link in the speech apparatus of human beings. We mention the existence of an urgent problem of guaranteeing the stability of the autoregressive model in systems with adaptation of their parameters to the observed speech signals of short duration. To overcome this difficulty, we pose the problem of testing the stability of the autoregressive model and adjustment of its parameters according to the results of testing. The required investigations are based on the original authors' technique of the formant analysis of vowel sounds of speech via the synthesis of a recursive shaping filter in the mode of free oscillations. For the solution of the posed problem, we propose a procedure aimed at testing the stability of the autoregressive model of vocal tract and adjustment of its parameters. The method is based on a two-stage algorithm of transformation of the autoregressive model of vocal tract. In the first stage of transformation, the stability of the autoregressive model is checked according to the impulse response of the shaping filter. In the second stage, if the stability of the autoregressive model is violated, its impulse response is modified as a result of the element-by-element multiplication by a variable exponential quantity asymptotically convergent to zero. We develop a regular algorithm for recalculating a modified impulse response into the adjusted vector of autoregressive parameters in the second stage of transformation. According to the results of experimental verification of the proposed method, we make a conclusion that the guaranteed stability of the autoregressive model of the vocal tract is attained with minimal distortions in the frequency domain. The obtained results can be useful for the development and improvement of the systems of automatic speech recognition, digital speech communications, artificial intelligence, and other information systems based on the use of data compression and speech encoding according to the autoregressive model of the vocal tract in the course of automatic processing of speech signals. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. The content of reindeer male vocalisations: acoustic cues to age and size.
- Author
-
Puch, Laura, Weladji, Robert B., Holand, Øystein, and Kumpula, Jouko
- Subjects
- *
VOCAL tract , *BIOACOUSTICS , *SEXUAL selection , *BODY size , *BODY weight - Abstract
Some acoustic parameters of animal vocalisations have been shown to reliably indicate male quality and play a role in mate and rival assessment. Reindeer possess a peculiar vocal tract anatomy involving a laryngeal air sac which probably acts as an additional filter, making it a candidate species for novel investigations in the field of bioacoustics. We investigated whether some acoustic parameters of male rutting vocalisations were good indicators of age and body weight (used as an index for body size). We did this by performing acoustic analyses using recordings collected from a semi-domesticated reindeer population in northern Finland. We found the age of subadult males (aged 2.5–4.5 years) to be negatively correlated with formant F3 and formant spacing, suggesting that their vocalisations convey information on the caller's age. Individual formant frequencies were not affected by male body weight, but formant spacing was lower in heavier males. Despite the presence of the laryngeal air sac, formant spacing seems to be an acoustic parameter influencing mate and rival assessment in reindeer as it gives an honest indication of male body size. We discuss the importance of reliable acoustic cues to quality indices in sexual selection contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Correcting the record: Phonetic potential of primate vocal tracts and the legacy of Philip Lieberman (1934−2022).
- Author
-
Ekström, Axel G.
- Subjects
- *
HOMINIDS , *VOCAL tract , *RHESUS monkeys , *ARTICULATION (Speech) , *SPEECH - Abstract
The phonetic potential of nonhuman primate vocal tracts has been the subject of considerable contention in recent literature. Here, the work of Philip Lieberman (1934−2022) is considered at length, and two research papers—both purported challenges to Lieberman's theoretical work—and a review of Lieberman's scientific legacy are critically examined. I argue that various aspects of Lieberman's research have been consistently misinterpreted in the literature. A paper by Fitch et al. overestimates the would‐be "speech‐ready" capacities of a rhesus macaque, and the data presented nonetheless supports Lieberman's principal position—that nonhuman primates cannot articulate the full extent of human speech sounds. The suggestion that no vocal anatomical evolution was necessary for the evolution of human speech (as spoken by all normally developing humans) is not supported by phonetic or anatomical data. The second challenge, by Boë et al., attributes vowel‐like qualities of baboon calls to articulatory capacities based on audio data; I argue that such "protovocalic" properties likely result from disparate articulatory maneuvers compared to human speakers. A review of Lieberman's scientific legacy by Boë et al. ascribes a view of speech evolution (which the authors term "laryngeal descent theory") to Lieberman, which contradicts his writings. The present article documents a pattern of incorrect interpretations of Lieberman's theoretical work in recent literature. Finally, the apparent trend of vowel‐like formant dispersions in great ape vocalization literature is discussed with regard to Lieberman's theoretical work. The review concludes that the "Lieberman account" of primate vocal tract phonetic capacities remains supported by research: the ready articulation of fully human speech reflects species‐unique anatomy. Highlights: The work by phonetician and cognitive scientist Philip Lieberman on the phonetic capacities of nonhuman primates has been challenged in recent years.None of these challenges seriously dispute the core tenet of Lieberman's claims that nonhuman primates cannot articulate the full extent of human speech sounds, resulting from species' limitations on articulatory anatomy.Misunderstandings of Lieberman's work are seemingly widespread in the literature and point to a critical dearth of knowledge of human speech production and articulation in the fields of primatology and bioacoustics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Speaker-independent speech inversion for recovery of velopharyngeal port constriction degreea).
- Author
-
Siriwardena, Yashish M., Boyce, Suzanne E., Tiede, Mark K., Oren, Liran, Fletcher, Brittany, Stern, Michael, and Espy-Wilson, Carol Y.
- Subjects
- *
VOCAL tract , *SPEECH , *AMERICAN English language , *NASAL cavity , *CONSONANTS , *AUTOMATIC speech recognition - Abstract
For most of his illustrious career, Ken Stevens focused on examining and documenting the rich detail about vocal tract changes available to listeners underlying the acoustic signal of speech. Current approaches to speech inversion take advantage of this rich detail to recover information about articulatory movement. Our previous speech inversion work focused on movements of the tongue and lips, for which "ground truth" is readily available. In this study, we describe acquisition and validation of ground-truth articulatory data about velopharyngeal port constriction, using both the well-established measure of nasometry plus a novel technique—high-speed nasopharyngoscopy. Nasometry measures the acoustic output of the nasal and oral cavities to derive the measure nasalance. High-speed nasopharyngoscopy captures images of the nasopharyngeal region and can resolve velar motion during speech. By comparing simultaneously collected data from both acquisition modalities, we show that nasalance is a sufficiently sensitive measure to use as ground truth for our speech inversion system. Further, a speech inversion system trained on nasalance can recover known patterns of velopharyngeal port constriction shown by American English speakers. Our findings match well with Stevens' own studies of the acoustics of nasal consonants. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Vocal tract dynamics shape the formant structure of conditioned vocalizations in a harbor seal.
- Author
-
Goncharova, Maria, Jadoul, Yannick, Reichmuth, Colleen, Fitch, W. Tecumseh, and Ravignani, Andrea
- Subjects
- *
HARBOR seal , *VOCAL tract , *SOFT palate , *SOUNDS , *TONGUE - Abstract
Formants, or resonance frequencies of the upper vocal tract, are an essential part of acoustic communication. Articulatory gestures—such as jaw, tongue, lip, and soft palate movements—shape formant structure in human vocalizations, but little is known about how nonhuman mammals use those gestures to modify formant frequencies. Here, we report a case study with an adult male harbor seal trained to produce an arbitrary vocalization composed of multiple repetitions of the sound wa. We analyzed jaw movements frame‐by‐frame and matched them to the tracked formant modulation in the corresponding vocalizations. We found that the jaw opening angle was strongly correlated with the first (F1) and, to a lesser degree, with the second formant (F2). F2 variation was better explained by the jaw angle opening when the seal was lying on his back rather than on the belly, which might derive from soft tissue displacement due to gravity. These results show that harbor seals share some common articulatory traits with humans, where the F1 depends more on the jaw position than F2. We propose further in vivo investigations of seals to further test the role of the tongue on formant modulation in mammalian sound production. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Prospectively accelerated dynamic speech magnetic resonance imaging at 3 T using a self‐navigated spiral‐based manifold regularized scheme.
- Author
-
Rusho, Rushdi Zahid, Ahmed, Abdul Haseeb, Kruger, Stanley, Alam, Wahidul, Meyer, David, Howard, David, Story, Brad, Jacob, Mathews, and Lingala, Sajan Goud
- Subjects
MAGNETIC resonance imaging ,SPEECH ,VOCAL tract ,FINITE differences ,LAPLACIAN matrices - Abstract
This work develops and evaluates a self‐navigated variable density spiral (VDS)‐based manifold regularization scheme to prospectively improve dynamic speech magnetic resonance imaging (MRI) at 3 T. Short readout duration spirals (1.3‐ms long) were used to minimize sensitivity to off‐resonance. A custom 16‐channel speech coil was used for improved parallel imaging of vocal tract structures. The manifold model leveraged similarities between frames sharing similar vocal tract postures without explicit motion binning. The self‐navigating capability of VDS was leveraged to learn the Laplacian structure of the manifold. Reconstruction was posed as a sensitivity‐encoding–based nonlocal soft‐weighted temporal regularization scheme. Our approach was compared with view‐sharing, low‐rank, temporal finite difference, extra dimension‐based sparsity reconstruction constraints. Undersampling experiments were conducted on five volunteers performing repetitive and arbitrary speaking tasks at different speaking rates. Quantitative evaluation in terms of mean square error over moving edges was performed in a retrospective undersampling experiment on one volunteer. For prospective undersampling, blinded image quality evaluation in the categories of alias artifacts, spatial blurring, and temporal blurring was performed by three experts in voice research. Region of interest analysis at articulator boundaries was performed in both experiments to assess articulatory motion. Improved performance with manifold reconstruction constraints was observed over existing constraints. With prospective undersampling, a spatial resolution of 2.4 × 2.4 mm2/pixel and a temporal resolution of 17.4 ms/frame for single‐slice imaging, and 52.2 ms/frame for concurrent three‐slice imaging, were achieved. We demonstrated implicit motion binning by analyzing the mechanics of the Laplacian matrix. Manifold regularization demonstrated superior image quality scores in reducing spatial and temporal blurring compared with all other reconstruction constraints. While it exhibited faint (nonsignificant) alias artifacts that were similar to temporal finite difference, it provided statistically significant improvements compared with the other constraints. In conclusion, the self‐navigated manifold regularized scheme enabled robust high spatiotemporal resolution dynamic speech MRI at 3 T. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. "The Only Constant is Change".
- Author
-
Guettlein, Miranda
- Subjects
AUTODIDACTICISM ,LITERATURE reviews ,VOCAL tract ,ANCIENT philosophers ,VOICE actors & actresses - Abstract
The article discusses the concept that "the only constant is change," often applied to linguistic evolution. It highlights the importance of adapting language to be inclusive and reflective of diverse perspectives, especially in voice and speech practices. The issue features research on vocal techniques, voice disorders, and educational tools for voice optimization, emphasizing the need for interdisciplinary collaboration and prioritizing conversation over having all the right answers. The journal acknowledges the contributions of various individuals and aims to continue curating articles that propel the field forward, with a focus on young, diverse scholars and artists. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
23. Two-stage algorithm of spectral analysis for the automatic speech recognition systems
- Author
-
Savchenko, V. V. and Savchenko, L. V.
- Published
- 2024
- Full Text
- View/download PDF
24. Vocal Tract Acoustic Measurements for Detection of Pathological Voice Disorders.
- Author
-
Mishra, Jyoti and Sharma, R. K.
- Subjects
- *
VOICE disorders , *VOCAL tract , *ACOUSTIC measurements , *COMMUNICATIVE disorders , *LARYNGEAL cancer - Abstract
Human voice is an important signal that, if analyzed cautiously, will reveal various disorders and diseases of the human body. Vocal profiling is emerging as a new technique for assessment of voice disorders. This work has focused mainly on vocal tract acoustic measurements to detect voice disorders, especially dysphonia. Dysphonia is a communication disorder that, if not detected at an early stage, may lead to serious complications like laryngeal cancer thus affecting health and quality of life. The voice recordings of 52 subjects (26 dysphonic and 26 healthy) were taken from Saarbrucken Voice Database, 28 live samples (14 dysphonic and 14 healthy) were recorded using CSL 4500 tool and 169 subjects (111 dysphonic and 58 healthy) were taken from VOICED Database. These voice samples were used to extract five acoustic parameters: fundamental frequency, jitter, shimmer, pitch, and formants. The results obtained had an accuracy of around 85% which confirms the potentiality of these fundamental parameters in the successful detection of dysphonia as well as other voice disorders. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Amplitude spectrum correction to improve speech signal classification quality.
- Author
-
Gmyrek, Stanislaw, Hossa, Robert, and Makowski, Ryszard
- Subjects
- *
SPECTRUM analysis , *SPEECH perception , *SIGNAL classification , *AMPLITUDE modulation , *VOCAL tract - Abstract
The speech signal can be described by three key elements: the excitation signal, the impulse response of the vocal tract, and a system that represents the impact of speech production through human lips. The primary carrier of semantic content in speech is primarily influenced by the characteristics of the vocal tract. Nonetheless, when it comes to parameterization coefficients, the irregular periodicity of the glottal excitation is a significant factor that leads to notable variations in the values of the feature vectors, resulting in disruptions in the amplitude spectrum with the appearance of ripples. In this study, a method is suggested to mitigate this phenomenon. To achieve this goal, inverse filtering was used to estimate the excitation and transfer functions of the vocal tract. Subsequently, using the derived parameterisation coefficients, statistical models for individual Polish phonemes were established as mixtures of Gaussian distributions. The impact of these corrections on the classification accuracy of Polish vowels was then investigated. The proposed modification of the parameterisation method fulfils the expectations, the scatter of feature vector values was reduced. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Cortical tracking of visual rhythmic speech by 5‐ and 8‐month‐old infants: Individual differences in phase angle relate to language outcomes up to 2 years.
- Author
-
Choisdealbha, Áine Ní, Attaheri, Adam, Rocha, Sinead, Mead, Natasha, Olawole‐Scott, Helen, Alfaro e Oliveira, Maria, Brough, Carmel, Brusini, Perrine, Gibbon, Samuel, Boutris, Panagiotis, Grey, Christina, Williams, Isabel, Flanagan, Sheila, and Goswami, Usha
- Subjects
- *
SPEECH , *INDIVIDUAL differences , *VOCAL tract , *INFANTS , *PROSODIC analysis (Linguistics) , *LANGUAGE acquisition - Abstract
It is known that the rhythms of speech are visible on the face, accurately mirroring changes in the vocal tract. These low‐frequency visual temporal movements are tightly correlated with speech output, and both visual speech (e.g., mouth motion) and the acoustic speech amplitude envelope entrain neural oscillations. Low‐frequency visual temporal information ('visual prosody') is known from behavioural studies to be perceived by infants, but oscillatory studies are currently lacking. Here we measure cortical tracking of low‐frequency visual temporal information by 5‐ and 8‐month‐old infants using a rhythmic speech paradigm (repetition of the syllable 'ta' at 2 Hz). Eye‐tracking data were collected simultaneously with EEG, enabling computation of cortical tracking and phase angle during visual‐only speech presentation. Significantly higher power at the stimulus frequency indicated that cortical tracking occurred across both ages. Further, individual differences in preferred phase to visual speech related to subsequent measures of language acquisition. The difference in phase between visual‐only speech and the same speech presented as auditory‐visual at 6‐ and 9‐months was also examined. These neural data suggest that individual differences in early language acquisition may be related to the phase of entrainment to visual rhythmic input in infancy. Research Highlights: Infant preferred phase to visual rhythmic speech predicts language outcomes.Significant cortical tracking of visual speech is present at 5 and 8 months.Phase angle to visual speech at 8 months predicted greater receptive and productive vocabulary at 24 months. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Growth and development of epiglottis and preepiglottic space of larynx as it acquires vocal tract.
- Author
-
Kiminori Sato, Shun-ichi Chitose, Kiminobu Sato, Fumihiko Sato, Takeharu Ono, and Hirohito Umeno
- Subjects
- *
VOCAL tract , *EPIGLOTTIS , *LARYNX , *TRACHEAL cartilage , *CONNECTIVE tissues , *ADIPOSE tissues - Abstract
Objectives: The growth and development of the epiglottis and preepiglottic space (PES) of the human larynx as it acquires the vocal tract were investigated. Methods: Three newborns, one infant, four children (2, 7, 8, and 12 years old), and two adult normal larynges were investigated and compared using the whole organ serial section technique. Results: The newborn PES occupied a small area just anterior to the epiglottis. It was composed of immature adipose tissue and areolar tissue. The epiglottis lay on a somewhat horizontal axis and is partially obscured behind the hyoid bone. The hyoid bone overlapped the thyroid cartilage, partially obscuring the superior thyroid notch. The newborn epiglottic cartilage was immature elastic cartilage, and the elastic fiber component was sparse. In the first 8 years of life, as the PES grew, the PES was located not only anterior to but also posterolateral and inferolateral to the epiglottic cartilage and thyroepiglottic ligament. Meanwhile, the epiglottic cartilage matured. Conclusions: In order to develop the vocal tract for speech production, it is reported that the human larynx descends as the child grows in the first 9 years of life. This study showed that the PES, occupying a small area just anterior to the epiglottis, grew and existed astride the epiglottis as the larynx descended and the vocal tract developed. Consequently, its distribution allows the epiglottis to more effectively play the role of retroflection during swallowing in order to prevent aspiration. The human speech faculty likely develops in conjunction with swallowing physiology. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Therapeutic Singing and Semi-Occluded Vocal Tract Exercises for Individuals with Parkinson's Disease: A Randomized Controlled Trial of a Single Session Intervention.
- Author
-
Lee, Sun Joo, Dvorak, Abbey L, and Manternach, Jeremy N
- Subjects
VOCAL tract ,PARKINSON'S disease ,RANDOMIZED controlled trials ,SINGING ,ARTICULATION (Speech) ,VISUAL analog scale - Abstract
Individuals with Parkinson's disease (PD) experience speech and voice-related symptoms that diminish communication and quality of life. Semi-occluded vocal tract (SOVT) exercises are targeted interventions that, when combined with the positive psychosocial benefits of therapeutic group singing (TGS), may affect outcomes. The purpose of this study was to explore the effectiveness of SOVT exercises, specifically straw phonation combined with TGS, to improve voice quality and mood for individuals with PD. We used a true experimental pretest–posttest between-subjects design (i.e. randomized controlled trial) facilitated by a board-certified music therapist. All participants (N = 27) were randomly assigned to one of three groups (a) straw phonation combined with TGS (SP + TGS, n = 10), (b) TGS (n = 10), and (c) speaking-only control group (n = 7). Participants completed voice recordings for acoustic measures and the Visual Analogue Mood Scale for mood analysis before and after a 30-min intervention. The results demonstrated significant improvement in voice quality evidenced by decreasing Acoustic Voice Quality Index scores following a single session for both SP + TGS and TGS intervention groups when compared to the control. Happiness scores improved in the experimental groups when compared to control. Although not statistically significant, participants in the experimental groups (SP + TGS, TGS) demonstrated better mean mood scores on happiness, anxiety, and angry when compared to control, indicating a positive psychological response to the singing interventions. Overall, this study indicated the effectiveness of SP + TGS and TGS as promising therapeutic interventions for voice quality and mood in individuals with PD. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Recurrence plot embeddings as short segment nonlinear features for multimodal speaker identification using air, bone and throat microphones.
- Author
-
Nawas, K. Khadar, Shahina, A., Balachandar, Keshav, Maadeshwaran, P., Devanathan, N. G., Kumar, Navein, and Khan, A. Nayeemulla
- Subjects
- *
VOCAL tract , *SPEECH perception , *THROAT , *BONE conduction , *MULTIMODAL user interfaces , *MICROPHONES , *AUTOMATIC speech recognition , *SOUND reverberation - Abstract
Speech is produced by a nonlinear, dynamical Vocal Tract (VT) system, and is transmitted through multiple (air, bone and skin conduction) modes, as captured by the air, bone and throat microphones respectively. Speaker specific characteristics that capture this nonlinearity are rarely used as stand-alone features for speaker modeling, and at best have been used in tandem with well known linear spectral features to produce tangible results. This paper proposes Recurrent Plot (RP) embeddings as stand-alone, non-linear speaker-discriminating features. Two datasets, the continuous multimodal TIMIT speech corpus and the consonant-vowel unimodal syllable dataset, are used in this study for conducting closed-set speaker identification experiments. Experiments with unimodal speaker recognition systems show that RP embeddings capture the nonlinear dynamics of the VT system which are unique to every speaker, in all the modes of speech. The Air (A), Bone (B) and Throat (T) microphone systems, trained purely on RP embeddings perform with an accuracy of 95.81%, 98.18% and 99.74%, respectively. Experiments using the joint feature space of combined RP embeddings for bimodal (A–T, A–B, B–T) and trimodal (A–B–T) systems show that the best trimodal system (99.84% accuracy) performs on par with trimodal systems using spectrogram (99.45%) and MFCC (99.98%). The 98.84% performance of the B–T bimodal system shows the efficacy of a speaker recognition system based entirely on alternate (bone and throat) speech, in the absence of the standard (air) speech. The results underscore the significance of the RP embedding, as a nonlinear feature representation of the dynamical VT system that can act independently for speaker recognition. It is envisaged that speech recognition too will benefit from this nonlinear feature. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Spectral warping based data augmentation for low resource children's speaker verification.
- Author
-
Kathania, Hemant Kumar, Kadyan, Virender, Kadiri, Sudarsana Reddy, and Kurimo, Mikko
- Subjects
DATA augmentation ,AUTOMATIC speech recognition ,VOCAL tract ,DATABASES ,SPEECH ,ERROR rates - Abstract
In this paper, we present our effort to develop an automatic speaker verification (ASV) system for low resources children's data. For the children's speakers, very limited amount of speech data is available in majority of the languages for training the ASV system. Developing an ASV system under low resource conditions is a very challenging problem. To develop the robust baseline system, we merged out of domain adults' data with children's data to train the ASV system and tested with children's speech. This kind of system leads to acoustic mismatches between training and testing data. To overcome this issue, we have proposed spectral warping based data augmentation. We modified adult speech data using spectral warping method (to simulate like children's speech) and added it to the training data to overcome data scarcity and mismatch between adults' and children's speech. The proposed data augmentation gives 20.46% and 52.52% relative improvement (in equal error rate) for Indian Punjabi and British English speech databases, respectively. We compared our proposed method with well known data augmentation methods: SpecAugment, speed perturbation (SP) and vocal tract length perturbation (VTLP), and found that the proposed method performed best. The proposed spectral warping method is publicly available at https://github.com/kathania/Speaker-Verification-spectral-warping. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. A method for the asynchronous analysis of a voice source based on a two-Level autoregressive model of speech signal.
- Author
-
Savchenko, Vladimir Vasilyevich and Savchenko, Lyudmila Vasilyevna
- Subjects
- *
VOICE analysis , *TELECOMMUNICATION systems , *VOCAL tract , *AUTOREGRESSIVE models , *SPEECH - Abstract
We consider the problem of analysis of the voice source of speech within the range of short-term observations. The problem of insufficient speed of the available methods for the analysis of voice source is described, regardless of the method of data preparation: either synchronous with the main tone of speech sounds or asynchronous. We propose a method for the analysis of voice sources based on the two-level autoregressive model of the speech signal. We describe a software realization of the developed method based on the Berg-Levinson high-speed procedure of numerical calculations. It is shown that this procedure is characterized by a relatively low level of computation costs and its application does not require synchronization of the sequence of observations with the main tone of speech signal. With the help of software implementation of the proposed method, we designed and performed full-scale experiment aimed at analyzing the vowel sounds in the speech of a reference speaker. The results of this experiment confirmed the elevated speed of the proposed method and enabled us to formulate the requirements to the duration of speech signal for the real-time voice analysis. Thus, the optimal duration of the speech signal should vary within the range 32–128 msec. The obtained results can be used for the development and investigation of digital speech communication systems, systems of voice control, biometrics, biomedicine and other speech systems in which specific voice features of speaker's speech are of primary importance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. How the Physical Environment Shaped Language
- Author
-
Dornbierer-Stuart, Joanna and Dornbierer-Stuart, Joanna
- Published
- 2024
- Full Text
- View/download PDF
33. The Voice Group
- Author
-
Ramsey, Gordon P., Ashby, Neil, Series Editor, Brantley, William, Series Editor, Deady, Matthew, Series Editor, Fowler, Michael, Series Editor, Hjorth-Jensen, Morten, Series Editor, Inglis, Michael, Series Editor, Luokkala, Barry, Series Editor, and Ramsey, Gordon P.
- Published
- 2024
- Full Text
- View/download PDF
34. An MRI-based articulatory analysis of the Kannada dental-retroflex contrast.
- Author
-
Kochetov, Alexei, Savariaux, Christophe, Lamalle, Laurent, Noûs, Camille, and Badin, Pierre
- Subjects
- *
VOCAL tract , *ALVEOLAR process , *LARYNX , *MAGNETIC resonance imaging - Abstract
This paper investigates the production of dental and retroflex stops, fricatives, nasals, and laterals in the Dravidian language Kannada. This is done using articulatory contours extracted from an extensive midsagittal MRI corpus of two female Kannada speakers' static vocal tract postures intended to capture key aspects of phonemic articulations. Articulatory modelling was used to determine a set of components responsible for the implementation of place and manner contrasts (/t̪ s̪ n̪ l̪/ vs. /ʈ ʂ ɳ ɭ/). These components included both lingual and non-lingual articulatory parameters. Constriction location and length were also determined based on articulatory contours. The results showed that the two speakers produced non-fricative retroflexes with a retracted tongue tip making a constriction behind the alveolar ridge and a characteristic convex tongue shape, yet without a retraction of the posterior portion of the tongue. Apart from the lingual parameters, place differences were also manifested by the vertical position of the larynx (lower for retroflexes). The realisation of the place contrast in sibilant fricatives was different, as /ʂ/ appeared to be produced by both speakers with a laminal alveolopalatal constriction. Manner differences were captured by various non-lingual parameters, yet being also manifested in constriction locations (more anterior for stops). These findings are discussed in the context of previous descriptive and articulatory accounts of dental-retroflex contrasts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. The articulatory properties of apical vowels in Hefei Mandarin.
- Author
-
Kong, Huifang, Wu, Shengyi, Li, Mingxing, and Shen, Xiangrong
- Subjects
- *
VOWELS , *VOCAL tract , *MANDARIN dialects , *CHINESE language , *RHYME , *LIBRARY associations - Abstract
Apical vowels are widely observed across Chinese dialects, such as the rime of [sɹ̩55] 'think' in Mandarin Chinese, which is a syllabic approximant homorganic to its preceding sibilant. The apical vowels in Hefei Mandarin differ from those in Mandarin Chinese and most other languages in three aspects: (i) there are three phonetic apical vowels [ɹ̩], [ɹ̩ʷ], and [ɻ̩] while others usually have one or two, (ii) the alveolar apical [ɹ̩] appears after both homorganic and non-homorganic consonants, e.g. [sɹ̩] vs. [pɹ̩], and (iii) there is a phonological contrast between an unrounded apical [ɹ̩] and a rounded apical [ɹ̩ʷ], e.g. [sɹ̩] vs. [sɹ̩ʷ]. The articulatory properties of the three apical vowels were examined in this study using ultrasound techniques and the results revealed that: (i) the commonalities of tongue gestures for the apical vowels include a retracted tongue root, a lowered tongue dorsum or blade, or both, together with a coronal constriction implemented with the blade and/or the tip; (ii) lip gestures are involved in distinguishing the three apical segments; (iii) the three segments each have its distinct articulatory gestures within a speaker that cannot be simply attributed to the influence from their preceding consonants, with [ɹ̩] and [ɹ̩ʷ] involving a grooving in the front part of the tongue and [ɻ̩] involving a retraction of tongue body in the back region of the vocal tract; (iv) the articulatory gesture of [ɹ̩] after a homorganic consonant, e.g. in [sɹ̩], is similar to that after a non-homorganic consonant, e.g. in [pɹ̩], suggesting an independent articulatory target for this segment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Simultaneous High-Speed Video Laryngoscopy and Acoustic Aerodynamic Recordings during Vocal Onset of Variable Sound Pressure Level: A Preliminary Study.
- Author
-
Woo, Peak
- Subjects
- *
SOUND pressure , *VOCAL cords , *LARYNGOSCOPY , *VOCAL tract , *FAST Fourier transforms , *DIFFUSION tensor imaging , *VOICE analysis - Abstract
Voicing: requires frequent starts and stops at various sound pressure levels (SPL) and frequencies. Prior investigations using rigid laryngoscopy with oral endoscopy have shown variations in the duration of the vibration delay between normal and abnormal subjects. However, these studies were not physiological because the larynx was viewed using rigid endoscopes. We adapted a method to perform to perform simultaneous high-speed naso-endoscopic video while simultaneously acquiring the sound pressure, fundamental frequency, airflow rate, and subglottic pressure. This study aimed to investigate voice onset patterns in normophonic males and females during the onset of variable SPL and correlate them with acoustic and aerodynamic data. Materials and Methods: Three healthy males and three healthy females were studied by simultaneous high-speed video laryngoscopy and recording with the production of the gesture [pa:pa:] at soft, medium, and loud voices. The fiber optic endoscope was threaded through a pneumotachograph mask for the simultaneous recording and analysis of acoustic and aerodynamic data. Results: The average increase in the sound pressure level (SPL) for the group was 15 dB, from 70 to 85 dB. The fundamental frequency increased by an average of 10 Hz. The flow was increased in two subjects, reduced in two subjects, and remained the same in two subjects as the SPL increased. There was a steady increase in the subglottic pressure from soft to loud phonation. Compared to soft to medium phonation, a significant increase in glottal resistance was observed with medium-to-loud phonation. Videokymogram analysis showed the onset of vibration for all voiced tokens without the need for full glottis closure. In loud phonation, there is a more rapid onset of a larger amplitude and prolonged closure of the glottal cycle; however, more cycles are required to achieve the intended SPL. There was a prolonged closed phase during loud phonation. Fast Fourier transform (FFT) analysis of the kymography waveform signal showed a more significant second- and third-harmonic energy above the fundamental frequency with loud phonation. There was an increase in the adjustments in the pharynx with the base of the tongue tilting, shortening of the vocal folds, and pharyngeal constriction. Conclusion: Voice onset occurs in all modalities, without the need for full glottal closure. There was a more significant increase in glottal resistance with loud phonation than that with soft or middle phonation. Vibration analysis of the voice onset showed that more time was required during loud phonation before the oscillation stabilized to a steady state. With increasing SPL, there were significant variations in vocal tract adjustments. The most apparent change was the increase in tongue tension with posterior displacement of the epiglottis. There was an increase in pre-phonation time during loud phonation. Patterns of muscle tension dysphonia with laryngeal squeezing, shortening of the vocal folds, and epiglottis tilting with increasing loudness are features of loud phonation. These observations show that flexible high-speed video laryngoscopy can reveal observations that cannot be observed with rigid video laryngoscopy. An objective analysis of the digital kymography signal can be conducted in selected cases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Articulatory and acoustic differences between lyric and dramatic singing in Western classical music.
- Author
-
Echternach, Matthias, Burk, Fabian, Kirsch, Jonas, Traser, Louisa, Birkholz, Peter, Burdumy, Michael, and Richter, Bernhard
- Subjects
- *
LARYNX , *SINGING , *MAGNETIC resonance imaging , *SOUND pressure , *VOCAL tract - Abstract
Within the realm of voice classification, singers could be sub-categorized by the weight of their repertoire, the so-called "singer's Fach." However, the opposite pole terms "lyric" and "dramatic" singing are not yet well defined by their acoustic and articulatory characteristics. Nine professional singers of different singers' Fach were asked to sing a diatonic scale on the vowel /a/, first in what the singers considered as lyric and second in what they considered as dramatic. Image recording was performed using real time magnetic resonance imaging (MRI) with 25 frames/s, and the audio signal was recorded via an optical microphone system. Analysis was performed with regard to sound pressure level (SPL), vibrato amplitude, and frequency and resonance frequencies as well as articulatory settings of the vocal tract. The analysis revealed three primary differences between dramatic and lyric singing: Dramatic singing was associated with greater SPL and greater vibrato amplitude and frequency as well as lower resonance frequencies. The higher SPL is an indication of voice source changes, and the lower resonance frequencies are probably caused by the lower larynx position. However, all these strategies showed a considerable individual variability. The singers' Fach might contribute to perceptual differences even for the same singer with regard to the respective repertoire. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Articulatory and acoustic dynamics of fronted back vowels in American English.
- Author
-
Havenhill, Jonathan
- Subjects
- *
AMERICAN English language , *VOWELS , *VOCAL tract , *ACOUSTIC reflex - Abstract
Fronting of the vowels /u, ʊ, o/ is observed throughout most North American English varieties, but has been analyzed mainly in terms of acoustics rather than articulation. Because an increase in F2, the acoustic correlate of vowel fronting, can be the result of any gesture that shortens the front cavity of the vocal tract, acoustic data alone do not reveal the combination of tongue fronting and/or lip unrounding that speakers use to produce fronted vowels. It is furthermore unresolved to what extent the articulation of fronted back vowels varies according to consonantal context and how the tongue and lips contribute to the F2 trajectory throughout the vowel. This paper presents articulatory and acoustic data on fronted back vowels from two varieties of American English: coastal Southern California and South Carolina. Through analysis of dynamic acoustic, ultrasound, and lip video data, it is shown that speakers of both varieties produce fronted /u, ʊ, o/ with rounded lips, and that high F2 observed for these vowels is associated with a front-central tongue position rather than unrounded lips. Examination of time-varying formant trajectories and articulatory configurations shows that the degree of vowel-internal F2 change is predominantly determined by coarticulatory influence of the coda. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Improving End-to-End Models for Children's Speech Recognition.
- Author
-
Patel, Tanvina and Scharenborg, Odette
- Subjects
SPEECH perception ,AUTOMATIC speech recognition ,VOCAL tract ,SPEECH ,DATA augmentation - Abstract
Children's Speech Recognition (CSR) is a challenging task due to the high variability in children's speech patterns and limited amount of available annotated children's speech data. We aim to improve CSR in the often-occurring scenario that no children's speech data is available for training the Automatic Speech Recognition (ASR) systems. Traditionally, Vocal Tract Length Normalization (VTLN) has been widely used in hybrid ASR systems to address acoustic mismatch and variability in children's speech when training models on adults' speech. Meanwhile, End-to-End (E2E) systems often use data augmentation methods to create child-like speech from adults' speech. For adult speech-trained ASRs, we investigate the effectiveness of augmentation methods; speed perturbations and spectral augmentation, along with VTLN, in an E2E framework for the CSR task, comparing these across Dutch, German, and Mandarin. We applied VTLN at different stages (training/test) of the ASR and conducted age and gender analyses. Our experiments showed highly similar patterns across the languages: Speed Perturbations and Spectral Augmentation yield significant performance improvements, while VTLN provided further improvements while maintaining recognition performance on adults' speech (depending on when it is applied). Additionally, VTLN showed performance improvement for both male and female speakers and was particularly effective for younger children. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson's Disease: A Study on Speaker Diarization and Classification Techniques.
- Author
-
Di Cesare, Michele Giuseppe, Perpetuini, David, Cardone, Daniela, and Merla, Arcangelo
- Subjects
- *
PARKINSON'S disease , *AUTOMATIC speech recognition , *SPEECH , *VOICE disorders , *CELL phones , *VOCAL tract , *STREAMING audio - Abstract
Parkinson's disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. One of the notable non-motor symptoms of PD is the presence of vocal disorders, attributed to the underlying pathophysiological changes in the neural control of the laryngeal and vocal tract musculature. From this perspective, the integration of machine learning (ML) techniques in the analysis of speech signals has significantly contributed to the detection and diagnosis of PD. Particularly, MEL Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) are both feature extraction techniques commonly used in the field of speech and audio signal processing that could exhibit great potential for vocal disorder identification. This study presents a novel approach to the early detection of PD through ML applied to speech analysis, leveraging both MFCCs and GTCCs. The recordings contained in the Mobile Device Voice Recordings at King's College London (MDVR-KCL) dataset were used. These recordings were collected from healthy individuals and PD patients while they read a passage and during a spontaneous conversation on the phone. Particularly, the speech data regarding the spontaneous dialogue task were processed through speaker diarization, a technique that partitions an audio stream into homogeneous segments according to speaker identity. The ML applied to MFCCS and GTCCs allowed us to classify PD patients with a test accuracy of 92.3%. This research further demonstrates the potential to employ mobile phones as a non-invasive, cost-effective tool for the early detection of PD, significantly improving patient prognosis and quality of life. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Speech perception difficulty modulates theta-band encoding of articulatory synergies.
- Author
-
Corsini, Alessandro, Tomassini, Alice, Pastore, Aldo, Delis, Ioannis, Fadiga, Luciano, and D'Ausilio, Alessandro
- Subjects
- *
SPEECH perception , *ARTICULATION (Speech) , *VOCAL tract , *PRINCIPAL components analysis , *ENCODING , *SPEECH - Abstract
The human brain tracks available speech acoustics and extrapolates missing information such as the speaker's articulatory patterns. However, the extent to which articulatory reconstruction supports speech perception remains unclear. This study explores the relationship between articulatory reconstruction and task difficulty. Participants listened to sentences and performed a speech-rhyming task. Real kinematic data of the speaker's vocal tract were recorded via electromagnetic articulography (EMA) and aligned to corresponding acoustic outputs. We extracted articulatory synergies from the EMA data with principal component analysis (PCA) and employed partial information decomposition (PID) to separate the electroencephalographic (EEG) encoding of acoustic and articulatory features into unique, redundant, and synergistic atoms of information. We median-split sentences into easy (ES) and hard (HS) based on participants' performance and found that greater task difficulty involved greater encoding of unique articulatory information in the theta band. We conclude that fine-grained articulatory reconstruction plays a complementary role in the encoding of speech acoustics, lending further support to the claim that motor processes support speech perception. NEW & NOTEWORTHY Top-down processes originating from the motor system contribute to speech perception through the reconstruction of the speaker's articulatory movement. This study investigates the role of such articulatory simulation under variable task difficulty. We show that more challenging listening tasks lead to increased encoding of articulatory kinematics in the theta band and suggest that, in such situations, fine-grained articulatory reconstruction complements acoustic encoding. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance images.
- Author
-
Belyk, Michel, Carignan, Christopher, and McGettigan, Carolyn
- Subjects
- *
VOCAL tract , *MAGNETIC resonance imaging , *LAUGHTER , *MACHINE learning , *HUMAN anatomy , *DEEP learning - Abstract
Real-time magnetic resonance imaging (rtMRI) is a technique that provides high-contrast videographic data of human anatomy in motion. Applied to the vocal tract, it is a powerful method for capturing the dynamics of speech and other vocal behaviours by imaging structures internal to the mouth and throat. These images provide a means of studying the physiological basis for speech, singing, expressions of emotion, and swallowing that are otherwise not accessible for external observation. However, taking quantitative measurements from these images is notoriously difficult. We introduce a signal processing pipeline that produces outlines of the vocal tract from the lips to the larynx as a quantification of the dynamic morphology of the vocal tract. Our approach performs simple tissue classification, but constrained to a researcher-specified region of interest. This combination facilitates feature extraction while retaining the domain-specific expertise of a human analyst. We demonstrate that this pipeline generalises well across datasets covering behaviours such as speech, vocal size exaggeration, laughter, and whistling, as well as producing reliable outcomes across analysts, particularly among users with domain-specific expertise. With this article, we make this pipeline available for immediate use by the research community, and further suggest that it may contribute to the continued development of fully automated methods based on deep learning algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Sketches of chimpanzee (Pan troglodytes) hoo's: vowels by any other name?
- Author
-
Ekström, Axel G. and Edlund, Jens
- Subjects
VOCAL tract ,VOWELS ,HOMINIDS ,CHIMPANZEES ,HUMAN beings ,HARD palate ,PHONETICS - Abstract
In human speech, the close back rounded vowel /u/ (the vowel in "boot") is articulated with the tongue arched toward the dorsal boundary of the hard palate, with the pharyngeal cavity open. Acoustic and perceptual properties of chimpanzee (Pan troglodytes) hoo's are similar to those of the human vowel /u/. However, the vocal tract morphology of chimpanzees likely limits their phonetic capabilities, so that it is unlikely, or even impossible, that their articulation is comparable to that of a human. To determine how qualities of the vowel /u/ may be achieved given the chimpanzee vocal tract, we calculated transfer functions of the vocal tract area for tube models of vocal tract configurations in which vocal tract length, length and area of a laryngeal air sac simulacrum, length of lip protrusion, and area of lip opening were systematically varied. The method described is principally acoustic; we make no claim as to the actual shape of the chimpanzee vocal tract during call production. Nonetheless, we demonstrate that it may be possible to achieve the acoustic and perceptual qualities of back vowels without a reconfigured human vocal tract. The results, while tentative, suggest that the production of hoo's by chimpanzees, while achieving comparable vowel-like qualities to the human /u/, may involve articulatory gestures that are beyond the range of the human articulators. The purpose of this study was to (1) stimulate further simulation research on great ape articulation, and (2) show that apparently vowel-like phenomena in nature are not necessarily indicative of evolutionary continuity per se. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Assessing accuracy of resonances obtained with reassigned spectrograms from the "ground truth" of physical vocal tract models.
- Author
-
Shadle, Christine H., Fulop, Sean A., Chen, Wei-Rong, and Whalen, D. H.
- Subjects
- *
VOCAL tract , *ACOUSTIC resonance , *SPECTROGRAMS , *MECHANICAL models , *MEASUREMENT errors , *STOCHASTIC resonance , *RESONANCE - Abstract
The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713–727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Vowels and Consonants in Animals
- Author
-
Lameira, Adriano R.
- Published
- 2022
- Full Text
- View/download PDF
46. A Theory That Never Was: Wrong Way to the 'Dawn of Speech'
- Author
-
Axel G. Ekström
- Subjects
speech production ,evolution of speech ,vocal tract ,primatology ,miscitation ,Language and Literature ,Philology. Linguistics ,P1-1091 - Abstract
Recent literature argues that a purportedly long-standing theory—so-called “laryngeal descent theory”—in speech evolution has been refuted (Boë et al., 2019, https://doi.org/10.1126/sciadv.aaw3916). However, an investigation into the relevant source material reveals that the theory described has never been a prominent line of thinking in speech-centric sciences. The confusion arises from a fundamental misunderstanding: the argument that the descent of the larynx and the accompanying changes in the hominin vocal tract expanded the range of possible speech sounds for human ancestors (a theory that enjoys wide interdisciplinary support) is mistakenly interpreted as a belief that all speech was impossible without such changes—a notion that was never widely endorsed in relevant literature. This work aims not to stir controversy but to highlight important historical context in the study of speech evolution.
- Published
- 2024
- Full Text
- View/download PDF
47. Predicting primate tongue morphology based on geometrical skull matching. A first step towards an application on fossil hominins.
- Author
-
Alvarez, Pablo, El Mouss, Marouane, Calka, Maxime, Belme, Anca, Berillon, Gilles, Brige, Pauline, Payan, Yohan, Perrier, Pascal, and Vialet, Amélie
- Subjects
- *
SKULL base , *TONGUE , *VOCAL tract , *FOSSIL hominids , *PRIMATES , *MORPHOLOGY - Abstract
As part of a long-term research project aiming at generating a biomechanical model of a fossil human tongue from a carefully designed 3D Finite Element mesh of a living human tongue, we present a computer-based method that optimally registers 3D CT images of the head and neck of the living human into similar images of another primate. We quantitatively evaluate the method on a baboon. The method generates a geometric deformation field which is used to build up a 3D Finite Element mesh of the baboon tongue. In order to assess the method's ability to generate a realistic tongue from bony structure information alone, as would be the case for fossil humans, its performance is evaluated and compared under two conditions in which different anatomical information is available: (1) combined information from soft-tissue and bony structures; (2) information from bony structures alone. An Uncertainty Quantification method is used to evaluate the sensitivity of the transformation to two crucial parameters, namely the resolution of the transformation grid and the weight of a smoothness constraint applied to the transformation, and to determine the best possible meshes. In both conditions the baboon tongue morphology is realistically predicted, evidencing that bony structures alone provide enough relevant information to generate soft tissues. Author summary: The issue of the phylogenetic emergence of speech in humans is the focus of lively and strong debates. It questions both cognitive and physical capacities of fossil hominins to articulate speech. The ultimate goal of our research project "Origins of Speech" is the quantitative investigation of the physical aspects of the debate. We rely for that on the design biomechanical models of fossil hominins' vocal tracts and on the assessment of their capacity to articulate distinctive sounds as is required for the emergence of spoken language. Since fossil remains do not preserve soft tissue, the technical challenge is to be able to predict them, and in particular the tongue, from bony structures alone. In this paper we present our method to reach this goal, which uses medical images of the head and neck to register a reference biomechanical tongue model of a living human into a tongue model of any other primate. We evaluate it quantitatively on the prediction of a Baboon tongue, for whom we have accurate X-Ray scans of the skull and the vocal tract, by comparing the tongue model predicted from bony structures alone with the model predicted from bony and soft tissue structures and with the tongue segmented on the baboon X-Ray data. The evaluation includes a mathematical evaluation, based on uncertainty quantification methods of the sensitivity of the predictions to the variations of crucial parameters used in the optimal geometrical registration. The results are very encouraging for future application to fossil hominins. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Deep Learning for Neuromuscular Control of Vocal Source for Voice Production.
- Author
-
Palaparthi, Anil, Alluri, Rishi K., and Titze, Ingo R.
- Subjects
DEEP learning ,VOCAL cords ,VOCAL tract ,LUNGS ,NEUROMUSCULAR system ,LARYNGEAL muscles ,SOUND pressure ,SPEECH - Abstract
A computational neuromuscular control system that generates lung pressure and three intrinsic laryngeal muscle activations (cricothyroid, thyroarytenoid, and lateral cricoarytenoid) to control the vocal source was developed. In the current study, LeTalker, a biophysical computational model of the vocal system was used as the physical plant. In the LeTalker, a three-mass vocal fold model was used to simulate self-sustained vocal fold oscillation. A constant /ə/ vowel was used for the vocal tract shape. The trachea was modeled after MRI measurements. The neuromuscular control system generates control parameters to achieve four acoustic targets (fundamental frequency, sound pressure level, normalized spectral centroid, and signal-to-noise ratio) and four somatosensory targets (vocal fold length, and longitudinal fiber stress in the three vocal fold layers). The deep-learning-based control system comprises one acoustic feedforward controller and two feedback (acoustic and somatosensory) controllers. Fifty thousand steady speech signals were generated using the LeTalker for training the control system. The results demonstrated that the control system was able to generate the lung pressure and the three muscle activations such that the four acoustic and four somatosensory targets were reached with high accuracy. After training, the motor command corrections from the feedback controllers were minimal compared to the feedforward controller except for thyroarytenoid muscle activation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. A measure of differences in speech signals by the voice timbre.
- Author
-
Savchenko, V. V.
- Subjects
- *
AUTOMATIC speech recognition , *SPEECH processing systems , *SPEECH , *TELECOMMUNICATION systems , *VOCAL tract , *ADAPTIVE filters - Abstract
This research relates to the field of speech technologies, where the key issue is the optimization of speech signal processing under conditions of a prior uncertainty of its fine structure. The problem of automatic (objective) analysis of the speaker's voice timbre using a speech signal of finite duration is considered. It is proposed to use a universal information-theoretic approach to solve it. Based on the Kullback-Leibler divergence, an expression was obtained to describe the asymptotically optimal decision statistic for differentiating speech signals by the voice timbre. The author highlights a serious obstacle during practical implementation of such statistics, namely: synchronization of the sequence of observations with the pitch of speech signals. To overcome the described obstacle, an objective measure of timbre-based differences in speech signals is proposed in terms of the acoustic theory of speech production and its "acoustic tube" type model of the speaker's vocal tract. The possibilities of practical implementation of a new measure based on an adaptive recursive filter are considered. A full-scale experiment was set up and carried out. The experimental results confirmed two main properties of the proposed measure: high sensitivity to differences in speech signals in terms of voice timbre and invariance with respect to the fundamental pitch frequency. The obtained results can be used when designing and studying digital speech processing systems tuned to the speaker's voice, for example, digital voice communication systems, biometric and biomedical systems, etc. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Implementation of two-factor user authentication in computer systems.
- Author
-
Tomić, Mihailo D. and Radojević, Olivera M.
- Subjects
- *
COMPUTER access control , *MULTI-factor authentication , *COMPUTER security , *COMPUTER systems , *DATABASES , *VOCAL tract - Abstract
Introduction/purpose: The paper explores the implementation of two-factor authentication (2FA) in computer systems, addressing the increasing need for enhanced security. It highlights the vulnerabilities of password-based authentication and emphasizes the advantages of 2FA in mitigating digital threats. The development of the VoiceAuth application, integrating 2FA through a combination of password and voice authentication, serves as a practical illustration. Methods: The research adopts a three-tier architecture for the VoiceAuth application, encompassing a database, server-side REST API, and clientside single-page application. Speaker verification is employed for voice authentication, analyzing elements like pitch, rhythm, and vocal tract shapes.The paper also discusses possibilities for future upgrades, suggesting enhancements such as real-time voice verification and additional 2FA methods. Results: The application's implementation involves a detailed breakdown of the REST API architecture, Single Page Applications (SPAs), and the Speaker Verification service. Conclusion: The research underscores the crucial role of two-factor authentication (2FA) in bolstering the security of computer systems. The VoiceAuth application serves as a practical demonstration, showcasing the successful integration of 2FA through a combination of password and voice authentication. The modular architecture of the application allows for potential upgrades. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.