Author: "Laine, Unto K. / Language: undetermined - Searchworks@Jio Institute Digital Library Search Results

1. Letter to the Editor

Author: Laine, Unto K.
Subjects: Environmental Engineering, Industrial and Manufacturing Engineering
Published: 2020
Full Text: View/download PDF

2. Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies

Author: Unto K. Laine, Dept Signal Process and Acoust, Aalto-yliopisto, and Aalto University
Subjects: Computer science, business.industry, Speech recognition, media_common.quotation_subject, Feature extraction, speech analysis, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Filter bank, 01 natural sciences, Computer Science::Sound, Perception, time-frequency methods, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, pitch-synchronous analysis, Artificial intelligence, business, 010301 acoustics, media_common
Abstract: Speech signal consists of events in time and frequency, and therefore its analysis with high-resolution time-frequency tools is often of importance. Analytic filter bank provides a simple, fast, and flexible method to construct time-frequency representations of signals. Its parameters can be easily adapted to different situations from uniform to any auditory frequency scale, or even to a focused resolution. Since the Hilbert magnitude values of the channels are obtained at every sample, it provides a practical tool for a high-resolution time-frequency analysis. The present study describes the basic theory of analytic filters and tests their main properties. Applications of analytic filter bank to different speech analysis tasks including pitch period estimation and pitch synchronous analysis of formant frequencies and bandwidths are demonstrated. In addition, a new feature vector called group delay vector is introduced. It is shown that this representation provides comparable, or even better results, than those obtained by spectral magnitude feature vectors in the analysis and classification of vowels. The implications of this observation are discussed also from the speech perception point of view.
Published: 2017
Full Text: View/download PDF

3. Classification of audio events using permutation transformation

Author: Seppo Fagerlund and Unto K. Laine
Subjects: Structure (mathematical logic), Audio mining, Permutation (music), Acoustics and Ultrasonics, business.industry, Speech recognition, Feature extraction, Pattern recognition, Domain (software engineering), Transformation (function), Frequency domain, Pattern recognition (psychology), Artificial intelligence, business, Mathematics
Abstract: Automatic detection and classification of short and nonstationary events in noisy signals is widely considered to be a difficult task for traditional frequency domain and even time–frequency domain approaches. A novel method for audio signal classification is introduced. It is based on statistical properties of the temporal fine structure of audio events. Artificially generated random signals and unvoiced stop consonants of speech are used to evaluate the method. The results show improved recognition accuracy in comparison to traditional approaches.
Published: 2014
Full Text: View/download PDF

4. Feedback and imitation by a caregiver guides a virtual infant to learn native phonemes and the skill of speech inversion

Author: Okko Räsänen, Heikki Rasilo, Unto K. Laine, and Informatics and Applied Informatics
Subjects: Linguistics and Language, Speech acquisition, Caregiver feedback, First language, Speech recognition, Place of articulation, 01 natural sciences, 050105 experimental psychology, Language and Linguistics, Babbling, imitation, 0103 physical sciences, 0501 psychology and cognitive sciences, Speech inversion, 010301 acoustics, Communication, 05 social sciences, Articulatory modeling, Phonetic learning, Inversion (meteorology), Language acquisition, Computer Science Applications, language acquisition, Modeling and Simulation, Computer Vision and Pattern Recognition, Mel-frequency cepstrum, Psychology, Software
Abstract: Despite large-scale research, development of robust machines for imitation and inversion of human speech into articulatory movements has remained an unsolved problem. We propose a set of principles that can partially explain real infants' speech acquisition processes and the emergence of imitation skills and demonstrate a simulation where a learning virtual infant (LeVI) learns to invert and imitate a virtual caregiver's speech. Based on recent findings in infants' language acquisition, LeVI learns the phonemes of his native language in a babbling phase using only caregiver's feedback as guidance and to map acoustically differing caregiver's speech into its own articulation in a phase where LeVI is imitated by the caregiver with similar, but not exact, utterances. After the learning stage, LeVI is able to recognize vowels from the virtual caregiver's VCVC utterances perfectly and all 25 Finnish phonemes with an average accuracy of 88.42%. The place of articulation of consonants is recognized with an accuracy of 96.81%. LeVI is also able to imitate the caregiver's speech since the recognition occurs directly in the domain of articulatory programs for phonemes. The learned imitation ability (speech inversion) is strongly language dependent since it is based on the phonemic programs learned from the caregiver. The findings suggest that caregivers' feedback can act as an important signal in guiding infants' articulatory learning, and that the speech inversion problem can be effectively approached from the perspective of early speech acquisition.
Published: 2013
Full Text: View/download PDF

5. A method for noise-robust context-aware pattern discovery and recognition from categorical sequences

Author: Unto K. Laine and Okko Räsänen
Subjects: business.industry, Computer science, Speech recognition, Context (language use), Pattern recognition, Task (project management), Set (abstract data type), Artificial Intelligence, Signal Processing, Pattern recognition (psychology), Feature (machine learning), Computer Vision and Pattern Recognition, Artificial intelligence, Noise (video), business, Spatial analysis, Categorical variable, Software
Abstract: An efficient method for weakly supervised pattern discovery and recognition from discrete categorical sequences is introduced. The method utilizes two parallel sources of data: categorical sequences carrying some temporal or spatial information and a set of labeled, but not exactly aligned, contextual events related to the sequences. From these inputs the method builds associative models able to describe systematically co-occurring structures in the input streams. The learned models, based on transitional probabilities of events observed at several different time lags, inherently segment and classify novel sequences into contextual categories. Learning and recognition processes are purely incremental and computationally cheap, making the approach suitable for on-line learning tasks. The capabilities of the algorithm are demonstrated in a keyword learning task from continuous infant-directed speech and a continuous speech recognition task operating at varying noise levels.
Published: 2012
Full Text: View/download PDF

6. New parametric representations of bird sounds for automatic classification

Author: Unto K. Laine and Seppo Fagerlund
Subjects: Set (abstract data type), Identification (information), business.industry, Computer science, Speech recognition, Pattern recognition, Artificial intelligence, Bird vocalization, Representation (mathematics), Focus (optics), business, Parametric statistics
Abstract: Identification of bird species based on their vocalization is studied in this paper. The main focus is introducing a new parametric representation of bird sounds for automatic identification of their species. The method is based on the statistics of local temporal patterns in bird vocalization. Two different sets of bird species are used in the classification tests. The first set contains six species that often produce inharmonic sounds. For the second set, four species that produce very different types of sounds were added. Recognition results using a k-NN-classifier shows improved recognition accuracy over the results obtained by MFCC-features.
Published: 2014
Full Text: View/download PDF

7. A comparison of warped and conventional linear predictive coding

Author: Aki Härmä and Unto K. Laine
Subjects: Signal processing, Audio signal, Acoustics and Ultrasonics, Computer science, Speech recognition, Speech coding, Linear prediction, Linear predictive coding, Wideband audio, Computer Vision and Pattern Recognition, Electrical and Electronic Engineering, Wideband, Software, Coding (social sciences)
Abstract: Frequency-warped signal processing techniques are attractive to many wideband speech and audio applications since they have a clear connection to the frequency resolution of human hearing. A warped version of linear predictive coding (LPC) is studied. The performance of conventional and warped LPC algorithms are compared in a simulated coding system using listening tests and conventional technical measures. The results indicate that the use of warped techniques is beneficial especially in wideband coding and may result in savings of one bit per sample compared to the conventional algorithm while retaining the same subjective quality.
Published: 2001
Full Text: View/download PDF

8. Automatic self-supervised learning of associations between speech and text

Author: Okko Räsänen, Unto K. Laine, and Juha Knuuttila
Subjects: Self supervised learning, business.industry, Computer science, Speech recognition, Artificial intelligence, computer.software_genre, business, computer, Natural language processing
Published: 2013
Full Text: View/download PDF

9. Time-frequency integration characteristics of hearing are optimized for perception of speech-like acoustic patterns

Author: Unto K. Laine and Okko Räsänen
Subjects: Sound Spectrography, Acoustics and Ultrasonics, Computer science, Acoustics, media_common.quotation_subject, Speech recognition, Signal, Speech Acoustics, Loudness, Pitch Discrimination, Critical band, Arts and Humanities (miscellaneous), Perception, Learning rule, medicine, Auditory system, Humans, Attention, Psychoacoustics, media_common, Time–frequency analysis, medicine.anatomical_structure, Time Perception, Speech Perception, Cues, Perceptual Masking
Abstract: Several psychoacoustic phenomena such as loudness perception, absolute thresholds of hearing, and perceptual grouping in time are affected by temporal integration of the signal in the auditory system. Similarly, the frequency resolution of the hearing system, often expressed in terms of critical bands, implies signal integration across neighboring frequencies. Although progress has been made in understanding the neurophysiological mechanisms behind these processes, the underlying reasons for the observed integration characteristics have remained poorly understood. The current work proposes that the temporal and spectral integration are a result of a system optimized for pattern detection from ecologically relevant acoustic inputs. This argument is supported by a simulation where the average time-frequency structure of speech that is derived from a large set of speech signals shows a good match to the time-frequency characteristics of the human auditory system. The results also suggest that the observed integration characteristics are learnable from acoustic inputs of the auditory environment using a Hebbian-like learning rule.
Published: 2013

10. Attention based temporal filtering of sensory signals for data redundancy reduction

Author: Okko Räsänen, Sofoklis Kakouros, and Unto K. Laine
Subjects: Computer science, business.industry, Feature extraction, Context (language use), Pattern recognition, computer.software_genre, Identification (information), Data redundancy, Pattern recognition (psychology), Artificial intelligence, Data mining, business, computer, Data compression
Abstract: Since modern computational devices are required to store and process increasing amounts of data generated from various sources, efficient algorithms for identification of significant information in the data are becoming essential. Sensory recordings are one example where automatic and continuous storing and processing of large amounts of data is needed. Therefore, algorithms that can alleviate the computational load of the devices and reduce their storage requirements by removing uninformative data are important. In this work we propose a method for data reduction based on theories of human attention. The method detects temporally salient events based on the context in which they occur and retains only those sections of the input signal. The algorithm is tested as a pre-processing stage in a weakly supervised keyword learning experiment where it is shown to significantly improve the quality of the codebooks used in the pattern discovery process.
Published: 2013
Full Text: View/download PDF

11. Splitting the unit delay [FIR/all pass filters design]

Author: Matti Karjalainen, Vesa Välimäki, Unto K. Laine, and Timo Laakso
Subjects: Signal processing, Digital delay line, Finite impulse response, Computer science, Applied Mathematics, Delay, Signal Processing, Electronic engineering, Array processing, Electrical and Electronic Engineering, Digital filter, All-pass filter, Group delay and phase delay
Abstract: A fractional delay filter is a device for bandlimited interpolation between samples. It finds applications in numerous fields of signal processing, including communications, array processing, speech processing, and music technology. We present a comprehensive review of FIR and allpass filter design techniques for bandlimited approximation of a fractional digital delay. Emphasis is on simple and efficient methods that are well suited for fast coefficient update or continuous control of the delay value. Various new approaches are proposed and several examples are provided to illustrate the performance of the methods. We also discuss the implementation complexity of the algorithms. We focus on four applications where fractional delay filters are needed: synchronization of digital modems, incommensurate sampling rate conversion, high-resolution pitch prediction, and sound synthesis of musical instruments.
Published: 1996
Full Text: View/download PDF

12. Comparison of Classifiers in Audio and Acceleration Based Context Classification in Mobile Phones

Author: Laine, Unto K., Leppänen, Jussi, Räsänen, Okko, and Saarinen, Jukka
Abstract: Publication in the conference proceedings of EUSIPCO, Barcelona, Spain, 2011
Published: 2011
Full Text: View/download PDF

13. Stop consonant recognition by temporal fine structure of burst

Author: Seppo Fagerlund and Unto K. Laine
Subjects: Computer science, Speech recognition, Stop consonant, Structure (category theory)
Published: 2011
Full Text: View/download PDF

14. Aspect of the physiological sources of vocal vibrato: A study of fundamental-period-synchronous changes in electroglottographic signals obtained from one singer and two excised human larynges

Author: Erkki Vilkman, Anne-Maria Laukkanen, and Unto K. Laine
Subjects: medicine.anatomical_structure, Register (music), Vocal folds, Acoustics, medicine, Falsetto, Cricothyroid articulation, General Medicine, Phonation, Singing, Loudness, Mathematics, Vibrato
Abstract: The following experiment was carried out in order to see whether it is possible to get information about the physiological mechanisms of fundamental frequency variation in the vibrato of singing voice by investigating the fundamental-period-synchronous changes in electroglottographic signals. Electroglottograms were taken from one trained female amateur singer while singing /a:/ at comfortable pitch and loudness level in the chest register mode a) with habitual vibrato, b) with exaggerated vibrato and c) while being abruptly pushed on the abdominal wall. For a comparison, EGG-signals were taken from samples produced with two excised human larynges; in those samples fundamental frequency was changed either by varying subglottic pressure or introducing longitudinal stretch on the vocal folds by manual rotation of the cricothyroid articulation. The results suggest that only if phonation shifts from modal to falsetto register changes in amplitude, SQ and QOQ of the EGG-signal differ for laryngeally produced a...
Published: 1992
Full Text: View/download PDF

15. Self-learning vector quantization for pattern discovery from speech

Author: Unto K. Laine, Toomas Altosaar, and Okko Räsänen
Subjects: Linde–Buzo–Gray algorithm, Learning vector quantization, Computer science, business.industry, Speech recognition, Vector quantization, k-means clustering, Pattern recognition, Computer Science::Sound, Canopy clustering algorithm, Artificial intelligence, Time series, Cluster analysis, business
Abstract: A novel and computationally straightforward clustering algorithm was developed for vector quantization (VQ) of speech signals for a task of unsupervised pattern discovery (PD) from speech. The algorithm works in purely incremental mode, is computationally extremely feasible, and achieves comparable classification quality with the well-known k-means algorithm in the PD task. In addition to presenting the algorithm, general findings regarding the relationship between the amounts of training material, convergence of the clustering algorithm, and the ultimate quality of VQ codebooks are discussed. Index Terms: speech recognition, pattern discovery, time series analysis, vector quantization, data clustering
Published: 2009
Full Text: View/download PDF

16. A noise robust method for pattern discovery in quantized time series: the concept matrix approach

Author: Okko Räsänen, Toomas Altosaar, and Unto K. Laine
Subjects: Set (abstract data type), Task (computing), Matrix (mathematics), Discrete time and continuous time, Series (mathematics), Computer science, business.industry, Pattern recognition, Artificial intelligence, Noise (video), business
Abstract: An efficient method for pattern discovery from discrete time series is introduced in this paper. The method utilizes two parallel streams of data, a discrete unit time-series and a set of labeled events, From these inputs it builds associative models between systematically co-occurring structures existing in both streams. The models are based on transitional probabilities of events at several different time scales. Learning and recognition processes are incremental, making the approach suitable for online learning tasks. The capabilities of the algorithm are demonstrated in a continuous speech recognition task operating in varying noise levels.
Published: 2009
Full Text: View/download PDF

17. Unsupervised segmentation of continuous speech using vector autoregressive time-frequency modeling errors

Author: Petri Korhonen and Unto K. Laine
Subjects: Vocabulary, Speech production, Computer science, business.industry, Speech recognition, media_common.quotation_subject, Speech technology, Speech synthesis, Pattern recognition, computer.software_genre, Speech segmentation, Autoregressive model, Phone, Segmentation, Artificial intelligence, business, computer, Utterance, media_common
Abstract: A vector autoregressive (VAR) model is used in the auditorytime-frequency domain to predict spectral changes. Forwardand backward prediction errors increases at the phone bound-aries. These error signals are then used to study and detect theboundaries of the largest changes allowing the most reliableautomatic segmentation. Using a fully unsupervised methodyields segments consisting of a variable number of phones. Thequality of performance of this method was tested with a set of150 Finnish sentences pronounced by one female and two malespeakers. The performance for English was tested using theTIMIT core test set. The boundaries between stops and vowels,in particular, are detected with high probability and precision. 1. Introduction Many subﬁelds of speech technology need robust methodsfor automatic phonetic speech segmentation. Preferably thesemethods would be fully speaker and language independent.They should perform segmentation without any prior infor-mation about the speaker or the utterance in question. Thesemethods should not apply any type of prior learning, andthey should be able to process unknown utterances in a fullyunsupervised manner. This paper describes a preliminary testof a novel method for automatic speech segmentation, whichfulﬁlls the hard demands mentioned to a certain degree.Segmentation methods described in the literature can beclassiﬁed into explicit and implicit methods. They also varyin terms of segmentation units (e.g. phonemes, syllables,words). In explicit methods, the underlying phoneme sequenceis known prior to the segmentation. These methods are usedin speech synthesis for example. Implicit methods split theutterance into smaller units without using any informationabout the underlying phoneme sequence. These methods arebased on analyzing the acoustic properties of the signal anddetecting either spectrally stable parts or rapid variations ofsignal. An example of a method based on locating spectrallystable parts is in [1] where the correlation between parameterscomputed from nearby frames has been used as a measure ofstability. In [2], segment boundaries are implicitly detectedcomparing the means of frames around potential boundariesusing “jump-function.” In [3], the variations of short-termenergy function is used as a measure to produce syllable-likeunits using minimum phase group delay functions.In the case of continuous speech, the signal cannot bestrictly divided into stable and varying parts which wouldcorrespond one-to-one with phones and segment boundaries.No phone in continuous speech produces steady spectra,but instead within a phone there are always slow spectralmovements which are, to some degree, possible to predict.The method proposed in this paper does not detect these slowspectral variations, but rather is based on detecting unpredictedchanges in auditory time-frequency picture of speech at phoneboundaries. These unpredicted changes happen most oftenwhen moving from one phoneme class to another. Change inthe speech production mechanism changes the acoustic signalin an unpredictable manner. Knowing that not all transitionsproduce a large or rapid spectral change, a question of thisstudy is which kind of phone boundaries allows the mostreliable and robust detection by the method.When facing speaker-independent unlimited vocabulary(e.g. inﬂectional languages) continuous speech recognition, thewords have to be split into smaller units such as morphemes;hence, not every phone boundary needs to be detected. Seg-ments similar to syllables or morphemes consisting of one tomany phones do apply as well as long as the total number ofdifferent segments is not too high for modeling purposes.The novel method presented in this paper produces seg-ments consisting of phone clusters of different lengths. Thecore idea is to model the spectral variation by using VectorAutoregressive model (VAR). The model performs forward andbackward predictions in the auditory time-frequency domainwith associated prediction errors. The segment boundarycandidates are found based on these error signals.
Published: 2005
Full Text: View/download PDF

18. Measurements on the effects of glottal opening and flow on the glottal impedance

Author: M. Karjalainen and Unto K. Laine
Subjects: Inductance, Resonator, Materials science, Turbulence, Acoustics, Flow (psychology), Resonance, Tube (container), Electrical impedance, Body orifice
Abstract: A new method to measure the acoustical impedance of an artificial glottal orifice is presented. The plate with the orifice is mounted to one end of a tube resonator with the other end being open. The impedance of the orifice seen from the tract can be solved from the resonance frequencies and bandwidths. The frequency characteristic of the orifice is easily otained under different DC-flow conditions. The results show that under turbulent flow the effective glottal inductance is clearly only a fraction of its flowless value. The measured glottal resistance is close to the theoretical value given by Flanagan's two-mass model.
Published: 2005
Full Text: View/download PDF

19. Aids for the handicapped based on 'Synte 2' speech synthesizer

Author: J. Wood, Unto K. Laine, Matti Karjalainen, R. Toivonen, K. Haymond, and R. Folmar
Subjects: Vocabulary, Presentation, Computer science, media_common.quotation_subject, Speech recognition, Speech technology, Speech synthesis, Loudspeaker, Speech processing, computer.software_genre, computer, media_common
Abstract: SYNTE 2 is a low-cost, high-quality, text-to-speech synthesizer designed for Finnish but applicable also to other languages if "phoneme writing" is used. After its first presentation in 1977 it has been adapted to many communication aids for the handicapped. The first application was a portable speaking machine with unlimited vocabulary for the speech impared. This paper describes the present applications of SYNTE 2, including the speaking machine, a talking data terminal for blind computer programmers, a system for automatic production of spoken information for the blind, etc.
Published: 2005
Full Text: View/download PDF

20. An all-zero model for higher pole correction

Author: Unto K. Laine
Subjects: Computer Science::Performance, Computer Science::Sound, Error analysis, Factorization of polynomials, Speech recognition, Attenuation, Bandwidth (signal processing), Effective length, Topology, Digital filter, Vocal tract, Mathematics
Abstract: The higher pole correction (HPC) function in relation to the all-pole modelling of the vocal tract transmission is analyzed. A set of theoretical HPC curves for vocal tracts of different effective lengths are calculated. An all-zero model for HPC is proposed. This model is derived by a polynomial factorization method. The zeroes have broad bandwidths (1.5-2 kHz) and are located periodically with a spacing depending on the effective length. Thus the design of HPC filters for variable effective lengths is trivially achieved. A detailed error analysis for these models is given. The use of the all-zero HPC filters leads to a new type of pole-zero model for vocal tract transmission which can be used either in analog domain or digital domain. Conventionally, for a fixed number of poles and sampling frequency, the length of the vocal tract is assumed to be fixed (i.e. no HPC). However, with the proposed pole-zero model, HPC for variable effective lengths is automatically ensured.
Published: 2005
Full Text: View/download PDF

21. PARCAS, A new terminal analog model for speech synthesis

Author: Unto K. Laine
Subjects: Compact space, Quality (physics), Formant, Vowel, Speech recognition, Speech synthesis, computer.software_genre, Speech processing, computer, Transfer function, Vocal tract, Mathematics
Abstract: A new method to construct formant-type models for text-to-speech synthesis is described. The method consists of two phases: Firstly, the idealized acoustic transfer function of the uniform vocal tract is factorized into two partial transfer functions each including only every other formant of the original one. Secondly, the partial transfer functions, are approximated with proper rational, meromorphic functions. The method leads to a PARallel-CAScade model called PARCAS. In a typical text-to-speech application the model needs only 6 resonators and 16 control parameters. The special features of the PARCAS model lie in its structural compactness and simplicity to control. With this spesific structure the formant amplitudes in vowel sounds can be put close to the right levels by controlling the formant frequencies only. The same compact filter system can be used in the synthesis of all sounds including fricatives, nasals, transients and bursts. Also the mixed type excitation for voiced fricatives can easily be obtained. By informal listening of the synthesized speech it is found to be of high quality.
Published: 2005
Full Text: View/download PDF

22. Modelling of LIP radiation impedance in Z-domain

Author: Unto K. Laine
Subjects: Radiation impedance, Mean squared error, Sampling (signal processing), Speech recognition, Mathematical analysis, Trigonometric functions, Sine, Acoustic impedance, Electrical impedance, Omega, Mathematics
Abstract: Three z-domain models for lip radiaton impedance are introduced. Two of them are based on the observation that the normalized acoustic impedance can be modelled as z(\omega) = C*[1.-\cos(\omegaT)] + j*B*\sin(\omegaT) . The values of the parameters C and B for different sampling frequencies and radiation areas are found by minimizing the mean square error (MSE) between the modelled and the acoustic impedance. Owing to the sine and cosine functions used, the modelled impedance can be transformed exactly and straightforwardly into the z-domain. The third model described is a pole-zero-model, the parameters of which are also optimized by MSE-criterion. The limits for acceptable errors of the modelled impedances are studied and the models compared to the well known Flanagan's model.
Published: 2005
Full Text: View/download PDF

23. Linear transforms and filterbanks based on vector ARMA models

Author: Unto K. Laine
Subjects: Linear map, Mathematical optimization, Iterative method, Wavelet transform, Pole–zero plot, Basis function, Impulse (physics), Filter bank, Residual, Algorithm, Mathematics
Abstract: Linear transformations, like wavelet transforms, and filterbanks of IIR-type and of arbitrary time-frequency plane tilings can be efficiently realized by vector ARMA models. The quality of the realization depends on how well the basis functions or impulse responses of the filterbank can be approximated by the actual VARMA based pole-zero model. The vector AR part gives an MSE-optimal block-recursive model for the target basis functions. The vector MA part is formed of the vector AR residual and further optimized by an iterative algorithm.
Published: 2003
Full Text: View/download PDF

24. A new glottal LPC method for voice coding and inverse filtering

Author: Unto K. Laine and Paavo Alku
Subjects: Speech production, Voice activity detection, Codec2, Computer science, Speech recognition, Speech coding, Speech processing, Linear predictive coding, Vector sum excited linear prediction, Harmonic Vector Excitation Coding, Vocal tract
Abstract: A linear-predictive-coding (LPC) based method for computing glottal pulses is presented. The method is based on modeling the speech production mechanism with three digital filters. The glottal contribution to the speech spectrum is first estimated with two consecutive LPC analyses. After the effect of the glottals are eliminated the vocal tract and radiation effects are modeled. Glottal waves close to the natural shape can be achieved with this straightforward method. Applications to speech coding are briefly described. >
Published: 2003
Full Text: View/download PDF

25. Famlet, to be or not to be a wavelet?

Author: Unto K. Laine
Subjects: Discrete wavelet transform, symbols.namesake, Wavelet, Fourier transform, Computer science, Second-generation wavelet transform, Mathematical analysis, symbols, Harmonic wavelet transform, Fast wavelet transform, Fractional Fourier transform, Constant Q transform
Abstract: A class of orthogonal time domain functions called famlets is introduced. Famlets are produced from their frequency domain representatives by using the inverse Fourier transform. The basic theory behind famlets and the famlet transform associated with them is formulated, and their usage in nonuniform resolution spectrum analysis is described. The similarities and dissimilarities of famlets and wavelets are discussed. >
Published: 2003
Full Text: View/download PDF

26. MSE filter design and spectrum parameterization by orthogonal FAM transform

Author: Unto K. Laine
Subjects: Filter design, Finite impulse response, Mean squared error, Control theory, Frequency domain, Orthonormal basis, Digital filter, Algorithm, Infinite impulse response, Linear phase, Mathematics
Abstract: An orthonormal set of the frequency amplitude modulated (FAM) class of functions is used to produce a frequency domain FAM transform suitable for infinite impulse response (IIR) filter design with least squared error criterion. The orthonormal basis used is chosen to be effectively implemented by a set of identical second order allpass filters equipped with a common magnitude weighting. The allpass parameters (/spl theta/, r) are first adapted to the problem in order to get rapid convergence in the mean squared error (MSE). The results show that the new method produces IIR filters with the same or better (in MSE sense) frequency domain response than finite impulse response (FIR) filters with 5 to 7 times higher order. The method allows optimization of the group delay properties. There are three different ways to approximate linear phase characteristics in the IIR filters produced. >
Published: 2002
Full Text: View/download PDF

27. A study on auditory resolution using Bark-FAMlet clicks

Author: M. Huotilainen and Unto K. Laine
Subjects: Masking (art), Amplitude modulation, Computer science, Speech recognition, Temporal resolution, Resolution (electron density), Perceptual coding, Phase (waves), Set (psychology), Frequency modulation
Abstract: Many areas in audio engineering, e.g., perceptual coding, strongly rely on knowledge and models of the human auditory system. Nonuniform frequency resolution and masking phenomenon are well studied aspects. Less attention has been paid to the temporal auditory resolution. In this pilot study the temporal resolution is tested by using short (0.55-4.15 ms) clicks called Bark-FAMlets. They form an orthogonal set of signals all having identical power spectra with uniform masking properties. Individual FAMlets differ only in phase by steps called an auditory unit delay (AUD). We found that 75% discrimination is achieved when the difference between two FAMlets is 22, 11 or 10 AUDs depending on the three test conditions used.
Published: 2002
Full Text: View/download PDF

28. Warped linear prediction (WLP) in speech and audio processing

Author: Toomas Altosaar, Unto K. Laine, and Matti Karjalainen
Subjects: Signal processing, Audio signal, Mathematics::Commutative Algebra, Computer science, Speech recognition, Speech coding, Linear prediction, Filter bank, Speech processing, computer.software_genre, Computer Science::Sound, Orthonormal basis, Image warping, Audio signal processing, computer
Abstract: A linear prediction process is applied to frequency warped signals. The warping is realized by using orthonormal FAM (frequency modulated complex exponentials) functions. The general formulation of WLP is given and effective realizations with allpass filters are studied. The application of auditory WLP to speech coding and speech recognition has given good results. >
Published: 2002
Full Text: View/download PDF

29. An orthogonal set of frequency and amplitude modulated (FAM) functions for variable resolution signal analysis

Author: Toomas Altosaar and Unto K. Laine
Subjects: symbols.namesake, Fourier transform, Orthogonality, Frequency domain, Speech recognition, symbols, Trigonometric functions, Orthogonal functions, Filter bank, Algorithm, Frequency modulation, Digital filter, Mathematics
Abstract: A general formula for defining a wide class of orthogonal functions is given. The class is based on circular sine and cosine functions which are simultaneously frequency and amplitude modulated in such a way that they remain orthogonal. This is achieved with any choice of FM or AM function. The class, which is called FAM functions, offers a practical and flexible tool for signal processing. They have been used to produce nonuniform resolution auditory spectrograms. The achieved time-frequency resolution is of very high quality. The preliminary results show that they are approaching the theoretical limit given for the Delta f- Delta t product. The orthogonality of the FAM functions is proved, how a complex orthogonal auditory transform (OAT) can be realized by FAMs is described, and a method for constructing a complex orthogonal one Bark filter bank for signal analysis and psychoacoustic experimentation is given. >
Published: 2002
Full Text: View/download PDF

30. Warped filters and their audio applications

Author: Jyri Huopaniemi, Unto K. Laine, Matti Karjalainen, and Aki Härmä
Subjects: Audio signal, Computer science, Speech recognition, Audio analyzer, Speech coding, Electronic engineering, Audio signal flow, Audio signal processing, computer.software_genre, computer, Audio filter, Sub-band coding, Digital audio
Abstract: An inherent property of many DSP algorithms is that they tend to exhibit uniform frequency resolution from zero to the Nyquist frequency. This is a direct consequence of using unit delays as building blocks; a frequency independent delay implies uniform frequency resolution. In audio applications, however, this is often an undesirable feature since the response properties are typically specified and measured on a logarithmic scale, following the behavior of the human auditory system. We present an overview of warped filters and DSP techniques which can be designed to better match the audio and auditory criteria. Audio applications, including modeling of auditory and musical phenomena, equalization techniques, auralization, and audio coding, are presented.
Published: 2002
Full Text: View/download PDF

31. Realizable warped IIR filters and their properties

Author: Matti Karjalainen, Aki Härmä, and Unto K. Laine
Subjects: Finite impulse response, Robustness (computer science), business.industry, Control theory, Prototype filter, Network synthesis filters, business, Topology, Digital filter, Infinite impulse response, Digital signal processing, All-pass filter, Mathematics
Abstract: Digital filters where unit delays are replaced with frequency dependent delays, such as first order allpass sections, are often called warped filters since they implement filter specifications on a warped non-uniform frequency scale. Warped IIR (WIIR) filters cannot be realized directly due to delay free loops. Specific solutions have been known that make WIIR filters realizable but no general approach has been available so far. In this paper we will explore the generation of such filters, including new filter structures. The robustness and computational efficiency of WIIR filters are studied and most potential applications are discussed.
Published: 2002
Full Text: View/download PDF

32. An experimental audio codec based on warped linear prediction of complex valued signals

Author: Matti Karjalainen, Unto K. Laine, and Aki Härmä
Subjects: Signal processing, Computer science, Microphone, Speech recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Linear prediction, Acoustic wave, Linear predictive coding, Speech processing, Wideband audio, Adaptive Multi-Rate audio codec, Audio codec, Codec, Data compression
Abstract: Bark-scale warped linear prediction (WLP) is a very potential core for a monophonic perceptual audio codec. In the current paper the WLP scheme is extended for processing complex valued signals (CWLP). Three different methods of converting a stereo signal to one complex valued signal are introduced. The philosophy behind the coding scheme is to integrate some aspects of modern wideband audio coding (e.g. perceptuality and stereo signal processing) into one computational element in order to find a more holistic and economic way of processing.
Published: 2002
Full Text: View/download PDF

33. Generalized linear prediction based on analytic signals

Author: Unto K. Laine
Subjects: Root mean square, Finite impulse response, Control theory, Pole–zero plot, Spectral flatness, Linear prediction, Filter (signal processing), Speech processing, Algorithm, All-pass filter, Mathematics
Abstract: The conventional theory of linear prediction (LP) is renewed and extended to form a more flexible algorithm called generalized linear prediction (GLP). There are three new levels of generalization available. On the first level (I) the predictor FIR is replaced with a generalized FIR constructed out of allpass sections having complex coefficients. On the second level (II) the allpass filters have distributed coefficients, i.e., they are unequal, and on the third and the most general level (III) the filter sections may have different characteristics. The theory of GLP is presented and the algorithm is tested with speech signals. The results show that GLP works as desired: nonuniform frequency resolution can be achieved and the resolution is controlled by the choice of the allpass parameters. On level I, the angle of the pole-zero-pair of the allpass sections defines the highest resolution area while the radius of the pole controls the degree of the resolution improvement. The GLP prediction error decreases rapidly with the order of the predictor. Its normalized RMS value falls off exponentially and its spectral flatness improves efficiently. On the average the results are clearly better than those of conventional LP. Levels II and III are only briefly discussed.
Published: 2002
Full Text: View/download PDF

34. On Block-Recursive Logarithmic Filterbanks

Author: Laine, Unto K.
Abstract: Publication in the conference proceedings of EUSIPCO, Tampere, Finland, 2000
Published: 2000
Full Text: View/download PDF

35. Modal synthesis and modeling of vowels

Author: Unto K. Laine
Published: 1999
Full Text: View/download PDF

36. Block-recursive, multirate filterbanks with arbitrary time-frequency plane tiling

Author: Unto K. Laine
Subjects: Approximation theory, Mathematical optimization, Channel (digital image), Approximation error, Plane (geometry), Pole–zero plot, Algorithm, Transfer function, Block (data storage), Mathematics, Time–frequency analysis
Abstract: A new method to realize arbitrary time-frequency plane tilings together with critical sampling in block-recursive filterbanks is presented. The method leads to pole-zero approximation of the target channel transfer functions. Perfect reconstruction within the limits of the approximation error can be achieved.
Published: 1999
Full Text: View/download PDF

37. On the utilization of overshoot effects in low-delay audio coding

Author: Matti Karjalainen, Unto K. Laine, and Aki Härmä
Subjects: Spectral envelope, Audio codec, Computer science, Speech recognition, Speech coding, Codec, Active listening, Linear prediction, Psychoacoustics, Sub-band coding, Coding (social sciences)
Abstract: In low-delay audio coding (coding delay
Published: 1999
Full Text: View/download PDF

38. Analysis of Pitch-Synchronous Modulation Effects by using Analytic Filters

Author: Laine, Unto K.
Abstract: Publication in the conference proceedings of EUSIPCO, Rhodes, Greece, 1998
Published: 1998
Full Text: View/download PDF

39. Backward Adaptive Warped Lattice for Wideband Stereo Coding

Author: Harma, Aki, Laine, Unto K., and Karjalainen, Matti
Abstract: Publication in the conference proceedings of EUSIPCO, Rhodes, Greece, 1998
Published: 1998
Full Text: View/download PDF

40. Warped Linear Predictive Audio Coding in Video Conferencing Application

Author: Palomaki, Kalle, Harma, Aki, and Laine, Unto K.
Abstract: Publication in the conference proceedings of EUSIPCO, Rhodes, Greece, 1998
Published: 1998
Full Text: View/download PDF

41. Critically sampled PR filterbanks of nonuniform resolution based on block recursive FAMlet transform

Author: Unto K. Laine
Published: 1997
Full Text: View/download PDF

42. Speech analysis using complex orthogonal auditory transform (coat)

Author: Unto K. Laine
Published: 1992
Full Text: View/download PDF

43. Time-frequency And Multiple-resolution Representations In Auditory Modeling

Author: Toomas Altosaar, Unto K. Laine, and Matti Karjalainen
Subjects: Computer science, Frequency domain, Speech recognition, Emphasis (telecommunications), Resolution (electron density), Baseband, Spectrogram, Filter bank, Frequency modulation, Time–frequency analysis
Published: 1991
Full Text: View/download PDF

44. A model for real-time sound synthesis of guitar on a floating-point signal processor

Author: Matti Karjalainen and Unto K. Laine
Subjects: Digital signal processor, Floating point, Finite impulse response, Computer science, Speech recognition, Acoustics, String (computer science), Speech synthesis, computer.software_genre, Resonator, Distortion (music), Computer Science::Sound, Electronic music, Distortion, Guitar, computer, Vocal tract, Interpolation
Abstract: Algorithms that can be used to synthesize guitar sounds on a floating-point signal processor are presented. A finite impulse response (FIR) Lagrange interpolator is introduced to implement the efficient and precise fractional delay approximation that is needed to achieve arbitrary and varying-length strings. This kind of interpolation is especially good in avoiding distortion and undesirable extra effects when the string length is changing continuously during the synthesis of a sound. The interpolator can also be used in other cases, e.g. in transmission-line modeling of acoustic tube resonators in wind instruments and for vocal tract models in speech synthesis. In addition to the interpolation principle, the implementation of the guitar string model on the TMS320C30 floating-point signal processor is described. >
Published: 1991
Full Text: View/download PDF

45. Aspects In Modeling And Real-time Synthesis Of The Acoustic Guitar

Author: Unto K. Laine, Vesa Välimäki, and Matti Karjalainen
Subjects: Engineering, Signal processing, business.industry, String (computer science), Filter (signal processing), Transfer function, Computer Science::Sound, Line (geometry), Electronic engineering, Guitar, business, Algorithm, Digital filter, All-pass filter
Abstract: This paper will address the problem of modeling the acoustic guitar for real-time synthesis on signal processairs. We will present a scheme for modeling the string for high-quality sound synthesis when the length of Ihe string is changing dynamically. We will focus also on the problem of modeling the body of the guitar for real-time synthesis. Filter-based approaches were experimented by LPC estimation, IIR-filter synthesis and FIR-filter approximation. Perceptual evaluation was used and taken into account. Real-time synthesis was implemented on the TMS32OC30 floating-point signal processor. The presentation includes audio examples. Introduction Computational modeling of musical instruments is an alternative to commonly used and more straightforward sound synthesis techniques like FA4 synthesis and waveform sampling. The traditional,approach 10 efficient modeling of a vibrating string has been to use proper digital filters or transmission lines, see e.g. Kauplus and Strong [l] and its extensions by Jaffe and Smith [2]. These represent "semiphysical" modeling where only some of the most fundamental features of the string, especially the transmission line property, are retained to achieve efficient computation. More complete finite element models and other kinds of physical modeling may lead to very realistic sounds but tend to be computationally too expensive for real-time purposes. Modeling of the guitar body for real-time sound synthesis seems too difficult unless a digital filter approach to approximate the transfer function is used. The derivation of the detailed transfer function from mechanical and acoustical parameters seems impossible. The remaining choice is to estimate the transfer function filter from measurements of a real guitar or to design a filter that approximates the general properties of the real guiltar body. In addition to strings and body the interactions between them (at least between the strings) should be included. String Modeling The natural way of modeling a guitar string is to describe it as a two-directional transmission or delay line (see Fig. la.) where the vibrational waves travel in both directions, reflecting at both ends. If all losses and other nonidealities are reduced to the reflection filters at the end points the computation of the ideal string is efficient by using two delay lines. The next problem is how to approximate the fractional part of the delay to achieve any (non-integer) length of the delay Wine. Allpass filters [2] are considered as a good solution if the string length is fixed. If the length is dynamically varying, however, it is very difficult to avoid transients and glitches when Ihe integer part of the delay line must change its length. E x c i t a t i d po in t
Published: 1991
Full Text: View/download PDF

46. A comparison of egg and a new automatic inverse filtering method in phonation change from breathy to normal

Author: Paavo Alku, Erkki Vilkman, and Unto K. Laine
Published: 1990
Full Text: View/download PDF

47. Speech Synthesizer in the Finnish Language

Author: T. Rahko, Matti Karjalainen, and Unto K. Laine
Subjects: Finnish language, Linguistics and Language, Computers, Computer science, business.industry, Speech recognition, Speech synthesis, Self-Help Devices, LPN and LVN, computer.software_genre, Language and Linguistics, Communication Aids for Disabled, Speech and Hearing, Humans, Artificial intelligence, business, computer, Finland, Natural language processing, Language
Published: 1980
Full Text: View/download PDF

48. Higher pole correction in vocal tract models and terminal analogs

Author: Unto K. Laine
Subjects: Linguistics and Language, Polynomial, Speech production, Computer science, Communication, Speech recognition, Function (mathematics), Transfer function, Language and Linguistics, Computer Science Applications, Terminal (electronics), Computer Science::Sound, Transmission line, Modeling and Simulation, Computer Vision and Pattern Recognition, Algorithm, Software, Vocal tract, Variable (mathematics)
Abstract: The Higher Pole Correction (HPC) function in analog and digital all-pole modelling of speech production is analyzed by comparing all-pole models with a Transmission Line (TL) model. The validity of the TL model, which was chosen as a computational reference system in the study, is tested by comparing its transfer functions to acoustical measurements made on a physical vocal tract model. The variation of effective length of the vocal tract turned out to be an important parameter in modelling the HPC. Even if the frequency responses of the HPC in analog and digital cases differ, the relative changes in the correction, influenced by the variations in the effective length of the vocal tract, are exactly the same in both cases. Therefore digital realizations should have a variable HPC also. A polynomial analysis of the vocal tract transfer function was done to obtain new practical models for the HPC. The work results in all-zero models, which can be used in analog as well as digital all-pole realizations to form a new type of pole-zero model for speech production. This new pole-zero model is related to the PARCAS terminal analog model [Laine, 1982].
Published: 1988
Full Text: View/download PDF

49. Speech audiometry by a speech synthesizer

Author: Unto K. Laine, Matti Karjalainen, T. Rahko, and S. Lavonen
Subjects: Adult, Male, media_common.quotation_subject, Speech recognition, Speech synthesis, Deafness, computer.software_genre, Data processing system, Presentation, Audiometry, Preliminary report, Humans, Medicine, Correction of Hearing Impairment, Hearing Disorders, media_common, business.industry, Communication, Auditory Threshold, General Medicine, Test (assessment), Otosclerosis, Otorhinolaryngology, Speech Perception, Head and neck surgery, Speech audiometry, Female, business, computer
Abstract: A preliminary report on speech test results with a portable, text-to-speech synthesizer is presented. The differentiation scores achieved at speed 80 words/min vary. So far the best mean differentiation scores in normal material are 75%. The increase of the presentation level improves the differentiation score, as does the decrease of word speed and training. The future and present uses of this system are discussed. These include: devices for the handicapped, e.g. to produce speech for the mute, man-machine communication through speech in industry control, data processing systems and uses in audiological diagnostics. The study is continued.
Published: 1979
Full Text: View/download PDF

50. Contents, Vol. 32, 1980

Author: Piotr Świdziński, H.-G. Streubel, E. Schleier, P.J. Bradley, L. Hoover, Sanjeeva Murthy, M.D. Buffalo, J G Heidelbach, John K. Torgerson, H. Schwickardi, Alvirda Farmer, James C. Montague, Unto K. Laine, B. Rydzewski, P.M. Stell, T. Rahko, Antoni Pruszewicz, Penny L. Carter, M Flach, Matti Karjalainen, Daniel E. Martin, and Andrzej Obrębowski
Subjects: Speech and Hearing, Linguistics and Language, LPN and LVN, Language and Linguistics
Published: 1980
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Journal

Database

Publisher

54 results on '"Laine, Unto K.'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources