54 results on '"Laine, Unto K.'
Search Results
2. Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies
- Author
-
Unto K. Laine, Dept Signal Process and Acoust, Aalto-yliopisto, and Aalto University
- Subjects
Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Feature extraction ,speech analysis ,020206 networking & telecommunications ,Pattern recognition ,02 engineering and technology ,Filter bank ,01 natural sciences ,Computer Science::Sound ,Perception ,time-frequency methods ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,pitch-synchronous analysis ,Artificial intelligence ,business ,010301 acoustics ,media_common - Abstract
Speech signal consists of events in time and frequency, and therefore its analysis with high-resolution time-frequency tools is often of importance. Analytic filter bank provides a simple, fast, and flexible method to construct time-frequency representations of signals. Its parameters can be easily adapted to different situations from uniform to any auditory frequency scale, or even to a focused resolution. Since the Hilbert magnitude values of the channels are obtained at every sample, it provides a practical tool for a high-resolution time-frequency analysis. The present study describes the basic theory of analytic filters and tests their main properties. Applications of analytic filter bank to different speech analysis tasks including pitch period estimation and pitch synchronous analysis of formant frequencies and bandwidths are demonstrated. In addition, a new feature vector called group delay vector is introduced. It is shown that this representation provides comparable, or even better results, than those obtained by spectral magnitude feature vectors in the analysis and classification of vowels. The implications of this observation are discussed also from the speech perception point of view.
- Published
- 2017
- Full Text
- View/download PDF
3. Classification of audio events using permutation transformation
- Author
-
Seppo Fagerlund and Unto K. Laine
- Subjects
Structure (mathematical logic) ,Audio mining ,Permutation (music) ,Acoustics and Ultrasonics ,business.industry ,Speech recognition ,Feature extraction ,Pattern recognition ,Domain (software engineering) ,Transformation (function) ,Frequency domain ,Pattern recognition (psychology) ,Artificial intelligence ,business ,Mathematics - Abstract
Automatic detection and classification of short and nonstationary events in noisy signals is widely considered to be a difficult task for traditional frequency domain and even time–frequency domain approaches. A novel method for audio signal classification is introduced. It is based on statistical properties of the temporal fine structure of audio events. Artificially generated random signals and unvoiced stop consonants of speech are used to evaluate the method. The results show improved recognition accuracy in comparison to traditional approaches.
- Published
- 2014
- Full Text
- View/download PDF
4. Feedback and imitation by a caregiver guides a virtual infant to learn native phonemes and the skill of speech inversion
- Author
-
Okko Räsänen, Heikki Rasilo, Unto K. Laine, and Informatics and Applied Informatics
- Subjects
Linguistics and Language ,Speech acquisition ,Caregiver feedback ,First language ,Speech recognition ,Place of articulation ,01 natural sciences ,050105 experimental psychology ,Language and Linguistics ,Babbling ,imitation ,0103 physical sciences ,0501 psychology and cognitive sciences ,Speech inversion ,010301 acoustics ,Communication ,05 social sciences ,Articulatory modeling ,Phonetic learning ,Inversion (meteorology) ,Language acquisition ,Computer Science Applications ,language acquisition ,Modeling and Simulation ,Computer Vision and Pattern Recognition ,Mel-frequency cepstrum ,Psychology ,Software - Abstract
Despite large-scale research, development of robust machines for imitation and inversion of human speech into articulatory movements has remained an unsolved problem. We propose a set of principles that can partially explain real infants' speech acquisition processes and the emergence of imitation skills and demonstrate a simulation where a learning virtual infant (LeVI) learns to invert and imitate a virtual caregiver's speech. Based on recent findings in infants' language acquisition, LeVI learns the phonemes of his native language in a babbling phase using only caregiver's feedback as guidance and to map acoustically differing caregiver's speech into its own articulation in a phase where LeVI is imitated by the caregiver with similar, but not exact, utterances. After the learning stage, LeVI is able to recognize vowels from the virtual caregiver's VCVC utterances perfectly and all 25 Finnish phonemes with an average accuracy of 88.42%. The place of articulation of consonants is recognized with an accuracy of 96.81%. LeVI is also able to imitate the caregiver's speech since the recognition occurs directly in the domain of articulatory programs for phonemes. The learned imitation ability (speech inversion) is strongly language dependent since it is based on the phonemic programs learned from the caregiver. The findings suggest that caregivers' feedback can act as an important signal in guiding infants' articulatory learning, and that the speech inversion problem can be effectively approached from the perspective of early speech acquisition.
- Published
- 2013
- Full Text
- View/download PDF
5. A method for noise-robust context-aware pattern discovery and recognition from categorical sequences
- Author
-
Unto K. Laine and Okko Räsänen
- Subjects
business.industry ,Computer science ,Speech recognition ,Context (language use) ,Pattern recognition ,Task (project management) ,Set (abstract data type) ,Artificial Intelligence ,Signal Processing ,Pattern recognition (psychology) ,Feature (machine learning) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Noise (video) ,business ,Spatial analysis ,Categorical variable ,Software - Abstract
An efficient method for weakly supervised pattern discovery and recognition from discrete categorical sequences is introduced. The method utilizes two parallel sources of data: categorical sequences carrying some temporal or spatial information and a set of labeled, but not exactly aligned, contextual events related to the sequences. From these inputs the method builds associative models able to describe systematically co-occurring structures in the input streams. The learned models, based on transitional probabilities of events observed at several different time lags, inherently segment and classify novel sequences into contextual categories. Learning and recognition processes are purely incremental and computationally cheap, making the approach suitable for on-line learning tasks. The capabilities of the algorithm are demonstrated in a keyword learning task from continuous infant-directed speech and a continuous speech recognition task operating at varying noise levels.
- Published
- 2012
- Full Text
- View/download PDF
6. New parametric representations of bird sounds for automatic classification
- Author
-
Unto K. Laine and Seppo Fagerlund
- Subjects
Set (abstract data type) ,Identification (information) ,business.industry ,Computer science ,Speech recognition ,Pattern recognition ,Artificial intelligence ,Bird vocalization ,Representation (mathematics) ,Focus (optics) ,business ,Parametric statistics - Abstract
Identification of bird species based on their vocalization is studied in this paper. The main focus is introducing a new parametric representation of bird sounds for automatic identification of their species. The method is based on the statistics of local temporal patterns in bird vocalization. Two different sets of bird species are used in the classification tests. The first set contains six species that often produce inharmonic sounds. For the second set, four species that produce very different types of sounds were added. Recognition results using a k-NN-classifier shows improved recognition accuracy over the results obtained by MFCC-features.
- Published
- 2014
- Full Text
- View/download PDF
7. A comparison of warped and conventional linear predictive coding
- Author
-
Aki Härmä and Unto K. Laine
- Subjects
Signal processing ,Audio signal ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,Speech coding ,Linear prediction ,Linear predictive coding ,Wideband audio ,Computer Vision and Pattern Recognition ,Electrical and Electronic Engineering ,Wideband ,Software ,Coding (social sciences) - Abstract
Frequency-warped signal processing techniques are attractive to many wideband speech and audio applications since they have a clear connection to the frequency resolution of human hearing. A warped version of linear predictive coding (LPC) is studied. The performance of conventional and warped LPC algorithms are compared in a simulated coding system using listening tests and conventional technical measures. The results indicate that the use of warped techniques is beneficial especially in wideband coding and may result in savings of one bit per sample compared to the conventional algorithm while retaining the same subjective quality.
- Published
- 2001
- Full Text
- View/download PDF
8. Automatic self-supervised learning of associations between speech and text
- Author
-
Okko Räsänen, Unto K. Laine, and Juha Knuuttila
- Subjects
Self supervised learning ,business.industry ,Computer science ,Speech recognition ,Artificial intelligence ,computer.software_genre ,business ,computer ,Natural language processing - Published
- 2013
- Full Text
- View/download PDF
9. Time-frequency integration characteristics of hearing are optimized for perception of speech-like acoustic patterns
- Author
-
Unto K. Laine and Okko Räsänen
- Subjects
Sound Spectrography ,Acoustics and Ultrasonics ,Computer science ,Acoustics ,media_common.quotation_subject ,Speech recognition ,Signal ,Speech Acoustics ,Loudness ,Pitch Discrimination ,Critical band ,Arts and Humanities (miscellaneous) ,Perception ,Learning rule ,medicine ,Auditory system ,Humans ,Attention ,Psychoacoustics ,media_common ,Time–frequency analysis ,medicine.anatomical_structure ,Time Perception ,Speech Perception ,Cues ,Perceptual Masking - Abstract
Several psychoacoustic phenomena such as loudness perception, absolute thresholds of hearing, and perceptual grouping in time are affected by temporal integration of the signal in the auditory system. Similarly, the frequency resolution of the hearing system, often expressed in terms of critical bands, implies signal integration across neighboring frequencies. Although progress has been made in understanding the neurophysiological mechanisms behind these processes, the underlying reasons for the observed integration characteristics have remained poorly understood. The current work proposes that the temporal and spectral integration are a result of a system optimized for pattern detection from ecologically relevant acoustic inputs. This argument is supported by a simulation where the average time-frequency structure of speech that is derived from a large set of speech signals shows a good match to the time-frequency characteristics of the human auditory system. The results also suggest that the observed integration characteristics are learnable from acoustic inputs of the auditory environment using a Hebbian-like learning rule.
- Published
- 2013
10. Attention based temporal filtering of sensory signals for data redundancy reduction
- Author
-
Okko Räsänen, Sofoklis Kakouros, and Unto K. Laine
- Subjects
Computer science ,business.industry ,Feature extraction ,Context (language use) ,Pattern recognition ,computer.software_genre ,Identification (information) ,Data redundancy ,Pattern recognition (psychology) ,Artificial intelligence ,Data mining ,business ,computer ,Data compression - Abstract
Since modern computational devices are required to store and process increasing amounts of data generated from various sources, efficient algorithms for identification of significant information in the data are becoming essential. Sensory recordings are one example where automatic and continuous storing and processing of large amounts of data is needed. Therefore, algorithms that can alleviate the computational load of the devices and reduce their storage requirements by removing uninformative data are important. In this work we propose a method for data reduction based on theories of human attention. The method detects temporally salient events based on the context in which they occur and retains only those sections of the input signal. The algorithm is tested as a pre-processing stage in a weakly supervised keyword learning experiment where it is shown to significantly improve the quality of the codebooks used in the pattern discovery process.
- Published
- 2013
- Full Text
- View/download PDF
11. Splitting the unit delay [FIR/all pass filters design]
- Author
-
Matti Karjalainen, Vesa Välimäki, Unto K. Laine, and Timo Laakso
- Subjects
Signal processing ,Digital delay line ,Finite impulse response ,Computer science ,Applied Mathematics ,Delay ,Signal Processing ,Electronic engineering ,Array processing ,Electrical and Electronic Engineering ,Digital filter ,All-pass filter ,Group delay and phase delay - Abstract
A fractional delay filter is a device for bandlimited interpolation between samples. It finds applications in numerous fields of signal processing, including communications, array processing, speech processing, and music technology. We present a comprehensive review of FIR and allpass filter design techniques for bandlimited approximation of a fractional digital delay. Emphasis is on simple and efficient methods that are well suited for fast coefficient update or continuous control of the delay value. Various new approaches are proposed and several examples are provided to illustrate the performance of the methods. We also discuss the implementation complexity of the algorithms. We focus on four applications where fractional delay filters are needed: synchronization of digital modems, incommensurate sampling rate conversion, high-resolution pitch prediction, and sound synthesis of musical instruments.
- Published
- 1996
- Full Text
- View/download PDF
12. Comparison of Classifiers in Audio and Acceleration Based Context Classification in Mobile Phones
- Author
-
Laine, Unto K., Leppänen, Jussi, Räsänen, Okko, and Saarinen, Jukka
- Abstract
Publication in the conference proceedings of EUSIPCO, Barcelona, Spain, 2011
- Published
- 2011
- Full Text
- View/download PDF
13. Stop consonant recognition by temporal fine structure of burst
- Author
-
Seppo Fagerlund and Unto K. Laine
- Subjects
Computer science ,Speech recognition ,Stop consonant ,Structure (category theory) - Published
- 2011
- Full Text
- View/download PDF
14. Aspect of the physiological sources of vocal vibrato: A study of fundamental-period-synchronous changes in electroglottographic signals obtained from one singer and two excised human larynges
- Author
-
Erkki Vilkman, Anne-Maria Laukkanen, and Unto K. Laine
- Subjects
medicine.anatomical_structure ,Register (music) ,Vocal folds ,Acoustics ,medicine ,Falsetto ,Cricothyroid articulation ,General Medicine ,Phonation ,Singing ,Loudness ,Mathematics ,Vibrato - Abstract
The following experiment was carried out in order to see whether it is possible to get information about the physiological mechanisms of fundamental frequency variation in the vibrato of singing voice by investigating the fundamental-period-synchronous changes in electroglottographic signals. Electroglottograms were taken from one trained female amateur singer while singing /a:/ at comfortable pitch and loudness level in the chest register mode a) with habitual vibrato, b) with exaggerated vibrato and c) while being abruptly pushed on the abdominal wall. For a comparison, EGG-signals were taken from samples produced with two excised human larynges; in those samples fundamental frequency was changed either by varying subglottic pressure or introducing longitudinal stretch on the vocal folds by manual rotation of the cricothyroid articulation. The results suggest that only if phonation shifts from modal to falsetto register changes in amplitude, SQ and QOQ of the EGG-signal differ for laryngeally produced a...
- Published
- 1992
- Full Text
- View/download PDF
15. Self-learning vector quantization for pattern discovery from speech
- Author
-
Unto K. Laine, Toomas Altosaar, and Okko Räsänen
- Subjects
Linde–Buzo–Gray algorithm ,Learning vector quantization ,Computer science ,business.industry ,Speech recognition ,Vector quantization ,k-means clustering ,Pattern recognition ,Computer Science::Sound ,Canopy clustering algorithm ,Artificial intelligence ,Time series ,Cluster analysis ,business - Abstract
A novel and computationally straightforward clustering algorithm was developed for vector quantization (VQ) of speech signals for a task of unsupervised pattern discovery (PD) from speech. The algorithm works in purely incremental mode, is computationally extremely feasible, and achieves comparable classification quality with the well-known k-means algorithm in the PD task. In addition to presenting the algorithm, general findings regarding the relationship between the amounts of training material, convergence of the clustering algorithm, and the ultimate quality of VQ codebooks are discussed. Index Terms: speech recognition, pattern discovery, time series analysis, vector quantization, data clustering
- Published
- 2009
- Full Text
- View/download PDF
16. A noise robust method for pattern discovery in quantized time series: the concept matrix approach
- Author
-
Okko Räsänen, Toomas Altosaar, and Unto K. Laine
- Subjects
Set (abstract data type) ,Task (computing) ,Matrix (mathematics) ,Discrete time and continuous time ,Series (mathematics) ,Computer science ,business.industry ,Pattern recognition ,Artificial intelligence ,Noise (video) ,business - Abstract
An efficient method for pattern discovery from discrete time series is introduced in this paper. The method utilizes two parallel streams of data, a discrete unit time-series and a set of labeled events, From these inputs it builds associative models between systematically co-occurring structures existing in both streams. The models are based on transitional probabilities of events at several different time scales. Learning and recognition processes are incremental, making the approach suitable for online learning tasks. The capabilities of the algorithm are demonstrated in a continuous speech recognition task operating in varying noise levels.
- Published
- 2009
- Full Text
- View/download PDF
17. Unsupervised segmentation of continuous speech using vector autoregressive time-frequency modeling errors
- Author
-
Petri Korhonen and Unto K. Laine
- Subjects
Vocabulary ,Speech production ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Speech technology ,Speech synthesis ,Pattern recognition ,computer.software_genre ,Speech segmentation ,Autoregressive model ,Phone ,Segmentation ,Artificial intelligence ,business ,computer ,Utterance ,media_common - Abstract
A vector autoregressive (VAR) model is used in the auditorytime-frequency domain to predict spectral changes. Forwardand backward prediction errors increases at the phone bound-aries. These error signals are then used to study and detect theboundaries of the largest changes allowing the most reliableautomatic segmentation. Using a fully unsupervised methodyields segments consisting of a variable number of phones. Thequality of performance of this method was tested with a set of150 Finnish sentences pronounced by one female and two malespeakers. The performance for English was tested using theTIMIT core test set. The boundaries between stops and vowels,in particular, are detected with high probability and precision. 1. Introduction Many subfields of speech technology need robust methodsfor automatic phonetic speech segmentation. Preferably thesemethods would be fully speaker and language independent.They should perform segmentation without any prior infor-mation about the speaker or the utterance in question. Thesemethods should not apply any type of prior learning, andthey should be able to process unknown utterances in a fullyunsupervised manner. This paper describes a preliminary testof a novel method for automatic speech segmentation, whichfulfills the hard demands mentioned to a certain degree.Segmentation methods described in the literature can beclassified into explicit and implicit methods. They also varyin terms of segmentation units (e.g. phonemes, syllables,words). In explicit methods, the underlying phoneme sequenceis known prior to the segmentation. These methods are usedin speech synthesis for example. Implicit methods split theutterance into smaller units without using any informationabout the underlying phoneme sequence. These methods arebased on analyzing the acoustic properties of the signal anddetecting either spectrally stable parts or rapid variations ofsignal. An example of a method based on locating spectrallystable parts is in [1] where the correlation between parameterscomputed from nearby frames has been used as a measure ofstability. In [2], segment boundaries are implicitly detectedcomparing the means of frames around potential boundariesusing “jump-function.” In [3], the variations of short-termenergy function is used as a measure to produce syllable-likeunits using minimum phase group delay functions.In the case of continuous speech, the signal cannot bestrictly divided into stable and varying parts which wouldcorrespond one-to-one with phones and segment boundaries.No phone in continuous speech produces steady spectra,but instead within a phone there are always slow spectralmovements which are, to some degree, possible to predict.The method proposed in this paper does not detect these slowspectral variations, but rather is based on detecting unpredictedchanges in auditory time-frequency picture of speech at phoneboundaries. These unpredicted changes happen most oftenwhen moving from one phoneme class to another. Change inthe speech production mechanism changes the acoustic signalin an unpredictable manner. Knowing that not all transitionsproduce a large or rapid spectral change, a question of thisstudy is which kind of phone boundaries allows the mostreliable and robust detection by the method.When facing speaker-independent unlimited vocabulary(e.g. inflectional languages) continuous speech recognition, thewords have to be split into smaller units such as morphemes;hence, not every phone boundary needs to be detected. Seg-ments similar to syllables or morphemes consisting of one tomany phones do apply as well as long as the total number ofdifferent segments is not too high for modeling purposes.The novel method presented in this paper produces seg-ments consisting of phone clusters of different lengths. Thecore idea is to model the spectral variation by using VectorAutoregressive model (VAR). The model performs forward andbackward predictions in the auditory time-frequency domainwith associated prediction errors. The segment boundarycandidates are found based on these error signals.
- Published
- 2005
- Full Text
- View/download PDF
18. Measurements on the effects of glottal opening and flow on the glottal impedance
- Author
-
M. Karjalainen and Unto K. Laine
- Subjects
Inductance ,Resonator ,Materials science ,Turbulence ,Acoustics ,Flow (psychology) ,Resonance ,Tube (container) ,Electrical impedance ,Body orifice - Abstract
A new method to measure the acoustical impedance of an artificial glottal orifice is presented. The plate with the orifice is mounted to one end of a tube resonator with the other end being open. The impedance of the orifice seen from the tract can be solved from the resonance frequencies and bandwidths. The frequency characteristic of the orifice is easily otained under different DC-flow conditions. The results show that under turbulent flow the effective glottal inductance is clearly only a fraction of its flowless value. The measured glottal resistance is close to the theoretical value given by Flanagan's two-mass model.
- Published
- 2005
- Full Text
- View/download PDF
19. Aids for the handicapped based on 'Synte 2' speech synthesizer
- Author
-
J. Wood, Unto K. Laine, Matti Karjalainen, R. Toivonen, K. Haymond, and R. Folmar
- Subjects
Vocabulary ,Presentation ,Computer science ,media_common.quotation_subject ,Speech recognition ,Speech technology ,Speech synthesis ,Loudspeaker ,Speech processing ,computer.software_genre ,computer ,media_common - Abstract
SYNTE 2 is a low-cost, high-quality, text-to-speech synthesizer designed for Finnish but applicable also to other languages if "phoneme writing" is used. After its first presentation in 1977 it has been adapted to many communication aids for the handicapped. The first application was a portable speaking machine with unlimited vocabulary for the speech impared. This paper describes the present applications of SYNTE 2, including the speaking machine, a talking data terminal for blind computer programmers, a system for automatic production of spoken information for the blind, etc.
- Published
- 2005
- Full Text
- View/download PDF
20. An all-zero model for higher pole correction
- Author
-
Unto K. Laine
- Subjects
Computer Science::Performance ,Computer Science::Sound ,Error analysis ,Factorization of polynomials ,Speech recognition ,Attenuation ,Bandwidth (signal processing) ,Effective length ,Topology ,Digital filter ,Vocal tract ,Mathematics - Abstract
The higher pole correction (HPC) function in relation to the all-pole modelling of the vocal tract transmission is analyzed. A set of theoretical HPC curves for vocal tracts of different effective lengths are calculated. An all-zero model for HPC is proposed. This model is derived by a polynomial factorization method. The zeroes have broad bandwidths (1.5-2 kHz) and are located periodically with a spacing depending on the effective length. Thus the design of HPC filters for variable effective lengths is trivially achieved. A detailed error analysis for these models is given. The use of the all-zero HPC filters leads to a new type of pole-zero model for vocal tract transmission which can be used either in analog domain or digital domain. Conventionally, for a fixed number of poles and sampling frequency, the length of the vocal tract is assumed to be fixed (i.e. no HPC). However, with the proposed pole-zero model, HPC for variable effective lengths is automatically ensured.
- Published
- 2005
- Full Text
- View/download PDF
21. PARCAS, A new terminal analog model for speech synthesis
- Author
-
Unto K. Laine
- Subjects
Compact space ,Quality (physics) ,Formant ,Vowel ,Speech recognition ,Speech synthesis ,computer.software_genre ,Speech processing ,computer ,Transfer function ,Vocal tract ,Mathematics - Abstract
A new method to construct formant-type models for text-to-speech synthesis is described. The method consists of two phases: Firstly, the idealized acoustic transfer function of the uniform vocal tract is factorized into two partial transfer functions each including only every other formant of the original one. Secondly, the partial transfer functions, are approximated with proper rational, meromorphic functions. The method leads to a PARallel-CAScade model called PARCAS. In a typical text-to-speech application the model needs only 6 resonators and 16 control parameters. The special features of the PARCAS model lie in its structural compactness and simplicity to control. With this spesific structure the formant amplitudes in vowel sounds can be put close to the right levels by controlling the formant frequencies only. The same compact filter system can be used in the synthesis of all sounds including fricatives, nasals, transients and bursts. Also the mixed type excitation for voiced fricatives can easily be obtained. By informal listening of the synthesized speech it is found to be of high quality.
- Published
- 2005
- Full Text
- View/download PDF
22. Modelling of LIP radiation impedance in Z-domain
- Author
-
Unto K. Laine
- Subjects
Radiation impedance ,Mean squared error ,Sampling (signal processing) ,Speech recognition ,Mathematical analysis ,Trigonometric functions ,Sine ,Acoustic impedance ,Electrical impedance ,Omega ,Mathematics - Abstract
Three z-domain models for lip radiaton impedance are introduced. Two of them are based on the observation that the normalized acoustic impedance can be modelled as z(\omega) = C*[1.-\cos(\omegaT)] + j*B*\sin(\omegaT) . The values of the parameters C and B for different sampling frequencies and radiation areas are found by minimizing the mean square error (MSE) between the modelled and the acoustic impedance. Owing to the sine and cosine functions used, the modelled impedance can be transformed exactly and straightforwardly into the z-domain. The third model described is a pole-zero-model, the parameters of which are also optimized by MSE-criterion. The limits for acceptable errors of the modelled impedances are studied and the models compared to the well known Flanagan's model.
- Published
- 2005
- Full Text
- View/download PDF
23. Linear transforms and filterbanks based on vector ARMA models
- Author
-
Unto K. Laine
- Subjects
Linear map ,Mathematical optimization ,Iterative method ,Wavelet transform ,Pole–zero plot ,Basis function ,Impulse (physics) ,Filter bank ,Residual ,Algorithm ,Mathematics - Abstract
Linear transformations, like wavelet transforms, and filterbanks of IIR-type and of arbitrary time-frequency plane tilings can be efficiently realized by vector ARMA models. The quality of the realization depends on how well the basis functions or impulse responses of the filterbank can be approximated by the actual VARMA based pole-zero model. The vector AR part gives an MSE-optimal block-recursive model for the target basis functions. The vector MA part is formed of the vector AR residual and further optimized by an iterative algorithm.
- Published
- 2003
- Full Text
- View/download PDF
24. A new glottal LPC method for voice coding and inverse filtering
- Author
-
Unto K. Laine and Paavo Alku
- Subjects
Speech production ,Voice activity detection ,Codec2 ,Computer science ,Speech recognition ,Speech coding ,Speech processing ,Linear predictive coding ,Vector sum excited linear prediction ,Harmonic Vector Excitation Coding ,Vocal tract - Abstract
A linear-predictive-coding (LPC) based method for computing glottal pulses is presented. The method is based on modeling the speech production mechanism with three digital filters. The glottal contribution to the speech spectrum is first estimated with two consecutive LPC analyses. After the effect of the glottals are eliminated the vocal tract and radiation effects are modeled. Glottal waves close to the natural shape can be achieved with this straightforward method. Applications to speech coding are briefly described. >
- Published
- 2003
- Full Text
- View/download PDF
25. Famlet, to be or not to be a wavelet?
- Author
-
Unto K. Laine
- Subjects
Discrete wavelet transform ,symbols.namesake ,Wavelet ,Fourier transform ,Computer science ,Second-generation wavelet transform ,Mathematical analysis ,symbols ,Harmonic wavelet transform ,Fast wavelet transform ,Fractional Fourier transform ,Constant Q transform - Abstract
A class of orthogonal time domain functions called famlets is introduced. Famlets are produced from their frequency domain representatives by using the inverse Fourier transform. The basic theory behind famlets and the famlet transform associated with them is formulated, and their usage in nonuniform resolution spectrum analysis is described. The similarities and dissimilarities of famlets and wavelets are discussed. >
- Published
- 2003
- Full Text
- View/download PDF
26. MSE filter design and spectrum parameterization by orthogonal FAM transform
- Author
-
Unto K. Laine
- Subjects
Filter design ,Finite impulse response ,Mean squared error ,Control theory ,Frequency domain ,Orthonormal basis ,Digital filter ,Algorithm ,Infinite impulse response ,Linear phase ,Mathematics - Abstract
An orthonormal set of the frequency amplitude modulated (FAM) class of functions is used to produce a frequency domain FAM transform suitable for infinite impulse response (IIR) filter design with least squared error criterion. The orthonormal basis used is chosen to be effectively implemented by a set of identical second order allpass filters equipped with a common magnitude weighting. The allpass parameters (/spl theta/, r) are first adapted to the problem in order to get rapid convergence in the mean squared error (MSE). The results show that the new method produces IIR filters with the same or better (in MSE sense) frequency domain response than finite impulse response (FIR) filters with 5 to 7 times higher order. The method allows optimization of the group delay properties. There are three different ways to approximate linear phase characteristics in the IIR filters produced. >
- Published
- 2002
- Full Text
- View/download PDF
27. A study on auditory resolution using Bark-FAMlet clicks
- Author
-
M. Huotilainen and Unto K. Laine
- Subjects
Masking (art) ,Amplitude modulation ,Computer science ,Speech recognition ,Temporal resolution ,Resolution (electron density) ,Perceptual coding ,Phase (waves) ,Set (psychology) ,Frequency modulation - Abstract
Many areas in audio engineering, e.g., perceptual coding, strongly rely on knowledge and models of the human auditory system. Nonuniform frequency resolution and masking phenomenon are well studied aspects. Less attention has been paid to the temporal auditory resolution. In this pilot study the temporal resolution is tested by using short (0.55-4.15 ms) clicks called Bark-FAMlets. They form an orthogonal set of signals all having identical power spectra with uniform masking properties. Individual FAMlets differ only in phase by steps called an auditory unit delay (AUD). We found that 75% discrimination is achieved when the difference between two FAMlets is 22, 11 or 10 AUDs depending on the three test conditions used.
- Published
- 2002
- Full Text
- View/download PDF
28. Warped linear prediction (WLP) in speech and audio processing
- Author
-
Toomas Altosaar, Unto K. Laine, and Matti Karjalainen
- Subjects
Signal processing ,Audio signal ,Mathematics::Commutative Algebra ,Computer science ,Speech recognition ,Speech coding ,Linear prediction ,Filter bank ,Speech processing ,computer.software_genre ,Computer Science::Sound ,Orthonormal basis ,Image warping ,Audio signal processing ,computer - Abstract
A linear prediction process is applied to frequency warped signals. The warping is realized by using orthonormal FAM (frequency modulated complex exponentials) functions. The general formulation of WLP is given and effective realizations with allpass filters are studied. The application of auditory WLP to speech coding and speech recognition has given good results. >
- Published
- 2002
- Full Text
- View/download PDF
29. An orthogonal set of frequency and amplitude modulated (FAM) functions for variable resolution signal analysis
- Author
-
Toomas Altosaar and Unto K. Laine
- Subjects
symbols.namesake ,Fourier transform ,Orthogonality ,Frequency domain ,Speech recognition ,symbols ,Trigonometric functions ,Orthogonal functions ,Filter bank ,Algorithm ,Frequency modulation ,Digital filter ,Mathematics - Abstract
A general formula for defining a wide class of orthogonal functions is given. The class is based on circular sine and cosine functions which are simultaneously frequency and amplitude modulated in such a way that they remain orthogonal. This is achieved with any choice of FM or AM function. The class, which is called FAM functions, offers a practical and flexible tool for signal processing. They have been used to produce nonuniform resolution auditory spectrograms. The achieved time-frequency resolution is of very high quality. The preliminary results show that they are approaching the theoretical limit given for the Delta f- Delta t product. The orthogonality of the FAM functions is proved, how a complex orthogonal auditory transform (OAT) can be realized by FAMs is described, and a method for constructing a complex orthogonal one Bark filter bank for signal analysis and psychoacoustic experimentation is given. >
- Published
- 2002
- Full Text
- View/download PDF
30. Warped filters and their audio applications
- Author
-
Jyri Huopaniemi, Unto K. Laine, Matti Karjalainen, and Aki Härmä
- Subjects
Audio signal ,Computer science ,Speech recognition ,Audio analyzer ,Speech coding ,Electronic engineering ,Audio signal flow ,Audio signal processing ,computer.software_genre ,computer ,Audio filter ,Sub-band coding ,Digital audio - Abstract
An inherent property of many DSP algorithms is that they tend to exhibit uniform frequency resolution from zero to the Nyquist frequency. This is a direct consequence of using unit delays as building blocks; a frequency independent delay implies uniform frequency resolution. In audio applications, however, this is often an undesirable feature since the response properties are typically specified and measured on a logarithmic scale, following the behavior of the human auditory system. We present an overview of warped filters and DSP techniques which can be designed to better match the audio and auditory criteria. Audio applications, including modeling of auditory and musical phenomena, equalization techniques, auralization, and audio coding, are presented.
- Published
- 2002
- Full Text
- View/download PDF
31. Realizable warped IIR filters and their properties
- Author
-
Matti Karjalainen, Aki Härmä, and Unto K. Laine
- Subjects
Finite impulse response ,Robustness (computer science) ,business.industry ,Control theory ,Prototype filter ,Network synthesis filters ,business ,Topology ,Digital filter ,Infinite impulse response ,Digital signal processing ,All-pass filter ,Mathematics - Abstract
Digital filters where unit delays are replaced with frequency dependent delays, such as first order allpass sections, are often called warped filters since they implement filter specifications on a warped non-uniform frequency scale. Warped IIR (WIIR) filters cannot be realized directly due to delay free loops. Specific solutions have been known that make WIIR filters realizable but no general approach has been available so far. In this paper we will explore the generation of such filters, including new filter structures. The robustness and computational efficiency of WIIR filters are studied and most potential applications are discussed.
- Published
- 2002
- Full Text
- View/download PDF
32. An experimental audio codec based on warped linear prediction of complex valued signals
- Author
-
Matti Karjalainen, Unto K. Laine, and Aki Härmä
- Subjects
Signal processing ,Computer science ,Microphone ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Linear prediction ,Acoustic wave ,Linear predictive coding ,Speech processing ,Wideband audio ,Adaptive Multi-Rate audio codec ,Audio codec ,Codec ,Data compression - Abstract
Bark-scale warped linear prediction (WLP) is a very potential core for a monophonic perceptual audio codec. In the current paper the WLP scheme is extended for processing complex valued signals (CWLP). Three different methods of converting a stereo signal to one complex valued signal are introduced. The philosophy behind the coding scheme is to integrate some aspects of modern wideband audio coding (e.g. perceptuality and stereo signal processing) into one computational element in order to find a more holistic and economic way of processing.
- Published
- 2002
- Full Text
- View/download PDF
33. Generalized linear prediction based on analytic signals
- Author
-
Unto K. Laine
- Subjects
Root mean square ,Finite impulse response ,Control theory ,Pole–zero plot ,Spectral flatness ,Linear prediction ,Filter (signal processing) ,Speech processing ,Algorithm ,All-pass filter ,Mathematics - Abstract
The conventional theory of linear prediction (LP) is renewed and extended to form a more flexible algorithm called generalized linear prediction (GLP). There are three new levels of generalization available. On the first level (I) the predictor FIR is replaced with a generalized FIR constructed out of allpass sections having complex coefficients. On the second level (II) the allpass filters have distributed coefficients, i.e., they are unequal, and on the third and the most general level (III) the filter sections may have different characteristics. The theory of GLP is presented and the algorithm is tested with speech signals. The results show that GLP works as desired: nonuniform frequency resolution can be achieved and the resolution is controlled by the choice of the allpass parameters. On level I, the angle of the pole-zero-pair of the allpass sections defines the highest resolution area while the radius of the pole controls the degree of the resolution improvement. The GLP prediction error decreases rapidly with the order of the predictor. Its normalized RMS value falls off exponentially and its spectral flatness improves efficiently. On the average the results are clearly better than those of conventional LP. Levels II and III are only briefly discussed.
- Published
- 2002
- Full Text
- View/download PDF
34. On Block-Recursive Logarithmic Filterbanks
- Author
-
Laine, Unto K.
- Abstract
Publication in the conference proceedings of EUSIPCO, Tampere, Finland, 2000
- Published
- 2000
- Full Text
- View/download PDF
35. Modal synthesis and modeling of vowels
- Author
-
Unto K. Laine
- Published
- 1999
- Full Text
- View/download PDF
36. Block-recursive, multirate filterbanks with arbitrary time-frequency plane tiling
- Author
-
Unto K. Laine
- Subjects
Approximation theory ,Mathematical optimization ,Channel (digital image) ,Approximation error ,Plane (geometry) ,Pole–zero plot ,Algorithm ,Transfer function ,Block (data storage) ,Mathematics ,Time–frequency analysis - Abstract
A new method to realize arbitrary time-frequency plane tilings together with critical sampling in block-recursive filterbanks is presented. The method leads to pole-zero approximation of the target channel transfer functions. Perfect reconstruction within the limits of the approximation error can be achieved.
- Published
- 1999
- Full Text
- View/download PDF
37. On the utilization of overshoot effects in low-delay audio coding
- Author
-
Matti Karjalainen, Unto K. Laine, and Aki Härmä
- Subjects
Spectral envelope ,Audio codec ,Computer science ,Speech recognition ,Speech coding ,Codec ,Active listening ,Linear prediction ,Psychoacoustics ,Sub-band coding ,Coding (social sciences) - Abstract
In low-delay audio coding (coding delay
- Published
- 1999
- Full Text
- View/download PDF
38. Analysis of Pitch-Synchronous Modulation Effects by using Analytic Filters
- Author
-
Laine, Unto K.
- Abstract
Publication in the conference proceedings of EUSIPCO, Rhodes, Greece, 1998
- Published
- 1998
- Full Text
- View/download PDF
39. Backward Adaptive Warped Lattice for Wideband Stereo Coding
- Author
-
Harma, Aki, Laine, Unto K., and Karjalainen, Matti
- Abstract
Publication in the conference proceedings of EUSIPCO, Rhodes, Greece, 1998
- Published
- 1998
- Full Text
- View/download PDF
40. Warped Linear Predictive Audio Coding in Video Conferencing Application
- Author
-
Palomaki, Kalle, Harma, Aki, and Laine, Unto K.
- Abstract
Publication in the conference proceedings of EUSIPCO, Rhodes, Greece, 1998
- Published
- 1998
- Full Text
- View/download PDF
41. Critically sampled PR filterbanks of nonuniform resolution based on block recursive FAMlet transform
- Author
-
Unto K. Laine
- Published
- 1997
- Full Text
- View/download PDF
42. Speech analysis using complex orthogonal auditory transform (coat)
- Author
-
Unto K. Laine
- Published
- 1992
- Full Text
- View/download PDF
43. Time-frequency And Multiple-resolution Representations In Auditory Modeling
- Author
-
Toomas Altosaar, Unto K. Laine, and Matti Karjalainen
- Subjects
Computer science ,Frequency domain ,Speech recognition ,Emphasis (telecommunications) ,Resolution (electron density) ,Baseband ,Spectrogram ,Filter bank ,Frequency modulation ,Time–frequency analysis - Published
- 1991
- Full Text
- View/download PDF
44. A model for real-time sound synthesis of guitar on a floating-point signal processor
- Author
-
Matti Karjalainen and Unto K. Laine
- Subjects
Digital signal processor ,Floating point ,Finite impulse response ,Computer science ,Speech recognition ,Acoustics ,String (computer science) ,Speech synthesis ,computer.software_genre ,Resonator ,Distortion (music) ,Computer Science::Sound ,Electronic music ,Distortion ,Guitar ,computer ,Vocal tract ,Interpolation - Abstract
Algorithms that can be used to synthesize guitar sounds on a floating-point signal processor are presented. A finite impulse response (FIR) Lagrange interpolator is introduced to implement the efficient and precise fractional delay approximation that is needed to achieve arbitrary and varying-length strings. This kind of interpolation is especially good in avoiding distortion and undesirable extra effects when the string length is changing continuously during the synthesis of a sound. The interpolator can also be used in other cases, e.g. in transmission-line modeling of acoustic tube resonators in wind instruments and for vocal tract models in speech synthesis. In addition to the interpolation principle, the implementation of the guitar string model on the TMS320C30 floating-point signal processor is described. >
- Published
- 1991
- Full Text
- View/download PDF
45. Aspects In Modeling And Real-time Synthesis Of The Acoustic Guitar
- Author
-
Unto K. Laine, Vesa Välimäki, and Matti Karjalainen
- Subjects
Engineering ,Signal processing ,business.industry ,String (computer science) ,Filter (signal processing) ,Transfer function ,Computer Science::Sound ,Line (geometry) ,Electronic engineering ,Guitar ,business ,Algorithm ,Digital filter ,All-pass filter - Abstract
This paper will address the problem of modeling the acoustic guitar for real-time synthesis on signal processairs. We will present a scheme for modeling the string for high-quality sound synthesis when the length of Ihe string is changing dynamically. We will focus also on the problem of modeling the body of the guitar for real-time synthesis. Filter-based approaches were experimented by LPC estimation, IIR-filter synthesis and FIR-filter approximation. Perceptual evaluation was used and taken into account. Real-time synthesis was implemented on the TMS32OC30 floating-point signal processor. The presentation includes audio examples. Introduction Computational modeling of musical instruments is an alternative to commonly used and more straightforward sound synthesis techniques like FA4 synthesis and waveform sampling. The traditional,approach 10 efficient modeling of a vibrating string has been to use proper digital filters or transmission lines, see e.g. Kauplus and Strong [l] and its extensions by Jaffe and Smith [2]. These represent "semiphysical" modeling where only some of the most fundamental features of the string, especially the transmission line property, are retained to achieve efficient computation. More complete finite element models and other kinds of physical modeling may lead to very realistic sounds but tend to be computationally too expensive for real-time purposes. Modeling of the guitar body for real-time sound synthesis seems too difficult unless a digital filter approach to approximate the transfer function is used. The derivation of the detailed transfer function from mechanical and acoustical parameters seems impossible. The remaining choice is to estimate the transfer function filter from measurements of a real guitar or to design a filter that approximates the general properties of the real guiltar body. In addition to strings and body the interactions between them (at least between the strings) should be included. String Modeling The natural way of modeling a guitar string is to describe it as a two-directional transmission or delay line (see Fig. la.) where the vibrational waves travel in both directions, reflecting at both ends. If all losses and other nonidealities are reduced to the reflection filters at the end points the computation of the ideal string is efficient by using two delay lines. The next problem is how to approximate the fractional part of the delay to achieve any (non-integer) length of the delay Wine. Allpass filters [2] are considered as a good solution if the string length is fixed. If the length is dynamically varying, however, it is very difficult to avoid transients and glitches when Ihe integer part of the delay line must change its length. E x c i t a t i d po in t
- Published
- 1991
- Full Text
- View/download PDF
46. A comparison of egg and a new automatic inverse filtering method in phonation change from breathy to normal
- Author
-
Paavo Alku, Erkki Vilkman, and Unto K. Laine
- Published
- 1990
- Full Text
- View/download PDF
47. Speech Synthesizer in the Finnish Language
- Author
-
T. Rahko, Matti Karjalainen, and Unto K. Laine
- Subjects
Finnish language ,Linguistics and Language ,Computers ,Computer science ,business.industry ,Speech recognition ,Speech synthesis ,Self-Help Devices ,LPN and LVN ,computer.software_genre ,Language and Linguistics ,Communication Aids for Disabled ,Speech and Hearing ,Humans ,Artificial intelligence ,business ,computer ,Finland ,Natural language processing ,Language - Published
- 1980
- Full Text
- View/download PDF
48. Higher pole correction in vocal tract models and terminal analogs
- Author
-
Unto K. Laine
- Subjects
Linguistics and Language ,Polynomial ,Speech production ,Computer science ,Communication ,Speech recognition ,Function (mathematics) ,Transfer function ,Language and Linguistics ,Computer Science Applications ,Terminal (electronics) ,Computer Science::Sound ,Transmission line ,Modeling and Simulation ,Computer Vision and Pattern Recognition ,Algorithm ,Software ,Vocal tract ,Variable (mathematics) - Abstract
The Higher Pole Correction (HPC) function in analog and digital all-pole modelling of speech production is analyzed by comparing all-pole models with a Transmission Line (TL) model. The validity of the TL model, which was chosen as a computational reference system in the study, is tested by comparing its transfer functions to acoustical measurements made on a physical vocal tract model. The variation of effective length of the vocal tract turned out to be an important parameter in modelling the HPC. Even if the frequency responses of the HPC in analog and digital cases differ, the relative changes in the correction, influenced by the variations in the effective length of the vocal tract, are exactly the same in both cases. Therefore digital realizations should have a variable HPC also. A polynomial analysis of the vocal tract transfer function was done to obtain new practical models for the HPC. The work results in all-zero models, which can be used in analog as well as digital all-pole realizations to form a new type of pole-zero model for speech production. This new pole-zero model is related to the PARCAS terminal analog model [Laine, 1982].
- Published
- 1988
- Full Text
- View/download PDF
49. Speech audiometry by a speech synthesizer
- Author
-
Unto K. Laine, Matti Karjalainen, T. Rahko, and S. Lavonen
- Subjects
Adult ,Male ,media_common.quotation_subject ,Speech recognition ,Speech synthesis ,Deafness ,computer.software_genre ,Data processing system ,Presentation ,Audiometry ,Preliminary report ,Humans ,Medicine ,Correction of Hearing Impairment ,Hearing Disorders ,media_common ,business.industry ,Communication ,Auditory Threshold ,General Medicine ,Test (assessment) ,Otosclerosis ,Otorhinolaryngology ,Speech Perception ,Head and neck surgery ,Speech audiometry ,Female ,business ,computer - Abstract
A preliminary report on speech test results with a portable, text-to-speech synthesizer is presented. The differentiation scores achieved at speed 80 words/min vary. So far the best mean differentiation scores in normal material are 75%. The increase of the presentation level improves the differentiation score, as does the decrease of word speed and training. The future and present uses of this system are discussed. These include: devices for the handicapped, e.g. to produce speech for the mute, man-machine communication through speech in industry control, data processing systems and uses in audiological diagnostics. The study is continued.
- Published
- 1979
- Full Text
- View/download PDF
50. Contents, Vol. 32, 1980
- Author
-
Piotr Świdziński, H.-G. Streubel, E. Schleier, P.J. Bradley, L. Hoover, Sanjeeva Murthy, M.D. Buffalo, J G Heidelbach, John K. Torgerson, H. Schwickardi, Alvirda Farmer, James C. Montague, Unto K. Laine, B. Rydzewski, P.M. Stell, T. Rahko, Antoni Pruszewicz, Penny L. Carter, M Flach, Matti Karjalainen, Daniel E. Martin, and Andrzej Obrębowski
- Subjects
Speech and Hearing ,Linguistics and Language ,LPN and LVN ,Language and Linguistics - Published
- 1980
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.