48 results on '"Satoshi Tamura"'
Search Results
2. Multi-Angle Lipreading with Angle Classification-Based Feature Extraction and Its Application to Audio-Visual Speech Recognition
- Author
-
Satoshi Tamura, Satoru Hayamizu, Yuuto Gotoh, Masaki Nose, and Shinnosuke Isobe
- Subjects
Phrase ,Computer Networks and Communications ,Computer science ,Speech recognition ,Feature extraction ,02 engineering and technology ,Information technology ,01 natural sciences ,view classification ,0103 physical sciences ,Decision fusion ,010301 acoustics ,audio-visual speech recognition ,business.industry ,Deep learning ,automatic speech recognition ,deep learning ,Audio-visual speech recognition ,021001 nanoscience & nanotechnology ,T58.5-58.64 ,multi-angle lipreading ,visual speech recognition ,Face (geometry) ,Artificial intelligence ,0210 nano-technology ,Focus (optics) ,business - Abstract
Recently, automatic speech recognition (ASR) and visual speech recognition (VSR) have been widely researched owing to the development in deep learning. Most VSR research works focus only on frontal face images. However, assuming real scenes, it is obvious that a VSR system should correctly recognize spoken contents from not only frontal but also diagonal or profile faces. In this paper, we propose a novel VSR method that is applicable to faces taken at any angle. Firstly, view classification is carried out to estimate face angles. Based on the results, feature extraction is then conducted using the best combination of pre-trained feature extraction models. Next, lipreading is carried out using the features. We also developed audio-visual speech recognition (AVSR) using the VSR in addition to conventional ASR. Audio results were obtained from ASR, followed by incorporating audio and visual results in a decision fusion manner. We evaluated our methods using OuluVS2, a multi-angle audio-visual database. We then confirmed that our approach achieved the best performance among conventional VSR schemes in a phrase classification task. In addition, we found that our AVSR results are better than ASR and VSR results.
- Published
- 2021
3. The development of integrated road condition monitoring system for developing countries using smartphone sensors and dashcam in vehicles
- Author
-
Fernao A. L. Nobre Mouzinho, Hidekazu Fukai, Satoshi Tamura, Vosco Pereira, and Frederico Soares Cabral
- Subjects
Transport engineering ,Computer science ,ComputerSystemsOrganization_MISCELLANEOUS ,Developing country ,Monitoring system ,Road condition - Abstract
In developing countries like Timor-Leste, regular road condition monitoring is a significant subject not only for maintaining road quality but also for a national plan of road network construction. The sophisticated equipment for road surface inspection is so expensive that it is difficult to introduce them in developing countries, and the monitoring is usually achieved by manual operation. On the other hand, the utilization of ICT devices such as smartphones has gained much attention in recent years, especially in developing countries because the penetration rate of the smartphone is remarkably increasing even in developing countries. The smartphones equip various high precision sensors, i.e., accelerometers, gyroscopes, GPS, and so on, in the small body in low price. In this project, we are developing an integrated road condition monitoring system that consists of smartphones, dashcams, and a server. There are similar trials in advanced countries but not so many in developing countries. This system assumes to be used in developing countries. The system is very low cost and does not require trained specialists in the field side. The items that are automatically inspected in this system were carefully selected with the local ministry of public works and include paved and unpaved classification, road roughness, road width, detection and size estimation of potholes, bumps, etc., at present. All the inspected items are visualized in Google Maps, Open Street Map, or QGIS with GPS information. The survey results are collected on a server and updated to more accurate values by the repeated surveys. On the analysis, we use several state-of-the-art machine learning and deep learning techniques. In this paper, we summarize related works and introduce this project’s target and framework, which especially focused on the developing countries, and achievements of each of our tasks.
- Published
- 2021
- Full Text
- View/download PDF
4. Multi-angle lipreading using angle classification and angle-specific feature integration
- Author
-
Satoru Hayamizu, Shinnosuke Isobe, Masaki Nose, Satoshi Tamura, and Yuuto Gotoh
- Subjects
Phrase ,business.industry ,Computer science ,Deep learning ,Face (geometry) ,Feature extraction ,Feature (machine learning) ,Pattern recognition ,Artificial intelligence ,business ,Speech processing ,Facial recognition system ,Convolutional neural network - Abstract
Recently, visual speech recognition (VSR), or namely lipreading, has been widely researched due to development of Deep Learning (DL). The most lipreading researches focus only on frontal face images. However, assuming real scenes, it is obvious that a lipreading system should correctly recognize spoken contents not only from frontal but also side faces. In this paper, we propose a novel lipreading method that is applicable to faces taken at any angles, using Convolutional Neural Networks (CNNs) which is one of key deep-learning techniques. Our method consists of three parts; the view classification part, the feature extraction part and the integration part. We firstly apply angle classification to input faces. Based on the results, secondly we determine the best combination of pre-trained angle-specific feature extraction scheme. Finally, we integrate these features followed by DL-based lipreading. We evaluated our method using the open dataset OuluVs2dataset including multi-angle audiovisual data. We then confirmed our approach has achieved the best performance among conventional and the other DL-based lipreading schemes in the phrase classification task.
- Published
- 2021
- Full Text
- View/download PDF
5. Combination of temporal and spatial denoising methods for cine MRI
- Author
-
Keigo Kawaji, Satoshi Tamura, Tsubasa Maeda, and Satoru Hayamizu
- Subjects
Signal processing ,Noise measurement ,Artificial neural network ,business.industry ,Computer science ,Noise reduction ,Physics::Medical Physics ,Signal ,Cine mri ,Unsupervised learning ,Computer vision ,Noise (video) ,Artificial intelligence ,business - Abstract
In this paper, we propose a denoising method for cine MRI acquired by MoPS. The MoPS-based cine MRI has a high FPS but contains reconstruction noise. DISPEL, a conventional method, performs denoising in the temporal domain. A neural network is further introduced to remove spatial noise. Different from most those methods requiring noisy and clean images, we choose an unsupervised scheme, N2N. We combine these two methods to perform temporal and spatial denoising for cine MRI. Experimental results show that the proposed method is able to remove noise from cine MRIs acquired by MoPS without removing tissue signal.
- Published
- 2021
- Full Text
- View/download PDF
6. Speech Recognition using Deep Canonical Correlation Analysis in Noisy Environments
- Author
-
Satoru Hayamizu, Satoshi Tamura, and Shinnosuke Isobe
- Subjects
Computer science ,business.industry ,Speech recognition ,Deep learning ,Artificial intelligence ,Canonical correlation ,business - Published
- 2021
- Full Text
- View/download PDF
7. Using Deep-Learning Approach to Detect Anomalous Vibrations of Press Working Machine
- Author
-
Kazuya Inagaki, Satoru Hayamizu, and Satoshi Tamura
- Subjects
Data collection ,business.industry ,Computer science ,Deep learning ,education ,Real-time computing ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Mixture model ,Autoencoder ,Vibration ,Factory (object-oriented programming) ,Anomaly detection ,Artificial intelligence ,Abnormality ,business - Abstract
In recent years, there has been a demand for advanced maintenance in factories. Data collection from factory equipment is being carried out, and the collected sensor data is widely used for statistical analysis in quality control and failure prediction by machine learning. For example, if it is possible to detect an abnormality using vibration data obtained from an equipment, increase in the operation rate of the plant can be expected. In this research, we aim at early detection of equipment failure by finding signs of abnormality from vibration data, using a deep-learning technique, particularly an autoencoder.
- Published
- 2020
- Full Text
- View/download PDF
8. Semantic Segmentation of Paved Road and Pothole Image Using U-Net Architecture
- Author
-
Satoshi Tamura, Hidekazu Fukai, Vosco Pereira, and Satoru Hayamizu
- Subjects
Computer science ,Intersection (set theory) ,business.industry ,Deep learning ,Image segmentation ,Object (computer science) ,Task (project management) ,ComputerSystemsOrganization_MISCELLANEOUS ,Technical report ,Pothole ,Segmentation ,Computer vision ,Artificial intelligence ,business - Abstract
Research on road monitoring system has been actively conducted by using both machine learning and deep learning technique. One of our nearest goal in the framework of road condition monitoring system is to segment all road related object and provide a technical report regarding road condition. Our final objective is to develop a community participant-based system for road condition monitoring. As one of our task, in this research, we start with the segmentation of road and pothole. To conduct this task we proposed a semantic segmentation method for road and pothole image segmentation by using one of the famous deep learning technique U-Net. Various condition of road images were used for training and validating the model. The experiment result showed that U-Net model can achieve 97 % of accuracy and 0.86 of mean Intersection Over Union (mIOU).
- Published
- 2019
- Full Text
- View/download PDF
9. Feature Extraction Methods Proposed for Speech Recognition Are Effective on Road Condition Monitoring Using Smartphone Inertial Sensors
- Author
-
Satoshi Tamura, Hidekazu Fukai, and Frederico Soares Cabral
- Subjects
Computer science ,Speech recognition ,Feature extraction ,02 engineering and technology ,lcsh:Chemical technology ,Biochemistry ,Article ,Field (computer science) ,Analytical Chemistry ,Machine Learning ,Inertial measurement unit ,Cepstrum ,0202 electrical engineering, electronic engineering, information engineering ,lcsh:TP1-1185 ,Electrical and Electronic Engineering ,signal processing ,Instrumentation ,smartphone inertial sensors ,Signal processing ,feature extraction ,deep neural network ,Signal Processing, Computer-Assisted ,020206 networking & telecommunications ,road condition monitoring ,paved and unpaved classification ,Atomic and Molecular Physics, and Optics ,Motor Vehicles ,ComputingMethodologies_PATTERNRECOGNITION ,020201 artificial intelligence & image processing ,Smartphone ,Mel-frequency cepstrum - Abstract
The objective of our project is to develop an automatic survey system for road condition monitoring using smartphone devices. One of the main tasks of our project is the classification of paved and unpaved roads. Assuming recordings will be archived by using various types of vehicle suspension system and speeds in practice, hence, we use the multiple sensors found in smartphones and state-of-the-art machine learning techniques for signal processing. Despite usually not being paid much attention, the results of the classification are dependent on the feature extraction step. Therefore, we have to carefully choose not only the classification method but also the feature extraction method and their parameters. Simple statistics-based features are most commonly used to extract road surface information from acceleration data. In this study, we evaluated the mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction coefficients (PLP) as a feature extraction step to improve the accuracy for paved and unpaved road classification. Although both MFCC and PLP have been developed in the human speech recognition field, we found that modified MFCC and PLP can be used to improve the commonly used statistical method.
- Published
- 2019
10. Process data based estimation of tool wear on punching machines using TCN-Autoencoder from raw time-series information
- Author
-
Mathias Liewald, Satoru Hayamizu, Satoshi Tamura, Celalettin Karadogan, and Shota Asahi
- Subjects
Series (mathematics) ,Computer science ,Process (computing) ,Data mining ,Tool wear ,computer.software_genre ,Punching ,Autoencoder ,computer - Abstract
Tracking the wear states of tools on punching machines is necessary to reduce scrap rates. In this paper, we propose a method to estimate wear state of punches using Temporal Convolutional Network Autoencoder (TCN-Autoencoder), one of the deep learning techniques for learning time-series information with convolutional architecture. Approach involves inputting raw time-series information, such as sensor, vibration and audio data, into TCN-Autoencoder, and calculating the reconstruction error between the output and the input data. The reconstruction error is used as “anomaly score” and indicates the distance from the normal state. By training TCN-Autoencoder only with data annotated as “normal” state, the reconstruction error becomes larger when inputting abnormal state data, which corresponds the wear state of the punch. Performance is evaluated on experimental measurement data that spans various wear states of the punch. The results showed our model can estimate anomalies faster than the conventional machine-learning-based anomaly estimation method, while maintaining the high estimation accuracy. This is due to TCN-Autoencoder being able to learn from both frequency and time domain.
- Published
- 2021
- Full Text
- View/download PDF
11. Investigation of DNN-Based Audio-Visual Speech Recognition
- Author
-
Satoshi Tamura, Kazuya Takeda, Hiroshi Ninomiya, Norihide Kitaoka, Satoru Hayamizu, Shin Osuga, and Yurie Iribe
- Subjects
Voice activity detection ,Computer science ,Time delay neural network ,business.industry ,Speech recognition ,Acoustic model ,020206 networking & telecommunications ,Audio-visual speech recognition ,02 engineering and technology ,computer.software_genre ,Speech processing ,Artificial Intelligence ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,computer ,Software ,Natural language processing - Published
- 2016
- Full Text
- View/download PDF
12. Proposal of failure prediction method of factory equipment by vibration data with Recurrent Autoencoder
- Author
-
Satoru Hayamizu, Ayaka Matsui, Shota Asahi, Takayoshi Naitou, Akira Furukawa, Ryosuke Isashi, and Satoshi Tamura
- Subjects
Vibration ,Computer science ,Factory (object-oriented programming) ,Control engineering ,General Medicine ,Autoencoder - Published
- 2020
- Full Text
- View/download PDF
13. Toward a High Performance Piano Practice Support System for Beginners
- Author
-
Satoshi Tamura, Shota Asahi, Yuko Sugiyama, and Satoru Hayamizu
- Subjects
Scheme (programming language) ,Reflection (computer programming) ,Computer science ,business.industry ,Deep learning ,Piano ,020206 networking & telecommunications ,02 engineering and technology ,Human–computer interaction ,0202 electrical engineering, electronic engineering, information engineering ,Support system ,Artificial intelligence ,Independent practice ,business ,computer ,computer.programming_language - Abstract
In piano learning, it is difficult especially for beginners to judge by themselves whether their musical performances are appropriate in terms of rhythm and melody. Therefore, we have been developing a piano practice support system, which enables piano beginners to conduct independent practice without their instructors. In this paper, we propose the system with the aid of a deep learning technique: Long Short-Term Memory (LSTM). Our system accepts raw piano sounds, extracting performance information. From these information, we evaluate performance. We evaluated the scheme using actual beginners' performances, and found the proposed system achieved better than previous conventional methods. This paper also presents an application employing our methods. Through subjective evaluation experiments for the proposed application, it turns out almost the all beginners found reflection points, and they maintained their motivation for independent practice.
- Published
- 2018
- Full Text
- View/download PDF
14. Audio-visual Voice Conversion Using Deep Canonical Correlation Analysis for Deep Bottleneck Features
- Author
-
Satoshi Tamura, Kento Horio, Tomoki Toda, Satoru Hayamizu, and Hajime Endo
- Subjects
Computer science ,business.industry ,Deep learning ,Speech recognition ,Audio visual ,Artificial intelligence ,business ,Canonical correlation ,Bottleneck - Published
- 2018
- Full Text
- View/download PDF
15. Classification of Paved and Unpaved Road Image Using Convolutional Neural Network for Road Condition Inspection System
- Author
-
Satoru Hayamizu, Hidekazu Fukai, Satoshi Tamura, and Vosco Pereira
- Subjects
Contextual image classification ,business.industry ,Computer science ,Deep learning ,Feature extraction ,Image processing ,Computer vision ,Artificial intelligence ,Unpaved road ,business ,Types of road ,Road condition ,Convolutional neural network - Abstract
Image processing techniques have been actively used for research on road condition inspection and achieving high detection accuracies. Many studies focus on the detection of cracks and potholes of the road. However, in some least developed countries, there are some distances of roads are still unpaved and it escaped the attention of the researchers. Inspired by penetration and success in applying deep learning technic to computer vision and to any other fields and by the existence of the various type of smartphone devices, we proposed a low - cost method for paved and unpaved road images classification using convolutional neural network (CNN). Our model is trained with 13.186 images and validate with 3.186 images which collected using smartphone device in various conditions of roads such as wet, muddy, dry, dusty and shady conditions and with different types of road surface such as ground, rocks and sands. The experiment using 500 new testing images showed that our model can achieve high Precision (98.0%), Recall (98.4%) and F1 -Score (98.2%) simultaneously.
- Published
- 2018
- Full Text
- View/download PDF
16. An Automatic Survey System for Paved and Unpaved Road Classification and Road Anomaly Detection using Smartphone Sensor
- Author
-
Fernao A. L. Nobre Mouzinho, Hidekazu Fukai, Frederico Soares Cabral, Mateus Pinto, and Satoshi Tamura
- Subjects
business.industry ,Computer science ,020207 software engineering ,02 engineering and technology ,Machine learning ,computer.software_genre ,Two stages ,Vertical acceleration ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,ComputerSystemsOrganization_MISCELLANEOUS ,Road surface ,0202 electrical engineering, electronic engineering, information engineering ,Pothole ,020201 artificial intelligence & image processing ,Anomaly detection ,Artificial intelligence ,Unpaved road ,business ,Hidden Markov model ,computer - Abstract
For developing countries like Timor-Leste, regular road surface monitoring is a major challenge not only for maintaining road quality but also for national plan of road network construction. In Timor-Leste nearly 50% of roads are still unpaved. For this reason, an automated system is required to do a survey of paved and unpaved roads. In this study, we present a new approach for the use of smartphones sensor to classify paved and unpaved roads, and anomaly detection. Although, the most remarkable factor to differentiate paved and unpaved road is based on amplitude of the vertical acceleration, each vehicle has a different type of suspension system. Therefore, we used high-dimensional features and state-of-the-art machine learning techniques to make the system robust for differences of vehicle and also smartphone type. This study divided into two stages such as paved and unpaved road classification and road anomaly detection such as pothole and bump. For paved and unpaved road classification, we tried to use the SVM, HMM and ResNet and compared the performance of these models. Of all comparison, the ResNet was the best choice in this study, because it outperformed the SVM and HMM on the all performance evaluation criteria. Furthermore, the KNN and DTW are applied for anomaly detection on the paved road. The KNN-DTW are also compared to the other machine learning techniques like SVM and classical KNN using same criteria. As a result of the comparison, the KNN-DTW and SVM performed better than classical KNN.
- Published
- 2018
- Full Text
- View/download PDF
17. A Deep Learning-Based Approach for Road Pothole Detection in Timor Leste
- Author
-
Hidekazu Fukai, Satoshi Tamura, Satoru Hayamizu, and Vosco Pereira
- Subjects
Timor leste ,Contextual image classification ,business.industry ,Computer science ,Deep learning ,Feature extraction ,020207 software engineering ,Pattern recognition ,02 engineering and technology ,Convolutional neural network ,Support vector machine ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,Pothole ,Artificial intelligence ,business ,030217 neurology & neurosurgery - Abstract
This research proposes a low-cost solution for detecting road potholes image by using convolutional neural network (CNN). Our model is trained entirely on the image which collected from several different places and has variation such as in wet, dry and shady conditions. The experiment using the 500 testing images showed that our model can achieve (99.80 %) of Accuracy, Precision (100%), Recall (99.60%), and F-Measure (99.60%) simultaneously.
- Published
- 2018
- Full Text
- View/download PDF
18. Toward effective noise reduction for sub-Nyquist high-frame-rate MRI techniques with deep learning
- Author
-
Satoshi Tamura, Yudai Suzuki, Keigo Kawaji, Amit R. Patel, and Satoru Hayamizu
- Subjects
Artifact (error) ,Image quality ,business.industry ,Computer science ,Noise reduction ,Iterative reconstruction ,Frame rate ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,Noise ,0302 clinical medicine ,030220 oncology & carcinogenesis ,Dynamic contrast-enhanced MRI ,Nyquist–Shannon sampling theorem ,Computer vision ,Artificial intelligence ,business - Abstract
Cine Cardiac Magnetic Resonance (Cine-CMR) is one example of dynamic MRI approaches to image organs that exhibit periodic motion. Conventional routine clinical Cine- CMR are typically obtained at 20–35 frames per second (fps) with temporal window sizes of 40–50 milliseconds. We have recently shown the feasibility of significantly increasing this overall frame rate by an acquisition of MRI k-space using a highly optimized radial sampling pattern with respect to both spatial and temporal coverage. In brief, our proposed approach acquires a significantly undersampled radial MRI k-space while encoding spatially and temporally periodic noise characteristics through the undersampled radial MRI acquisition; however, remnant radial streaking noise remain under physiologic imaging conditions. In this research, we propose to further remove these streaking noise, employing a Spatio-Temporal Denoising Auto-Encoder (ST-DAE) based on deep learning. We evaluate performance of our method in addressing such remnant artifact using ST-DAE; PSNR is used to evaluate image quality, and computational time is also discussed.
- Published
- 2017
- Full Text
- View/download PDF
19. Swallowing function evaluation using deep-learning-based acoustic signal processing
- Author
-
Chisa Kodama, Kunihito Kato, Satoshi Tamura, and Satoru Hayamizu
- Subjects
Sound (medical instrument) ,Epiglottis ,Signal processing ,medicine.medical_specialty ,Audio signal ,Computer science ,business.industry ,Deep learning ,digestive, oral, and skin physiology ,Early detection ,Audiology ,medicine.anatomical_structure ,stomatognathic system ,Swallowing ,otorhinolaryngologic diseases ,medicine ,Artificial intelligence ,Esophagus ,business - Abstract
In recent years, people with swallowing disorder are increasing. Therefore, it is important to evaluate the swallowing function in the early detection and prevention of swallowing disorder. In this study, we used a capsule that generates sound and estimated the timing at which food is sent to the esophagus by sound signal processing and deep learning. By comparing it with the movement of the epiglottis tracked by the image, we performed a noninvasive and quantitative swallowing function evaluation.
- Published
- 2017
- Full Text
- View/download PDF
20. Classification of Green coffee bean images basec on defect types using convolutional neural network (CNN)
- Author
-
Satoshi Tamura, Junya Furukawa, Hidekazu Fukai, and Carlito Pinto
- Subjects
Artificial neural network ,Computer science ,business.industry ,Feature extraction ,Sorting ,Pattern recognition ,Image processing ,Convolutional neural network ,Electronic mail ,sort ,Computer vision ,Artificial intelligence ,Green coffee ,business - Abstract
In Timor-Leste, the coffee is one of the most important product for the acquisition of foreign currency. However, there are almost no rationalizations, therefore, enhancing the value of the coffee efficiently at the local of production is desired. The final objective of our study is to develop the automatic coffee beans sorting system for the producers of coffee beans in Timor-Leste. As the first step, we developed an image processing system which classifies the images of green coffee beans into each type of defect. We employed deep convolutional neural networks, the state-of-the-art machine learning technique, for the image processing. As the results, we succeeded to sort defect beans from 72.4% to 98.7% of accuracies based on the types of defects.
- Published
- 2017
- Full Text
- View/download PDF
21. A fast, non-iterative approach for Accelerated High-Temporal Resolution cine-CMR using Dynamically Interleaved Streak removal in the Power-spectral Encoded domain with Low-pass-filtering (DISPEL) and Modulo-Prime Spokes (MoPS)
- Author
-
Takeyoshi Ota, Keigo Kawaji, Akiko Tanaka, Mita Patel, Satoshi Tamura, Yi Wang, Amit R. Patel, Marco Marino, Timothy J. Carroll, Hui Wang, and Charles G. Cantrell
- Subjects
Computer science ,Streak ,Magnetic Resonance Imaging, Cine ,Reproducibility of Results ,Heart ,General Medicine ,Streaking Artifact ,Imaging phantom ,Article ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,Temporal resolution ,Image Interpretation, Computer-Assisted ,Humans ,Cardiac magnetic resonance ,Artifacts ,Algorithm ,Rotation (mathematics) ,030217 neurology & neurosurgery ,Algorithms - Abstract
PURPOSE To introduce a pair of accelerated non-Cartesian acquisition principles that when combined, exploit the periodicity of k-space acquisition, and thereby enable acquisition of high-temporal cine Cardiac Magnetic Resonance (CMR). METHODS The mathematical formulation of a noniterative, undersampled non-Cartesian cine acquisition and reconstruction is presented. First, a low-pass filtering step that exploits streaking artifact redundancy is provided (i.e., Dynamically Interleaved Streak removal in the Power-spectrum Encoded domain with Low-pass filtering [DISPEL]). Next, an effective radial acquisition for the DISPEL approach that exploits the property of prime numbers is described (i.e., Modulo-Prime Spoke [MoPS]). Both DISPEL and MoPS are examined using numerical simulation of a digital heart phantom to show that high-temporal cine-CMR is feasible without removing physiologic motion vs aperiodic interleaving using Golden Angles. The combined high-temporal cine approach is next examined in 11 healthy subjects for a time-volume curve assessment of left ventricular systolic and diastolic performance vs conventional Cartesian cine-CMR reference. RESULTS The DISPEL method was first shown using simulation under different streak cycles to allow separation of undersampled radial streaking artifacts from physiologic motion with a sufficiently frequent streak-cycle interval. Radial interleaving with MoPS is next shown to allow interleaves with pseudo-Golden-Angle variants, and be more compatible with DISPEL against irrational and nonperiodic rotation angles, including the Golden-Angle-derived rotations. In the in vivo data, the proposed method showed no statistical difference in the systolic performance, while diastolic parameters sensitive to the cine's temporal resolution were statistically significant (P < 0.05 vs Cartesian cine). CONCLUSIONS We demonstrate a high-temporal resolution cine-CMR using DISPEL and MoPS, whose streaking artifact was separated from physiologic motion.
- Published
- 2017
22. Multistream sparse representation features for noise robust audio-visual speech recognition
- Author
-
Satoshi Tamura, Peng Shen, and Satoru Hayamizu
- Subjects
Noise ,Acoustics and Ultrasonics ,Computer science ,business.industry ,Noise reduction ,Speech recognition ,Audio-visual speech recognition ,Pattern recognition ,Sparse approximation ,Artificial intelligence ,business - Published
- 2014
- Full Text
- View/download PDF
23. Development of audio-visual speech corpus toward speaker-independent Japanese LVCSR
- Author
-
Kazuto Ukai, Satoru Hayamizu, and Satoshi Tamura
- Subjects
Vocabulary ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Speech corpus ,Audio-visual speech recognition ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Audio visual ,Mel-frequency cepstrum ,Artificial intelligence ,Noise (video) ,Hidden Markov model ,business ,Focus (optics) ,computer ,Natural language processing ,media_common - Abstract
In the speech recognition literature, building corpora for Large Vocabulary Continuous Speech Recognition (LVCSR) is quite important. In addition, in order to overcome performance decrease caused by noise, using visual information such as lip images is effective. In this paper, therefore, we focus on collecting speech and lip-image data for audio-visual LVCSR. Audio-visual speech data were obtained from 12 speakers, each who uttered ATR503 phonetically-balanced sentences. These data were recorded in acoustically and visually clean environments. Using the data, we conducted recognition experiments. Mel Frequency Cepstral Coefficients (MFCCs) and eigenlip features were obtained, and multi-stream Hidden Markov Models (HMMs) were built. We compared the performance in clean condition to those in noisy environments. It is found that visual information is able to compensate the performance. In addition, it turns out that we should improve visual speech recognition for high-performance audio-visual LVCSR.
- Published
- 2016
- Full Text
- View/download PDF
24. Visual Analysis of Health Checkup Data Using Multidimensional Scaling
- Author
-
Satoshi Tamura, Yamamoto Keiko, Satoru Hayamizu, and Yasutomi Kinosada
- Subjects
Human-Computer Interaction ,Artificial Intelligence ,Computer science ,Computer Vision and Pattern Recognition ,Multidimensional scaling ,Data mining ,computer.software_genre ,computer - Abstract
The objective of this study is the presentation of an analytical method to support health consultants, thereby establishing an analytical method that enables them to select subjects for health guidance using health checkup data and to derive a suitable guidance policy for each subject. This paper examines an analysis method that maps a health checkup using Multi-Dimensional Scaling (MDS). MDS mapping of multivariate health checkup data for a health checkup examinee on a two-dimensional plane facilitates comprehension of a subject’s health condition easily as visual information. This study focuses on the efficacy of visualization from the viewpoint of supporting health consultants. The mode of display by MDS facilitates visual confirmation that groups outside of the scope of health guidance and at high risk are shown in a contrastive position. In addition, a medium risk group was plotted into an in-between position. A plot ofmore detailed classification for all inspection items suggests by concurrence an increased risk. Results of this study indicate that its coordinates are effective both in determining a subject’s health condition intuitively and in use as one index of risk formetabolic syndrome. These results are therefore considered useful for formulating health guidance plans such as priority issues.
- Published
- 2012
- Full Text
- View/download PDF
25. Audio-visual speech recognition using deep bottleneck features and high-performance lipreading
- Author
-
Yurie Iribe, Satoshi Tamura, Norihide Kitaoka, Hiroshi Ninomiya, Kazuya Takeda, Shin Osuga, and Satoru Hayamizu
- Subjects
Voice activity detection ,Computer science ,business.industry ,Deep learning ,Speech recognition ,Feature extraction ,Pattern recognition ,Audio-visual speech recognition ,Bottleneck ,Visualization ,Discrete cosine transform ,Artificial intelligence ,Hidden Markov model ,business - Abstract
This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring high-performance visual features, (2) applying audio and visual deep bottleneck features to improve AVSR performance, and (3) investigating effectiveness of voice activity detection in a visual modality. In our approach, many kinds of visual features are incorporated, subsequently converted into bottleneck features by deep learning technology. By using proposed features, we successfully achieved 73.66% lipreading accuracy in speaker-independent open condition, and about 90% AVSR accuracy on average in noisy environments. In addition, we extracted speech segments from visual features, resulting 77.80% lipreading accuracy. It is found VAD is useful in both audio and visual modalities, for better lipreading and AVSR.
- Published
- 2015
- Full Text
- View/download PDF
26. Integration of deep bottleneck features for audio-visual speech recognition
- Author
-
Hiroshi Ninomiya, Kazuya Takeda, Satoshi Tamura, Norihide Kitaoka, and Yurie Iribe
- Subjects
Thesaurus (information retrieval) ,Computer science ,business.industry ,Audio-visual speech recognition ,Artificial intelligence ,business ,computer.software_genre ,computer ,Bottleneck ,Natural language processing - Published
- 2015
- Full Text
- View/download PDF
27. Multi-modal service operation estimation using DNN-based acoustic bag-of-features
- Author
-
Satoshi Tamura, Satoru Hayamizu, Takuya Uno, Takeshi Kurata, and Masanori Takehara
- Subjects
Scheme (programming language) ,Service (systems architecture) ,Service quality ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Modal ,Key (cryptography) ,Artificial intelligence ,business ,computer ,Bag of features ,computer.programming_language - Abstract
In service engineering it is important to estimate when and what a worker did, because they include crucial evidences to improve service quality and working environments. For Service Operation Estimation (SOE), acoustic information is one of useful and key modalities; particularly environmental or background sounds include effective cues. This paper focuses on two aspects: (1) extracting powerful and robust acoustic features by using stacked-denoising-autoencoder and bag-of-feature techniques, and (2) investigating a multi-modal SOE scheme by combining the audio features and the other sensor data as well as non-sensor information. We conducted evaluation experiments using multi-modal data recorded in a restaurant. We improved SOE performance in comparison to conventional acoustic features, and effectiveness of our multimodal SOE scheme is also clarified.
- Published
- 2015
- Full Text
- View/download PDF
28. Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images
- Author
-
Satoshi Tamura, Sadaoki Furui, and Koji Iwano
- Subjects
Computer science ,business.industry ,Speech recognition ,Frame (networking) ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,Pattern recognition ,Speaker recognition ,Triphone ,Set (abstract data type) ,Feature (computer vision) ,Signal Processing ,Pattern recognition (psychology) ,Feature (machine learning) ,Artificial intelligence ,Electrical and Electronic Engineering ,Hidden Markov model ,business ,Information Systems - Abstract
This paper proposes a multi-modal speech recognition method using optical-flow analysis for lip images. Optical flow is defined as the distribution of apparent velocities in the movement of brightness patterns in an image. Since the optical flow is computed without extracting the speaker's lip contours and location, robust visual features can be obtained for lip movements. Our method calculates two kinds of visual feature sets in each frame. The first feature set consists of variances of vertical and horizontal components of optical-flow vectors. These are useful for estimating silence/pause periods in noisy conditions since they represent movement of the speaker's mouth. The second feature set consists of maximum and minimum values of integral of the optical flow. These are expected to be more effective than the first set since this feature set has not only silence/pause information but also open/close status of the speaker's mouth. Each of the feature sets is combined with an acoustic feature set in the framework of HMM-based recognition. Triphone HMMs are trained using the combined parameter sets extracted from clean speech data. Noise-corrupted speech recognition experiments have been carried out using audio-visual data from 11 male speakers uttering connected digits. The following improvements of digit accuracy over the audio-only recognition scheme have been achieved when the visual information was used only for silence HMM: 4% at SNR = 5 dB and 13% at SNR = 10 dB using the integral information of optical flow as the visual feature set.
- Published
- 2004
- Full Text
- View/download PDF
29. Camera system used for elevator’s door sensor
- Author
-
Satoshi Tamura, Masahiro Bunya, Sayumi Kimura, Kota Nakanishi, Ryo Otsubo, and Yukari Murata
- Subjects
Elevator ,Computer science ,Automotive engineering - Published
- 2017
- Full Text
- View/download PDF
30. Analysis of customer communication by employee in restaurant and lead time estimation
- Author
-
Takeshi Kurata, Masanori Takehara, Satoshi Tamura, Satoru Hayamizu, and Hiroya Nojiri
- Subjects
Service (business) ,Estimation ,Service quality ,Knowledge management ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,Sound detection ,Order (business) ,Quality (business) ,business ,Lead time ,Smoothing ,media_common - Abstract
Human behavior sensing and their analysis are great role to improve service quality and education of employees. This paper shows novel frameworks of detection of customer communication and lead time estimation(LTE) by using multi-sensored data, sound data and accounting data in the restaurant. They are useful for management about work environments and problems for employees. Lead time from order to delivery shows the quality of the service for customers. We found sound data of an employee's speech is useful for these techniques by speech ratio smoothing and POS sound detection.
- Published
- 2014
- Full Text
- View/download PDF
31. Data collection for mobile audio-visual speech recognition in various environments
- Author
-
Satoshi Tamura, Satoru Hayamizu, and Takumi Seko
- Subjects
Audio mining ,Voice activity detection ,Computer science ,Speech recognition ,Acoustic model ,Speech analytics ,Speech synthesis ,Audio-visual speech recognition ,computer.software_genre ,Speaker recognition ,Speech processing ,computer - Abstract
This paper introduces our recent activities for audio-visual speech recognition on mobile devices and data collection in various environments. Audio-visual automatic speech recognition is effective in noisy or real conditions to enhance the robustness of speech recognizer and to improve the recognition accuracy. We have developed an audio-visual speech recognition interface for mobile devices. In order to evaluate the recognizer and investigate issues related to audio-visual processing on mobile computers, we collected speech data and lip images of 16 subjects in eight conditions, where there were various audio noises and visual difficulties. Audio-only speech recognition and visual-only lipreading were then conducted. Through these experiments, we found some issues and future works not only for construction of audio-visual database but also for robust audio-visual speech recognition.
- Published
- 2014
- Full Text
- View/download PDF
32. Audio-visual voice conversion using noise-robust features
- Author
-
Kohei Sawada, Satoru Hayamizu, Satoshi Tamura, and Masanori Takehara
- Subjects
Scheme (programming language) ,Computer science ,Feature (computer vision) ,business.industry ,Speech recognition ,Audio visual ,Feature selection ,Pattern recognition ,Artificial intelligence ,Noise (video) ,business ,computer ,computer.programming_language - Abstract
Voice Conversion (VC) is a technique to convert speech data of source speaker into ones of target speaker. VC has been investigated and statistical VC is used for various purposes. Conventional VC uses acoustic features, however, the audio-only VC has suffered from the degradation in noisy or real environments. This paper proposes an AudioVisual VC (AVVC) method using not only audio features but also visual information, i.e. lip images. Eigenlip feature is employed in our scheme as visual feature. We also propose a feature selection approach for audio-visual features. Experiments were conducted to evaluate our AVVC scheme comparing with audio-only VC, using noisy data. The results show that AVVC can improve the performance even in noisy environments, by properly selecting audio and visual parameters. It is also found that visual VC is also successful. Furthermore, it is observed that visual dynamic features are more effective than visual static information.
- Published
- 2014
- Full Text
- View/download PDF
33. Path planning for mobile robot using a genetic algorithm
- Author
-
Satoshi Tamura, Toshiharu Hatanaka, Makoto Takuno, and Katsuji Uosaki
- Subjects
Computer science ,business.industry ,Materials Science (miscellaneous) ,Genetic algorithm ,Mobile robot ,Artificial intelligence ,Motion planning ,business - Published
- 1999
- Full Text
- View/download PDF
34. New Powder Autoindexing Software CONOGRAPH
- Author
-
Satoshi Tamura and Ryoko Oishi-Tomiyasu
- Subjects
Engineering drawing ,Software ,Computer science ,business.industry ,business - Published
- 2015
- Full Text
- View/download PDF
35. Improvement of Lip Reading Performance in Real Environments Using Speaker and Environmental Adaptation
- Author
-
Satoshi Tamura, Seko Takumi, Naoya Ukai, Takuya Kawasaki, and Satoru Hayamizu
- Subjects
Image pattern recognition ,Computer science ,business.industry ,Speech recognition ,media_common.quotation_subject ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Environmental adaptation ,Speaker recognition ,computer.software_genre ,Speaker diarisation ,Reading (process) ,Pattern recognition (psychology) ,Artificial intelligence ,business ,Adaptation (computer science) ,computer ,Natural language processing ,Speaker adaptation ,media_common - Abstract
Lip reading technologies play a great role not only in image pattern recognition e.g. computer vision, but also in audio-visual pattern recognition e.g. bimodal speech recognition. However, it is a problem that the recognition accuracy is still significantly low, compared to that of speech recognition. Another problem lies which the performance degradation occurs in real environments. To improve the performance, in this paper we employ two adaptation schemes: speaker adaptation and environmental adaptation. The speaker adaptation is performed to recognition models so as to prevent the degradation caused by the difference of speakers. The environmental adaptation is also conducted to deal with environmental differences. We tested these adaptation schemes using a real-world audio-visual corpus CENSREC-2-AV, we have built this corpus containing real-world data (speech signals and lip images) recorded in a driving car, in which subjects uttered Japanese connected digits. Experimental results show that the lip reading recognition performance was largely improved by the speaker adaptation, and further recovered by the environmental adaptation.
- Published
- 2013
- Full Text
- View/download PDF
36. Measurement and analysis of speech data toward improving service in restaurant
- Author
-
Satoshi Tamura, Tomohiro Fukuhara, Takeshi Kurata, Takashi Okuma, Masanori Takehara, Ryuhei Tenmoku, and Satoru Hayamizu
- Subjects
Audio mining ,Voice activity detection ,Data collection ,Computer science ,Speech recognition ,Speech technology ,Acoustic model ,Speech analytics ,Speech processing ,Voice analysis - Abstract
In this paper, we introduce human behavior sensing and data collection tequniques in a real environment. In order to improve worker's skill and service quality, it is desired to analyse and visualize worker's behavior scientifically. We acquired multi-sensored data and speech data of wait staffs in a restaurant for several months. Our goal is to combine these data, to analyse objectively and finally to improve their behavior and service. The speech data is useful to detect their speech events and spoken contents. To collect speech data in real-environment is a difficult task because of recording environment and load to the subjects. This paper focuses on collection of speech and discusses the issue of its measurement and analysis. We also apply VAD(Voice activity detection) for speech data and calculate their speech ratio. The speech ratio has a certain inclination related to their operations. The change of speech ratio expresses the efficiency of QC(Quality control) activity.
- Published
- 2013
- Full Text
- View/download PDF
37. CENSREC-2-AV: An evaluation framework for bimodal speech recognition in real environments
- Author
-
Satoshi Tamura, Norihide Kitaoka, Naoya Ukai, Satoru Hayamizu, Takuya Kawasaki, Kazuya Takeda, and Chiyomi Miyajima
- Subjects
Audio mining ,Voice activity detection ,business.industry ,Computer science ,Speech recognition ,Acoustic model ,Speech corpus ,computer.software_genre ,Speaker recognition ,Speech processing ,VoxForge ,ComputingMethodologies_PATTERNRECOGNITION ,Speech analytics ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
In this paper, we introduce a bimodal speech recognition corpus in real environments. In recent years, speech recognition technology has been used in noisy conditions. Therefore, it becomes necessary to achieve higher recognition accuracy in real environments. As one of the solutions, bimodal speech recognition using audio and non-audio information is getting studied. However, there are few databases which can be used to evaluate the bimodal speech recognition in real environments. In this paper, we introduce CENSREC-2-AV we have been working to built, as a new bimodal speech recognition corpus. CENSREC-2-AV is one of the databases of the CEN-SREC project; we provided a similar corpus CENSREC-1-AV as a database for bimodal speech recognition for additive noises. In these corpora, there are speech data and lip images. Researchers can evaluate a bimodal speech recognition method built using CENSREC-1-AV which consists of clean data, in real environments by using CENSREC-2-AV.
- Published
- 2012
- Full Text
- View/download PDF
38. The role of speech technology in service-operation estimation
- Author
-
Satoshi Tamura, Satoru Hayamizu, Ryuhei Tenmoku, Masanori Takehara, and Takeshi Kurata
- Subjects
Service (systems architecture) ,Voice activity detection ,business.industry ,Computer science ,Speech recognition ,Speech technology ,Machine learning ,computer.software_genre ,Speech processing ,Sensor fusion ,Data visualization ,Keyword spotting ,Speech analytics ,Artificial intelligence ,business ,computer - Abstract
This paper introduces our recent effort to develop a Service-Operation Estimation (SOE) system using speech and multi-sensored data as well as other acquired data. In SOE, it is essential to analyze employees' data in order to increase the productivity in many service industries. Speech processing techniques, such as voice activity detection and keyword spotting recognition, help the analysis and enhance the precision of the results; the beginning and end times of speech region are used to detect work events, and recognized keywords are used to conduct work estimation. In our system all the results are visualized in a 3D model, and it makes employers and employees help their operations.
- Published
- 2011
- Full Text
- View/download PDF
39. Performance Measurement of Dynamic Characteristic in Motor of Electric Motorcycle
- Author
-
Norikane Kanai, Masao Isshiki, and Satoshi Tamura
- Subjects
Computer science ,Performance measurement ,Automotive engineering - Published
- 2010
- Full Text
- View/download PDF
40. CENSREC-4: development of evaluation framework for distant-talking speech recognition under reverberant environments
- Author
-
Tetsuji Ogawa, Shigeki Matsuda, Kazumasa Yamamoto, Shingo Kuroiwa, Satoru Tsuge, Tetsuya Takiguchi, Yuki Denda, Takeshi Yamada, Takanobu Nishiura, Satoshi Nakamura, Masakiyo Fujimoto, Satoshi Tamura, Norihide Kitaoka, Kazuya Takeda, Masato Nakayama, and Chiyomi Miyajima
- Subjects
Development (topology) ,Computer science ,Speech recognition ,Process (computing) ,Impulse (physics) ,Impulse response ,Speech interface ,Convolution - Abstract
In this paper, we newly introduce a collection of databases and evaluation tools called CENSREC-4, which is an evaluation framework for distant-talking speech under hands-free conditions. Distant-talking speech recognition is crucial for a handsfree speech interface. Therefore, we measured room impulse responses to investigate reverberant speech recognition in various environments. The data contained in CENSREC-4 are connected digit utterances, as in CENSREC-1. Two subsets are included in the data: basic data sets and extra data sets. The basic data sets are used for the evaluation environment for the room impulse response-convolved speech data. The extra data sets consist of simulated and recorded data. An evaluation framework is only provided for the basic data sets as evaluation tools. The results of evaluation experiments proved that CENSREC-4 is an effective database for evaluating the new dereverberation method because the traditional dereverberation process had difficulty sufficiently improving the recognition performance. Index Terms: Various environments, Impulse response, Convolution, Real recorded data, Evaluation framework
- Published
- 2008
- Full Text
- View/download PDF
41. Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance
- Author
-
Satoru Tsuge, Satoshi Tamura, Seiichi Nakagawa, Norihide Kitaoka, Chiyomi Miyajima, Kazumasa Yamamoto, Tetsuya Takiguchi, Masakiyo Fujimoto, T. Kusamizu, Kazuya Takeda, Takeshi Yamada, Shingo Kuroiwa, Satoshi Nakamura, Takanobu Nishiura, Masato Nakayama, and Yuki Denda
- Subjects
Speech enhancement ,Voice activity detection ,Computer science ,Speech recognition ,Speech coding ,Speech technology ,Acoustic model ,PSQM ,Linear predictive coding ,Speech processing - Abstract
Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding in noisy environments. We developed an evaluation framework for VAD in such environments, called corpus and environment for noisy speech recognition 1 concatenated (CENSREC-1-C). This framework consists of noisy continuous digit utterances and evaluation tools for VAD results. By adoptiong two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance, we provide the evaluation results of a power-based VAD method as a baseline. When using VAD in speech recognizer, the detected speech segments are extended to avoid the loss of speech frames and the pause segments are then absorbed by a pause model. We investigate the balance of an explicit segmentation by VAD and an implicit segmentation by a pause model using an experimental simulation of segment extension and show that a small extension improves speech recognition.
- Published
- 2007
- Full Text
- View/download PDF
42. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images
- Author
-
Koji Iwano, Tomoaki Yoshinaga, Sadaoki Furui, and Satoshi Tamura
- Subjects
Audio mining ,Acoustics and Ultrasonics ,Computer science ,Speech recognition ,Speech coding ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,lcsh:QC221-246 ,Audio-visual speech recognition ,White noise ,Handset ,lcsh:QA75.5-76.95 ,law.invention ,ComputingMethodologies_PATTERNRECOGNITION ,Robustness (computer science) ,law ,Computer Science::Sound ,Computer Science::Computer Vision and Pattern Recognition ,lcsh:Acoustics. Sound ,lcsh:Electronic computers. Computer science ,Electrical and Electronic Engineering ,Hidden Markov model - Abstract
This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.
- Published
- 2007
43. Note-Taking Support for Nurses Using Digital Pen Character Recognition System
- Author
-
Satoshi Tamura, Yutaka Nishimoto, Satoru Hayamizu, and Yujiro Hayashi
- Subjects
Software portability ,Human–computer interaction ,Computer science ,Speech recognition ,Interface (computing) ,Pattern recognition (psychology) ,Information system ,User interface ,Virtual reality ,Note-taking ,Test (assessment) - Abstract
This study presents a novel system which supports nurses in note-taking by providing a digital pen and character-recognition system laying stress on user interface. The system applies characteristics of a digital pen for improving the efficiency of tasks related to nursing records. The system aims at improving the efficiency of nursing activities and reducing the time spent for tasks for nursing records. In our system, first, notes are written on a check sheet using a digital pen along with a voice that is recorded on a voice recorder; the pen and voice data are transferred to a PC. The pen data are then recognized automatically as characters, which can be viewed and manipulated with the application. We conducted an evaluation experiment to improve efficiency and operation of the system, and its interface. The evaluation and test operations used 10 test subjects. Based on the test operation and the evaluation experiment of the system, it is turned out that improvement for urgent situations, enhancement of portability, and further use of character recognition are required.
- Published
- 2006
- Full Text
- View/download PDF
44. A Robust Multimodal Speech Recognition Method using Optical Flow Analysis
- Author
-
Satoshi Tamura, Koji Iwano, and Sadaoki Furui
- Subjects
business.industry ,Computer science ,Speech recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Optical flow ,Word error rate ,Pattern recognition ,Audio-visual speech recognition ,White noise ,Triphone ,Approximation error ,Robustness (computer science) ,Artificial intelligence ,business ,Hidden Markov model - Abstract
We propose a new multimodal speech recognition method using optical flow analysis and evaluate its robustness to acoustic and visual noises. Optical flow is defined as the distribution of apparent velocities in the movement of brightness patterns in an image. Since the optical flow is computed without extracting speaker's lip contours and location, robust visual features can be obtained for lip movements. Our method calculates a visual feature set in each frame consisting of maximum and minimum values of the integral of the optical flow. This feature set has not only silence information but also open/close status of the speaker's mouth. The visual feature set is combined with an acoustic feature set in the framework of HMM-based recognition. Triphone HMMs are trained using the combined parameter set extracted from clean speech data. Two multimodal speech recognition experiments have been carried out. First, acoustic white noise was added to speech waveforms, and a speech recognition experiment was conducted using audio-visual data from 11 male speakers uttering connected Japanese digits. The following improvements of relative reduction of digit error rate over the audio-only recognition scheme were achieved when the visual information was incorporated into the silence HMM: 32% at SNR=10dB and 47% at SNR=15dB. Second, real-world data distorted both acoustically and visually was recorded in a driving car from six male speakers and recognized. We achieved approximately 17% and 11% relative error reduction compared with audio-only results on batch and incremental MLLR-based adaptation, respectively.
- Published
- 2005
- Full Text
- View/download PDF
45. A method to obtain all geometric ambiguities in powder indexing
- Author
-
Takashi Kamiyama, Masao Yonemura, Shuki Torii, Yoshihisa Ishikawa, Satoshi Tamura, and R. Oishi-Tomiyasu
- Subjects
Inorganic Chemistry ,Crystallography ,Structural Biology ,Computer science ,Search engine indexing ,General Materials Science ,Physical and Theoretical Chemistry ,Condensed Matter Physics ,Biochemistry ,Algorithm - Abstract
It is known that different unit cells can have the same computed lines in some cases (Figure). The phenomenon was first called geometrical ambiguity and studied in [1]. In all the high-symmetric cases provided in [1], such unit cells correspond to derivative lattices of each other. Although this is not true in general for 3-dimensional lattices, this has been assumed in methods to search for geometrical ambiguities (e.g., [2]). Thus, a method to obtain all the geometrical ambiguities in very short time for a given unit cell parameters is provided. Because such a method has not been used for powder indexing, it will have impact in the following sense: firstly, it is useful for checking powder indexing solutions promptly. Some powder auto-indexing methods cannot obtain all the geometrical ambiguities. Even for the software including Conograph which can gain all the ambiguities, it is not straightforward to search for them from many indexing solutions using the figures of merit which are sometimes not reliable [3]. Secondly, the new method indicates that powder indexing has only finitely many solutions at least if peak search succeeds in obtaining all (but a few) diffraction peaks with q-values smaller than some calculated value. (Note that infinitely many solutions may exist for lattices of dimension more than 4.) The result seems to provide a foundation of automatic powder crystal structure analysis, because it is possible to obtain all the ambiguities by computation. The introduced method was implemented in the newest version of Conograph, which will be distributed on the web (http://research.kek.jp/people/rtomi/ConographGUI/web_page.html) by IUCr2014. ACKNOWLEDGEMENT: this research was partly supported by a JSPS KAKENHI grant (No.22740077) and by Ibaraki Prefecture (J-PARC-23D06).
- Published
- 2014
- Full Text
- View/download PDF
46. Evaluation of GA Approach to AHP by Simulation
- Author
-
Satoshi Tamura, Noriyuki Matsuda, and Nobuko Kato
- Subjects
Mathematical optimization ,Computer science ,Analytic hierarchy process - Published
- 1999
- Full Text
- View/download PDF
47. MINIMIZING POWER USAGE IN WATER DISTRIBUTION SYSTEM USING MIXED INTEGER LINEAR PROGRAMMING
- Author
-
Akira Koizumi, Satoshi Tamura, Toyono Inakazu, Atsushi Masuko, Toshiki Horie, Takashi Yamamoto, and Yasuhiro Arai
- Subjects
Distribution system ,Mathematical optimization ,Computer science ,Branch and price ,Integer programming ,Power usage - Published
- 2012
- Full Text
- View/download PDF
48. GEMSIS - A novel application of speech recognition to emergency and disaster medicine
- Author
-
Shinji Ogura, Satoshi Tamura, Kunihiko Takamatsu, and Satoru Hayamizu
- Subjects
Computer science ,Speech recognition ,Disaster medicine
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.