40 results on '"sequence-to-sequence learning"'
Search Results
2. A deep learning approach for line-level Amharic Braille image recognition.
- Author
-
Asfaw, Nega Agmas, Belay, Birhanu Hailu, and Alemu, Kassawmar Mandefro
- Abstract
Braille, the most popular tactile-based writing system, uses patterns of raised dots arranged in cells to inscribe characters for visually impaired persons. Amharic is Ethiopia’s official working language, spoken by more than 100 million people. To bridge the written communication gap between persons with and without eyesight, multiple Optical braille recognition systems for various language scripts have been developed utilizing both statistical and deep learning approaches. However, the need for half-character identification and character segmentation has complicated these systems, particularly in the Amharic script, where each character is represented by two braille cells. To address these challenges, this study proposed deep learning model that combines a CNN and a BiLSTM network with CTC. The model was trained with 1,800 line images with 32 × 256 and 48 × 256 dimensions, and validated with 200 line images and evaluated using Character Error Rate. The best-trained model had a CER of 7.81% on test data with a 48 × 256 image dimension. These findings demonstrate that the proposed sequence-to-sequence learning method is a viable Optical Braille Recognition (OBR) solution that does not necessitate extensive image pre and post processing. Inaddition, we have made the first Amharic braille line-image data set available for free to researchers via the link: . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. A deep learning approach for line-level Amharic Braille image recognition
- Author
-
Nega Agmas Asfaw, Birhanu Hailu Belay, and Kassawmar Mandefro Alemu
- Subjects
End-to-end training ,Optical Braille Character Recognition ,Amharic braille character recognition ,Sequence-to-sequence learning ,Biderectional LSTM ,Medicine ,Science - Abstract
Abstract Braille, the most popular tactile-based writing system, uses patterns of raised dots arranged in cells to inscribe characters for visually impaired persons. Amharic is Ethiopia’s official working language, spoken by more than 100 million people. To bridge the written communication gap between persons with and without eyesight, multiple Optical braille recognition systems for various language scripts have been developed utilizing both statistical and deep learning approaches. However, the need for half-character identification and character segmentation has complicated these systems, particularly in the Amharic script, where each character is represented by two braille cells. To address these challenges, this study proposed deep learning model that combines a CNN and a BiLSTM network with CTC. The model was trained with 1,800 line images with 32 × 256 and 48 × 256 dimensions, and validated with 200 line images and evaluated using Character Error Rate. The best-trained model had a CER of 7.81% on test data with a 48 × 256 image dimension. These findings demonstrate that the proposed sequence-to-sequence learning method is a viable Optical Braille Recognition (OBR) solution that does not necessitate extensive image pre and post processing. Inaddition, we have made the first Amharic braille line-image data set available for free to researchers via the link: https://github.com/Ne-UoG-git/Am-Br-line-image.github.io .
- Published
- 2024
- Full Text
- View/download PDF
4. Note-level singing melody transcription with transformers.
- Author
-
Park, Jonggwon, Choi, Kyoyun, Oh, Seola, Kim, Leekyung, and Park, Jonghun
- Subjects
- *
TRANSFORMER models , *TONE color (Music theory) , *MUSICAL pitch , *MELODY , *DATA scrubbing , *SINGING , *SINGING instruction - Abstract
Recognizing a singing melody from an audio signal in terms of the music notes' pitch onset and offset, referred to as note-level singing melody transcription, has been studied as a critical task in the field of automatic music transcription. The task is challenging due to the different timbre and vibrato of each vocal and the ambiguity of onset and offset of the human voice compared with other instrumental sounds. This paper proposes a note-level singing melody transcription model using sequence-to-sequence Transformers. The singing melody annotation is expressed as a monophonic melody sequence and used as a decoder sequence. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and and adding noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. For fundamental frequency metrics, the voice detection performance of the proposed model is comparable to that of a vocal melody extraction model. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Neuro-evolutionary for time series forecasting and its application in hourly energy consumption prediction.
- Author
-
Son, Nguyen Ngoc and Van Cuong, Nguyen
- Subjects
- *
TIME series analysis , *ENERGY consumption , *PARTICLE swarm optimization , *DIFFERENTIAL evolution , *LYNX , *AUTOREGRESSION (Statistics) - Abstract
This paper proposed an ensemble methodology comprising neural networks, modified differential evolution algorithm and nonlinear autoregressive network with exogenous inputs (NARX) (called neuro-evolutionary NARX or NE-NARX model) for time series forecasting. In NE-NARX, the structure is designed by connecting the neural model and NARX model, and the weight value connection is optimized by a modified differential evolution algorithm. The effectiveness of the proposed NE-NARX model is tested on two well-known benchmark datasets, including the Canadian lynx and the Wolf sunspot. The proposed model is compared to other models, including the classical backpropagation algorithm, particle swarm optimization, differential evolution (DE) and DE variants. Additionally, an ARIMA model is employed as the benchmark for evaluating the capacity of the proposed model. And then, NE-NARX model is used for hourly energy consumption prediction through comparison with other machine learning models including gated recurrent units, convolutional neural networks (CNN), long short-term memory (LSTM), a hybrid CNN-LSTM and sequence-to-sequence learning. Results show convincingly the superiority of the proposed NE-NARX model over other models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. Attention-Based Neural Machine Translation Approach for Low-Resourced Indic Languages—A Case of Sanskrit to Hindi Translation
- Author
-
Bakarola, Vishvajit, Nasriwala, Jitendra, Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Somani, Arun K., editor, Mundra, Ankit, editor, Doss, Robin, editor, and Bhattacharya, Subhajit, editor
- Published
- 2022
- Full Text
- View/download PDF
7. Multi‐step‐ahead flood forecasting using an improved BiLSTM‐S2S model.
- Author
-
Cao, Qing, Zhang, Hanchen, Zhu, Feilin, Hao, Zhenchun, and Yuan, Feifei
- Subjects
FLOOD forecasting ,TIME series analysis ,FLOOD risk ,RUNOFF ,FLOODS - Abstract
Rainfall–runoff modeling is a complex hydrological issue that still has room for improvement. This study developed a coupled bidirectional long short‐term memory (LSTM) with sequence‐to‐sequence (Seq2Seq) learning (BiLSTM‐Seq2seq) model to simulate multi‐step‐ahead runoff for flood events. The bidirectional LSTM with Seq2Seq learning (LSTM‐Seq2Seq) and multilayer perceptron (MLP) was set as benchmarks. The results show that: (1) root mean absolute error is reduced by approximately 19% up to 27%, and the Nash–Sutcliffe coefficient of efficiency is improved by 14% up to 34% for 6‐h‐ahead runoff prediction for BiLSTM‐Seq2Seq compared LSTM‐Seq2Seq and MLP; (2) The BiLSTM‐Seq2Seq model has good performance not only for one‐peak flood events but also for multi‐peak flood events; and (3) BiLSTM‐Seq2Seq can mitigate the time‐delay problem and time lag is shortened by 39% up to 69% in comparison to LSTM‐Seq2Seq and MLP. These results suggest that the time‐delay problem can be mitigated by BiLSTM‐Seq2Seq, which has excellent potential in time series predictions in the hydrological field. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Part-of-Speech Tagging Using Long Short Term Memory (LSTM): Amazigh Text Written in Tifinaghe Characters
- Author
-
Maarouf, Otman, El Ayachi, Rachid, van der Aalst, Wil, Series Editor, Mylopoulos, John, Series Editor, Rosemann, Michael, Series Editor, Shaw, Michael J., Series Editor, Szyperski, Clemens, Series Editor, Fakir, Mohamed, editor, Baslam, Mohamed, editor, and El Ayachi, Rachid, editor
- Published
- 2021
- Full Text
- View/download PDF
9. BERT for Sequence-to-Sequence Multi-label Text Classification
- Author
-
Yarullin, Ramil, Serdyukov, Pavel, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, van der Aalst, Wil M. P., editor, Batagelj, Vladimir, editor, Ignatov, Dmitry I., editor, Khachay, Michael, editor, Koltsova, Olessia, editor, Kutuzov, Andrey, editor, Kuznetsov, Sergei O., editor, Lomazova, Irina A., editor, Loukachevitch, Natalia, editor, Napoli, Amedeo, editor, Panchenko, Alexander, editor, Pardalos, Panos M., editor, Pelillo, Marcello, editor, Savchenko, Andrey V., editor, and Tutubalina, Elena, editor
- Published
- 2021
- Full Text
- View/download PDF
10. Sign Language Recognition
- Author
-
Guo, Dan, Tang, Shengeng, Hong, Richang, Wang, Meng, McDaniel, Troy, editor, and Liu, Xueliang, editor
- Published
- 2021
- Full Text
- View/download PDF
11. Multistep ahead atmospheric optical turbulence forecasting for free-space optical communication using empirical mode decomposition and LSTM-based sequence-to-sequence learning
- Author
-
Yalin Li, Hongqun Zhang, Lang Li, Lu Shi, Yan Huang, and Shiyao Fu
- Subjects
atmospheric optical turbulence forecasting ,free-space optical communication ,empirical mode decomposition ,LSTM ,sequence-to-sequence learning ,Physics ,QC1-999 - Abstract
Although free-space optical communication (FSOC) is a promising means of high data rate satellite-to-ground communication, beam distortion caused by atmospheric optical turbulence remains a major challenge for its engineering applications. Accurate prediction of atmospheric optical turbulence to optimize communication plans and equipment parameters, such as adaptive optics (AO), is an effective means to address this problem. In this research, a hybrid multi-step prediction model for atmospheric optical turbulence, EMD-Seq2Seq-LSTM, is proposed by combining empirical mode decomposition (EMD), sequence-to-sequence (Seq2Seq), and long short-term memory (LSTM) network. First, using empirical mode decomposition to decompose the non-linear and non-stationary atmospheric optical turbulence dataset into a set of stationary components for which internal feature information can be easily extracted significantly reduces the training difficulty and improves the forecast accuracy of the model. Second, sequence-to-sequence is combined with LSTM networks to build a prediction model that can eliminate time delay and make full use of long-term information and then use the model to predict each component separately. Finally, the prediction results of each component are combined to obtain the final atmospheric turbulence forecasting results. To validate the performance of the proposed method, three comparative models, including WRF, LSTM, and sequence-to-sequence-LSTM, are demonstrated in this study. The forecasting results reveal that the proposed model outperforms all other models both qualitatively and quantitatively and thus can be a powerful method for atmospheric optical turbulence forecasting.
- Published
- 2023
- Full Text
- View/download PDF
12. Multi‐step‐ahead flood forecasting using an improved BiLSTM‐S2S model
- Author
-
Qing Cao, Hanchen Zhang, Feilin Zhu, Zhenchun Hao, and Feifei Yuan
- Subjects
bidirectional long short‐term memory ,flood modeling ,flood risk ,multi‐step‐ahead runoff forecast ,sequence‐to‐sequence learning ,River protective works. Regulation. Flood control ,TC530-537 ,Disasters and engineering ,TA495 - Abstract
Abstract Rainfall–runoff modeling is a complex hydrological issue that still has room for improvement. This study developed a coupled bidirectional long short‐term memory (LSTM) with sequence‐to‐sequence (Seq2Seq) learning (BiLSTM‐Seq2seq) model to simulate multi‐step‐ahead runoff for flood events. The bidirectional LSTM with Seq2Seq learning (LSTM‐Seq2Seq) and multilayer perceptron (MLP) was set as benchmarks. The results show that: (1) root mean absolute error is reduced by approximately 19% up to 27%, and the Nash–Sutcliffe coefficient of efficiency is improved by 14% up to 34% for 6‐h‐ahead runoff prediction for BiLSTM‐Seq2Seq compared LSTM‐Seq2Seq and MLP; (2) The BiLSTM‐Seq2Seq model has good performance not only for one‐peak flood events but also for multi‐peak flood events; and (3) BiLSTM‐Seq2Seq can mitigate the time‐delay problem and time lag is shortened by 39% up to 69% in comparison to LSTM‐Seq2Seq and MLP. These results suggest that the time‐delay problem can be mitigated by BiLSTM‐Seq2Seq, which has excellent potential in time series predictions in the hydrological field.
- Published
- 2022
- Full Text
- View/download PDF
13. Learning Higher Representations from Bioacoustics: A Sequence-to-Sequence Deep Learning Approach for Bird Sound Classification
- Author
-
Qiao, Yu, Qian, Kun, Zhao, Ziping, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Yang, Haiqin, editor, Pasupa, Kitsuchart, editor, Leung, Andrew Chi-Sing, editor, Kwok, James T., editor, Chan, Jonathan H., editor, and King, Irwin, editor
- Published
- 2020
- Full Text
- View/download PDF
14. Sequence-to-Sequence Emotional Voice Conversion With Strength Control
- Author
-
Heejin Choi and Minsoo Hahn
- Subjects
Voice conversion ,emotional voice conversion ,emotion strength ,sequence-to-sequence learning ,controllable emotional voice conversion ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This paper proposes an improved emotional voice conversion (EVC) method with emotional strength and duration controllability. EVC methods without duration mapping generate emotional speech with identical duration to that of the neutral input speech. In reality, even the same sentences would have different speeds and rhythms depending on the emotions. To solve this, the proposed method adopts a sequence-to-sequence network with an attention module that enables the network to learn attention in the neutral input sequence should be focused on which part of the emotional output sequence. Besides, to capture the multi-attribute aspects of emotional variations, an emotion encoder is designed for transforming acoustic features into emotion embedding vectors. By aggregating the emotion embedding vectors for each emotion, a representative vector for the target emotion is obtained and weighted to reflect emotion strength. By introducing a speaker encoder, the proposed method can preserve speaker identity even after the emotion conversion. Objective and subjective evaluation results confirm that the proposed method is superior to other previous works. Especially, in emotion strength control, we achieve in getting successful results.
- Published
- 2021
- Full Text
- View/download PDF
15. PIEED: Position information enhanced encoder-decoder framework for scene text recognition.
- Author
-
Ma, Xitao, He, Kai, Zhang, Dazhuang, and Li, Dashuang
- Subjects
TEXT recognition ,LONG-term memory ,SHORT-term memory ,DEEP learning - Abstract
Scene text recognition (STR) technology has a rapid development with the rise of deep learning. Recently, the encoder-decoder framework based on attention mechanism is widely used in STR for better recognition. However, the commonly used Long Short Term Memory (LSTM) network in the framework tends to ignore certain position or visual information. To address this problem, we propose a Position Information Enhanced Encoder-Decoder (PIEED) framework for scene text recognition, in which an addition position information enhancement (PIE) module is proposed to compensate the shortage of the LSTM network. Our module tends to retain more position information in the feature sequence, as well as the context information extracted by the LSTM network, which is helpful to improve the recognition accuracy of the text without context. Besides that, our fusion decoder can make full use of the output of the proposed module and the LSTM network, so as to independently learn and preserve useful features, which is helpful to improve the recognition accuracy while not increase the number of arguments. Our overall framework can be trained end-to-end only using images and ground truth. Extensive experiments on several benchmark datasets demonstrate that our proposed framework surpass state-of-the-art ones on both regular and irregular text recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
16. Lemmatization for Ancient Languages: Rules or Neural Networks?
- Author
-
Dereza, Oksana, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Ustalov, Dmitry, editor, Filchenkov, Andrey, editor, Pivovarova, Lidia, editor, and Žižka, Jan, editor
- Published
- 2018
- Full Text
- View/download PDF
17. A Hierarchical Conditional Attention-Based Neural Networks for Paraphrase Generation
- Author
-
Nguyen-Ngoc, Khuong, Le, Anh-Cuong, Nguyen, Viet-Ha, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Kaenampornpan, Manasawee, editor, Malaka, Rainer, editor, Nguyen, Duc Dung, editor, and Schwind, Nicolas, editor
- Published
- 2018
- Full Text
- View/download PDF
18. Generating Natural Answers on Knowledge Bases and Text by Sequence-to-Sequence Learning
- Author
-
Ye, Zhihao, Cai, Ruichu, Liao, Zhaohui, Hao, Zhifeng, Li, Jinfen, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Kůrková, Věra, editor, Manolopoulos, Yannis, editor, Hammer, Barbara, editor, Iliadis, Lazaros, editor, and Maglogiannis, Ilias, editor
- Published
- 2018
- Full Text
- View/download PDF
19. Retrospective Encoders for Video Summarization
- Author
-
Zhang, Ke, Grauman, Kristen, Sha, Fei, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Ferrari, Vittorio, editor, Hebert, Martial, editor, Sminchisescu, Cristian, editor, and Weiss, Yair, editor
- Published
- 2018
- Full Text
- View/download PDF
20. Markup: A Web-Based Annotation Tool Powered by Active Learning
- Author
-
Samuel Dobbie, Huw Strafford, W. Owen Pickrell, Beata Fonferko-Shadrach, Carys Jones, Ashley Akbari, Simon Thompson, and Arron Lacey
- Subjects
natural language processing ,active learning ,unstructured text ,annotation ,sequence-to-sequence learning ,Medicine ,Public aspects of medicine ,RA1-1270 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Across various domains, such as health and social care, law, news, and social media, there are increasing quantities of unstructured texts being produced. These potential data sources often contain rich information that could be used for domain-specific and research purposes. However, the unstructured nature of free-text data poses a significant challenge for its utilisation due to the necessity of substantial manual intervention from domain-experts to label embedded information. Annotation tools can assist with this process by providing functionality that enables the accurate capture and transformation of unstructured texts into structured annotations, which can be used individually, or as part of larger Natural Language Processing (NLP) pipelines. We present Markup (https://www.getmarkup.com/) an open-source, web-based annotation tool that is undergoing continued development for use across all domains. Markup incorporates NLP and Active Learning (AL) technologies to enable rapid and accurate annotation using custom user configurations, predictive annotation suggestions, and automated mapping suggestions to both domain-specific ontologies, such as the Unified Medical Language System (UMLS), and custom, user-defined ontologies. We demonstrate a real-world use case of how Markup has been used in a healthcare setting to annotate structured information from unstructured clinic letters, where captured annotations were used to build and test NLP applications.
- Published
- 2021
- Full Text
- View/download PDF
21. Machine translation using deep learning for universal networking language based on their structure.
- Author
-
Ali, Md. Nawab Yousuf, Rahman, Md. Lizur, Chaki, Jyotismita, Dey, Nilanjan, and Santosh, K. C.
- Abstract
This paper presents a deep learning-based machine translation (MT) system that translates a sentence of subject-object-verb (SOV) structured language into subject-verb-object (SVO) structured language. This system uses recurrent neural networks (RNNs) and Encodings. Encode embedded RNNs generate a set of numbers from the input sentence, where the second RNNs generate the output from these sets of numbers. Three popular datasets of SOV structured language i.e., EMILLE corpus, Prothom-Alo corpus and Punjabi Monolingual Text Corpus ILCI-II are used as two different case-study to validate. In our experimental case-study 1, for the EMILLE corpus and Prothom-Alo corpus dataset, we have achieved 0.742, 4.11 and 0.18, respectively as Bilingual Evaluation Understudy (BLEU), NIST (metric) and tertiary entrance rank scores. Another case-study for Punjabi Monolingual Text Corpus ILCI-II dataset achieved a BLEU score of 0.75. Our results can be compared with the state-of-the-art results. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
22. A Sequence-to-Sequence Model With Attention and Monotonicity Loss for Tool Wear Monitoring and Prediction.
- Author
-
Wang, Gang and Zhang, Feng
- Subjects
- *
DEEP learning , *CUTTING tools , *MACHINE learning , *NUMERICAL control of machine tools , *FEATURE extraction - Abstract
Recently, deep learning has been successfully applied in tool wear monitoring systems. However, since the tool wear accumulates in the cutting process, the state of the cutting tool shows a degradation trend, which has not been fully exploited by the current deep learning models. In this article, an end-to-end deep learning model, named sequence-to-sequence model with attention and monotonicity loss (SMAML) is proposed to simultaneously monitor and predict the tool wear. In the proposed SMAML, the encoder is utilized to extract features from signals collected by different sensors using the group convolution, while the decoder is designed to produce sequential outputs, including the results of tool wear monitoring and multistep-ahead prediction. Besides, a monotonicity loss function is proposed to capture the degeneration characters in these sequential outputs. The experiments are conducted on real-world datasets, which were collected from a high-speed CNC machine. The experimental results demonstrate the effectiveness of the proposed model which outperforms other conventional machine learning and deep learning models. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
23. SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition.
- Author
-
Dai, Pengwen, Zhang, Hua, and Cao, Xiaochun
- Subjects
- *
TEXT recognition , *ARTIFICIAL neural networks - Abstract
Scene text recognition, the final step of the scene text reading system, has made impressive progress based on deep neural networks. However, existing recognition methods devote to dealing with the geometrically regular or irregular scene text. They are limited to the semantically arbitrary-orientation scene text. Meanwhile, previous scene text recognizers usually learn the single-scale feature representations for various-scale characters, which cannot model effective contexts for different characters. In this paper, we propose a novel scale-adaptive orientation attention network for arbitrary-orientation scene text recognition, which consists of a dynamic log-polar transformer and a sequence recognition network. Specifically, the dynamic log-polar transformer learns the log-polar origin to adaptively convert the arbitrary rotations and scales of scene texts into the shifts in the log-polar space, which is helpful to generate the rotation-aware and scale-aware visual representation. Next, the sequence recognition network is an encoder-decoder model, which incorporates a novel character-level receptive field attention module to encode more valid contexts for various-scale characters. The whole architecture can be trained in an end-to-end manner, only requiring the word image and its corresponding ground-truth text. Extensive experiments on several public datasets have demonstrated the effectiveness and superiority of our proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
24. Linear‐Time Korean Morphological Analysis Using an Action‐based Local Monotonic Attention Mechanism
- Author
-
Hyunsun Hwang and Changki Lee
- Subjects
deep learning ,korean morphological analysis ,local attention mechanism ,natural language processing ,sequence‐to‐sequence learning ,Telecommunication ,TK5101-6720 ,Electronics ,TK7800-8360 - Abstract
For Korean language processing, morphological analysis is a critical component that requires extensive work. This morphological analysis can be conducted in an end‐to‐end manner without requiring a complicated feature design using a sequence‐to‐sequence model. However, the sequence‐to‐sequence model has a time complexity of O(n2) for an input length n when using the attention mechanism technique for high performance. In this study, we propose a linear‐time Korean morphological analysis model using a local monotonic attention mechanism relying on monotonic alignment, which is a characteristic of Korean morphological analysis. The proposed model indicates an extreme improvement in a single threaded environment and a high morphometric F1‐measure even for a hard attention model with the elimination of the attention mechanism formula.
- Published
- 2019
- Full Text
- View/download PDF
25. End-to-End Dialogue with Sentiment Analysis Features
- Author
-
Rinaldi, Alex, Oseguera, Omar, Tuazon, Joann, Cruz, Albert C., Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, and Stephanidis, Constantine, editor
- Published
- 2017
- Full Text
- View/download PDF
26. Context Aware Energy Disaggregation Using Adaptive Bidirectional LSTM Models.
- Author
-
Kaselimi, Maria, Doulamis, Nikolaos, Voulodimos, Athanasios, Protopapadakis, Eftychios, and Doulamis, Anastasios
- Abstract
Energy disaggregation, or Non-Intrusive Load Monitoring (NILM), describes various processes aiming to identify the individual contribution of appliances, given the aggregate power signal. In this paper, a non-causal adaptive context-aware bidirectional deep learning model for energy disaggregation is introduced. The proposed model, CoBiLSTM, harnesses the representational power of deep recurrent Long Short-Term Memory (LSTM) neural networks, while fitting two basic properties of NILM problem which state of the art methods do not appropriately account for: non-causality and adaptivity to contextual factors (e.g., seasonality). A Bayesian-optimized framework is introduced to select the best configuration of the proposed regression model, driven by a self-training adaptive mechanism. Furthermore, the proposed model is structured in a modular way to address multi-dimensionality issues that arise when the number of appliances increases. Experimental results indicate the proposed method’s superiority compared to the current state of the art. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
27. Linear‐Time Korean Morphological Analysis Using an Action‐based Local Monotonic Attention Mechanism.
- Author
-
Hwang, Hyunsun and Lee, Changki
- Subjects
KOREAN language ,NATURAL language processing ,MACHINE learning - Abstract
For Korean language processing, morphological analysis is a critical component that requires extensive work. This morphological analysis can be conducted in an end‐to‐end manner without requiring a complicated feature design using a sequence‐to‐sequence model. However, the sequence‐to‐sequence model has a time complexity of O(n2) for an input length n when using the attention mechanism technique for high performance. In this study, we propose a linear‐time Korean morphological analysis model using a local monotonic attention mechanism relying on monotonic alignment, which is a characteristic of Korean morphological analysis. The proposed model indicates an extreme improvement in a single threaded environment and a high morphometric F1‐measure even for a hard attention model with the elimination of the attention mechanism formula. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
28. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification.
- Author
-
Shi, Baoguang, Yang, Mingkun, Wang, Xinggang, Lyu, Pengyuan, Yao, Cong, and Bai, Xiang
- Subjects
- *
TEXT recognition , *ARTIFICIAL neural networks , *RECURRENT neural networks - Abstract
A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
29. Assessment of deep recurrent neural network-based strategies for short-term building energy predictions.
- Author
-
Fan, Cheng, Wang, Jiayuan, Gang, Wenjie, and Li, Shenghan
- Subjects
- *
CONSTRUCTION contracts , *STRUCTURAL engineering , *CONSTRUCTION industry , *CONSTRUCTION laws , *CONSTRUCTION materials - Abstract
Highlights • Various strategies have been proposed for short-term building energy predictions. • Three inference approaches have been exploited for multi-step ahead predictions. • Advanced techniques have been utilized for the development of recurrent models. • Model performance is evaluated based on prediction accuracy and computation loads. • The results can provide valuable insights for developing deep recurrent models. Abstract Accurate and reliable building energy predictions can bring significant benefits for energy conservations. With the development in smart buildings, massive amounts of building operational data are being collected and available for analysis. It is desired to develop big data-driven methods to fully realize the potential of building operational data in energy predictions. This paper investigates the usefulness of advanced recurrent neural network-based strategies for building energy predictions. Each strategy presents unique characteristics at two levels. At the high level, three inference approaches are used for generating short-term predictions, including the recursive approach, the direct approach and the multi-input and multi-output (MIMO) approach. At the low level, the state-of-the-art techniques are utilized for recurrent model development, such as the use of one-dimensional convolutional operations, bidirectional operations, and different types of recurrent units. The performance of different strategies has been assessed from different perspectives based on real building operational data. The research results help to bridge the knowledge gap between building professionals and advanced big data analytics. The insights obtained can be used as guidelines and references for developing advanced deep recurrent models for short-term building energy predictions. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
30. Dual attention-based multi-step ahead prediction enhancement for monitoring systems in industrial processes.
- Author
-
An, Nahyeon, Hong, Seokyoung, Kim, Yurim, Cho, Hyungtae, Lim, Jongkoo, Moon, Il, and Kim, Junghwan
- Subjects
MANUFACTURING processes ,INDUSTRIALISM ,PREDICTIVE control systems ,PREDICTION models - Abstract
In industrial processes, the ability to predict future steps is essential as it offers long-term insights, benefiting strategic decision-making. However, traditional sequence-to-sequence models designed to predict dynamic behaviors suffer from accumulating errors during recurrent predictions which use previous outputs as inputs for the next time step. In this article, we propose a dual attention-based encoder–decoder framework, specifically designed to enhance multi-step ahead predictions in industrial processes. The dual attention model strategically minimizes the error accumulation of output sequence by leveraging a temporal attention mechanism, which focuses on relevant time-steps in the input sequence, and a supervised attention mechanism that assigns different weights to output sequence errors during training. The supervised attention method, in particular, provides a significant improvement by focusing on minimizing the error of earlier steps during backpropagation using predefined attention weights, resulting in enhanced overall multistep prediction performance. Experiments on real-world industrial datasets demonstrate that our approach outperforms baseline models, specifically simple sequence-to-sequence and single attention-based sequence-to-sequence models. In fact, our dual attention framework consistently surpasses single attention models, currently regarded as state-of-the-art, at all prediction stages. The suggested approach has potential applications in the field of process monitoring and model predictive control. • Dual attention was introduced to enhance multi-step ahead prediction. • Hyperparameters were investigated to optimize supervised attention. • Supervised attention could effectively reduce cumulative errors in output sequence. • Dual attention-based predictions outperformed those from other established methods. • The longer the output sequence, the more effective supervised attention was. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
31. Representation Learning of Logic Words by an RNN: From Word Sequences to Robot Actions
- Author
-
Tatsuro Yamada, Shingo Murata, Hiroaki Arie, and Tetsuya Ogata
- Subjects
symbol grounding ,neural network ,human–robot interaction ,logic words ,language understanding ,sequence-to-sequence learning ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
An important characteristic of human language is compositionality. We can efficiently express a wide variety of real-world situations, events, and behaviors by compositionally constructing the meaning of a complex expression from a finite number of elements. Previous studies have analyzed how machine-learning models, particularly neural networks, can learn from experience to represent compositional relationships between language and robot actions with the aim of understanding the symbol grounding structure and achieving intelligent communicative agents. Such studies have mainly dealt with the words (nouns, adjectives, and verbs) that directly refer to real-world matters. In addition to these words, the current study deals with logic words, such as “not,” “and,” and “or” simultaneously. These words are not directly referring to the real world, but are logical operators that contribute to the construction of meaning in sentences. In human–robot communication, these words may be used often. The current study builds a recurrent neural network model with long short-term memory units and trains it to learn to translate sentences including logic words into robot actions. We investigate what kind of compositional representations, which mediate sentences and robot actions, emerge as the network's internal states via the learning process. Analysis after learning shows that referential words are merged with visual information and the robot's own current state, and the logical words are represented by the model in accordance with their functions as logical operators. Words such as “true,” “false,” and “not” work as non-linear transformations to encode orthogonal phrases into the same area in a memory cell state space. The word “and,” which required a robot to lift up both its hands, worked as if it was a universal quantifier. The word “or,” which required action generation that looked apparently random, was represented as an unstable space of the network's dynamical system.
- Published
- 2017
- Full Text
- View/download PDF
32. Approximate Computing for Long Short Term Memory (LSTM) Neural Networks.
- Author
-
Sen, Sanchari and Raghunathan, Anand
- Subjects
- *
MACHINE learning , *DEEP learning , *MICROPROCESSORS , *COMPUTER storage capacity , *REAL-time computing , *APPLICATION software , *DATA removal (Computer science) - Abstract
Long Short Term Memory (LSTM) networks are a class of recurrent neural networks that are widely used for machine learning tasks involving sequences, including machine translation, text generation, and speech recognition. Large-scale LSTMs, which are deployed in many real-world applications, are highly compute intensive. To address this challenge, we propose AxLSTM, an application of approximate computing to improve the execution efficiency of LSTMs. An LSTM is composed of cells, each of which contains a cell state along with multiple gating units that control the addition and removal of information from the state. The LSTM execution proceeds in timesteps, with a new symbol of the input sequence processed at each timestep. AxLSTM consists of two techniques—Dynamic Timestep Skipping (DTS) and Dynamic State Reduction (DSR). DTS identifies, at runtime, input symbols that are likely to have little or no impact on the cell state and skips evaluating the corresponding timesteps. In contrast, DSR reduces the size of the cell state in accordance with the complexity of the input sequence, leading to a reduced number of computations per timestep. We describe how AxLSTM can be applied to the most common application of LSTMs, viz., sequence-to-sequence learning. We implement AxLSTM within the TensorFlow deep learning framework and evaluate it on 3 state-of-the-art sequence-to-sequence models. On a 2.7 GHz Intel Xeon server with 128 GB memory and 32 processor cores, AxLSTM achieves $ {1.08\times -1.31 \times }$ speedups with minimal loss in quality, and $ {1.12 \times -1.37 \times }$ speedups when moderate reductions in quality are acceptable. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
33. Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks.
- Author
-
Plappert, Matthias, Mandery, Christian, and Asfour, Tamim
- Subjects
- *
ARTIFICIAL neural networks , *NATURAL languages , *SEMANTICS , *MOTION , *ALGORITHMS - Abstract
Abstract Linking human whole-body motion and natural language is of great interest for the generation of semantic representations of observed human behaviors as well as for the generation of robot behaviors based on natural language input. While there has been a large body of research in this area, most approaches that exist today require a symbolic representation of motions (e.g. in the form of motion primitives), which have to be defined a-priori or require complex segmentation algorithms. In contrast, recent advances in the field of neural networks and especially deep learning have demonstrated that sub-symbolic representations that can be learned end-to-end usually outperform more traditional approaches, for applications such as machine translation. In this paper we propose a generative model that learns a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks (RNNs) and sequence-to-sequence learning. Our approach does not require any segmentation or manual feature engineering and learns a distributed representation, which is shared for all motions and descriptions. We evaluate our approach on 2 846 human whole-body motions and 6 187 natural language descriptions thereof from the KIT Motion-Language Dataset. Our results clearly demonstrate the effectiveness of the proposed model: We show that our model generates a wide variety of realistic motions only from descriptions thereof in form of a single sentence. Conversely, our model is also capable of generating correct and detailed natural language descriptions from human motions. Highlights • We present a novel method to learn a bidirectional mapping between human motion and natural language. • Our model is capable of accurately describing a wide range of human motion in complete sentences. • We further show that our model can generate versatile and rich motions from natural language descriptions. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
34. Product review summarization through question retrieval and diversification.
- Author
-
Liu, Mengwen, Fang, Yi, Choulos, Alexander, Park, Dae, and Hu, Xiaohua
- Subjects
- *
CONSUMERS , *CONSUMER behavior , *SUBMODULAR functions , *NEURAL circuitry , *PORTFOLIO diversification - Abstract
Product reviews have become an important resource for customers before they make purchase decisions. However, the abundance of reviews makes it difficult for customers to digest them and make informed choices. In our study, we aim to help customers who want to quickly capture the main idea of a lengthy product review before they read the details. In contrast with existing work on review analysis and document summarization, we aim to retrieve a set of real-world user questions to summarize a review. In this way, users would know what questions a given review can address and they may further read the review only if they have similar questions about the product. Specifically, we design a two-stage approach which consists of question selection and question diversification. For question selection phase, we first employ probabilistic retrieval models to locate candidate questions that are relevant to a given review. A Recurrent Neural Network Encoder-Decoder is utilized to measure the 'answerability' of questions to a review. We then design a set function to re-rank the questions with the goal of rewarding diversity in the final question set. The set function satisfies submodularity and monotonicity, which results in an efficient greedy algorithm of submodular optimization. Evaluation on product reviews from two categories shows that the proposed approach is effective for discovering meaningful questions that are representative of individual reviews. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
35. Recurrent neural network-based semantic variational autoencoder for Sequence-to-sequence learning.
- Author
-
Jang, Myeongjun, Seo, Seungwan, and Kang, Pilsung
- Subjects
- *
RECURRENT neural networks , *SEMANTICS , *NATURAL language processing , *MACHINE translating , *STANDARD deviations - Abstract
Abstract Sequence-to-sequence (Seq2seq) models have played an important role in the recent success of various natural language processing methods, such as machine translation, text summarization, and speech recognition. However, current Seq2seq models have trouble preserving global latent information from a long sequence of words. Variational autoencoder (VAE) alleviates this problem by learning a continuous semantic space of the input sentence. However, it does not solve the problem completely. In this paper, we propose a new recurrent neural network (RNN)-based Seq2seq model, RNN semantic variational autoencoder (RNN–SVAE), to better capture the global latent information of a sequence of words. To suitably reflect the meanings of words in a sentence regardless of their position within the sentence, we utilized two approaches: (1) constructing a document information vector based on the attention information between the final state of the encoder and every prior hidden state, and (2) extracting the semantic vector based on the self-attention mechanism. Then, the mean and standard deviation of the continuous semantic space are learned by using this vector to take advantage of the variational method. By using the document information vector and the self-attention mechanism to find the semantic space of the sentence, it becomes possible to better capture the global latent feature of the sentence. Experimental results of three natural language tasks (i.e., language modeling, missing word imputation, paraphrase identification) confirm that the proposed RNN–SVAE yields higher performance than two benchmark models. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
36. Generating Mobility Trajectories with Retained Data Utility
- Author
-
Cao, Chu, Li, Mo, Cao, Chu, and Li, Mo
- Abstract
This paper presents TrajGen, an approach to generate artificial datasets of mobility trajectories based on an original trajectory dataset while retaining the utility of the original data in supporting various mobility applications. The generated mobility data is disentangled with the original data and can be shared without compromising the data privacy. TrajGen leverages Generative Adversarial Nets combined with a Seq2Seq model to generate the spatial-temporal trajectory data. TrajGen is implemented and evaluated with real-world taxi trajectory data in Singapore. The extensive experimental results demonstrate that TrajGen is able to generate artificial trajectory data that retain key statistical characteristics of the original data. Two case studies, i.e. road map updating and Origin-Destination demand estimation are performed with the generated artificial data, and the results show that the artificial trajectories generated by TrajGen retain the utility of original data in supporting the two applications. © 2021 ACM.
- Published
- 2021
37. Representation Learning of Logic Words by an RNN: From Word Sequences to Robot Actions
- Author
-
Tetsuya Ogata, Hiroaki Arie, Shingo Murata, and Tatsuro Yamada
- Subjects
neural network ,Computer science ,Principle of compositionality ,sequence-to-sequence learning ,Biomedical Engineering ,02 engineering and technology ,Meaning (non-linguistic) ,computer.software_genre ,lcsh:RC321-571 ,03 medical and health sciences ,0302 clinical medicine ,Artificial Intelligence ,Noun ,0202 electrical engineering, electronic engineering, information engineering ,human–robot interaction ,symbol grounding ,lcsh:Neurosciences. Biological psychiatry. Neuropsychiatry ,Original Research ,Structure (mathematical logic) ,business.industry ,language understanding ,Expression (computer science) ,logic words ,Variety (linguistics) ,Symbol grounding ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Feature learning ,030217 neurology & neurosurgery ,Natural language processing ,Neuroscience - Abstract
An important characteristic of human language is compositionality. We can efficiently express a wide variety of real-world situations, events, and behaviors by compositionally constructing the meaning of a complex expression from a finite number of elements. Previous studies have analyzed how machine-learning models, particularly neural networks, can learn from experience to represent compositional relationships between language and robot actions with the aim of understanding the symbol grounding structure and achieving intelligent communicative agents. Such studies have mainly dealt with the words (nouns, adjectives, and verbs) that directly refer to real-world matters. In addition to these words, the current study deals with logic words, such as “not,” “and,” and “or” simultaneously. These words are not directly referring to the real world, but are logical operators that contribute to the construction of meaning in sentences. In human–robot communication, these words may be used often. The current study builds a recurrent neural network model with long short-term memory units and trains it to learn to translate sentences including logic words into robot actions. We investigate what kind of compositional representations, which mediate sentences and robot actions, emerge as the network's internal states via the learning process. Analysis after learning shows that referential words are merged with visual information and the robot's own current state, and the logical words are represented by the model in accordance with their functions as logical operators. Words such as “true,” “false,” and “not” work as non-linear transformations to encode orthogonal phrases into the same area in a memory cell state space. The word “and,” which required a robot to lift up both its hands, worked as if it was a universal quantifier. The word “or,” which required action generation that looked apparently random, was represented as an unstable space of the network's dynamical system.
- Published
- 2017
- Full Text
- View/download PDF
38. Markup: A Web-Based Annotation Tool Powered by Active Learning.
- Author
-
Dobbie S, Strafford H, Pickrell WO, Fonferko-Shadrach B, Jones C, Akbari A, Thompson S, and Lacey A
- Abstract
Across various domains, such as health and social care, law, news, and social media, there are increasing quantities of unstructured texts being produced. These potential data sources often contain rich information that could be used for domain-specific and research purposes. However, the unstructured nature of free-text data poses a significant challenge for its utilisation due to the necessity of substantial manual intervention from domain-experts to label embedded information. Annotation tools can assist with this process by providing functionality that enables the accurate capture and transformation of unstructured texts into structured annotations, which can be used individually, or as part of larger Natural Language Processing (NLP) pipelines. We present Markup (https://www.getmarkup.com/) an open-source, web-based annotation tool that is undergoing continued development for use across all domains. Markup incorporates NLP and Active Learning (AL) technologies to enable rapid and accurate annotation using custom user configurations, predictive annotation suggestions, and automated mapping suggestions to both domain-specific ontologies, such as the Unified Medical Language System (UMLS), and custom, user-defined ontologies. We demonstrate a real-world use case of how Markup has been used in a healthcare setting to annotate structured information from unstructured clinic letters, where captured annotations were used to build and test NLP applications., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2021 Dobbie, Strafford, Pickrell, Fonferko-Shadrach, Jones, Akbari, Thompson and Lacey.)
- Published
- 2021
- Full Text
- View/download PDF
39. Learning deep representation for trajectory clustering.
- Author
-
Yao, Di, Zhang, Chao, Zhu, Zhihua, Hu, Qin, Wang, Zheng, Huang, Jianhui, and Bi, Jingping
- Subjects
- *
TRAJECTORY measurements , *TRAJECTORY optimization , *CLUSTER analysis (Statistics) , *MACHINE learning , *PATTERN perception - Abstract
Abstract: Trajectory clustering, which aims at discovering groups of similar trajectories, has long been considered as a corner stone task for revealing movement patterns as well as facilitating higher level applications such as location prediction and activity recognition. Although a plethora of trajectory clustering techniques have been proposed, they often rely on spatio‐temporal similarity measures that are not space and time invariant. As a result, they cannot detect trajectory clusters where the within‐cluster similarity occurs in different regions and time periods. In this paper, we revisit the trajectory clustering problem by learning quality low‐dimensional representations of the trajectories. We first use a sliding window to extract a set of moving behaviour features that capture space‐ and time‐invariant characteristics of the trajectories. With the feature extraction module, we transform each trajectory into a feature sequence to describe object movements and further employ a sequence‐to‐sequence auto‐encoder to learn fixed‐length deep representations. The learnt representations robustly encode the movement characteristics of the objects and thus lead to space‐ and time‐invariant clusters. We evaluate the proposed method on both synthetic and real data and observe significant performance improvements over existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
40. Representation Learning of Logic Words by an RNN: From Word Sequences to Robot Actions.
- Author
-
Yamada T, Murata S, Arie H, and Ogata T
- Abstract
An important characteristic of human language is compositionality. We can efficiently express a wide variety of real-world situations, events, and behaviors by compositionally constructing the meaning of a complex expression from a finite number of elements. Previous studies have analyzed how machine-learning models, particularly neural networks, can learn from experience to represent compositional relationships between language and robot actions with the aim of understanding the symbol grounding structure and achieving intelligent communicative agents. Such studies have mainly dealt with the words (nouns, adjectives, and verbs) that directly refer to real-world matters. In addition to these words, the current study deals with logic words, such as "not," "and," and "or" simultaneously. These words are not directly referring to the real world, but are logical operators that contribute to the construction of meaning in sentences. In human-robot communication, these words may be used often. The current study builds a recurrent neural network model with long short-term memory units and trains it to learn to translate sentences including logic words into robot actions. We investigate what kind of compositional representations, which mediate sentences and robot actions, emerge as the network's internal states via the learning process. Analysis after learning shows that referential words are merged with visual information and the robot's own current state, and the logical words are represented by the model in accordance with their functions as logical operators. Words such as "true," "false," and "not" work as non-linear transformations to encode orthogonal phrases into the same area in a memory cell state space. The word "and," which required a robot to lift up both its hands, worked as if it was a universal quantifier. The word "or," which required action generation that looked apparently random, was represented as an unstable space of the network's dynamical system.
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.