Descriptor: "Zero-shot" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zero-shot"' showing total 171 results

Start Over Descriptor "Zero-shot"

171 results on '"Zero-shot"'

151. Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques

Author: Tu Anh Dinh, Danni Liu, Jan Niehues, Dept. of Advanced Computing Sciences, and RS: FSE DACS
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, I.2.7, few-shot, speech translation, Computer Science - Sound, machine translation, zero-shot, multi-task, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recently, end-to-end speech translation (ST) has gained significant attention as it avoids error propagation. However, the approach suffers from data scarcity. It heavily depends on direct ST data and is less efficient in making use of speech transcription and text translation data, which is often more easily available. In the related field of multilingual text translation, several techniques have been proposed for zero-shot translation. A main idea is to increase the similarity of semantically similar sentences in different languages. We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data. We investigate the effects of data augmentation and auxiliary loss function. The techniques were successfully applied to few-shot ST using limited ST data, with improvements of up to +12.9 BLEU points compared to direct end-to-end ST and +3.1 BLEU points compared to ST models fine-tuned from ASR model., Comment: 6 pages, 5 figures, accepted to IEEE ICASSP 2022. arXiv admin note: text overlap with arXiv:2107.06010
Published: 2022
Full Text: View/download PDF

152. Zero-Shot Pipeline Detection for Sub-Bottom Profiler Data Based on Imaging Principles

Author: Jianhu Zhao, Jie Feng, Shaobo Li, and Gen Zheng
Subjects: pipeline detection, Pixel, Computer science, business.industry, Deep learning, Pipeline (computing), Science, Sample (graphics), Object detection, Standard deviation, sub-bottom profiler, zero-shot, YOLOv5s, Vertical direction, General Earth and Planetary Sciences, Computer vision, Artificial intelligence, Underwater, business
Abstract: With the increasing number of underwater pipeline investigation activities, the research on automatic pipeline detection is of great significance. At this stage, object detection algorithms based on Deep Learning (DL) are widely used due to their abilities to deal with various complex scenarios. However, DL algorithms require massive representative samples, which are difficult to obtain for pipeline detection with sub-bottom profiler (SBP) data. In this paper, a zero-shot pipeline detection method is proposed. First, an efficient sample synthesis method based on SBP imaging principles is proposed to generate samples. Then, the generated samples are used to train the YOLOv5s network and a pipeline detection strategy is developed to meet the real-time requirements. Finally, the trained model is tested with the measured data. In the experiment, the trained model achieved a mAP@0.5 of 0.962, and the mean deviation of the predicted pipeline position is 0.23 pixels with a standard deviation of 1.94 pixels in the horizontal direction and 0.34 pixels with a standard deviation of 2.69 pixels in the vertical direction. In addition, the object detection speed also met the real-time requirements. The above results show that the proposed method has the potential to completely replace the manual interpretation and has very high application value.
Published: 2021

153. Analyse morpho-syntaxique massivement multilingue à l'aide de ressources typologiques, d'annotations universelles et de plongements de mots multilingues

Author: Scholivet, Manon, Traitement Automatique du Langage Ecrit et Parlé (TALEP), Laboratoire d'Informatique et Systèmes (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), Aix Marseille Université (AMU), Alexis Nasr, and Carlos Ramisch
Subjects: étiquetage morpho-syntaxique, zero-shot, [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG], multilingue, parsing, tagging, typological features, multilingual, [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], analyse syntaxique, traits typologiques
Abstract: Data annotation is a major problem in all machine learning tasks. In the field of NLP, this problem is multiplied by the number of existing languages.Many languages do not have any annotations, and are therefore excluded from NLP systems. One possible solution to integrate these languages into the systems is to try to leverage the languages having many annotations, and to try to learn information about these resource-rich languages, and to transfer this knowledge to the low-resources languages.It is possible to rely on initiatives such as Universal Dependencies, which propose a universal annotation scheme between languages. The use of multilingual word embeddings and typological features from resources such as the WALS are solutions allowing knowledge sharing between languages.These tracks are studied in the framework of this thesis, through the prediction of parsing, morphology and parts of speech on 41 languages in total. We show that the impact of the WALS can be positive in a multilingual setting, but that its usefulness is not systematic in a zero-shot learning setting. Other language representations can be learned from the data, and perform better than the WALS, but have the downside of not working in a zero-shot setting. We also highlight the importance of the presence of a nearby language when learning patterns, as well as the problems associated with using a character pattern for isolated languages.; L'annotation de données est un problème majeur dans toutes les tâches d'apprentissage automatique. Dans le domaine du TAL, ce problème est multiplié par le nombre de langues existantes.De nombreuses langues se retrouvent sans annotations, et sont alors mises à l'écart des systèmes de TAL. Une solution possible pour intégrer ces langues dans les systèmes est de tenter d'exploiter les langues disposant de nombreuses annotations, d'apprendre des informations sur ces langues bien dotées, et de transférer ce savoir vers les langues peu dotées. Pour cela, il est possible de se reposer sur des initiatives comme les Universal Dependencies, qui proposent un schéma d'annotation universel entre les langues. L'utilisation de plongements de mots multilingues et de traits typologiques issus de ressources comme le WALS sont des solutions permettant un partage de connaissances entre les langues.Ces pistes sont étudiées dans le cadre de cette thèse, à travers la prédiction de l'analyse syntaxique, de la morphologie et des parties du discours sur 41 langues au total. Nous montrons que l'impact du WALS peut être positif dans un cadre multilingue, mais que son utilité n'est pas systématique dans une configuration d'apprentissage zero-shot. D'autres représentations des langues peuvent être apprises sur les données, et donnent de meilleurs résultats que le WALS, mais ont l'inconvénient de ne pas fonctionner dans un cadre de zero-shot. Nous mettons également en évidence l'importance de la présence d'une langue proche lors de l'apprentissage des modèles, ainsi que les problèmes liés à l'utilisation d'un modèle de caractère pour les langues isolées.
Published: 2021

154. Multilingual Zero-Shot and Few-Shot Causality Detection

Author: Reimann, Sebastian Michael and Reimann, Sebastian Michael
Abstract: Relations that hold between causes and their effects are fundamental for a wide range of different sectors. Automatically finding sentences that express such relations may for example be of great interest for the economy or political institutions. However, for many languages other than English, a lack of training resources for this task needs to be dealt with. In recent years, large, pretrained transformer-based model architectures have proven to be very effective for tasks involving cross-lingual transfer such as cross-lingual language inference, as well as multilingual named entity recognition, POS-tagging and dependency parsing, which may hint at similar potentials for causality detection. In this thesis, we define causality detection as a binary labelling problem and use cross-lingual transfer to alleviate data scarcity for German and Swedish by using three different classifiers that make either use of multilingual sentence embeddings obtained from a pretrained encoder or pretrained multilingual language models. The source languages in most of our experiments will be English, for Swedish we however also use a small German training set and a combination of English and German training data. We try out zero-shot transfer as well as making use of limited amounts of target language data either as a development set or as additional training data in a few-shot setting. In the latter scenario, we explore the impact of varying sizes of training data. Moreover, the problem of data scarcity in our situation also makes it necessary to work with data from different annotation projects. We also explore how much this would impact our result. For German as a target language, our results in a zero-shot scenario expectedly fall short in comparison with monolingual experiments, but F1-macro scores between 60 and 65 in cases where annotation did not differ drastically still signal that it was possible to transfer at least some knowledge. When introducing only small amounts of targ
Published: 2021

155. Inductive Zero-Shot Image Annotation via Embedding Graph

Author: Fei Yuan, Yuejun Li, Fangxin Wang, Jie Liu, Shuwu Zhang, and Guixuan Zhang
Subjects: Node2Vec, General Computer Science, business.industry, Computer science, General Engineering, Pattern recognition, image annotation, Annotation, zero-shot, Automatic image annotation, Contextualized word embeddings, graph convolutional network, Embedding, Graph (abstract data type), General Materials Science, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, lcsh:TK1-9971
Abstract: Conventional image annotation systems can only handle those images having labels within the exist library, but cannot recognize those novel labels. In order to learn new concepts, one has to gather large amount of labeled images and train the model from scratch. More importantly, it can come with a high price to collect those labeled images. For these reasons, we put forward a zero-shot image annotation model, to reduce the demand for the images with novel labels. In this paper, we focus on the two big challenges of zero-shot image annotation: polysemous words and a strong bias in the generalized zero-shot setting. For the first problem, instead of training on large corpus datasets as previous methods, we propose to adopt Node2Vec to obtain contextualized word embeddings, which can easily produce word vectors of the polysemous words. For the second problem, we alleviate the strong bias in two ways: on one hand, we utilize a model based on graph convolutional network (GCN) to make target images involved in the training process; on the other hand, we put forward a novel semantic coherent (SC) loss to capture the semantic relations of the source and target labels. The extensive experiments on NUSWIDE, COCO, IAPR TC-12, and Core15k datasets show the superiority of the proposed model and the annotation performance get improved by 4%-6% comparing with state-of-the-art methods.
Published: 2019

156. Cross-modal prototype learning for zero-shot handwritten character recognition.

Author: Ao, Xiang, Zhang, Xu-Yao, and Liu, Cheng-Lin
Subjects: *PATTERN recognition systems, *HANDWRITING recognition (Computer science), *ARTIFICIAL neural networks, *PROTOTYPES, *CHINESE characters
Abstract: • We extend the cross-modal prototype learning CMPL framework to three modalities. • CMPL achieves state-of-the-art results on online and offline zero-shot handwritten character recognition. • CMPL shows promising cross-domain generalization ability in zero-shot handwritten character recognition. Traditional methods of handwritten character recognition rely on extensive labeled data. However, humans can generalize to unseen handwritten characters by watching a few printed examples in textbooks. To simulate this ability, we propose a cross-modal prototype learning method (CMPL) to realize zero-shot recognition. For each character class, a prototype is generated by mapping the printed character into a deep neural network feature space. For unseen character class, its prototype can be directly produced from a printed character sample, therefore, not requiring any handwritten samples to realize class-incremental learning. Specifically, CMPL considers different modalities simultaneously - online handwritten trajectories, offline handwritten images, and auxiliary printed character images. The joint learning of the above modalities is achieved through sharing printed prototypes between online and offline data. In zero-shot inference, we feed CMPL the printed samples to obtain corresponding class prototypes, and then the unseen handwritten character can be recognized by the nearest prototype. Our experimental results demonstrate that CMPL outperforms the state-of-the-art methods in both online and offline zero-shot handwritten Chinese character recognition. Moreover, we also show the cross-domain generalization of CMPL from two perspectives: cross-language and modern-to-ancient handwritten character recognition, focusing on the transferability between different languages and different styles (i.e., modern and historical handwritings). [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

157. Zero-shot stance detection: Paradigms and challenges.

Author: Allaway E and McKeown K
Abstract: A major challenge in stance detection is the large (potentially infinite) and diverse set of stance topics. Collecting data for such a set is unrealistic due to both the expense of annotation and the continuous creation of new real-world topics (e.g., a new politician runs for office). Furthermore, stancetaking occurs in a wide range of languages and genres (e.g., Twitter, news articles). While zero-shot stance detection in English, where evaluation is on topics not seen during training, has received increasing attention, we argue that this attention should be expanded to multilingual and multi-genre settings. We discuss two paradigms for English zero-shot stance detection evaluation, as well as recent work in this area. We then discuss recent work on multilingual and multi-genre stance detection, which has focused primarily on non-zero-shot settings. We argue that this work should be expanded to multilingual and multi-genre zero-shot stance detection and propose best practices to systematize and stimulate future work in this direction. While domain adaptation techniques are well-suited for work in these settings, we argue that increased care should be taken to improve model explainability and to conduct robust evaluations, considering not only empirical generalization ability but also the understanding of complex language and inferences., Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest., (Copyright © 2023 Allaway and McKeown.)
Published: 2023
Full Text: View/download PDF

158. Multilingual Dependency Parsing of Uralic Languages : Parsing with zero-shot transfer and cross-lingual models using geographically proximate, genealogically related, and syntactically similar transfer languages

Author: Erenmalm, Elsa and Erenmalm, Elsa
Abstract: One way to improve dependency parsing scores for low-resource languages is to make use of existing resources from other closely related or otherwise similar languages. In this paper, we look at eleven Uralic target languages (Estonian, Finnish, Hungarian, Karelian, Livvi, Komi Zyrian, Komi Permyak, Moksha, Erzya, North Sámi, and Skolt Sámi) with treebanks of varying sizes and select transfer languages based on geographical, genealogical, and syntactic distances. We focus primarily on the performance of parser models trained on various combinations of geographically proximate and genealogically related transfer languages, in target-trained, zero-shot, and cross-lingual configurations. We find that models trained on combinations of geographically proximate and genealogically related transfer languages reach the highest LAS in most zero-shot models, while our highest-performing cross-lingual models were trained on genealogically related languages. We also find that cross-lingual models outperform zero-shot transfer models. We then select syntactically similar transfer languages for three target languages, and find a slight improvement in the case of Hungarian. We discuss the results and conclude with suggestions for possible future work.
Published: 2020

159. Cross-lingual Word Embeddings Beyond Zero-shot Machine Translation

Author: Shifei, Chen and Shifei, Chen
Abstract: Zero-shot translation is a transfer learning setup that refers to the ability of neural machine translation to generalize translation information into unseen language pairs. It provides an appealing solution to the lack of available materials for low-resource languages by transferring knowledge from high-resource languages. So far, zero-shot translation mainly focuses on unseen language pairs whose individual component is still known to the system. There are fewer reports on transfer learning in machine translation being carried out on completely unknown test languages. This thesis pushes the boundary of zero-shot translation and explores the possibility of transferring learning from training languages to unknown test languages in a multilingual Neural Machine Translation (NMT) system. Based on the fact that zero-shot translation systems primarily learn language invariant features, we use cross-lingual word embeddings as the only knowledge source since they are good at capturing the semantic similarity of words from different languages in the same vector space. By conducting experiments on an encoder-decoder multilingual NMT model with an attention module, we have examined the relationship of language similarity and the transferability of unseen languages. We hypothesize that our multilingual NMT model with cross-lingual word embeddings should transfer reasonably even to completely unknown languages. However, we observe little transferability from the training languages to unseen test languages due to the transformed output vector space. Such minor transferability only happens between highly-related languages with a large number of shared vocabularies.
Published: 2020

160. That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Author: Żelasko, Piotr (author), Moro-Velázquez, Laureano (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author), Dehak, Najim (author), Żelasko, Piotr (author), Moro-Velázquez, Laureano (author), Hasegawa-Johnson, Mark (author), Scharenborg, O.E. (author), and Dehak, Najim (author)
Abstract: Only a handful of the world’s languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a multilingual automatic speech recognition (ASR) model, which, intuitively, should learn some universal phonetic representations. In this work, we focus on gaining a deeper understanding of how general these representations might be, and how individual phones are getting improved in a multilingual setting. To that end, we select a phonetically diverse set of languages, and perform a series of monolingual, multilingual and crosslingual (zero-shot) experiments. The ASR is trained to recognize the International Phonetic Alphabet (IPA) token sequences. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting, where the model, among other errors, considers Javanese as a tone language. Notably, as little as 10 hours of the target language training data tremendously reduces ASR error rates. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages — an encouraging result for the low-resource speech community., Multimedia Computing
Published: 2020
Full Text: View/download PDF

161. Similar language translation

Author: Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Ruiz Costa-Jussà, Marta, Vergés Boncompte, Pere, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Ruiz Costa-Jussà, Marta, and Vergés Boncompte, Pere
Abstract: La traducció automàtica és la tasca de traduir automàticament unidioma a un altre. Aquest projecte avalua el rendiment dels últims siste-mes d'aprenentatge profund en la tasca de traducció d'idiomes similars.Avaluarem la traducció entre Català, Castella i Portuguès, que són llen-gües romàniques, per veure com l'arquitectura del Transformer realitzala tasca. També farem servir diverses tècniques per millorar la traduccióentre els idiomes. Primer, utilitzarem model multilingües que permetenfer transferència de coneixement entre idiomes i poder fer traduccionszero-shot. Després aplicarem backtranslation per poder fer ús dels textsmonolingües i millorar les traduccions del sistema. Per últim milloraremla traducció de domini específic fent ús de fine tuning., Machine translation is the task of automatically translating one lan-guage into another. This project aims to evaluate the performance ofstate-of-the-art Deep Learning systems on similar language translation.We will to evaluate the translation between Catalan, Spanish, and Por-tuguese, which are Romance languages, and see how the Transformer ar-chitecture performs in this task. We will additionally make use of differenttechniques to improve the translation between these languages. First ofall, we will make use of multilingual models that allow for transfer-learningas well as zero-shot translations. Secondly, we will apply the backtransla-tion technique to make use of the monolingual data and better the systemtranslations. Lastly, we will improve the specific domain data using finetuning.
Published: 2020

162. Object Recognition with Zero-shot Learning

Author: Tezcan, Burak, Taşdemir, Şakir, Selçuk Üniversitesi, Teknoloji Fakültesi, Bilgisayar Mühendisliği Bölümü, Tezcan, Burak, and Taşdemir, Şakir
Subjects: Object recognition, Classification, Zero-shot
Abstract: Zero-shot learning aims to classify unseen class examples. It gained popularity in applications where examples for each category are limited. The main issue to consider is transferring information from seen classes to unseen classes via mapping image space to semantic space. Therefore, mapping from image space to semantic space is at the core of the learning process. In this work, Google’s Word2vec were used for semantic space. Total of 20 classes, 15 for training and 5 for zero-shot classes were chosen from Visual Gnome dataset. We have achieved 0.71 accuracy for top-5 classes.
Published: 2021

163. Discovering phonetic inventories with crosslingual automatic speech recognition.

Author: Żelasko, Piotr, Feng, Siyuan, Moro Velázquez, Laureano, Abavisani, Ali, Bhati, Saurabhchand, Scharenborg, Odette, Hasegawa-Johnson, Mark, and Dehak, Najim
Subjects: *AUTOMATIC speech recognition, *MULTILINGUAL communication, *LANGUAGE & languages, *PHONOTACTICS, *TONE (Phonetics)
Abstract: The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or for which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well as zero-shot learning in order to build ASR systems for these low-resource languages. While it has been shown that the pooling of resources from multiple languages is helpful, we have not yet seen a successful application of an ASR model to a language unseen during training. A crucial step in the adaptation of ASR from seen to unseen languages is the creation of the phone inventory of the unseen language. The ultimate goal of our work is to build the phone inventory of a language unseen during training in an unsupervised way without any knowledge about the language. In this paper, we (1) investigate the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; (2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and (3) present different methods to build a phone inventory of an unseen language in an unsupervised way. To that end, we conducted mono-, multi-, and crosslingual experiments on a set of 13 phonetically diverse languages and several in-depth analyses. We found a number of universal phone tokens (IPA symbols) that are well-recognized cross-linguistically. Through a detailed analysis of results, we conclude that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery. [Display omitted] • We perform phonetic inventory discovery and present a way of approaching it with ASR. • We train mono-, multi-, and crosslingual hybrid and E2E ASR on 13 diverse languages. • Not knowing the target language phonotactics a priori is a limitation. • We found universal phone tokens: IPA symbols that are well-recognized cross-linguistically. • Unique sounds, similar sounds, and tone languages remain a challenge. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

164. Micro-Knowledge Embedding for Zero-shot Classification.

Author: Li, Houjun, Wang, Fang, Liu, Jingxian, Huang, Jianhua, Zhang, Ting, and Yang, Shuhong
Subjects: *IMAGE representation, *MACHINE learning, *CLASSIFICATION
Abstract: Zero-shot learning is one of the most challenging machine learning tasks, in which learning stable and transferable knowledge from seen classes plays a pivotal role. To improve the currently unsatisfactory performance of zero-shot object recognition, this paper proposes a novel image representation method, namely, micro-knowledge. In our method, the segmentation of micro-regions and the consequent learning of micro-knowledge are unified by the introduction of a self-attention mechanism. A zero-shot classification framework is carefully designed based on micro-knowledge of images. Under this framework, multiple micro-region descriptions are first obtained by embedding micro-knowledge and then merged to carry out the final classification of unseen objects. Finally, a capsule-unified framework is employed as a graphical programming tool to accomplish the aforementioned tasks. Experiments on public datasets show that the proposed framework can generally achieve competitive results for the classification of unseen objects. Specifically, these results verify that the micro-knowledge learned from one dataset can be directly applied to others without complicated adjustments and demonstrate that using visual features instead of semantic features can result in a decrease in classification error. This research will bring new ideas into the field of zero-shot learning and will serve as an appealing option when addressing the problem of domain shift. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

165. A Zero-Shot Sketch-based Inter-Modal Object Retrieval Scheme for Remote Sensing Images

Author: Mihai Datcu, Biplab Banerjee, Avik Bhattacharya, and Ushasi Chaudhuri
Subjects: FOS: Computer and information sciences, Structure (mathematical logic), remote sensing (RS), Computer science, Computer Vision and Pattern Recognition (cs.CV), Data domain, Computer Science - Computer Vision and Pattern Recognition, Geotechnical Engineering and Engineering Geology, Object (computer science), earth on canvas (EoC), Sketch, Data set, zero-shot, Data retrieval, Cross-modal retrieval, information retrieval, Electrical and Electronic Engineering, Unavailability, Representation (mathematics), database, sketches, Remote sensing
Abstract: Domain-agnostic data retrieval has lately become essential amidst the availability of large-scale data from different types of sensors. However, the unavailability of a sufficient amount of samples of certain classes during training curtails the utility of existing retrieval models in remote sensing (RS) applications. Here, we propose a novel framework for zero-shot intermodal data retrieval of RS data. Thereupon, we design an encoder-decoder structure that ensures enhanced overlapping among the two data domains utilizing cross-triplet and cross-projection loss functions. Furthermore, we propose a sketch-based representation of the RS database Earth on Canvas with diverse classes. We perform a thorough benchmarking of this data set and demonstrate that the proposed framework outperforms state-of-the-art methods for zero-shot sketch-based retrieval framework for RS data.
Published: 2020

166. Similar language translation

Author: Vergés Boncompte, Pere, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, and Ruiz Costa-Jussà, Marta
Subjects: Transformer, Backtranslation, Castellà, Fine Tuning, WMT20, Català, Portugués, Deep Learning, Informàtica [Àrees temàtiques de la UPC], Natural language processing (Computer science), Traducció entre Llenguatges de la mateixa Familia Lingüsitìca, Machine learning, Aprenentatge automàtic, Similar Language Translation, Tractament del llenguatge natural (Informàtica), Zero-Shot
Abstract: La traducció automàtica és la tasca de traduir automàticament unidioma a un altre. Aquest projecte avalua el rendiment dels últims siste-mes d'aprenentatge profund en la tasca de traducció d'idiomes similars.Avaluarem la traducció entre Català, Castella i Portuguès, que són llen-gües romàniques, per veure com l'arquitectura del Transformer realitzala tasca. També farem servir diverses tècniques per millorar la traduccióentre els idiomes. Primer, utilitzarem model multilingües que permetenfer transferència de coneixement entre idiomes i poder fer traduccionszero-shot. Després aplicarem backtranslation per poder fer ús dels textsmonolingües i millorar les traduccions del sistema. Per últim milloraremla traducció de domini específic fent ús de fine tuning. Machine translation is the task of automatically translating one lan-guage into another. This project aims to evaluate the performance ofstate-of-the-art Deep Learning systems on similar language translation.We will to evaluate the translation between Catalan, Spanish, and Por-tuguese, which are Romance languages, and see how the Transformer ar-chitecture performs in this task. We will additionally make use of differenttechniques to improve the translation between these languages. First ofall, we will make use of multilingual models that allow for transfer-learningas well as zero-shot translations. Secondly, we will apply the backtransla-tion technique to make use of the monolingual data and better the systemtranslations. Lastly, we will improve the specific domain data using finetuning.
Published: 2020

167. Autoencoders as Kolmogorov complexity based distance function in zero-shot learning : wherein pictures of seahorses improve bird classification

Author: Staudt, Dorian
Subjects: Aritificial Intelligence, Machine Learning, Kolmogorov Komplexit��t, Kolmogorov Complexity, K��nstliche Intelligenz, Autoencoder, Zero-Shot, Maschinelles Lernen
Abstract: Klassifikationsprobleme leiden oftmals unter einem Mangel an annotierten Trainingsdaten. Dies f��hrte zu der Entwicklung von Zero-shot Learning Modellen, welche mit Klassen trainiert werden f��r die ausreichend Trainingsdaten zur Verf��gung stehen, um dann unbekannte Klassen anhand von Beschreibungen zu erkennen. Oftmals sind diese Beschreibungen in der Form von Attributsvektoren, die allerdings ebenfalls selten zur Verf��gung stehen und aufw��ndig zu erstellen sind. Manche Ans��tze nutzen daher stattdessen Beschreibungen in nat��rlicher Sprache. In dieser Arbeit wird eine neue Methode zum Vergleich von Daten aus verschiedenen Dom��nen, die Autoencoder Distance (AD), vorgestellt und getestet in einer Zero-shot Anwendung mit Bilddaten und Beschreibungen in nat��rlicher Sprache. Die Distanzfunkion basiert auf der Normalised Compression Distance von Cilibrasi und Vit��nyi, ein Verfahren bei dem verlustfreie Komprimierungsalgorithmen genutzt werden um gemeinsame Muster zu erkennen, in dem die Gr��e von kombinierten Eingangsdaten nach Komprimierung gemessen werden. Die Messung wird normalisiert mit den Gr��en der Eingabedaten wenn sie unabh��ngig voneinander komprimiert werden. F��r die Methode die in dieser Arbeit vorgestellt wird ist statt eines verlustfreien Komprimierungsalgorithmus ein Autoencoder im Einsatz. Dieser wird zuerst darauf trainiert zusammengeh��rige Eingabedaten zu assoziieren, also Bilder und die Beschreibungen der Klassen denen sie angeh��ren. Die Distanz zwischen Eingabedaten wird dann approximiert indem die mittlere quadratische Abweichung zwischen der Beschreibung und der korrespondierenden Ausgabe berechnet wird. F��r die Normalisierung werden f��r alle Beschreibungen Durchschnitt und Standardabweichung dieser Abweichung f��r alle Bilder in einem festgelegten Set genutzt. Zur Klassifikationen eines Bildes werden alle Beschreibungen nach ihrem AD zu diesem Bild gereiht. Das Bild wird dann der Klasse die der erstgereihten Beschreibung entspricht zugeordnet. Evaluiert wird das Modell anhand einer Variation des Caltech-USCD Vogel-Datensets mit Klassenbeschreibungen von Reed et al. Des Weiteren werden Bildersets von diversen Tieren und Alltagsgenst��nden zur Normalisierung genutzt. Beim Klassifizieren mit 50 Beschreibungen die im Training nicht vorkamen konnte eine T1 Genauigkeit von 23,25% und eine T5 Genauigkeit von 57,14% erreicht werden, wobei Bilder von Seepferdchen zur Normalisierung genutzt wurden. Diese Werte sind geringer als Genauigkeiten die von anderen Werken auf den gleichen Daten erreicht werden konnte, aber durch die neuartige Methode werden viele bisher unerforschte Ans��tze f��r zuk��nftige Entwicklungen er��ffnet. Als ein Nebenziel wird zus��tzlich gezeigt, dass die Ausgabe des Autoencoders f��r Explainability genutzt werden kann., Many classification tasks suffer a lack of labelled data. This led to the development of zero-shot learning models, which are trained on classes with available data to recognise unknown classes from descriptions. Often this is done with descriptions in the form of attribute vectors, but those are again rarely available and expensive to produce. Some approaches therefore use descriptions in natural language instead. In this thesis a new method of comparing data from different domains, Autoencoder Distance (AD), is introduced and tested on a zero-shot application with image data and natural language descriptions. The distance function is based on the Normalised Compression Distance by Cilibrasi and Vit��nyi, a method that uses lossless compression algorithms to estimate shared patterns by measuring the size of combined inputs after compression, normalised by the compressed size of the inputs on their own. For the method introduced in this thesis an autoencoder is used instead of lossless compression. It is first trained to associate related inputs, i.e., images and the descriptions of their class. The distance between inputs is then approximated by calculating the mean squared error between the input description and its reconstruction. Normalisation for each description is done with the mean and standard deviation of this error over a shared set of images. For classification descriptions are ranked by their AD to a given image. The imaged is then placed in the class associated with the top ranked description. Evaluation is done on a variation of the Caltech-USCD bird dataset with descriptions provided by Reed et al. Further, image sets depicting various animals and commonplace items are used for normalisation. Classifying by ranking 50 descriptions not encountered in training, a T1 accuracy of 23.25% and a T5 accuracy of 57.14% could be achieved using pictures of sea horses for normalisation. This is lower than what was previously achieved on the same data, but the new method opens many novel avenues for future work. As a secondary objective it is also shown that the output of the autoencoder can be used for explainability.
Published: 2020
Full Text: View/download PDF

168. That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Author: Najim Dehak, Mark Hasegawa-Johnson, Piotr Zelasko, Odette Scharenborg, and Laureano Moro-Velázquez
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Speech recognition, computer.software_genre, Computer Science - Sound, Speech community, Zero-shot, Audio and Speech Processing (eess.AS), Transfer (computing), International Phonetic Alphabet, Multilingual, FOS: Electrical engineering, electronic engineering, information engineering, Set (psychology), Phone recognition, Crosslingual, Computer Science - Computation and Language, business.industry, Speech processing, Focus (linguistics), Transfer learning, Artificial intelligence, Transfer of learning, business, Computation and Language (cs.CL), computer, Natural language processing, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a multilingual automatic speech recognition (ASR) model, which, intuitively, should learn some universal phonetic representations. In this work, we focus on gaining a deeper understanding of how general these representations might be, and how individual phones are getting improved in a multilingual setting. To that end, we select a phonetically diverse set of languages, and perform a series of monolingual, multilingual and crosslingual (zero-shot) experiments. The ASR is trained to recognize the International Phonetic Alphabet (IPA) token sequences. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting, where the model, among other errors, considers Javanese as a tone language. Notably, as little as 10 hours of the target language training data tremendously reduces ASR error rates. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages - an encouraging result for the low-resource speech community., Comment: Submitted to Interspeech 2020. For some reason, the ArXiv Latex engine rendered it in more than 4 pages
Published: 2020

169. Zero-Shot Pipeline Detection for Sub-Bottom Profiler Data Based on Imaging Principles.

Author: Zheng, Gen, Zhao, Jianhu, Li, Shaobo, and Feng, Jie
Subjects: *OBJECT recognition (Computer vision), *UNDERWATER pipelines, *DEEP learning, *PIPELINES, *STANDARD deviations, *PIXELS, *SHOT peening
Abstract: With the increasing number of underwater pipeline investigation activities, the research on automatic pipeline detection is of great significance. At this stage, object detection algorithms based on Deep Learning (DL) are widely used due to their abilities to deal with various complex scenarios. However, DL algorithms require massive representative samples, which are difficult to obtain for pipeline detection with sub-bottom profiler (SBP) data. In this paper, a zero-shot pipeline detection method is proposed. First, an efficient sample synthesis method based on SBP imaging principles is proposed to generate samples. Then, the generated samples are used to train the YOLOv5s network and a pipeline detection strategy is developed to meet the real-time requirements. Finally, the trained model is tested with the measured data. In the experiment, the trained model achieved a mAP@0.5 of 0.962, and the mean deviation of the predicted pipeline position is 0.23 pixels with a standard deviation of 1.94 pixels in the horizontal direction and 0.34 pixels with a standard deviation of 2.69 pixels in the vertical direction. In addition, the object detection speed also met the real-time requirements. The above results show that the proposed method has the potential to completely replace the manual interpretation and has very high application value. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

170. Connectionist temporal classification loss for vector quantized variational autoencoder in zero-shot voice conversion.

Author: Kang, Xiao, Huang, Hao, Hu, Ying, and Huang, Zhihua
Subjects: *LINGUISTIC change, *CLASSIFICATION
Abstract: • CTC loss is used to guide the VQ-VAE to learn pure content representations. • Experiments show generated speech with better naturalness and similarity. • Thorough analysis provides useful insight into representation disentangling. Vector quantized variational autoencoder (VQ-VAE) has recently become an increasingly popular method in non-parallel zero-shot voice conversion (VC). The reason behind is that VQ-VAE is capable of disentangling the content and the speaker representations from the speech by using a content encoder and a speaker encoder, which is suitable for the VC task that makes the speech of a source speaker sound like the speech of the target speaker without changing the linguistic content. However, the converted speech is not satisfying because it is difficult to disentangle the pure content representations from the acoustic features due to the lack of linguistic supervision for the content encoder. To address this issue, under the framework of VQ-VAE, connectionist temporal classification (CTC) loss is proposed to guide the content encoder to learn pure content representations by using an auxiliary network. Based on the fact that the CTC loss is not affected by the sequence length of the output of the content encoder, adding the linguistic supervision to the content encoder can be much easier. This non-parallel many-to-many voice conversion model is named as CTC-VQ-VAE. VC experiments on the CMU ARCTIC and VCTK corpus are carried out to evaluate the proposed method. Both the objective and the subjective results show that the proposed approach significantly improves the speech quality and speaker similarity of the converted speech, compared with the traditional VQ-VAE method. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

171. Transforming task representations to perform novel tasks.

Author: Lampinen AK and McClelland JL
Subjects: Humans, Language, Learning, Visual Perception, Adaptation, Physiological, Artificial Intelligence, Cognition, Models, Neurological
Abstract: An important aspect of intelligence is the ability to adapt to a novel task without any direct experience (zero shot), based on its relationship to previous tasks. Humans can exhibit this cognitive flexibility. By contrast, models that achieve superhuman performance in specific tasks often fail to adapt to even slight task alterations. To address this, we propose a general computational framework for adapting to novel tasks based on their relationship to prior tasks. We begin by learning vector representations of tasks. To adapt to new tasks, we propose metamappings, higher-order tasks that transform basic task representations. We demonstrate the effectiveness of this framework across a wide variety of tasks and computational paradigms, ranging from regression to image classification and reinforcement learning. We compare to both human adaptability and language-based approaches to zero-shot learning. Across these domains, metamapping is successful, often achieving 80 to 90% performance, without any data, on a novel task, even when the new task directly contradicts prior experience. We further show that metamapping can not only generalize to new tasks via learned relationships, but can also generalize using novel relationships unseen during training. Finally, using metamapping as a starting point can dramatically accelerate later learning on a new task and reduce learning time and cumulative error substantially. Our results provide insight into a possible computational basis of intelligent adaptability and offer a possible framework for modeling cognitive flexibility and building more flexible artificial intelligence systems., Competing Interests: The authors declare no competing interest.
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

171 results on '"Zero-shot"'

151. Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques

152. Zero-Shot Pipeline Detection for Sub-Bottom Profiler Data Based on Imaging Principles

153. Analyse morpho-syntaxique massivement multilingue à l'aide de ressources typologiques, d'annotations universelles et de plongements de mots multilingues

154. Multilingual Zero-Shot and Few-Shot Causality Detection

155. Inductive Zero-Shot Image Annotation via Embedding Graph

156. Cross-modal prototype learning for zero-shot handwritten character recognition.

157. Zero-shot stance detection: Paradigms and challenges.

158. Multilingual Dependency Parsing of Uralic Languages : Parsing with zero-shot transfer and cross-lingual models using geographically proximate, genealogically related, and syntactically similar transfer languages

159. Cross-lingual Word Embeddings Beyond Zero-shot Machine Translation

160. That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

161. Similar language translation

162. Object Recognition with Zero-shot Learning

163. Discovering phonetic inventories with crosslingual automatic speech recognition.

164. Micro-Knowledge Embedding for Zero-shot Classification.

165. A Zero-Shot Sketch-based Inter-Modal Object Retrieval Scheme for Remote Sensing Images

166. Similar language translation

167. Autoencoders as Kolmogorov complexity based distance function in zero-shot learning : wherein pictures of seahorses improve bird classification

168. That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

169. Zero-Shot Pipeline Detection for Sub-Bottom Profiler Data Based on Imaging Principles.

170. Connectionist temporal classification loss for vector quantized variational autoencoder in zero-shot voice conversion.

171. Transforming task representations to perform novel tasks.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

171 results on '"Zero-shot"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources