42 results on '"Campbell, Nick"'
Search Results
2. The Metalogue Debate Trainee Corpus: Data Collection and Annotation
- Author
-
Petukhova, Volha, Malchanau, Andrei, Oualil, Youssef, Klakow, Dietrich, Luz, Saturnino, Haider, Fasih, Campbell, Nick, Koryzis, Dimitris, Spiliotopoulos, Dimitris, Albert, Pierre, Linz, Nicklas, Alexandersson, Jan, chair), Nicoletta Calzolari (Conference, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Hasida, Koiti, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Hélène, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios, and Tokunaga, Takenobu
- Abstract
This paper describes the Metalogue Debate Trainee Corpus (DTC). DTC has been collected and annotated in order to facilitate thedesign of instructional and interactive models for Virtual Debate Coach application - an intelligent tutoring system used by youngparliamentarians to train their debate skills. The training is concerned with the use of appropriate multimodal rhetorical devices in orderto improve (1) the organization of arguments, (2) arguments’ content selection, and (3) argument delivery techniques. DTC containstracking data from motion and speech capturing devices and semantic annotations - dialogue acts - as defined in ISO 24617-2 anddiscourse relations as defined in ISO 24617-8. The corpus comes with a manual describing the data collection process, annotationactivities including an overview of basic concepts and their definitions including annotation schemes and guidelines on how to applythem, tools and other resources. DTC will be released in the ELRA catalogue in second half of 2018.
- Published
- 2018
3. Redefining Concatenative Speech Synthesis for Use in Spontaneous Conversational Dialogues; A Study with the GBO Corpus
- Author
-
Campbell, Nick
- Abstract
In: Challenges in Analysis and Processing of Spontaneous Speech
- Published
- 2018
- Full Text
- View/download PDF
4. Incorporating Global Visual Features into Attention-Based Neural Machine Translation
- Author
-
Calixto, Iacer, Liu, Qun, and Campbell, Nick
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,I.2.7 ,Computation and Language (cs.CL) - Abstract
We introduce multi-modal, attention-based neural machine translation (NMT) models which incorporate visual features into different parts of both the encoder and the decoder. We utilise global image features extracted using a pre-trained convolutional neural network and incorporate them (i) as words in the source sentence, (ii) to initialise the encoder hidden state, and (iii) as additional data to initialise the decoder hidden state. In our experiments, we evaluate how these different strategies to incorporate global image features compare and which ones perform best. We also study the impact that adding synthetic multi-modal, multilingual data brings and find that the additional data have a positive impact on multi-modal models. We report new state-of-the-art results and our best models also significantly improve on a comparable phrase-based Statistical MT (PBSMT) model trained on the Multi30k data set according to all metrics evaluated. To the best of our knowledge, it is the first time a purely neural model significantly improves over a PBSMT model on all metrics evaluated on this data set., 8 pages (11 including references), 5 figures
- Published
- 2017
5. Things that Make Robots Go HMMM : Heterogeneous Multilevel Multimodal Mixing to Realise Fluent, Multiparty, Human-Robot Interaction
- Author
-
Davison, Daniel, Gorer, Binnur, Kolkmeier, Jan, Linssen, Johannes Maria, Schadenberg, Bob R., van de Vijver, Bob, Campbell, Nick, Dertien, Edwin, Reidsma, Dennis, and Truong, Khiet P.
- Abstract
Fluent, multi-party, human-robot interaction calls for the mixing of deliberate conversational behaviour and re- active, semi-autonomous behaviour. In this project, we worked on a novel, state-of-the-art setup for realising such interactions. We approach this challenge from two sides. On the one hand, a dialogue manager requests deliberative behaviour and setting parameters on ongoing (semi)autonomous behaviour. On the other hand, robot control software needs to translate and mix these deliberative and bottom-up behaviours into consistent and coherent motion. The two need to collaborate to create behaviour that is fluent, naturally varied, and well-integrated. The resulting challenge is that, at the same time, this behaviour needs to conform to both high level requirements and to content and timing that are set by the dialogue manager. We tackled this challenge by designing a framework which can mix these two types of behaviour, using AsapRealizer, a Behaviour Markup Language realiser. We call this Heterogeneous Multilevel Mul- timodal Mixing (HMMM). Our framework is showcased in a scenario which revolves around a robot receptionist which is able to interact with multiple users.
- Published
- 2017
6. CARAMILLA - Speech Mediated Language Learning Modules for Refugee and High School Learners of English and Irish
- Author
-
Gilmartin, Emer, Kim, Jaebok, Diallo, Alpha, Zhao, Yong, Chiarain, Neasa Ni, Su, Ketong, Huang, Yuyun, Cowan, Benjamin, Campbell, Nick, Engwall, O., Lopes, J., and Leite, I.
- Subjects
speech ,language learning - Published
- 2017
7. Multilingual Multi-modal Embeddings for Natural Language Processing
- Author
-
Calixto, Iacer, Liu, Qun, and Campbell, Nick
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,I.2.7 ,Computation and Language (cs.CL) - Abstract
We propose a novel discriminative model that learns embeddings from multilingual and multi-modal data, meaning that our model can take advantage of images and descriptions in multiple languages to improve embedding quality. To that end, we introduce a modification of a pairwise contrastive estimation optimisation function as our training objective. We evaluate our embeddings on an image-sentence ranking (ISR), a semantic textual similarity (STS), and a neural machine translation (NMT) task. We find that the additional multilingual signals lead to improvements on both the ISR and STS tasks, and the discriminative cost can also be used in re-ranking $n$-best lists produced by NMT models, yielding strong improvements., Comment: 4 pages (5 including references), no figures
- Published
- 2017
- Full Text
- View/download PDF
8. International workshop on multimodal analyses enabling artificial agents in human-machine interaction (workshop summary)
- Author
-
Böck, Ronald, Bonin, Francesca, Campbell, Nick, Poppe, R.W., Sub Multimedia, and Multimedia
- Subjects
0209 industrial biotechnology ,Scope (project management) ,Computer science ,Technical systems ,02 engineering and technology ,Human-centered computing ,Conjunction (grammar) ,Multimodality ,020901 industrial engineering & automation ,Human–computer interaction ,Human machine interaction ,Taverne ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing - Abstract
In this paper a brief overview of the third workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction. The paper is focussing on the main aspects intended to be discussed in the workshop reflecting the main scope of the papers presented during the meeting. The MA3HMI 2016 workshop is held in conjunction with the 18th ACM International Conference on Mulitmodal Interaction (ICMI 2016) taking place in Tokyo, Japan, in November 2016. This year, we have solicited papers concerning the different phases of the development of multimodal systems. Tools and systems that address real-time conversations with artificial agents and technical systems are also within the scope.
- Published
- 2016
9. Cross-language Voice Conversion Evaluation Using Bilingual Databases
- Author
-
Mashimo, Mikiko, Toda, Tomoki, Kawanami, Hiromichi, Shikano, Kiyohiro, and Campbell, Nick
- Abstract
This paper describes experiments that test an extension of techniques for converting the voice of one speaker to sound like that of another speaker, to include cross-language utterances, such as would be required for spoken language translation or language training applications. In particular, it addresses the issue of evaluation of system performance, and compares objective tests using a perceptually-motivated acoustic measure, with perceptual tests of voice quality and speaker resemblance. The proposed method uses Japanese and English speech databases from 2 female and 2 male bilingual speakers for training in a system based on a Gaussian mixture model (GMM) and a high quality vocoder. Results indicate that training with cross-language models also produces close acoustic matches between source and target speakers' voices. Perceptual tests revealed little significant difference in the performance of mapping functions trained on single-language and cross-language data pairs.
- Published
- 2002
10. MILLA Multimodal Interactive Language Learning Agent
- Author
-
Paulo Cabral, Joao, Campbell, Nick, Ganesh, Shree, Gilmartin, Emer, Haider, Fasih, Kenny, Eammon, Kheirkhah, Mina, Murphy, Andrew, Ni Chiarain, Neasa, Pellegrini, Thomas, Rey Orozko, Odei, Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE), Universidad del País Vasco - Euskal Herriko Unibertsitatea - EHU (SPAIN), Georg-August-Universität Göttingen (GERMANY), The Institute for Advanced Studies in Basic Sciences - IASBS (IRAN), Trinity College Dublin - TCD (IRELAND), and Institut de Recherche en Informatique de Toulouse - IRIT (Toulouse, France)
- Subjects
Milla ,Traitement des images ,CEFR ,Traitement du signal et de l'image ,Vision par ordinateur et reconnaissance de formes ,Language learning agent ,Intelligence artificielle ,Synthèse d'image et réalité virtuelle - Abstract
Learning a new language involves the acquisition and integration of a range of skills. A human tu�tor aids learners by (i) providing tasks suitable to the learner’s needs, (ii) monitoring progress and adapting task content and delivery style, and (iii) providing a source of speaking practice and motivation.
- Published
- 2014
11. Proceedings of the 7th International Conference on Speech Prosody
- Author
-
Campbell, Nick, Gibbon, Dafydd, Hirst, Daniel, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Laboratoire Parole et Langage (LPL), and Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,[SCCO.LING]Cognitive science/Linguistics - Abstract
no abstract
- Published
- 2014
12. Long-term convergence of speech rhythm in L1 and L2 English
- Author
-
Quené, H, Orr, Rosemary, Campbell, Nick, Gibbon, Dafydd, Hirst, Daniel, LS Psycholinguistiek, extern UU GWS, and UiL OTS Psycholinguistics
- Subjects
accommodation ,speech rhythm ,L2 ,phonetic convergence - Abstract
When talkers from various language backgrounds use L2 English as a lingua franca, their accents of English are expected to converge, and talkers’ rhythmical patterns are predicted to converge too. Prosodic convergence was studied among talkers who lived in a community where L2 English is used predominantly. Speech rhythm was operationalized here as the peak frequency in the spectrum of the intensity envelope, normalized to the speaking rate (in syll/s). Results indicate that talkers produced intensity contours with maximum periodicity at frequencies of about 0.32 times their syllable rates, i.e., peaks in intensity tend to occur every 1/0.32 syllables. These results were collected repeatedly, from 5 recordings conducted over 3 years with the same talkers. We found that variance between talkers in their rhythm decreases over time, thus confirming the predicted convergence in speech rhythm in L2 English. These findings show that speech rhythm in L2 English tends to converge, and that this prosodic convergence continues to proceed over several years, as well as over communicative settings.
- Published
- 2014
13. Long-term convergence of speech rhythm in L1 and L2 English
- Author
-
Quené, H, Orr, Rosemary, Campbell, Nick, Gibbon, Dafydd, Hirst, Daniel, LS Psycholinguistiek, extern UU GWS, and UiL OTS Psycholinguistics
- Subjects
accommodation ,speech rhythm ,L2 ,phonetic convergence - Abstract
When talkers from various language backgrounds use L2 English as a lingua franca, their accents of English are expected to converge, and talkers’ rhythmical patterns are predicted to converge too. Prosodic convergence was studied among talkers who lived in a community where L2 English is used predominantly. Speech rhythm was operationalized here as the peak frequency in the spectrum of the intensity envelope, normalized to the speaking rate (in syll/s). Results indicate that talkers produced intensity contours with maximum periodicity at frequencies of about 0.32 times their syllable rates, i.e., peaks in intensity tend to occur every 1/0.32 syllables. These results were collected repeatedly, from 5 recordings conducted over 3 years with the same talkers. We found that variance between talkers in their rhythm decreases over time, thus confirming the predicted convergence in speech rhythm in L2 English. These findings show that speech rhythm in L2 English tends to converge, and that this prosodic convergence continues to proceed over several years, as well as over communicative settings.
- Published
- 2014
14. MILLA-Multimodal Interactive Language Learning Agent
- Author
-
Cabral, Joao P, Campbell, Nick, Ganesh, Shree, Gilmartin, Emer, Haider, Fasih, Kenny, Eamonn, Kheirkhah, Mina, Murphy, Andrew, Chiaráin, Neasa Ni, Pellegrini, Thomas, Orozko, Odei Rey, Erro, Daniel, and Hernáez , Inma
- Subjects
ComputingMilieux_COMPUTERSANDEDUCATION - Abstract
The goal of this project was to create a multimodal dialogue system which provides some of the advantages of a human tutor, not normally encountered in self-study material and systems. A human tutor aids learners by:•Providing a framework of tasks suitable to the learner’s needs•Continuously monitoring learner progress and adapting task content and delivery style•Providing a source of speaking practice and motivation MILLA is a prototype language tuition system comprising tuition management, learner state monitoring, and an adaptable curriculum, all mediated through speech. The system enrols and monitors learners via a spoken dialogue interface, provides pronunciation practice and automatic error correction in two modalities, grammar exercises, and two custom speech-to-speech chat bots for spoken interaction practice. The focus on speech in the tutor’s output and in the learning modules addresses the current deficit in spoken interaction practice in Computer Aided Language Learning (CALL) applications, with different text-to-speech (TTS) voices used to provide a variety of speech models across the different modules. The system monitors learner engagement using Kinect sensors and checks pronunciation and responds to dialogue using automatic speech recognition (ASR).A learner record is used in conjunction with the curriculum to provide activities relevant to the learner’s current abilities and first language, and to monitor and record progress.
- Published
- 2014
15. Proceedings of the 7th International Conference on Speech Prosody
- Author
-
Campbell, Nick, Gibbon, Dafydd, Hirst, Daniel, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Laboratoire Parole et Langage (LPL), and Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,[SCCO.LING]Cognitive science/Linguistics - Abstract
no abstract
- Published
- 2014
16. Proceedings of the 7th International Conference on Speech Prosody
- Author
-
Campbell, Nick, Gibbon, Dafydd, Hirst, Daniel, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Laboratoire Parole et Langage (LPL), and Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,[SCCO.LING]Cognitive science/Linguistics - Abstract
no abstract
- Published
- 2014
17. Disposition Recognition from Spontaneous Speech Towards a Combination with Co-Speech Gestures
- Author
-
Boeck, Ronald, Bergmann, Kirsten, Jaecks, Petra, Böck, Ronald, Bonin, Francesca, Campbell, Nick, and Poppe, Ronald
- Published
- 2014
18. Proceedings of the 7th International Conference on Speech Prosody
- Author
-
Campbell, Nick, Gibbon, Dafydd, Hirst, Daniel, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Laboratoire Parole et Langage (LPL), and Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,[SCCO.LING]Cognitive science/Linguistics - Abstract
no abstract
- Published
- 2014
19. Co-speech gesture generation for embodied agents and its effects on user evaluation
- Author
-
Bergmann, Kirsten, Campbell, Nick, and Rojc, Matej
- Published
- 2013
20. Toward a Model for Incremental Grounding in Spoken Dialogue Systems
- Author
-
Visser, Thomas, Traum, David, DeVault, David, op den Akker, Rieks, Böck, Ronald, Bonin, Francesca, Campbell, Nick, Edlund, Jens, de Kok, Iwan, and Poppe, Ronald
- Subjects
HMI-HF: Human Factors ,Spoken dialogue systems ,Incremental language processing ,Grounding - Abstract
Recent advances in incremental language processing for dialogue systems promise to enable more natural conversation between humans and computers. By analyzing the user's utterance while it is still in progress, systems can provide more human-like overlapping and backchannel responses to convey their level of understanding and respond more quickly. In this paper, we look at examples of several overlapping response types in human-human dialogues, and present an initial computational model of the incremental grounding process in these responses. Additionally, we describe an implementation of this model in a virtual human dialogue system that can provide backchannels, head nods, frowns, completions and low latency responses.
- Published
- 2012
21. Joint Proceedings of the Intelligent Virtual Agents 2012 Workshops: Santa Cruz, CA, September 15, 2012
- Author
-
Böck, Ronald, Bonin, Francesca, Campbell, Nick, Edlund, Jens, de Kok, Iwan, Poppe, Ronald, and Traum, David
- Subjects
EWI-22554 ,METIS-293205 ,HMI-IA: Intelligent Agents ,HMI-MI: MULTIMODAL INTERACTIONS ,IR-83515 - Published
- 2012
22. Online backchannel synthesis evaluation with the switching Wizard of Oz
- Author
-
Poppe, Ronald, ter Maat, Mark, Heylen, Dirk, Böck, Ronald, Bonin, Francesca, Campbell, Nick, Edlund, Jens, de Kok, Iwan, and Traum, David
- Subjects
EWI-22553 ,HMI-CI: Computational Intelligence ,METIS-293204 ,HMI-MI: MULTIMODAL INTERACTIONS ,IR-83427 - Abstract
In this paper, we evaluate a backchannel synthesis algorithm in an online conversation between a human speaker and a virtual listener. We adopt the Switching Wizard of Oz (SWOZ) approach to assess behavior synthesis algorithms online. A human speaker watches a virtual listener that is either controlled by a human listener or by an algorithm. The source switches at random intervals. Speakers indicate when they feel they are no longer talking to a human listener. Analysis of these responses reveals patterns of inappropriate behavior in terms of quantity and timing of backchannels.
- Published
- 2012
23. Speech & Multimodal Resources: the Herme Database of Spontaneous Multimodal Human-Robot Dialogues
- Author
-
Hang, Jing Guang, Gilmartin, Emer, De Looze, Celine, Vaughan, Brian, and Campbell, Nick
- Subjects
Speech communication ,Computer Sciences ,Multimodal robot platform ,Social interaction ,Speech corpus - Abstract
This paper presents methodologies and tools for language resource (LR) construction. It describes a database of interactive speech collected over a three-month period at the Science Gallery in Dublin, where visitors could take part in a conversation with a robot. The system collected samples of informal, chatty dialogue – normally difficult to capture under laboratory conditions for human-human dialogue, and particularly so for human-machine interaction. The conversations were based on a script followed by the robot consisting largely of social chat with some task-based elements. The interactions were audio-visually recorded using several cameras together with microphones. As part of the conversation the participants were asked to sign a consent form giving permission to use their data for human-machine interaction research. The multimodal corpus will be made available to interested researchers and the technology developed during the three-month exhibition is being extended for use in education and assisted-living applications.
- Published
- 2012
24. Designing and Implementing a Platform for Collecting Multi-Modal Data of Human-Robot Interaction
- Author
-
Vaughan, Brian, Han, Jing Guang, Kilmartin, Emer, and Campbell, Nick
- Subjects
human-robot interaction ,multi-modal data collection ,audio interface ,platform- robot ,WOZ ,face detection ,Computer Sciences - Abstract
This paper details a method of collecting video and audio recordings of people inter- acting with a simple robot interlocutor. The interaction is recorded via a number of cameras and microphones mounted on and around the robot. The system utilised a number of technologies to engage with interlocutors including OpenCV, Python, and Max MSP. Interactions over a three month period were collected at The Science Gallery in Trinity College Dublin. Visitors to the gallery freely engaged with the robot, with interactions on their behalf being spontaneous and non-scripted. The robot dialogue was a set pattern of utterances to engage interlocutors in a simple conversation. A large number of audio and video recordings were collected over a three month period.
- Published
- 2012
25. Text, Speech and Language Technology vol. 14/15 - Rhythm, Melody and Harmony in Speech. Studies in Honour of Wiktor Jassem
- Author
-
Gibbon, Dafydd, Hirst, Daniel, Campbell, Nick, Fakultät für Linguistik und Literaturwissenschaft, Universität Bielefeld, Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), and PRJT-000145
- Subjects
harmony ,[SHS.INFO]Humanities and Social Sciences/Library and information sciences ,meldody ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,rhythm - Abstract
This collection of studies on phonetics and phonology is cordially dedicated to Professor Wiktor Jassem by his colleagues and friends on the occasion of his 90th birthday, 11th June 2012, in appreciation of his influential and pioneering contributions to the field.
- Published
- 2012
26. Context Cues For Classification Of Competitive And Collaborative Overlaps
- Author
-
Oertel, Catharine, Wlodarczak, Marcin, Tarasov, Alexey, Campbell, Nick, and Wagner, Petra
- Subjects
competitive overlaps ,overlapping speech ,Linguistics ,dialogue prosody ,cooperative overlaps - Abstract
Being able to respond appropriately to users’ overlaps should be seen as one of the core competencies of incremental dialogue systems. At the same time identifying whether an interlocutor wants to support or grab the turn is a task which comes naturally to humans, but has not yet been implemented in such systems. Motivated by this we first investigate whether prosodic characteristics of speech in the vicinity of overlaps are significantly different from prosodic characteristics in the vicinity of non-overlapping speech. We then test the suitability of different context sizes, both preceding and following but excluding features of the overlap, for the automatic classification of collaborative and competitive overlaps. We also test whether the fusion of preceding and succeeding contexts improves the classification. Preliminary results indicate that the optimal context for classification of overlap lies at 0.2 seconds preceding the overlap and up to 0.3 seconds following it. We demonstrate that we are able to classify collaborative and competitive overlap with a median accuracy of 63%.
- Published
- 2012
27. Measuring dynamics of mimicry by means of prosodic cues in conversational speech
- Author
-
De Looze, Céline, Oertel, Catharine, Rauzy, Stéphane, Campbell, Nick, Speech Communication Laboratory, Laboratoire Parole et Langage (LPL), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), and RAUZY, Stéphane
- Subjects
prosody ,modeling ,social interaction ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,[SHS.LANGUE] Humanities and Social Sciences/Linguistics ,mimicry - Abstract
International audience; This study presents a method for measuring the dynamics of mimicry in conversational speech by means of prosodic cues. It shows that the more speakers are involved in a conversation, the more they intend to mimic each other's speech prosody. It supports that mimicry in speech is part of social interaction and that it may be implemented into spoken dialogue systems in order to improve their efficiency.
- Published
- 2011
28. Towards the Automatic Detection of Involvement in Conversation
- Author
-
Oertel, Catharine, de Looze, Celine, Scherer, Stefan, Windmann, Andreas, Wagner, Petra, Campbell, Nick, Esposito, Anna, Vinciarelli, Alessandro, Vicsi, Klára, Pelachaud, Catherine, and Nijholt, Anton
- Subjects
Communication ,Movement (music) ,business.industry ,media_common.quotation_subject ,Face (sociological concept) ,Social engagement ,Human interaction ,Phenomenon ,Conversation ,Psychology ,Social information ,business ,Cognitive psychology ,media_common - Abstract
Although an increasing amount of research has been carried out into human-machine interaction in the last century, even today we are not able to fully understand the dynamic changes in human interaction. Only when we achieve this, will we be able to go beyond a one to one mapping between text and speech and be able to add social information to speech technologies. Social information is expressed to a high degree through prosodic cues and movement of the body and the face. The aim of this paper is to use those cues to make one aspect of social information more tangible; namely participants' degree of involvement in a conversation. Our results on the level and span of the voice as well as intensity, and our preliminary results on the movement of the body and face suggest that these cues are reliable cues for the detection of distinct levels of participants' involvement in conversation. This will allow for the development of a statistical model which is able to classify these stages of involvement. Our data indicate that involvement may be a scalar phenomenon.
- Published
- 2011
- Full Text
- View/download PDF
29. Preface - Development of Multimodal Interfaces
- Author
-
Esposito, Anna, Campbell, Nick, Vogel, Carl, Hussain, Amir, Nijholt, Anton, and Human Media Interaction
- Published
- 2010
30. Development of Multimodal Interfaces: Active Listening and Synchrony
- Author
-
Esposito, Anna, Campbell, Nick, Vogel, Carl, Hussain, Amir, and Nijholt, Antinus
- Subjects
Synchrony ,active listening ,METIS-270732 ,Multi-modal interaction ,IR-70724 ,HMI-MI: MULTIMODAL INTERACTIONS ,EWI-17439 ,cross-modality - Abstract
This volume brings together, through a peer-revision process, the advanced research results obtained by the European COST Action 2102: Cross-Modal Analysis of Verbal and Nonverbal Communication, primarily discussed for the first time at the Second COST 2102 International Training School on “Development of Multimodal Interfaces: Active Listening and Synchrony��? held in Dublin, Ireland, March 23–27 2009. The school was sponsored by COST (European Cooperation in the Field of Scientific and Technical Research, www.cost.esf.org ) in the domain of Information and Communication Technologies (ICT) for disseminating the advances of the research activities developed within the COST Action 2102: “Cross-Modal Analysis of Verbal and Nonverbal Communication��? (cost2102.cs.stir.ac.uk) COST Action 2102 in its third year of life brought together about 60 European and 6 overseas scientific laboratories whose aim is to develop interactive dialogue systems and intelligent virtual avatars graphically embodied in a 2D and/or 3D interactive virtual world, capable of interacting intelligently with the environment, other avatars, and particularly with human users. The main focus of the school was the development of multimodal interfaces. Traditional approaches to multimodal interface design tend to assume a “ping-pong��? or “push-to-talk��? approach to speech interaction wherein either the system or the human interlocutor is active at any one time. This is contrary to many recent findings in conversation and discourse analysis, where the definition of a “turn��? or even an “utterance��? is found to be very complex. People don’t “take turns��? to talk in a typical conversational interaction, but they each contribute actively to the joint emergence of a “common understanding.��? The sub-theme of the school was “Synchrony and Active Listening��? selected with the idea to identify contributions that actively give support to the ongoing research into the dynamics of human spoken interaction, to the production of multimodal conversation data and to the subsequent analysis and modelling of interaction dynamics, with the dual goal of appropriately designing multimodal interfaces, as well as providing new approaches and developmental paradigms.
- Published
- 2010
31. Preface (to: Development of Multimodal Interfaces: Active Listening and Synchrony)
- Author
-
Esposito, Anna, Campbell, Nick, Vogel, Carl, Hussain, Amir, Esposito, A., Campbell, N., Vogel, C., Hussain, A., and Nijholt, Antinus
- Subjects
Multimodal interfaces ,Synchrony ,active listening ,EWI-17440 ,METIS-276704 ,HMI-MI: MULTIMODAL INTERACTIONS ,IR-70725 ,cross-modality - Abstract
This volume brings together, through a peer-revision process, the advanced research results obtained by the European COST Action 2102: Cross-Modal Analysis of Verbal and Nonverbal Communication, primarily discussed for the first time at the Second COST 2102 International Training School on “Development of Multimodal Interfaces: Active Listening and Synchrony�? held in Dublin, Ireland, March 23–27 2009. The school was sponsored by COST (European Cooperation in the Field of Scientific and Technical Research, www.cost.esf.org ) in the domain of Information and Communication Technologies (ICT) for disseminating the advances of the research activities developed within the COST Action 2102: “Cross-Modal Analysis of Verbal and Nonverbal Communication�? (cost2102.cs.stir.ac.uk).
- Published
- 2010
32. Technology for Processing Non-verbal Information in Speech
- Author
-
Campbell, Nick
- Abstract
Proceedings of the NODALIDA 2009 workshop Multimodal Communication — from Human Behaviour to Computational Models. Editors: Costanza Navarretta, Patrizia Paggio, Jens Allwood, Elisabeth Alsén and Yasuhiro Katagiri. NEALT Proceedings Series, Vol. 6 (2009), 1-2. © 2009 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/9208 .
- Published
- 2009
33. Predicting the intonation of discourse segments from examples in dialogue speech
- Author
-
Black, Alan W and Campbell, Nick
- Abstract
In the area of speech synthesis it is already possible to generate understandable speech with citation form prosody for simple written texts. However at ATR we are researching into speech synthesis techniques for use in speech translation environments. Dialogues in such conversations involve much richer forms of prosodic variation than are required for the reading of texts. In order for our translations to sound natural it is necessary for our synthesis system to offer a wide range of porosodic variability, which can be described at an appropriate level of abstraction.
- Published
- 1995
34. Duration, Pitch and Diphones in the CSTR TTS System
- Author
-
Campbell, Nick, Isard, Stephen, Monaghan, Alex, and Verhoeven, J.
- Abstract
This paper describes the prosodic processing and wave-form generation components of the text-to-speech system being developed at Edinburgh University's Centre for Speech Technology Research. Intonation is specified as a sequence of minimal descriptors whose locations are given in terms of syntactically-determined prosodic domains. A pitch contour is computed by converting the descriptors into a sequence of abstract targets whose absolute values depend on a specific speaker model. Duration is determined first at the level of the syllable by a neural network, then accommodated at the segment level according to the distributions observed in a phonetically balanced database. The output waveform is generated by LPC resynthesis of diphone units. Three methods of diphone segmentation are discussed.
- Published
- 1990
35. Acoustic-phonetic realisation of Polish syllable prominence: a corpus study
- Author
-
Malisz, Zofia, Wagner, Petra, Gibbon, Dafydd, Hirst, Daniel, and Campbell, Nick
36. Multifaceted engagement in social interaction with a machine: the JOKER project
- Author
-
Lucile Bechade, Yannick Estève, Laurence Devillers, Yücel Yemez, Engin Erzin, Kevin El Haddad, Guillaume Dubuisson Duplessis, Carole Lailler, Bekir Berker Turker, Emer Gilmartin, Stéphane Dupont, Sophie Rosset, Nick Campbell, Paul Deléglise, Metin Sezgin, Erzin, Engin (ORCID 0000-0002-2715-2368 & YÖK ID 34503), Yemez, Yücel (ORCID & YÖK ID 107907), Türker, Bekir B., Sezgin, Tevfik Metin (ORCID 0000-0002-1524-1646 & YÖK ID 18632), Devillers, Laurence, Rosset, Sophie, Duplessis, Guillaume Dubuisson, Bechade, Lucile, El Haddad, Kevin, Dupont, Stephane, Deleglise, Paul, Esteve, Yannick, Lailler, Carole, Gilmartin, Emer, Campbell, Nick, College of Engineering, Graduate School of Sciences and Engineering, and Department of Computer Engineering
- Subjects
060201 languages & linguistics ,Scheme (programming language) ,Data collection ,Computer science ,business.industry ,06 humanities and the arts ,Paralanguage ,Human-Robot interaction ,Dataset ,Engagement ,Speech recognition ,Affective computing ,Social relation ,Human–robot interaction ,Nonverbal communication ,Human–computer interaction ,0602 languages and literature ,Artificial intelligence ,Computer science, artificial intelligence ,Engineering, electrical and electronic ,business ,computer ,Joker ,computer.programming_language - Abstract
This paper addresses the problem of evaluating engagement of the human participant by combining verbal and nonverbal behaviour along with contextual information. This study will be carried out through four different corpora. Four different systems designed to explore essential and complementary aspects of the JOKER system in terms of paralinguistic/linguistic inputs were used for the data collection. An annotation scheme dedicated to the labeling of verbal and non-verbal behavior have been designed. From our experiment, engagement in HRI should be multifaceted., Scientific and Technological Research Council of Turkey (TÜBİTAK); European Union (European Union); ERA-Net CHISTERA; Agence Nationale pour la Recherche (ANR, France); Fonds National de la Recherche Scientifique (FNRS, Belgium); Irish Research Council (IRC, Ireland)
- Published
- 2018
37. Utilization of Multimodal Interaction Signals for Automatic Summarisation of Academic Presentations
- Author
-
Curtis, Keith, Jones, Gareth J.F., and Campbell, Nick
- Subjects
Interactive computer systems ,Image processing ,Digital video ,Information retrieval ,Multimedia systems ,Video Summarisation, Feature Classification, Evaluation, Eye Tracking - Abstract
Multimedia archives are expanding rapidly. For these, there exists a shortage of retrieval and summarisation techniques for accessing and browsing content where the main information exists in the audio stream. This thesis describes an investigation into the development of novel feature extraction and summarisation techniques for audio-visual recordings of academic presentations. We report on the development of a multimodal dataset of academic presentations. This dataset is labelled by human annotators to the concepts of presentation ratings, audience engagement levels, speaker emphasis, and audience comprehension. We investigate the automatic classification of speaker ratings and audience engagement by extracting audio-visual features from video of the presenter and audience and training classifiers to predict speaker ratings and engagement levels. Following this, we investigate automatic identi�cation of areas of emphasised speech. By analysing all human annotated areas of emphasised speech, minimum speech pitch and gesticulation are identified as indicating emphasised speech when occurring together. Investigations are conducted into the speaker's potential to be comprehended by the audience. Following crowdsourced annotation of comprehension levels during academic presentations, a set of audio-visual features considered most likely to affect comprehension levels are extracted. Classifiers are trained on these features and comprehension levels could be predicted over a 7-class scale to an accuracy of 49%, and over a binary distribution to an accuracy of 85%. Presentation summaries are built by segmenting speech transcripts into phrases, and using keywords extracted from the transcripts in conjunction with extracted paralinguistic features. Highest ranking segments are then extracted to build presentation summaries. Summaries are evaluated by performing eye-tracking experiments as participants watch presentation videos. Participants were found to be consistently more engaged for presentation summaries than for full presentations. Summaries were also found to contain a higher concentration of new information than full presentations.
- Published
- 2018
- Full Text
- View/download PDF
38. Proceedings of the Interdisciplinary Workshop on The Phonetics of Laughter : Saarland University, Saarbrücken, Germany, 4-5 August 2007
- Author
-
Trouvain, Jürgen and Campbell, Nick
- Subjects
ddc:400 ,ddc:620 - Published
- 2008
- Full Text
- View/download PDF
39. Imitating conversational laughter with an articulatory speech synthesizer
- Author
-
Lasarcyk, Eva, Trouvain, Jürgen, Trouvain, Jürgen, and Campbell, Nick
- Subjects
ddc:400 ,ddc:620 - Abstract
In this study we present initial efforts to model laughter with an articulatory speech synthesizer. We aimed at imitating a real laugh taken from a spontaneous speech database and created several synthetic versions of it using articulatory synthesis and diphone synthesis. In modeling laughter with articulatory synthesis, we also approximated features like breathing noises that do not normally occur in speech. Evaluation with respect to the perceived degree of naturalness indicated that the laugh stimuli would pass as "laughs'; in an appropriate conversational context. In isolation, though, significant differences could be measured with regard to the degree of variation (durational patterning, fundamental frequency, intensity) within each laugh.
- Published
- 2007
- Full Text
- View/download PDF
40. Perception of smiled French speech by native vs. non-native listeners : a pilot study
- Author
-
Émond, Caroline, Trouvain, Jürgen, Ménard, Lucie, Trouvain, Jürgen, and Campbell, Nick
- Subjects
ddc:400 ,ddc:620 - Abstract
Smiling is a visible expression and it has been shown that it is audible too. The aim of this paper is to investigate the perception of the prosody of smile in 2 different languages. In order to elicitate smiled speech, 6 speakers of Québec French were required to read sentences displayed with or without caricatures. The sentences produced were used as stimuli for a perception test administered to 10 listeners of Québec French and 10 listeners of German who had no knowledge of French. Results suggest that some prosodic cues are universals and others are culture specific.
- Published
- 2007
- Full Text
- View/download PDF
41. Temporal structures for Fast and Slow Speech Rate
- Author
-
Zellner, Brigitte and Campbell, Nick
- Subjects
Speech - Abstract
The rhythmic component in speech synthesis often remains rather rudimentary, despite recent major efforts in the modeling of prosodic models. The European COST Action 258 has identified this problem as one of the next challenges for speech synthesis. This paper is a contribution to a new, promising approach that was tested on a French temporal model.
- Published
- 1998
42. A Study of Human Perception of Intonation in Domestic Cat Meows
- Author
-
Joost van de Weijer, Susanne Schötz, Heldner, Mattias, Campbell, Nick, Gibbon, Dafydd, and Hirst, Daniel
- Subjects
General Language Studies and Linguistics ,medicine.medical_specialty ,Speech recognition ,Perception ,media_common.quotation_subject ,Intonation (linguistics) ,medicine ,Hazard perception ,Context (language use) ,Vocal interaction ,Audiology ,Psychology ,media_common - Abstract
This study examined human listeners’ ability to classify domestic cat vocalisations (meows) recorded in two different contexts; during feeding time (food related meows) and while waiting to visit a veterinarian (vet related meows). A pitch analysis showed a tendency for food related meows to have rising F0 contours, while vet related meows tended to have more falling F0 contours. 30 listeners judged twelve meows (six of each context) in a perception test. Classification accuracy was significantly above chance, and listeners who had reported previous experience with cats performed significantly better than inexperienced listeners. Moreover, the two food related meows with the highest classification accuracy showed clear rising F0 contours, while clear falling F0 contours characterised the two vet related meows that received the highest classification accuracy. Listeners also reported that some meows were very easy to classify, while others were more difficult. Taken together, these results suggest that cats may use different intonation patterns in their vocal interaction with humans, and that humans are able to identify the vocalisations based on intonation. (Less)
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.