31 results on '"Livescu, Karen"'
Search Results
2. Articulatory feature-based pronunciation modeling
- Author
-
Livescu, Karen, Jyothi, Preethi, and Fosler-Lussier, Eric
- Published
- 2016
- Full Text
- View/download PDF
3. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- Author
-
Srivastava, Aarohi, Rastogi, Abhinav, Rao, Abhishek, Shoeb, Abu Awal Md, Abid, Abubakar, Fisch, Adam, Brown, Adam R., Santoro, Adam, Gupta, Aditya, Garriga-Alonso, Adrià, Kluska, Agnieszka, Lewkowycz, Aitor, Agarwal, Akshat, Power, Alethea, Ray, Alex, Warstadt, Alex, Kocurek, Alexander W., Safaya, Ali, Tazarv, Ali, Xiang, Alice, Parrish, Alicia, Nie, Allen, Hussain, Aman, Askell, Amanda, Dsouza, Amanda, Slone, Ambrose, Rahane, Ameet, Iyer, Anantharaman S., Andreassen, Anders, Madotto, Andrea, Santilli, Andrea, Stuhlmüller, Andreas, Dai, Andrew, La, Andrew, Lampinen, Andrew, Zou, Andy, Jiang, Angela, Chen, Angelica, Vuong, Anh, Gupta, Animesh, Gottardi, Anna, Norelli, Antonio, Venkatesh, Anu, Gholamidavoodi, Arash, Tabassum, Arfa, Menezes, Arul, Kirubarajan, Arun, Mullokandov, Asher, Sabharwal, Ashish, Herrick, Austin, Efrat, Avia, Erdem, Aykut, Karakaş, Ayla, Roberts, B. Ryan, Loe, Bao Sheng, Zoph, Barret, Bojanowski, Bartłomiej, Özyurt, Batuhan, Hedayatnia, Behnam, Neyshabur, Behnam, Inden, Benjamin, Stein, Benno, Ekmekci, Berk, Lin, Bill Yuchen, Howald, Blake, Orinion, Bryan, Diao, Cameron, Dour, Cameron, Stinson, Catherine, Argueta, Cedrick, Ramírez, César Ferri, Singh, Chandan, Rathkopf, Charles, Meng, Chenlin, Baral, Chitta, Wu, Chiyu, Callison-Burch, Chris, Waites, Chris, Voigt, Christian, Manning, Christopher D., Potts, Christopher, Ramirez, Cindy, Rivera, Clara E., Siro, Clemencia, Raffel, Colin, Ashcraft, Courtney, Garbacea, Cristina, Sileo, Damien, Garrette, Dan, Hendrycks, Dan, Kilman, Dan, Roth, Dan, Freeman, Daniel, Khashabi, Daniel, Levy, Daniel, González, Daniel Moseguí, Perszyk, Danielle, Hernandez, Danny, Chen, Danqi, Ippolito, Daphne, Gilboa, Dar, Dohan, David, Drakard, David, Jurgens, David, Datta, Debajyoti, Ganguli, Deep, Emelin, Denis, Kleyko, Denis, Yuret, Deniz, Chen, Derek, Tam, Derek, Hupkes, Dieuwke, Misra, Diganta, Buzan, Dilyar, Mollo, Dimitri Coelho, Yang, Diyi, Lee, Dong-Ho, Schrader, Dylan, Shutova, Ekaterina, Cubuk, Ekin Dogus, Segal, Elad, Hagerman, Eleanor, Barnes, Elizabeth, Donoway, Elizabeth, Pavlick, Ellie, Rodola, Emanuele, Lam, Emma, Chu, Eric, Tang, Eric, Erdem, Erkut, Chang, Ernie, Chi, Ethan A., Dyer, Ethan, Jerzak, Ethan, Kim, Ethan, Manyasi, Eunice Engefu, Zheltonozhskii, Evgenii, Xia, Fanyue, Siar, Fatemeh, Martínez-Plumed, Fernando, Happé, Francesca, Chollet, Francois, Rong, Frieda, Mishra, Gaurav, Winata, Genta Indra, de Melo, Gerard, Kruszewski, Germán, Parascandolo, Giambattista, Mariani, Giorgio, Wang, Gloria, Jaimovitch-López, Gonzalo, Betz, Gregor, Gur-Ari, Guy, Galijasevic, Hana, Kim, Hannah, Rashkin, Hannah, Hajishirzi, Hannaneh, Mehta, Harsh, Bogar, Hayden, Shevlin, Henry, Schütze, Hinrich, Yakura, Hiromu, Zhang, Hongming, Wong, Hugh Mee, Ng, Ian, Noble, Isaac, Jumelet, Jaap, Geissinger, Jack, Kernion, Jackson, Hilton, Jacob, Lee, Jaehoon, Fisac, Jaime Fernández, Simon, James B., Koppel, James, Zheng, James, Zou, James, Kocoń, Jan, Thompson, Jana, Wingfield, Janelle, Kaplan, Jared, Radom, Jarema, Sohl-Dickstein, Jascha, Phang, Jason, Wei, Jason, Yosinski, Jason, Novikova, Jekaterina, Bosscher, Jelle, Marsh, Jennifer, Kim, Jeremy, Taal, Jeroen, Engel, Jesse, Alabi, Jesujoba, Xu, Jiacheng, Song, Jiaming, Tang, Jillian, Waweru, Joan, Burden, John, Miller, John, Balis, John U., Batchelder, Jonathan, Berant, Jonathan, Frohberg, Jörg, Rozen, Jos, Hernandez-Orallo, Jose, Boudeman, Joseph, Guerr, Joseph, Jones, Joseph, Tenenbaum, Joshua B., Rule, Joshua S., Chua, Joyce, Kanclerz, Kamil, Livescu, Karen, Krauth, Karl, Gopalakrishnan, Karthik, Ignatyeva, Katerina, Markert, Katja, Dhole, Kaustubh D., Gimpel, Kevin, Omondi, Kevin, Mathewson, Kory, Chiafullo, Kristen, Shkaruta, Ksenia, Shridhar, Kumar, McDonell, Kyle, Richardson, Kyle, Reynolds, Laria, Gao, Leo, Zhang, Li, Dugan, Liam, Qin, Lianhui, Contreras-Ochando, Lidia, Morency, Louis-Philippe, Moschella, Luca, Lam, Lucas, Noble, Lucy, Schmidt, Ludwig, He, Luheng, Colón, Luis Oliveros, Metz, Luke, Şenel, Lütfi Kerem, Bosma, Maarten, Sap, Maarten, ter Hoeve, Maartje, Farooqi, Maheen, Faruqui, Manaal, Mazeika, Mantas, Baturan, Marco, Marelli, Marco, Maru, Marco, Quintana, Maria Jose Ramírez, Tolkiehn, Marie, Giulianelli, Mario, Lewis, Martha, Potthast, Martin, Leavitt, Matthew L., Hagen, Matthias, Schubert, Mátyás, Baitemirova, Medina Orduna, Arnaud, Melody, McElrath, Melvin, Yee, Michael A., Cohen, Michael, Gu, Michael, Ivanitskiy, Michael, Starritt, Michael, Strube, Michael, Swędrowski, Michał, Bevilacqua, Michele, Yasunaga, Michihiro, Kale, Mihir, Cain, Mike, Xu, Mimee, Suzgun, Mirac, Walker, Mitch, Tiwari, Mo, Bansal, Mohit, Aminnaseri, Moin, Geva, Mor, Gheini, Mozhdeh, T, Mukund Varma, Peng, Nanyun, Chi, Nathan A., Lee, Nayeon, Krakover, Neta Gur-Ari, Cameron, Nicholas, Roberts, Nicholas, Doiron, Nick, Martinez, Nicole, Nangia, Nikita, Deckers, Niklas, Muennighoff, Niklas, Keskar, Nitish Shirish, Iyer, Niveditha S., Constant, Noah, Fiedel, Noah, Wen, Nuan, Zhang, Oliver, Agha, Omar, Elbaghdadi, Omar, Levy, Omer, Evans, Owain, Casares, Pablo Antonio Moreno, Doshi, Parth, Fung, Pascale, Liang, Paul Pu, Vicol, Paul, Alipoormolabashi, Pegah, Liao, Peiyuan, Liang, Percy, Chang, Peter, Eckersley, Peter, Htut, Phu Mon, Hwang, Pinyu, Miłkowski, Piotr, Patil, Piyush, Pezeshkpour, Pouya, Oli, Priti, Mei, Qiaozhu, Lyu, Qing, Chen, Qinlang, Banjade, Rabin, Rudolph, Rachel Etta, Gabriel, Raefer, Habacker, Rahel, Risco, Ramon, Millière, Raphaël, Garg, Rhythm, Barnes, Richard, Saurous, Rif A., Arakawa, Riku, Raymaekers, Robbe, Frank, Robert, Sikand, Rohan, Novak, Roman, Sitelew, Roman, LeBras, Ronan, Liu, Rosanne, Jacobs, Rowan, Zhang, Rui, Salakhutdinov, Ruslan, Chi, Ryan, Lee, Ryan, Stovall, Ryan, Teehan, Ryan, Yang, Rylan, Singh, Sahib, Mohammad, Saif M., Anand, Sajant, Dillavou, Sam, Shleifer, Sam, Wiseman, Sam, Gruetter, Samuel, Bowman, Samuel R., Schoenholz, Samuel S., Han, Sanghyun, Kwatra, Sanjeev, Rous, Sarah A., Ghazarian, Sarik, Ghosh, Sayan, Casey, Sean, Bischoff, Sebastian, Gehrmann, Sebastian, Schuster, Sebastian, Sadeghi, Sepideh, Hamdan, Shadi, Zhou, Sharon, Srivastava, Shashank, Shi, Sherry, Singh, Shikhar, Asaadi, Shima, Gu, Shixiang Shane, Pachchigar, Shubh, Toshniwal, Shubham, Upadhyay, Shyam, Shyamolima, Debnath, Shakeri, Siamak, Thormeyer, Simon, Melzi, Simone, Reddy, Siva, Makini, Sneha Priscilla, Lee, Soo-Hwan, Torene, Spencer, Hatwar, Sriharsha, Dehaene, Stanislas, Divic, Stefan, Ermon, Stefano, Biderman, Stella, Lin, Stephanie, Prasad, Stephen, Piantadosi, Steven T., Shieber, Stuart M., Misherghi, Summer, Kiritchenko, Svetlana, Mishra, Swaroop, Linzen, Tal, Schuster, Tal, Li, Tao, Yu, Tao, Ali, Tariq, Hashimoto, Tatsu, Wu, Te-Lin, Desbordes, Théo, Rothschild, Theodore, Phan, Thomas, Wang, Tianle, Nkinyili, Tiberius, Schick, Timo, Kornev, Timofei, Tunduny, Titus, Gerstenberg, Tobias, Chang, Trenton, Neeraj, Trishala, Khot, Tushar, Shultz, Tyler, Shaham, Uri, Misra, Vedant, Demberg, Vera, Nyamai, Victoria, Raunak, Vikas, Ramasesh, Vinay, Prabhu, Vinay Uday, Padmakumar, Vishakh, Srikumar, Vivek, Fedus, William, Saunders, William, Zhang, William, Vossen, Wout, Ren, Xiang, Tong, Xiaoyu, Zhao, Xinran, Wu, Xinyi, Shen, Xudong, Yaghoobzadeh, Yadollah, Lakretz, Yair, Song, Yangqiu, Bahri, Yasaman, Choi, Yejin, Yang, Yichi, Hao, Yiding, Chen, Yifu, Belinkov, Yonatan, Hou, Yu, Hou, Yufang, Bai, Yuntao, Seid, Zachary, Zhao, Zhuoye, Wang, Zijian, Wang, Zijie J., Wang, Zirui, Wu, Ziyi, and Department of Philosophy
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,cs.LG ,cs.CL ,Machine Learning (stat.ML) ,cs.AI ,stat.ML ,Machine Learning (cs.LG) ,Computer Science - Computers and Society ,Artificial Intelligence (cs.AI) ,Statistics - Machine Learning ,Computers and Society (cs.CY) ,Computation and Language (cs.CL) ,cs.CY - Abstract
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting., 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench
- Published
- 2022
4. Open-Domain Sign Language Translation Learned from Online Video
- Author
-
Shi, Bowen, Brentari, Diane, Shakhnarovich, Greg, and Livescu, Karen
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Computation and Language (cs.CL) - Abstract
Existing work on sign language translation - that is, translation from sign language videos into sentences in a written language - has focused mainly on (1) data collected in a controlled environment or (2) data in a specific domain, which limits the applicability to real-world settings. In this paper, we introduce OpenASL, a large-scale American Sign Language (ASL) - English dataset collected from online video sites (e.g., YouTube). OpenASL contains 288 hours of ASL videos in multiple domains from over 200 signers and is the largest publicly available ASL translation dataset to date. To tackle the challenges of sign language translation in realistic settings and without glosses, we propose a set of techniques including sign search as a pretext task for pre-training and fusion of mouthing and handshape features. The proposed techniques produce consistent and large improvements in translation quality, over baseline models based on prior work. Our data and code are publicly available at https://github.com/chevalierNoir/OpenASL, EMNLP 2022
- Published
- 2022
5. Multistream articulatory feature-based models for visual speech recognition
- Author
-
Saenko, Kate, Livescu, Karen, Glass, James, and Darrell, Trevor
- Subjects
Bayesian statistical decision theory -- Analysis ,Voice recognition -- Analysis - Published
- 2009
6. Pronunciation modeling using a finite-state transducer representation
- Author
-
Hazen, Timothy J., Hetherington, I. Lee, Shu, Han, and Livescu, Karen
- Published
- 2005
- Full Text
- View/download PDF
7. Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches
- Author
-
Settle, Shane and Livescu, Karen
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,Computation and Language (cs.CL) - Abstract
Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search. Such embeddings can be learned discriminatively so that they are similar for speech segments corresponding to the same word, while being dissimilar for segments corresponding to different words. Recent work has found that acoustic word embeddings can outperform dynamic time warping on query-by-example search and related word discrimination tasks. However, the space of embedding models and training approaches is still relatively unexplored. In this paper we present new discriminative embedding models based on recurrent neural networks (RNNs). We consider training losses that have been successful in prior work, in particular a cross entropy loss for word classification and a contrastive loss that explicitly aims to separate same-word and different-word pairs in a "Siamese network" training setting. We find that both classifier-based and Siamese RNN embeddings improve over previously reported results on a word discrimination task, with Siamese RNNs outperforming classification models. In addition, we present analyses of the learned embeddings and the effects of variables such as dimensionality and network structure., To appear at SLT 2016
- Published
- 2016
8. Deep Variational Canonical Correlation Analysis
- Author
-
Wang, Weiran, Yan, Xinchen, Lee, Honglak, and Livescu, Karen
- Subjects
FOS: Computer and information sciences ,Computer Science - Learning ,Machine Learning (cs.LG) - Abstract
We present deep variational canonical correlation analysis (VCCA), a deep multi-view learning model that extends the latent variable model interpretation of linear CCA to nonlinear observation models parameterized by deep neural networks. We derive variational lower bounds of the data likelihood by parameterizing the posterior probability of the latent variables from the view that is available at test time. We also propose a variant of VCCA called VCCA-private that can, in addition to the "common variables" underlying both views, extract the "private variables" within each view, and disentangles the shared and private information for multi-view data without hard supervision. Experimental results on real-world datasets show that our methods are competitive across domains.
- Published
- 2016
9. Large-Scale Approximate Kernel Canonical Correlation Analysis
- Author
-
Wang, Weiran and Livescu, Karen
- Subjects
FOS: Computer and information sciences ,Computer Science - Learning ,Machine Learning (cs.LG) - Abstract
Kernel canonical correlation analysis (KCCA) is a nonlinear multi-view representation learning technique with broad applicability in statistics and machine learning. Although there is a closed-form solution for the KCCA objective, it involves solving an $N\times N$ eigenvalue system where $N$ is the training set size, making its computational requirements in both memory and time prohibitive for large-scale problems. Various approximation techniques have been developed for KCCA. A commonly used approach is to first transform the original inputs to an $M$-dimensional random feature space so that inner products in the feature space approximate kernel evaluations, and then apply linear CCA to the transformed inputs. In many applications, however, the dimensionality $M$ of the random feature space may need to be very large in order to obtain a sufficiently good approximation; it then becomes challenging to perform the linear CCA step on the resulting very high-dimensional data matrices. We show how to use a stochastic optimization algorithm, recently proposed for linear CCA and its neural-network extension, to further alleviate the computation requirements of approximate KCCA. This approach allows us to run approximate KCCA on a speech dataset with $1.4$ million training samples and a random feature space of dimensionality $M=100000$ on a typical workstation., Published as a conference paper at International Conference on Learning Representations (ICLR) 2016
- Published
- 2015
10. From Paraphrase Database to Compositional Paraphrase Model and Back
- Author
-
Wieting, John, Bansal, Mohit, Gimpel, Kevin, Livescu, Karen, and Roth, Dan
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computation and Language (cs.CL) - Abstract
The Paraphrase Database (PPDB; Ganitkevitch et al., 2013) is an extensive semantic resource, consisting of a list of phrase pairs with (heuristic) confidence estimates. However, it is still unclear how it can best be used, due to the heuristic nature of the confidences and its necessarily incomplete coverage. We propose models to leverage the phrase pairs from the PPDB to build parametric paraphrase models that score paraphrase pairs more accurately than the PPDB's internal scores while simultaneously improving its coverage. They allow for learning phrase embeddings as well as improved word embeddings. Moreover, we introduce two new, manually annotated datasets to evaluate short-phrase paraphrasing models. Using our paraphrase model trained using PPDB, we achieve state-of-the-art results on standard word and bigram similarity tasks and beat strong baselines on our new short phrase paraphrase tasks., 2015 TACL paper updated with an appendix describing new 300 dimensional embeddings. Submitted 1/2015. Accepted 2/2015. Published 6/2015
- Published
- 2015
11. Differential Representation of Articulatory Gestures and Phonemes in Precentral and Inferior Frontal Gyri.
- Author
-
Mugler, Emily M., Tate, Matthew C., Livescu, Karen, Templer, Jessica W., Goldrick, Matthew A., and Slutzky, Marc W.
- Subjects
PHONEME (Linguistics) ,GESTURE ,BRAIN-computer interfaces ,VOCAL tract ,PRODUCTION control - Abstract
Speech is a critical form of human communication and is central to our daily lives. Yet, despite decades of study, an understanding of the fundamental neural control of speech production remains incomplete. Current theories model speech production as a hierarchy from sentences and phrases down to words, syllables, speech sounds (phonemes), and the actions of vocal tract articulators used to produce speech sounds (articulatory gestures). Here, we investigate the cortical representation of articulatory gestures and phonemes in ventral precentral and inferior frontal gyri in men and women. Our results indicate that ventral precentral cortex represents gestures to a greater extent than phonemes, while inferior frontal cortex represents both gestures and phonemes. These findings suggest that speech production shares a common cortical representation with that of other types of movement, such as arm and hand movements. This has important implications both for our understanding of speech production and for the design of brain-machine interfaces to restore communication to people who cannot speak. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
12. End-to-End Neural Segmental Models for Speech Recognition.
- Author
-
Tang, Hao, Lu, Liang, Kong, Lingpeng, Gimpel, Kevin, Livescu, Karen, Dyer, Chris, Smith, Noah A., and Renals, Steve
- Abstract
Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multistage versus end-to-end training and multitask training that combines segmental and frame-level losses. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
13. Signer-independent fingerspelling recognition with deep neural network adaptation.
- Author
-
Kim, Taehwan, Wang, Weiran, Tang, Hao, and Livescu, Karen
- Published
- 2016
- Full Text
- View/download PDF
14. Deep convolutional acoustic word embeddings using word-pair side information.
- Author
-
Kamper, Herman, Wang, Weiran, and Livescu, Karen
- Published
- 2016
- Full Text
- View/download PDF
15. Speech Production in Speech Technologies: Introduction to the CSL Special Issue
- Author
-
Livescu, Karen, Rudzicz, Frank, Fosler-Lussier, Eric, Hasegawa-Johnson, Mark, and Bilmes, Jeff
- Published
- 2016
- Full Text
- View/download PDF
16. Unsupervised learning of acoustic features via deep canonical correlation analysis.
- Author
-
Wang, Weiran, Arora, Raman, Livescu, Karen, and Bilmes, Jeff A.
- Published
- 2015
- Full Text
- View/download PDF
17. Stochastic optimization for deep CCA via nonlinear orthogonal iterations.
- Author
-
Wang, Weiran, Arora, Raman, Livescu, Karen, and Srebro, Nathan
- Published
- 2015
- Full Text
- View/download PDF
18. Discriminative segmental cascades for feature-rich phone recognition.
- Author
-
Tang, Hao, Wang, Weiran, Gimpel, Kevin, and Livescu, Karen
- Published
- 2015
- Full Text
- View/download PDF
19. Multi-view learning with supervision for transformed bottleneck features.
- Author
-
Arora, Raman and Livescu, Karen
- Published
- 2014
- Full Text
- View/download PDF
20. Reconstruction of articulatory measurements with smoothed low-rank matrix completion.
- Author
-
Wang, Weiran, Arora, Raman, and Livescu, Karen
- Published
- 2014
- Full Text
- View/download PDF
21. Discriminative articulatory models for spoken term detection in low-resource conversational settings.
- Author
-
Prabhavalkar, Rohit, Livescu, Karen, Fosler-Lussier, Eric, and Keshet, Joseph
- Published
- 2013
- Full Text
- View/download PDF
22. Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains.
- Author
-
Arora, Raman and Livescu, Karen
- Published
- 2013
- Full Text
- View/download PDF
23. Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings.
- Author
-
Levin, Keith, Henry, Katharine, Jansen, Aren, and Livescu, Karen
- Abstract
Measures of acoustic similarity between words or other units are critical for segmental exemplar-based acoustic models, spoken term discovery, and query-by-example search. Dynamic time warping (DTW) alignment cost has been the most commonly used measure, but it has well-known inadequacies. Some recently proposed alternatives require large amounts of training data. In the interest of finding more efficient, accurate, and low-resource alternatives, we consider the problem of embedding speech segments of arbitrary length into fixed-dimensional spaces in which simple distances (such as cosine or Euclidean) serve as a proxy for linguistically meaningful (phonetic, lexical, etc.) dissimilarities. Such embeddings would enable efficient audio indexing and permit application of standard distance learning techniques to segmental acoustic modeling. In this paper, we explore several supervised and unsupervised approaches to this problem and evaluate them on an acoustic word discrimination task. We identify several embedding algorithms that match or improve upon the DTW baseline in low-resource settings. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
24. Stochastic optimization for PCA and PLS.
- Author
-
Arora, Raman, Cotter, Andrew, Livescu, Karen, and Srebro, Nathan
- Published
- 2012
- Full Text
- View/download PDF
25. American sign language fingerspelling recognition with phonological feature-based tandem models.
- Author
-
Kim, Taehwan, Livescu, Karen, and Shakhnarovich, Gregory
- Abstract
We study the recognition of fingerspelling sequences in American Sign Language from video using tandem-style models, in which the outputs of multilayer perceptron (MLP) classifiers are used as observations in a hidden Markov model (HMM)-based recognizer. We compare a baseline HMM-based recognizer, a tandem recognizer using MLP letter classifiers, and a tandem recognizer using MLP classifiers of phonological features. We present experiments on a database of fingerspelling videos. We find that the tandem approaches outperform an HMM-based baseline, and that phonological feature-based tandem models outperform letter-based tandem models. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
26. A factored conditional random field model for articulatory feature forced transcription.
- Author
-
Prabhavalkar, Rohit, Fosler-Lussier, Eric, and Livescu, Karen
- Abstract
We investigate joint models of articulatory features and apply these models to the problem of automatically generating articulatory transcriptions of spoken utterances given their word transcriptions. The task is motivated by the need for larger amounts of labeled articulatory data for both speech recognition and linguistics research, which is costly and difficult to obtain through manual transcription or physical measurement. Unlike phonetic transcription, in our task it is important to account for the fact that the articulatory features can desynchronize. We consider factored models of the articulatory state space with an explicit model of articulator asynchrony. We compare two types of graphical models: a dynamic Bayesian network (DBN), based on previously proposed models; and a conditional random field (CRF), which we develop here. We demonstrate how task-specific constraints can be leveraged to allow for efficient exact inference in the CRF. On the transcription task, the CRF outperforms the DBN, with relative improvements of 2.2% to 10.0%. [ABSTRACT FROM PUBLISHER]
- Published
- 2011
- Full Text
- View/download PDF
27. Multi-view clustering via canonical correlation analysis.
- Author
-
Chaudhuri, Kamalika, Kakade, Sham M., Livescu, Karen, and Sridharan, Karthik
- Published
- 2009
- Full Text
- View/download PDF
28. Subword Modeling for Automatic Speech Recognition: Past, Present, and Emerging Approaches.
- Author
-
Livescu, Karen, Fosler-Lussier, Eric, and Metze, Florian
- Abstract
Modern automatic speech recognition systems handle large vocabularies of words, making it infeasible to collect enough repetitions of each word to train individual word models. Instead, large-vocabulary recognizers represent each word in terms of subword units. Typically the subword unit is the phone, a basic speech sound such as a single consonant or vowel. Each word is then represented as a sequence, or several alternative sequences, of phones specified in a pronunciation dictionary. Other choices of subword units have been studied as well. The choice of subword units, and the way in which the recognizer represents words in terms of combinations of those units, is the problem of subword modeling. Different subword models may be preferable in different settings, such as high-variability conversational speech, highnoise conditions, low-resource settings, or multilingual speech recognition. This article reviews past, present, and emerging approaches to subword modeling. To make clean comparisons between many approaches, the review uses the unifying language of graphical models. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
29. Speech production knowledge in automatic speech recognition.
- Author
-
King, Simon, Frankel, Joe, Livescu, Karen, McDermott, Erik, Richmond, Korin, and Wester, Mirjam
- Subjects
SPEECH perception ,AUTOMATIC speech recognition ,PATTERN recognition systems ,PERCEPTRONS ,SPEECH processing systems - Abstract
Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
30. Word level language identification in online multilingual communication
- Author
-
Nguyen, Dong, Dogruöz, A. Seza, Yarowsky, David, Baldwin, Timothy, Korhonen, Anna, Livescu, Karen, and Bethard, Steven
- Subjects
automatic language identification ,social media ,Turkish ,Lt3 ,code-switching ,multilingual ,Dutch ,Languages and Literatures - Abstract
Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data require automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using languagemodels and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.
- Published
- 2013
31. LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS HOPKINS SUMMER WORKSHOP.
- Author
-
Hasegawa-Johnson M, Baker J, Borys S, Chen K, Coogan E, Greenberg S, Juneja A, Kirchhoff K, Livescu K, Mohan S, Muller J, Sonmez K, and Wang T
- Abstract
Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multiframe acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.
- Published
- 2005
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.