Author: "Church, Kenneth" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Church, Kenneth"' showing total 626 results

Start Over Author "Church, Kenneth"

626 results on '"Church, Kenneth"'

1. No Culture Left Behind: ArtELingo-28, a Benchmark of WikiArt with Captions in 28 Languages

Author: Mohamed, Youssef, Li, Runjia, Ahmad, Ibrahim Said, Haydarov, Kilichbek, Torr, Philip, Church, Kenneth Ward, and Elhoseiny, Mohamed
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: Research in vision and language has made considerable progress thanks to benchmarks such as COCO. COCO captions focused on unambiguous facts in English; ArtEmis introduced subjective emotions and ArtELingo introduced some multilinguality (Chinese and Arabic). However we believe there should be more multilinguality. Hence, we present ArtELingo-28, a vision-language benchmark that spans $\textbf{28}$ languages and encompasses approximately $\textbf{200,000}$ annotations ($\textbf{140}$ annotations per image). Traditionally, vision research focused on unambiguous class labels, whereas ArtELingo-28 emphasizes diversity of opinions over languages and cultures. The challenge is to build machine learning systems that assign emotional captions to images. Baseline results will be presented for three novel conditions: Zero-Shot, Few-Shot and One-vs-All Zero-Shot. We find that cross-lingual transfer is more successful for culturally-related languages. Data and code are provided at www.artelingo.org., Comment: 9 pages, Accepted at EMNLP 24, for more details see www.artelingo.org
Published: 2024

2. On Translating Technical Terminology: A Translation Workflow for Machine-Translated Acronyms

Author: Yue, Richard, Ortega, John E., and Church, Kenneth Ward
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: The typical workflow for a professional translator to translate a document from its source language (SL) to a target language (TL) is not always focused on what many language models in natural language processing (NLP) do - predict the next word in a series of words. While high-resource languages like English and French are reported to achieve near human parity using common metrics for measurement such as BLEU and COMET, we find that an important step is being missed: the translation of technical terms, specifically acronyms. Some state-of-the art machine translation systems like Google Translate which are publicly available can be erroneous when dealing with acronyms - as much as 50% in our findings. This article addresses acronym disambiguation for MT systems by proposing an additional step to the SL-TL (FR-EN) translation workflow where we first offer a new acronym corpus for public consumption and then experiment with a search-based thresholding algorithm that achieves nearly 10% increase when compared to Google Translate and OpusMT., Comment: AMTA 2024 - The Association for Machine Translation in the Americas organizes biennial conferences devoted to researchers, commercial users, governmental and NGO users
Published: 2024

3. Academic Article Recommendation Using Multiple Perspectives

Author: Church, Kenneth, Alonso, Omar, Vickers, Peter, Sun, Jiameng, Ebrahimi, Abteen, and Chandrasekar, Raman
Subjects: Computer Science - Information Retrieval
Abstract: We argue that Content-based filtering (CBF) and Graph-based methods (GB) complement one another in Academic Search recommendations. The scientific literature can be viewed as a conversation between authors and the audience. CBF uses abstracts to infer authors' positions, and GB uses citations to infer responses from the audience. In this paper, we describe nine differences between CBF and GB, as well as synergistic opportunities for hybrid combinations. Two embeddings will be used to illustrate these opportunities: (1) Specter, a CBF method based on BERT-like deepnet encodings of abstracts, and (2) ProNE, a GB method based on spectral clustering of more than 200M papers and 2B citations from Semantic Scholar.
Published: 2024

4. Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT

Author: Ahmad, Ibrahim Said, Dudy, Shiran, Ramachandranpillai, Resmi, and Church, Kenneth
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs), such as ChatGPT, are widely used to generate content for various purposes and audiences. However, these models may not reflect the cultural and emotional diversity of their users, especially for low-resource languages. In this paper, we investigate how ChatGPT represents Hausa's culture and emotions. We compare responses generated by ChatGPT with those provided by native Hausa speakers on 37 culturally relevant questions. We conducted experiments using emotion analysis and applied two similarity metrics to measure the alignment between human and ChatGPT responses. We also collected human participants ratings and feedback on ChatGPT responses. Our results show that ChatGPT has some level of similarity to human responses, but also exhibits some gaps and biases in its knowledge and awareness of the Hausa culture and emotions. We discuss the implications and limitations of our methodology and analysis and suggest ways to improve the performance and evaluation of LLMs for low-resource languages.
Published: 2024

5. Since the Scientific Literature Is Multilingual, Our Models Should Be Too

Author: Ebrahimi, Abteen and Church, Kenneth
Subjects: Computer Science - Computation and Language
Abstract: English has long been assumed the $\textit{lingua franca}$ of scientific research, and this notion is reflected in the natural language processing (NLP) research involving scientific document representation. In this position piece, we quantitatively show that the literature is largely multilingual and argue that current models and benchmarks should reflect this linguistic diversity. We provide evidence that text-based models fail to create meaningful representations for non-English papers and highlight the negative user-facing impacts of using English-only models non-discriminately across a multilingual domain. We end with suggestions for the NLP community on how to improve performance on non-English documents.
Published: 2024

6. ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture

Author: Mohamed, Youssef, Abdelfattah, Mohamed, Alhuwaider, Shyma, Li, Feifan, Zhang, Xiangliang, Church, Kenneth Ward, and Elhoseiny, Mohamed
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Computers and Society, Computer Science - Machine Learning
Abstract: This paper introduces ArtELingo, a new benchmark and dataset, designed to encourage work on diversity across languages and cultures. Following ArtEmis, a collection of 80k artworks from WikiArt with 0.45M emotion labels and English-only captions, ArtELingo adds another 0.79M annotations in Arabic and Chinese, plus 4.8K in Spanish to evaluate "cultural-transfer" performance. More than 51K artworks have 5 annotations or more in 3 languages. This diversity makes it possible to study similarities and differences across languages and cultures. Further, we investigate captioning tasks, and find diversity improves the performance of baseline models. ArtELingo is publicly available at https://www.artelingo.org/ with standard splits and baseline models. We hope our work will help ease future research on multilinguality and culturally-aware AI., Comment: 9 pages, Accepted at EMNLP 22, for more details see https://www.artelingo.org/
Published: 2022

7. Data-Driven Adaptive Simultaneous Machine Translation

Author: Xun, Guangxu, Ma, Mingbo, Bian, Yuchen, Cai, Xingyu, Huang, Jiaji, Zheng, Renjie, Chen, Junkun, Yuan, Jiahong, Church, Kenneth, and Huang, Liang
Subjects: Computer Science - Computation and Language
Abstract: In simultaneous translation (SimulMT), the most widely used strategy is the wait-k policy thanks to its simplicity and effectiveness in balancing translation quality and latency. However, wait-k suffers from two major limitations: (a) it is a fixed policy that can not adaptively adjust latency given context, and (b) its training is much slower than full-sentence translation. To alleviate these issues, we propose a novel and efficient training scheme for adaptive SimulMT by augmenting the training corpus with adaptive prefix-to-prefix pairs, while the training complexity remains the same as that of training full-sentence translation models. Experiments on two language pairs show that our method outperforms all strong baselines in terms of translation quality and latency.
Published: 2022

8. Efficiently Disentangle Causal Representations

Author: Li, Yuanpeng, Hestness, Joel, Elhoseiny, Mohamed, Zhao, Liang, and Church, Kenneth
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: This paper proposes an efficient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions. We approximate the difference with models' generalization abilities so that it fits in the standard machine learning framework and can be efficiently computed. In contrast to the state-of-the-art approach, which relies on the learner's adaptation speed to new distribution, the proposed approach only requires evaluating the model's generalization ability. We provide a theoretical explanation for the advantage of the proposed method, and our experiments show that the proposed technique is 1.9--11.0$\times$ more sample efficient and 9.4--32.4 times quicker than the previous method on various tasks. The source code is available at \url{https://github.com/yuanpeng16/EDCR}., Comment: 17 pages, 7 figures
Published: 2022

9. Exploiting a Zoo of Checkpoints for Unseen Tasks

Author: Huang, Jiaji, Qiu, Qiang, and Church, Kenneth
Subjects: Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: There are so many models in the literature that it is difficult for practitioners to decide which combinations are likely to be effective for a new task. This paper attempts to address this question by capturing relationships among checkpoints published on the web. We model the space of tasks as a Gaussian process. The covariance can be estimated from checkpoints and unlabeled probing data. With the Gaussian process, we can identify representative checkpoints by a maximum mutual information criterion. This objective is submodular. A greedy method identifies representatives that are likely to "cover" the task space. These representatives generalize to new tasks with superior performance. Empirical evidence is provided for applications from both computational linguistics as well as computer vision., Comment: Accepted in Neurips 2021
Published: 2021

10. The Role of Phonetic Units in Speech Emotion Recognition

Author: Yuan, Jiahong, Cai, Xingyu, Zheng, Renjie, Huang, Liang, and Church, Kenneth
Subjects: Computer Science - Computation and Language
Abstract: We propose a method for emotion recognition through emotiondependent speech recognition using Wav2vec 2.0. Our method achieved a significant improvement over most previously reported results on IEMOCAP, a benchmark emotion dataset. Different types of phonetic units are employed and compared in terms of accuracy and robustness of emotion recognition within and across datasets and languages. Models of phonemes, broad phonetic classes, and syllables all significantly outperform the utterance model, demonstrating that phonetic units are helpful and should be incorporated in speech emotion recognition. The best performance is from using broad phonetic classes. Further research is needed to investigate the optimal set of broad phonetic classes for the task of emotion recognition. Finally, we found that Wav2vec 2.0 can be fine-tuned to recognize coarser-grained or larger phonetic units than phonemes, such as broad phonetic classes and syllables.
Published: 2021

11. Decoupling recognition and transcription in Mandarin ASR

Author: Yuan, Jiahong, Cai, Xingyu, Gao, Dongji, Zheng, Renjie, Huang, Liang, and Church, Kenneth
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Much of the recent literature on automatic speech recognition (ASR) is taking an end-to-end approach. Unlike English where the writing system is closely related to sound, Chinese characters (Hanzi) represent meaning, not sound. We propose factoring audio -> Hanzi into two sub-tasks: (1) audio -> Pinyin and (2) Pinyin -> Hanzi, where Pinyin is a system of phonetic transcription of standard Chinese. Factoring the audio -> Hanzi task in this way achieves 3.9% CER (character error rate) on the Aishell-1 corpus, the best result reported on this dataset so far., Comment: submitted to ASRU 2021
Published: 2021

12. Automatic recognition of suprasegmentals in speech

Author: Yuan, Jiahong, Ryant, Neville, Cai, Xingyu, Church, Kenneth, and Liberman, Mark
Subjects: Computer Science - Computation and Language
Abstract: This study reports our efforts to improve automatic recognition of suprasegmentals by fine-tuning wav2vec 2.0 with CTC, a method that has been successful in automatic speech recognition. We demonstrate that the method can improve the state-of-the-art on automatic recognition of syllables, tones, and pitch accents. Utilizing segmental information, by employing tonal finals or tonal syllables as recognition units, can significantly improve Mandarin tone recognition. Language models are helpful when tonal syllables are used as recognition units, but not helpful when tones are recognition units. Finally, Mandarin tone recognition can benefit from English phoneme recognition by combining the two tasks in fine-tuning wav2vec 2.0., Comment: submitted to ASRU 2021
Published: 2021

13. Better than BERT but Worse than Baseline

Author: Liu, Boxiang, Huang, Jiaji, Cai, Xingyu, and Church, Kenneth
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: This paper compares BERT-SQuAD and Ab3P on the Abbreviation Definition Identification (ADI) task. ADI inputs a text and outputs short forms (abbreviations/acronyms) and long forms (expansions). BERT with reranking improves over BERT without reranking but fails to reach the Ab3P rule-based baseline. What is BERT missing? Reranking introduces two new features: charmatch and freq. The first feature identifies opportunities to take advantage of character constraints in acronyms and the second feature identifies opportunities to take advantage of frequency constraints across documents., Comment: 6 pages, 2 figures, 5 tables
Published: 2021

14. The Third DIHARD Diarization Challenge

Author: Ryant, Neville, Singh, Prachi, Krishnamohan, Venkat, Varma, Rajat, Church, Kenneth, Cieri, Christopher, Du, Jun, Ganapathy, Sriram, and Liberman, Mark
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain. Speaker diarization was evaluated under two speech activity conditions (diarization from a reference speech activity vs. diarization from scratch) and 11 diverse domains. The domains span a range of recording conditions and interaction types, including read audio-books, meeting speech, clinical interviews, web videos, and, for the first time, conversational telephone speech. A total of 30 organizations (forming 21teams) from industry and academia submitted 499 valid system outputs. The evaluation results indicate that speaker diarization has improved markedly since DIHARD I, particularly for two-party interactions, but that for many domains (e.g., web video) the problem remains far from solved., Comment: arXiv admin note: text overlap with arXiv:1906.07839
Published: 2020

15. Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training

Author: Zheng, Renjie, Ma, Mingbo, Zheng, Baigong, Liu, Kaibo, Yuan, Jiahong, Church, Kenneth, and Huang, Liang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Simultaneous speech-to-speech translation is widely useful but extremely challenging, since it needs to generate target-language speech concurrently with the source-language speech, with only a few seconds delay. In addition, it needs to continuously translate a stream of sentences, but all recent solutions merely focus on the single-sentence scenario. As a result, current approaches accumulate latencies progressively when the speaker talks faster, and introduce unnatural pauses when the speaker talks slower. To overcome these issues, we propose Self-Adaptive Translation (SAT) which flexibly adjusts the length of translations to accommodate different source speech rates. At similar levels of translation quality (as measured by BLEU), our method generates more fluent target speech (as measured by the naturalness metric MOS) with substantially lower latency than the baseline, in both Zh <-> En directions., Comment: 10 pages, accepted by Findings of EMNLP 2020
Published: 2020

16. Third DIHARD Challenge Evaluation Plan

Author: Ryant, Neville, Church, Kenneth, Cieri, Christopher, Du, Jun, Ganapathy, Sriram, and Liberman, Mark
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: This paper introduces the third DIHARD challenge, the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain. The challenge comprises two tracks evaluating diarization performance when starting from a reference speech segmentation (track 1) and diarization from raw audio scratch (track 2). We describe the task, metrics, datasets, and evaluation protocol., Comment: Version 1.2 - Planned schedule updated - Updated numbers in tables from final versions of development/evaluation sets - Corrected typo
Published: 2020

17. Exploring Long Tail Visual Relationship Recognition with Large Vocabulary

Author: Abdelkarim, Sherif, Agarwal, Aniket, Achlioptas, Panos, Chen, Jun, Huang, Jiaji, Li, Boyang, Church, Kenneth, and Elhoseiny, Mohamed
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning, I.2.10, I.5.0, I.4.0
Abstract: Several approaches have been proposed in recent literature to alleviate the long-tail problem, mainly in object classification tasks. In this paper, we make the first large-scale study concerning the task of Long-Tail Visual Relationship Recognition (LTVRR). LTVRR aims at improving the learning of structured visual relationships that come from the long-tail (e.g., "rabbit grazing on grass"). In this setup, the subject, relation, and object classes each follow a long-tail distribution. To begin our study and make a future benchmark for the community, we introduce two LTVRR-related benchmarks, dubbed VG8K-LT and GQA-LT, built upon the widely used Visual Genome and GQA datasets. We use these benchmarks to study the performance of several state-of-the-art long-tail models on the LTVRR setup. Lastly, we propose a visiolinguistic hubless (VilHub) loss and a Mixup augmentation technique adapted to LTVRR setup, dubbed as RelMix. Both VilHub and RelMix can be easily integrated on top of existing models and despite being simple, our results show that they can remarkably improve the performance, especially on tail classes. Benchmarks, code, and models have been made available at: https://github.com/Vision-CAIR/LTVRR.
Published: 2020

18. Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework

Author: Ma, Mingbo, Zheng, Baigong, Liu, Kaibo, Zheng, Renjie, Liu, Hairong, Peng, Kainan, Church, Kenneth, and Huang, Liang
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Text-to-speech synthesis (TTS) has witnessed rapid progress in recent years, where neural methods became capable of producing audios with high naturalness. However, these efforts still suffer from two types of latencies: (a) the {\em computational latency} (synthesizing time), which grows linearly with the sentence length even with parallel approaches, and (b) the {\em input latency} in scenarios where the input text is incrementally generated (such as in simultaneous translation, dialog generation, and assistive technologies). To reduce these latencies, we devise the first neural incremental TTS approach based on the recently proposed prefix-to-prefix framework. We synthesize speech in an online fashion, playing a segment of audio while generating the next, resulting in an $O(1)$ rather than $O(n)$ latency., Comment: Findings of EMNLP 2020
Published: 2019

19. The Second DIHARD Diarization Challenge: Dataset, task, and baselines

Author: Ryant, Neville, Church, Kenneth, Cieri, Christopher, Cristia, Alejandrina, Du, Jun, Ganapathy, Sriram, and Liberman, Mark
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language
Abstract: This paper introduces the second DIHARD challenge, the second in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain. The challenge comprises four tracks evaluating diarization performance under two input conditions (single channel vs. multi-channel) and two segmentation conditions (diarization from a reference speech segmentation vs. diarization from scratch). In order to prevent participants from overtuning to a particular combination of recording conditions and conversational domain, recordings are drawn from a variety of sources ranging from read audiobooks to meeting speech, to child language acquisition recordings, to dinner parties, to web video. We describe the task and metrics, challenge design, datasets, and baseline systems for speech enhancement, speech activity detection, and diarization., Comment: Accepted by Interspeech 2019
Published: 2019

20. Language Modeling at Scale

Author: Patwary, Mostofa, Chabbi, Milind, Jun, Heewoo, Huang, Jiaji, Diamos, Gregory, and Church, Kenneth
Subjects: Computer Science - Computation and Language
Abstract: We show how Zipf's Law can be used to scale up language modeling (LM) to take advantage of more training data and more GPUs. LM plays a key role in many important natural language applications such as speech recognition and machine translation. Scaling up LM is important since it is widely accepted by the community that there is no data like more data. Eventually, we would like to train on terabytes (TBs) of text (trillions of words). Modern training methods are far from this goal, because of various bottlenecks, especially memory (within GPUs) and communication (across GPUs). This paper shows how Zipf's Law can address these bottlenecks by grouping parameters for common words and character sequences, because $U \ll N$, where $U$ is the number of unique words (types) and $N$ is the size of the training set (tokens). For a local batch size $K$ with $G$ GPUs and a $D$-dimension embedding matrix, we reduce the original per-GPU memory and communication asymptotic complexity from $\Theta(GKD)$ to $\Theta(GK + UD)$. Empirically, we find $U \propto (GK)^{0.64}$ on four publicly available large datasets. When we scale up the number of GPUs to 64, a factor of 8, training time speeds up by factors up to 6.7$\times$ (for character LMs) and 6.3$\times$ (for word LMs) with negligible loss of accuracy. Our weak scaling on 192 GPUs on the Tieba dataset shows a 35\% improvement in LM prediction accuracy by training on 93 GB of data (2.5$\times$ larger than publicly available SOTA dataset), but taking only 1.25$\times$ increase in training time, compared to 3 GB of the same dataset running on 6 GPUs.
Published: 2018

21. Progress in Machine Translation

Author: Wang, Haifeng, Wu, Hua, He, Zhongjun, Huang, Liang, and Church, Kenneth Ward
Published: 2022
Full Text: View/download PDF

22. Some Useful Things to Know When Combining IR and NLP: The Easy, the Hard and the Ugly

Author: Alonso, Omar, primary and Church, Kenneth, additional
Published: 2024
Full Text: View/download PDF

23. Statistical Models for Natural Language Processing

Author: Church, Kenneth and Mitkov, Ruslan, book editor
Published: 2022
Full Text: View/download PDF

24. Emerging trends: a gentle introduction to RAG.

Author: Church, Kenneth Ward, Sun, Jiameng, Yue, Richard, Vickers, Peter, Saba, Walid, and Chandrasekar, Raman
Subjects: LEGISLATIVE bills, UPLOADING of data, KNOWLEDGE base, CUSTOMER services, HALLUCINATIONS
Abstract: Retrieval-augmented generation (RAG) adds a simple but powerful feature to chatbots, the ability to upload files just-in-time. Chatbots are trained on large quantities of public data. The ability to upload files just-in-time makes it possible to reduce hallucinations by filling in gaps in the knowledge base that go beyond the public training data such as private data and recent events. For example, in a customer service scenario, with RAG, we can upload your private bill and then the bot can discuss questions about your bill as opposed to generic FAQ questions about bills in general. This tutorial will show how to upload files and generate responses to prompts; see https://github.com/kwchurch/RAG for multiple solutions based on tools from OpenAI, LangChain, HuggingFace transformers and VecML. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Emerging trends: When can users trust GPT, and when should they intervene?

Author: Church, Kenneth, primary
Published: 2024
Full Text: View/download PDF

26. Using Statistics in Lexical Analysis

Author: Church, Kenneth, primary, Gale, William, additional, Hanks, Patrick, additional, and Hindle, Donald, additional
Published: 2021
Full Text: View/download PDF

27. Minsky, Chomsky and Deep Nets

Author: Church, Kenneth Ward, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Pandu Rangan, C., Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Sojka, Petr, editor, Horák, Aleš, editor, Kopeček, Ivan, editor, and Pala, Karel, editor
Published: 2018
Full Text: View/download PDF

28. 8. Jihad

Author: Church, Kenneth, primary
Published: 2020
Full Text: View/download PDF

29. An Additively Manufactured CPW-back-fed Wideband Circularly-Polarized Radix Metasurface Patch Antenna for X-Band Space Applications

Author: O’Keefe, John, primary, Roberts, Bake, additional, Gray, Bryce, additional, Church, Kenneth, additional, and Rojas-Nastrucci, Eduardo A., additional
Published: 2023
Full Text: View/download PDF

30. Emerging trends: Smooth-talking machines

Author: Church, Kenneth Ward, primary and Yue, Richard, additional
Published: 2023
Full Text: View/download PDF

31. Microdispensing Processes

Author: Church, Kenneth, primary
Published: 2020
Full Text: View/download PDF

32. Nonlinear Estimators and Tail Bounds for Dimension Reduction in $l_1$ Using Cauchy Random Projections

Author: Li, Ping, Hastie, Trevor J., and Church, Kenneth W.
Subjects: Computer Science - Data Structures and Algorithms, Computer Science - Information Retrieval, Computer Science - Learning
Abstract: For dimension reduction in $l_1$, the method of {\em Cauchy random projections} multiplies the original data matrix $\mathbf{A} \in\mathbb{R}^{n\times D}$ with a random matrix $\mathbf{R} \in \mathbb{R}^{D\times k}$ ($k\ll\min(n,D)$) whose entries are i.i.d. samples of the standard Cauchy C(0,1). Because of the impossibility results, one can not hope to recover the pairwise $l_1$ distances in $\mathbf{A}$ from $\mathbf{B} = \mathbf{AR} \in \mathbb{R}^{n\times k}$, using linear estimators without incurring large errors. However, nonlinear estimators are still useful for certain applications in data stream computation, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. The sample median estimator and the geometric mean estimator are asymptotically (as $k\to \infty$) equivalent but the latter is more accurate at small $k$. We derive explicit tail bounds for the geometric mean estimator and establish an analog of the Johnson-Lindenstrauss (JL) lemma for dimension reduction in $l_1$, which is weaker than the classical JL lemma for dimension reduction in $l_2$. Asymptotically, both the sample median estimator and the geometric mean estimators are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.
Published: 2006

33. Corpus Methods in a Digitized World

Author: Church, Kenneth Ward, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, and Mitkov, Ruslan, editor
Published: 2017
Full Text: View/download PDF

34. Improved Contextualized Speech Representations for Tonal Analysis

Author: Yuan, Jiahong, primary, Cai, Xingyu, additional, and Church, Kenneth, additional
Published: 2023
Full Text: View/download PDF

35. An Example of (Too Much) Hyper-Parameter Tuning In Suicide Ideation Detection

Author: Marie Schoene, Annika, primary, Ortega, John, additional, Amir, Silvio, additional, and Church, Kenneth, additional
Published: 2023
Full Text: View/download PDF

36. K-vec: A New Approach for Aligning Parallel Texts

Author: Fung, Pascale and Church, Kenneth
Subjects: Computer Science - Computation and Language
Abstract: Various methods have been proposed for aligning texts in two or more languages such as the Canadian Parliamentary Debates(Hansards). Some of these methods generate a bilingual lexicon as a by-product. We present an alternative alignment strategy which we call K-vec, that starts by estimating the lexicon. For example, it discovers that the English word "fisheries" is similar to the French "pe^ches" by noting that the distribution of "fisheries" in the English text is similar to the distribution of "pe^ches" in the French. K-vec does not depend on sentence boundaries., Comment: 7 pages, uuencoded, compressed PostScript; Proc. COLING-94
Published: 1994

37. Emerging trends: Risks 3.0 and proliferation of spyware to 50,000 cell phones

Author: Church, Kenneth Ward, primary and Chandrasekar, Raman, additional
Published: 2023
Full Text: View/download PDF

38. On Memory Limitations in Natural Language Processing

Author: Szolovits, Peter, Church, Kenneth Ward, Szolovits, Peter, and Church, Kenneth Ward
Abstract: This paper proposes a welcome hypothesis: a computationally simple device is sufficient for processing natural language. Traditionally it has been argued that processing natural language syntax requires very powerful machinery. Many engineers have come to this rather grim conclusion: almost all working parsers are actually Turing Machines (TM). For example, Woods specifically designed his Augmented Transition Networks (ATNs) to be Turing Equivalent.
Published: 2023

39. Coping with Syntactic Ambiguity or How to Put the Block in the Box on the Table

Author: Church, Kenneth, Patil, Ramesh, Church, Kenneth, and Patil, Ramesh
Abstract: Sentences are far more ambiguous than one might have thought. There may be hundreds, perhaps thousands of syntatic parse trees for certain very natural sentences of English. This fact has been a major problem confronting natural language processing because it indicates that it may require a long time to construct a list of all the parse trees, and furthermore, it isn't clear what to do with the list once it has ben constructed. This list may be so numerous that it is probably not the most convenient representation for communication with the semantic and pragmatic processing modules. In this paper we propose some methods for dealing with syntactic ambiguity in ways that take advantage of certain regularities among the alternative parse trees. These regularities will be expressed as linear combinations of ATN networks, and also as sums and products of formal power series. We will suggest some ways that practical processor can take advantage of this modularity in order to deal more efficiently with combinatoric ambiguity. In particular, we will show how a processor can efficiently compute the ambiguity of an input sentence (or any portion thereof). Furthermore, we will show how to compile certain grammars into a form that can be processed more efficiently. In some cases, including the "every way ambiguous" grammar (e.g., conjunction, prepositional phrases, noun-noun modification), processing time will be reduced from O9n^3) to O(n). Finally, we will show how to uncompile certain highly optimized grammars into a form suitable for linguistic analysis.
Published: 2023

40. Minsky, Chomsky and Deep Nets

Author: Church, Kenneth Ward, primary
Published: 2018
Full Text: View/download PDF

41. Emerging trends: Unfair, biased, addictive, dangerous, deadly, and insanely profitable

Author: Church, Kenneth, primary, Schoene, Annika, additional, Ortega, John E., additional, Chandrasekar, Raman, additional, and Kordoni, Valia, additional
Published: 2022
Full Text: View/download PDF

42. Substring Statistics

Author: Umemura, Kyoji, Church, Kenneth, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, and Gelbukh, Alexander, editor
Published: 2009
Full Text: View/download PDF

43. Has Computational Linguistics Become More Applied?

Author: Church, Kenneth, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, and Gelbukh, Alexander, editor
Published: 2009
Full Text: View/download PDF

44. Corpus Methods in a Digitized World

Author: Church, Kenneth Ward, primary
Published: 2017
Full Text: View/download PDF

45. Nonlinear Estimators and Tail Bounds for Dimension Reduction in l 1 Using Cauchy Random Projections

Author: Li, Ping, Hastie, Trevor J., Church, Kenneth W., Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Bshouty, Nader H., editor, and Gentile, Claudio, editor
Published: 2007
Full Text: View/download PDF

46. Improving Random Projections Using Marginal Information

Author: Li, Ping, Hastie, Trevor J., Church, Kenneth W., Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Lugosi, Gábor, editor, and Simon, Hans Ulrich, editor
Published: 2006
Full Text: View/download PDF

47. Emerging trends: Deep nets thrive on scale

Author: Church, Kenneth Ward, primary
Published: 2022
Full Text: View/download PDF

48. Speech and Language Processing: Can We Use the Past to Predict the Future?

Author: Church, Kenneth, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Sojka, Petr, editor, Kopeček, Ivan, editor, and Pala, Karel, editor
Published: 2004
Full Text: View/download PDF

49. Approximate inference: A sampling based modeling technique to capture complex dependencies in a language model

Author: Deoras, Anoop, Mikolov, Tomáš, Kombrink, Stefan, and Church, Kenneth
Published: 2013
Full Text: View/download PDF

50. Virtual Data Warehousing, Data Publishing and Call Detail

Author: Belanger, David, Church, Kenneth, Hume, Andrew, Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, and Jonker, Willem, editor
Published: 2000
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

626 results on '"Church, Kenneth"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources