Author: "Arya D" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Arya D"' showing total 582 results

Start Over Author "Arya D"

582 results on '"Arya D"'

1. FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning

Author: Niklaus, Joel, Zheng, Lucia, McCarthy, Arya D., Hahn, Christopher, Rosen, Brian M., Henderson, Peter, Ho, Daniel E., Honke, Garrett, Liang, Percy, and Manning, Christopher
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, 68T50, I.2
Abstract: Instruction tuning is an important step in making language models useful for direct user interaction. However, many legal tasks remain out of reach for most open LLMs and there do not yet exist any large scale instruction datasets for the domain. This critically limits research in this application area. In this work, we curate LawInstruct, a large legal instruction dataset, covering 17 jurisdictions, 24 languages and a total of 12M examples. We present evidence that domain-specific pretraining and instruction tuning improve performance on LegalBench, including improving Flan-T5 XL by 8 points or 16\% over the baseline. However, the effect does not generalize across all tasks, training regimes, model sizes, and other factors. LawInstruct is a resource for accelerating the development of models with stronger information processing and decision making capabilities in the legal domain.
Published: 2024

2. Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Author: McCarthy, Arya D., Zhang, Hao, Kumar, Shankar, Stahlberg, Felix, and Wu, Ke
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs by incorporating finite-state constraints during decoding; these eliminate invalid outputs without requiring additional training. We discover that LLMs are adaptable to transcripts containing ASR errors through prompt-tuning or fine-tuning. Relative to a state-of-the-art automatic punctuation baseline, our best LLM improves the average BLEU by 2.9 points for English-German, English-Spanish, and English-Arabic TED talk translation in 9 test sets, just by improving segmentation., Comment: accepted to the Findings of EMNLP 2023. arXiv admin note: text overlap with arXiv:2212.09895
Published: 2023

3. Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models

Author: Ebrahimi, Abteen, McCarthy, Arya D., Oncevay, Arturo, Chiruzzo, Luis, Ortega, John E., Giménez-Lugo, Gustavo A., Coto-Solano, Rolando, and Kann, Katharina
Subjects: Computer Science - Computation and Language
Abstract: Large multilingual models have inspired a new class of word alignment methods, which work well for the model's pretraining languages. However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data. In this work, we ask: How do modern aligners perform on unseen languages, and are they better than traditional methods? We contribute gold-standard alignments for Bribri--Spanish, Guarani--Spanish, Quechua--Spanish, and Shipibo-Konibo--Spanish. With these, we evaluate state-of-the-art aligners with and without model adaptation to the target language. Finally, we also evaluate the resulting alignments extrinsically through two downstream tasks: named entity recognition and part-of-speech tagging. We find that although transformer-based methods generally outperform traditional models, the two classes of approach remain competitive with each other., Comment: EACL 2023
Published: 2023

4. Improved Long-Form Spoken Language Translation with Large Language Models

Author: McCarthy, Arya D., Zhang, Hao, Kumar, Shankar, Stahlberg, Felix, and Ng, Axel H.
Subjects: Computer Science - Computation and Language
Abstract: A challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we fine-tune a general-purpose, large language model to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We compare to several segmentation strategies and find that our approach improves BLEU score on three languages by an average of 2.7 BLEU overall compared to an automatic punctuation baseline. Further, we demonstrate the effectiveness of two constrained decoding strategies to improve well-formedness of the model output from above 99% to 100%.
Published: 2022

5. A Major Obstacle for NLP Research: Let's Talk about Time Allocation!

Author: Kann, Katharina, Dudy, Shiran, and McCarthy, Arya D.
Subjects: Computer Science - Computation and Language
Abstract: The field of natural language processing (NLP) has grown over the last few years: conferences have become larger, we have published an incredible amount of papers, and state-of-the-art research has been implemented in a large variety of customer-facing products. However, this paper argues that we have been less successful than we should have been and reflects on where and how the field fails to tap its full potential. Specifically, we demonstrate that, in recent years, subpar time allocation has been a major obstacle for NLP research. We outline multiple concrete problems together with their negative consequences and, importantly, suggest remedies to improve the status quo. We hope that this paper will be a starting point for discussions around which common practices are -- or are not -- beneficial for NLP research., Comment: To appear at EMNLP 2022
Published: 2022

6. UniMorph 4.0: Universal Morphology

Author: Batsuren, Khuyagbaatar, Goldman, Omer, Khalifa, Salam, Habash, Nizar, Kieraś, Witold, Bella, Gábor, Leonard, Brian, Nicolai, Garrett, Gorman, Kyle, Ate, Yustinus Ghanggo, Ryskina, Maria, Mielke, Sabrina J., Budianskaya, Elena, El-Khaissi, Charbel, Pimentel, Tiago, Gasser, Michael, Lane, William, Raj, Mohit, Coler, Matt, Samame, Jaime Rafael Montoya, Camaiteri, Delio Siticonatzi, Sagot, Benoît, Rojas, Esaú Zumaeta, Francis, Didier López, Oncevay, Arturo, Bautista, Juan López, Villegas, Gema Celeste Silva, Hennigen, Lucas Torroba, Ek, Adam, Guriel, David, Dirix, Peter, Bernardy, Jean-Philippe, Scherbakov, Andrey, Bayyr-ool, Aziyana, Anastasopoulos, Antonios, Zariquiey, Roberto, Sheifer, Karina, Ganieva, Sofya, Cruz, Hilaria, Karahóǧa, Ritván, Markantonatou, Stella, Pavlidis, George, Plugaryov, Matvey, Klyachko, Elena, Salehi, Ali, Angulo, Candy, Baxi, Jatayu, Krizhanovsky, Andrew, Krizhanovskaya, Natalia, Salesky, Elizabeth, Vania, Clara, Ivanova, Sardana, White, Jennifer, Maudslay, Rowan Hall, Valvoda, Josef, Zmigrod, Ran, Czarnowska, Paula, Nikkarinen, Irene, Salchak, Aelita, Bhatt, Brijesh, Straughn, Christopher, Liu, Zoey, Washington, Jonathan North, Pinter, Yuval, Ataman, Duygu, Wolinski, Marcin, Suhardijanto, Totok, Yablonskaya, Anna, Stoehr, Niklas, Dolatian, Hossep, Nuriah, Zahroh, Ratan, Shyam, Tyers, Francis M., Ponti, Edoardo M., Aiton, Grant, Arora, Aryaman, Hatcher, Richard J., Kumar, Ritesh, Young, Jeremiah, Rodionova, Daria, Yemelina, Anastasia, Andrushko, Taras, Marchenko, Igor, Mashkovtseva, Polina, Serova, Alexandra, Prud'hommeaux, Emily, Nepomniashchaya, Maria, Giunchiglia, Fausto, Chodroff, Eleanor, Hulden, Mans, Silfverberg, Miikka, McCarthy, Arya D., Yarowsky, David, Cotterell, Ryan, Tsarfaty, Reut, and Vylomova, Ekaterina
Subjects: Computer Science - Computation and Language
Abstract: The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet., Comment: LREC 2022; The first two authors made equal contributions
Published: 2022

7. Morphological Processing of Low-Resource Languages: Where We Are and What's Next

Author: Wiemerslage, Adam, Silfverberg, Miikka, Yang, Changbing, McCarthy, Arya D., Nicolai, Garrett, Colunga, Eliana, and Kann, Katharina
Subjects: Computer Science - Computation and Language
Abstract: Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages. Having long been multilingual, the field of computational morphology is increasingly moving towards approaches suitable for languages with minimal or no annotated resources. First, we survey recent developments in computational morphology with a focus on low-resource languages. Second, we argue that the field is ready to tackle the logical next challenge: understanding a language's morphology from raw text alone. We perform an empirical study on a truly unsupervised version of the paradigm completion task and show that, while existing state-of-the-art models bridged by two newly proposed models we devise perform reasonably, there is still much room for improvement. The stakes are high: solving this task will increase the language coverage of morphological resources by a number of magnitudes., Comment: Findings of ACL 2022
Published: 2022

8. Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?

Author: Lee, En-Shiun Annie, Thillainathan, Sarubi, Nayak, Shravan, Ranathunga, Surangika, Adelani, David Ifeoluwa, Su, Ruisi, and McCarthy, Arya D.
Subjects: Computer Science - Computation and Language
Abstract: What can pre-trained multilingual sequence-to-sequence models like mBART contribute to translating low-resource languages? We conduct a thorough empirical experiment in 10 languages to ascertain this, considering five factors: (1) the amount of fine-tuning data, (2) the noise in the fine-tuning data, (3) the amount of pre-training data in the model, (4) the impact of domain mismatch, and (5) language typology. In addition to yielding several heuristics, the experiments form a framework for evaluating the data sensitivities of machine translation systems. While mBART is robust to domain differences, its translations for unseen and typologically distant languages remain below 3.0 BLEU. In answer to our title's question, mBART is not a low-resource panacea; we therefore encourage shifting the emphasis from new models to new data., Comment: Accepted to Findings of ACL 2022
Published: 2022

9. Species richness and diversity of lepidoptera in an agricultural ecosystem of Bhabar region in district Nainital, Uttarakhand

Author: Rekha, Goswami, D., Arya, D., and Kaushal, B.R.
Published: 2023
Full Text: View/download PDF

10. Conclusions

Author: Dore, Giovanna Maria Dora, McCarthy, Arya D., Scharf, James A., Dore, Giovanna Maria Dora, McCarthy, Arya D., and Scharf, James A.
Published: 2023
Full Text: View/download PDF

11. Non-rhetorical Tactics

Author: Dore, Giovanna Maria Dora, McCarthy, Arya D., Scharf, James A., Dore, Giovanna Maria Dora, McCarthy, Arya D., and Scharf, James A.
Published: 2023
Full Text: View/download PDF

12. Introduction

Author: Dore, Giovanna Maria Dora, McCarthy, Arya D., Scharf, James A., Dore, Giovanna Maria Dora, McCarthy, Arya D., and Scharf, James A.
Published: 2023
Full Text: View/download PDF

13. Methodological Approach and Data

Author: Dore, Giovanna Maria Dora, McCarthy, Arya D., Scharf, James A., Dore, Giovanna Maria Dora, McCarthy, Arya D., and Scharf, James A.
Published: 2023
Full Text: View/download PDF

14. Rhetorical Tactics

Author: Dore, Giovanna Maria Dora, McCarthy, Arya D., Scharf, James A., Dore, Giovanna Maria Dora, McCarthy, Arya D., and Scharf, James A.
Published: 2023
Full Text: View/download PDF

15. AirWare: Utilizing Embedded Audio and Infrared Signals for In-Air Hand-Gesture Recognition

Author: Lohia, Nibhrat, Mundada, Raunak, McCarthy, Arya D., and Larson, Eric C.
Subjects: Computer Science - Human-Computer Interaction
Abstract: We introduce AirWare, an in-air hand-gesture recognition system that uses the already embedded speaker and microphone in most electronic devices, together with embedded infrared proximity sensors. Gestures identified by AirWare are performed in the air above a touchscreen or a mobile phone. AirWare utilizes convolutional neural networks to classify a large vocabulary of hand gestures using multi-modal audio Doppler signatures and infrared (IR) sensor information. As opposed to other systems which use high frequency Doppler radars or depth cameras to uniquely identify in-air gestures, AirWare does not require any external sensors. In our analysis, we use openly available APIs to interface with the Samsung Galaxy S5 audio and proximity sensors for data collection. We find that AirWare is not reliable enough for a deployable interaction system when trying to classify a gesture set of 21 gestures, with an average true positive rate of only 50.5% per gesture. To improve performance, we train AirWare to identify subsets of the 21 gestures vocabulary based on possible usage scenarios. We find that AirWare can identify three gesture sets with average true positive rate greater than 80% using 4--7 gestures per set, which comprises a vocabulary of 16 unique in-air gestures.
Published: 2021

16. The SIGMORPHON 2022 Shared Task on Cross-lingual and Low-Resource Grapheme-to-Phoneme Conversion.

Author: Arya D. McCarthy, Jackson L. Lee, Alexandra DeLucia, Travis Bartley, Milind Agarwal, Lucas F. E. Ashby, Luca Del Signore, Cameron Gibson, Reuben Raff, and Winston Wu
Published: 2023
Full Text: View/download PDF

17. Meeting the Needs of Low-Resource Languages: The Value of Automatic Alignments via Pretrained Models.

Author: Abteen Ebrahimi, Arya D. McCarthy, Arturo Oncevay, John E. Ortega, Luis Chiruzzo, Gustavo Giménez Lugo, Rolando Coto-Solano, and Katharina Kann
Published: 2023
Full Text: View/download PDF

18. Theory-Grounded Computational Text Analysis.

Author: Arya D. McCarthy and Giovanna Maria Dora Dore
Published: 2023
Full Text: View/download PDF

19. Unsupervised Morphological Paradigm Completion

Author: Jin, Huiming, Cai, Liwei, Peng, Yihui, Xia, Chen, McCarthy, Arya D., and Kann, Katharina
Subjects: Computer Science - Computation and Language
Abstract: We propose the task of unsupervised morphological paradigm completion. Given only raw text and a lemma list, the task consists of generating the morphological paradigms, i.e., all inflected forms, of the lemmas. From a natural language processing (NLP) perspective, this is a challenging unsupervised task, and high-performing systems have the potential to improve tools for low-resource languages or to assist linguistic annotators. From a cognitive science perspective, this can shed light on how children acquire morphological knowledge. We further introduce a system for the task, which generates morphological paradigms via the following steps: (i) EDIT TREE retrieval, (ii) additional lemma retrieval, (iii) paradigm size discovery, and (iv) inflection generation. We perform an evaluation on 14 typologically diverse languages. Our system outperforms trivial baselines with ease and, for some languages, even obtains a higher accuracy than minimally supervised systems., Comment: Accepted by ACL 2020
Published: 2020

20. Predicting Declension Class from Form and Meaning

Author: Williams, Adina, Pimentel, Tiago, McCarthy, Arya D., Blix, Hagen, Chodroff, Eleanor, and Cotterell, Ryan
Subjects: Computer Science - Computation and Language
Abstract: The noun lexica of many natural languages are divided into several declension classes with characteristic morphological properties. Class membership is far from deterministic, but the phonological form of a noun and/or its meaning can often provide imperfect clues. Here, we investigate the strength of those clues. More specifically, we operationalize this by measuring how much information, in bits, we can glean about declension class from knowing the form and/or meaning of nouns. We know that form and meaning are often also indicative of grammatical gender---which, as we quantitatively verify, can itself share information with declension class---so we also control for gender. We find for two Indo-European languages (Czech and German) that form and meaning respectively share significant amounts of information with class (and contribute additional information above and beyond gender). The three-way interaction between class, form, and meaning (given gender) is also significant. Our study is important for two reasons: First, we introduce a new method that provides additional quantitative support for a classic linguistic finding that form and meaning are relevant for the classification of nouns into declensions. Secondly, we show not only that individual declensions classes vary in the strength of their clues within a language, but also that these variations themselves vary across languages., Comment: 14 pages, 2 figures, the is the camera-ready version accepted at the 2020 Annual Conference of the Association for Computational Linguistics (ACL 2020)
Published: 2020

21. SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation

Author: McCarthy, Arya D., Puzon, Liezl, and Pino, Juan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: We propose autoencoding speaker conversion for training data augmentation in automatic speech translation. This technique directly transforms an audio sequence, resulting in audio synthesized to resemble another speaker's voice. Our method compares favorably to SpecAugment on English$\to$French and English$\to$Romanian automatic speech translation (AST) tasks as well as on a low-resource English automatic speech recognition (ASR) task. Further, in ablations, we show the benefits of both quantity and diversity in augmented data. Finally, we show that we can combine our approach with augmentation by machine-translated transcripts to obtain a competitive end-to-end AST model that outperforms a very strong cascade model on an English$\to$French AST task. Our method is sufficiently general that it can be applied to other speech generation and analysis tasks., Comment: Accepted to ICASSP 2020
Published: 2020

22. The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

Author: McCarthy, Arya D., Vylomova, Ekaterina, Wu, Shijie, Malaviya, Chaitanya, Wolf-Sonkin, Lawrence, Nicolai, Garrett, Kirov, Christo, Silfverberg, Miikka, Mielke, Sabrina J., Heinz, Jeffrey, Cotterell, Ryan, and Hulden, Mans
Subjects: Computer Science - Computation and Language
Abstract: The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages. The first task evolves past years' inflection tasks by examining transfer of morphological inflection knowledge from a high-resource language to a low-resource language. This year also presents a new second challenge on lemmatization and morphological feature analysis in context. All submissions featured a neural component and built on either this year's strong baselines or highly ranked systems from previous years' shared tasks. Every participating team improved in accuracy over the baselines for the inflection task (though not Levenshtein distance), and every team in the contextual analysis task improved on both state-of-the-art neural and non-neural baselines., Comment: Presented at SIGMORPHON 2019
Published: 2019
Full Text: View/download PDF

23. Modeling Color Terminology Across Thousands of Languages

Author: McCarthy, Arya D., Wu, Winston, Mueller, Aaron, Watson, Bill, and Yarowsky, David
Subjects: Computer Science - Computation and Language
Abstract: There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969). This paper employs a set of diverse measures on massively cross-linguistic data to operationalize and critique the Berlin and Kay color term hypotheses. Collectively, the 14 empirically-grounded computational linguistic metrics we design---as well as their aggregation---correlate strongly with both the Berlin and Kay basic/secondary color term partition (gamma=0.96) and their hypothesized universal acquisition sequence. The measures and result provide further empirical evidence from computational linguistics in support of their claims, as well as additional nuance: they suggest treating the partition as a spectrum instead of a dichotomy., Comment: Accepted for presentation at EMNLP-IJCNLP 2019
Published: 2019

24. Improved Variational Neural Machine Translation by Promoting Mutual Information

Author: McCarthy, Arya D., Li, Xian, Gu, Jiatao, and Dong, Ning
Subjects: Computer Science - Computation and Language
Abstract: Posterior collapse plagues VAEs for text, especially for conditional text generation with strong autoregressive decoders. In this work, we address this problem in variational neural machine translation by explicitly promoting mutual information between the latent variables and the data. Our model extends the conditional variational autoencoder (CVAE) with two new ingredients: first, we propose a modified evidence lower bound (ELBO) objective which explicitly promotes mutual information; second, we regularize the probabilities of the decoder by mixing an auxiliary factorized distribution which is directly predicted by the latent variables. We present empirical results on the Transformer architecture and show the proposed model effectively addressed posterior collapse: latent variables are no longer ignored in the presence of powerful decoder. As a result, the proposed model yields improved translation quality while demonstrating superior performance in terms of data efficiency and robustness.
Published: 2019

25. Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade

Author: Pino, Juan, Puzon, Liezl, Gu, Jiatao, Ma, Xutai, McCarthy, Arya D., and Gopinath, Deepak
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: For automatic speech translation (AST), end-to-end approaches are outperformed by cascaded models that transcribe with automatic speech recognition (ASR), then translate with machine translation (MT). A major cause of the performance gap is that, while existing AST corpora are small, massive datasets exist for both the ASR and MT subsystems. In this work, we evaluate several data augmentation and pretraining approaches for AST, by comparing all on the same datasets. Simple data augmentation by translating ASR transcripts proves most effective on the English--French augmented LibriSpeech dataset, closing the performance gap from 8.2 to 1.4 BLEU, compared to a very strong cascade that could directly utilize copious ASR and MT data. The same end-to-end approach plus fine-tuning closes the gap on the English--Romanian MuST-C dataset from 6.7 to 3.7 BLEU. In addition to these results, we present practical recommendations for augmentation and pretraining approaches. Finally, we decrease the performance gap to 0.01 BLEU using a Transformer-based architecture., Comment: IWSLT 2019
Published: 2019

26. Meaning to Form: Measuring Systematicity as Information

Author: Pimentel, Tiago, McCarthy, Arya D., Blasi, Damián E., Roark, Brian, and Cotterell, Ryan
Subjects: Computer Science - Computation and Language
Abstract: A longstanding debate in semiotics centers on the relationship between linguistic signs and their corresponding semantics: is there an arbitrary relationship between a word form and its meaning, or does some systematic phenomenon pervade? For instance, does the character bigram \textit{gl} have any systematic relationship to the meaning of words like \textit{glisten}, \textit{gleam} and \textit{glow}? In this work, we offer a holistic quantification of the systematicity of the sign using mutual information and recurrent neural networks. We employ these in a data-driven and massively multilingual approach to the question, examining 106 languages. We find a statistically significant reduction in entropy when modeling a word form conditioned on its semantic representation. Encouragingly, we also recover well-attested English examples of systematic affixes. We conclude with the meta-point: Our approximate effect size (measured in bits) is quite small---despite some amount of systematicity between form and meaning, an arbitrary relationship and its resulting benefits dominate human language., Comment: Accepted for publication at ACL 2019
Published: 2019

27. An Exact No Free Lunch Theorem for Community Detection

Author: McCarthy, Arya D., Chen, Tongfei, and Ebner, Seth
Subjects: Computer Science - Social and Information Networks, Computer Science - Discrete Mathematics
Abstract: A precondition for a No Free Lunch theorem is evaluation with a loss function which does not assume a priori superiority of some outputs over others. A previous result for community detection by Peel et al. (2017) relies on a mismatch between the loss function and the problem domain. The loss function computes an expectation over only a subset of the universe of possible outputs; thus, it is only asymptotically appropriate with respect to the problem size. By using the correct random model for the problem domain, we provide a stronger, exact No Free Lunch theorem for community detection. The claim generalizes to other set-partitioning tasks including core/periphery separation, $k$-clustering, and graph partitioning. Finally, we review the literature of proposed evaluation functions and identify functions which (perhaps with slight modifications) are compatible with an exact No Free Lunch theorem.
Published: 2019
Full Text: View/download PDF

28. Metrics matter in community detection

Author: McCarthy, Arya D., Chen, Tongfei, Rudinger, Rachel, and Matula, David W.
Subjects: Computer Science - Social and Information Networks, Physics - Physics and Society
Abstract: We present a critical evaluation of normalized mutual information (NMI) as an evaluation metric for community detection. NMI exaggerates the leximin method's performance on weak communities: Does leximin, in finding the trivial singletons clustering, truly outperform eight other community detection methods? Three NMI improvements from the literature are AMI, rrNMI, and cNMI. We show equivalences under relevant random models, and for evaluating community detection, we advise one-sided AMI under the $\mathbb{M}_{\mathrm{all}}$ model (all partitions of $n$ nodes). This work seeks (1) to start a conversation on robust measurements, and (2) to advocate evaluations which do not give "free lunch".
Published: 2019
Full Text: View/download PDF

29. Methodological Approach and Data

Author: Dore, Giovanna Maria Dora, primary, McCarthy, Arya D., additional, and Scharf, James A., additional
Published: 2023
Full Text: View/download PDF

30. Conclusions

Author: Dore, Giovanna Maria Dora, primary, McCarthy, Arya D., additional, and Scharf, James A., additional
Published: 2023
Full Text: View/download PDF

31. A Free Press, If You Can Keep It

Author: Dore, Giovanna Maria Dora, primary, McCarthy, Arya D., additional, and Scharf, James A., additional
Published: 2023
Full Text: View/download PDF

32. Rhetorical Tactics

Author: Dore, Giovanna Maria Dora, primary, McCarthy, Arya D., additional, and Scharf, James A., additional
Published: 2023
Full Text: View/download PDF

33. Non-rhetorical Tactics

Author: Dore, Giovanna Maria Dora, primary, McCarthy, Arya D., additional, and Scharf, James A., additional
Published: 2023
Full Text: View/download PDF

34. Introduction

Author: Dore, Giovanna Maria Dora, primary, McCarthy, Arya D., additional, and Scharf, James A., additional
Published: 2023
Full Text: View/download PDF

35. A Major Obstacle for NLP Research: Let's Talk about Time Allocation!

Author: Katharina Kann, Shiran Dudy, and Arya D. McCarthy
Published: 2022
Full Text: View/download PDF

36. Deciphering and Characterizing Out-of-Vocabulary Words for Morphologically Rich Languages.

Author: Georgie Botev, Arya D. McCarthy, Winston Wu, and David Yarowsky
Published: 2022

37. UniMorph 4.0: Universal Morphology.

Author: Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieras, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Abbott Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóga, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer C. White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, and Ekaterina Vylomova
Published: 2022

38. Hong Kong: Longitudinal and Synchronic Characterisations of Protest News between 1998 and 2020.

Author: Arya D. McCarthy and Giovanna Maria Dora Dore
Published: 2022

39. UniMorph 2.0: Universal Morphology

Author: Kirov, Christo, Cotterell, Ryan, Sylak-Glassman, John, Walther, Géraldine, Vylomova, Ekaterina, Xia, Patrick, Faruqui, Manaal, Mielke, Sabrina J., McCarthy, Arya D., Kübler, Sandra, Yarowsky, David, Eisner, Jason, and Hulden, Mans
Subjects: Computer Science - Computation and Language
Abstract: The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages. The project releases annotated morphological data using a universal tagset, the UniMorph schema. Each inflected form is associated with a lemma, which typically carries its underlying lexical meaning, and a bundle of morphological features from our schema. Additional supporting data and tools are also released on a per-language basis when available. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland and is sponsored by the DARPA LORELEI program. This paper details advances made to the collection, annotation, and dissemination of project resources since the initial UniMorph release described at LREC 2016. lexical resources} }, Comment: LREC 2018
Published: 2018

40. The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

Author: Cotterell, Ryan, Kirov, Christo, Sylak-Glassman, John, Walther, Géraldine, Vylomova, Ekaterina, McCarthy, Arya D., Kann, Katharina, Mielke, Sabrina J., Nicolai, Garrett, Silfverberg, Miikka, Yarowsky, David, Eisner, Jason, and Hulden, Mans
Subjects: Computer Science - Computation and Language
Abstract: The CoNLL--SIGMORPHON 2018 shared task on supervised learning of morphological generation featured data sets from 103 typologically diverse languages. Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task. This second task featured seven languages. Task 1 received 27 submissions and task 2 received 6 submissions. Both tasks featured a low, medium, and high data condition. Nearly all submissions featured a neural component and built on highly-ranked systems from the earlier 2017 shared task. In the inflection task (task 1), 41 of the 52 languages present in last year's inflection task showed improvement by the best systems in the low-resource setting. The cloze task (task 2) proved to be difficult, and few submissions managed to consistently improve upon both a simple neural baseline system and a lemma-repeating baseline., Comment: CoNLL 2018. arXiv admin note: text overlap with arXiv:1706.09031
Published: 2018

41. Marrying Universal Dependencies and Universal Morphology

Author: McCarthy, Arya D., Silfverberg, Miikka, Cotterell, Ryan, Hulden, Mans, and Yarowsky, David
Subjects: Computer Science - Computation and Language
Abstract: The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. With compatibility of tags, each project's annotations could be used to validate the other's. Additionally, the availability of both type- and token-level resources would be a boon to tasks such as parsing and homograph disambiguation. To ease this interoperability, we present a deterministic mapping from Universal Dependencies v2 features into the UniMorph schema. We validate our approach by lookup in the UniMorph corpora and find a macro-average of 64.13% recall. We also note incompatibilities due to paucity of data on either side. Finally, we present a critical evaluation of the foundations, strengths, and weaknesses of the two annotation projects., Comment: UDW18
Published: 2018
Full Text: View/download PDF

42. Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

Author: Thompson, Brian, Khayrallah, Huda, Anastasopoulos, Antonios, McCarthy, Arya D., Duh, Kevin, Marvin, Rebecca, McNamee, Paul, Gwinnup, Jeremy, Anderson, Tim, and Koehn, Philipp
Subjects: Computer Science - Computation and Language
Abstract: To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation. We find that freezing any single component during continued training has minimal impact on performance, and that performance is surprisingly good when a single component is adapted while holding the rest of the model fixed. We also find that continued training does not move the model very far from the out-of-domain model, compared to a sensitivity analysis metric, suggesting that the out-of-domain model can provide a good generic initialization for the new domain., Comment: presented at WMT 2018. Please cite using the bib entry from here: http://www.statmt.org/wmt18/bib/WMT013.bib
Published: 2018
Full Text: View/download PDF

43. Jump-Starting Item Parameters for Adaptive Language Tests.

Author: Arya D. McCarthy, Kevin P. Yancey, Geoffrey T. LaFlair, Jesse Egbert, Manqian Liao, and Burr Settles
Published: 2021
Full Text: View/download PDF

44. Exergy analysis on micro-turbine combined cycle.

Author: Wahyuni, Fitri, Yunanto, Arya D., Febriansyah, Daffa, Alfaraby, M. Wiweko, Diraharja, Rifat S., Ragaskha, Rizky V., Yulia, Fayza, Rizal, Reda, and Julian, James
Subjects: *HEAT exchangers, *GAS turbines, *PRESSURE drop (Fluid dynamics), *ENTHALPY, *ENTROPY, *EXERGY
Abstract: A micro turbine are tiny gas turbines that can generate both electricity and heat. It has small capacities that vary from 25 kW to 250 kW. This article aims to determine the value of enthalpy, entropy, energy and exergy in each state, which is then used to calculate percentage of efficiency and exergy destruction on each component. The research result shows that, the largest percentage of efficiency is in the startup component burner and the smallest efficiency is found in the steam reformer. Meanwhile, in exergy destruction, the steam reformer is a component that has exergy the largest destruction, and the startup burner has the smallest exergy destruction. The results of the analysis process show that the optimal mass flow rate is one of the keys to designing a heat exchanger. Where, pressure drop, and convection coefficient increase with increasing flow rate mass. However, the optimal mass flow rate is also highly dependent on heat exchanger dimensions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Physiology of Ethanol Production by Clostridium thermocellum

Author: Arya, D. B., primary, Vincent, Salom Gnana Thanga, additional, and Balagurusamy, Nagamani, additional
Published: 2022
Full Text: View/download PDF

46. ERDOSTEINE REINSTATE MYOCARDIAL NECROSIS: INNOVATORY EFFECT THROUGH MODULATION OF MAPK AND NRF 2/HO 1 PATHWAY

Author: Kumaraguruparan, Nivetha, primary, Verma, Vipin Kumar, additional, Bhatia, Jagriti, additional, and Arya, D S, additional
Published: 2024
Full Text: View/download PDF

47. MORIN, A BIOFLAVONOID, ATTENUATES MYOCARDIAL ISCHEMIA-REPERFUSION INJURY BY ATTENUATING RISK/SAPK PATHWAY

Author: Bhatia, Jagriti, primary, Verma, Vipin Kumar, additional, and Arya, D S, additional
Published: 2024
Full Text: View/download PDF

48. DECIPHERING THE NOVEL ROLE OF ERDOSTEINE IN PREVENTION OF MYOCARDIAL ISCHEMIA-REPERFUSION INJURY IN RAT

Author: Bhardwaj, Priya, primary, Verma, Vipin Kumar, additional, Mutneja, Ekta, additional, Prajapati, Vaishali, additional, Bhatia, Jagriti, additional, and Arya, D S, additional
Published: 2024
Full Text: View/download PDF

49. THERAPEUTIC POTENTIAL OF ABATACEPT IN RAT MODEL OF CARDIAC HYPERTROPHY VIA LIDDING CD80 AND CD86: A PROFOUND EXPLORATION OF SIGNALING PATHWAYS

Author: Parjapati, Vaishali, primary, Verma, Vipin Kumar, additional, Bhatia, Jagriti, additional, and Arya, D. S., additional
Published: 2024
Full Text: View/download PDF

50. MOLECULAR UNDERSTANDING TOWARDS ANTI-INFLAMMATORY ROLE OF MORIN IN ISOPROTERENOL INDUCED MYOCARDIAL NECROSIS IN MURINE MODEL

Author: Dinesh, Drishya, primary, Verma, Vipin Kumar, additional, Bhatia, Jagriti, additional, and Arya, D S, additional
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

582 results on '"Arya D"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources