Author: "Gandhe, P." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Gandhe, P."' showing total 145 results

Start Over Author "Gandhe, P."

145 results on '"Gandhe, P."'

1. Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback

Author: Lin, Guan-Ting, Shivakumar, Prashanth Gurunath, Gourav, Aditya, Gu, Yile, Gandhe, Ankur, Lee, Hung-yi, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: While textless Spoken Language Models (SLMs) have shown potential in end-to-end speech-to-speech modeling, they still lag behind text-based Large Language Models (LLMs) in terms of semantic coherence and relevance. This work introduces the Align-SLM framework, which leverages preference optimization inspired by Reinforcement Learning with AI Feedback (RLAIF) to enhance the semantic understanding of SLMs. Our approach generates multiple speech continuations from a given prompt and uses semantic metrics to create preference data for Direct Preference Optimization (DPO). We evaluate the framework using ZeroSpeech 2021 benchmarks for lexical and syntactic modeling, the spoken version of the StoryCloze dataset for semantic coherence, and other speech generation metrics, including the GPT4-o score and human evaluation. Experimental results show that our method achieves state-of-the-art performance for SLMs on most benchmarks, highlighting the importance of preference optimization to improve the semantics of SLMs.
Published: 2024

2. Speech Recognition Rescoring with Large Speech-Text Foundation Models

Author: Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gourav, Aditya, Gu, Yi, Gandhe, Ankur, Rastrow, Ariya, and Bulyko, Ivan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Sound
Abstract: Large language models (LLM) have demonstrated the ability to understand human language by leveraging large amount of text data. Automatic speech recognition (ASR) systems are often limited by available transcribed speech data and benefit from a second pass rescoring using LLM. Recently multi-modal large language models, particularly speech and text foundational models have demonstrated strong spoken language understanding. Speech-Text foundational models leverage large amounts of unlabelled and labelled data both in speech and text modalities to model human language. In this work, we propose novel techniques to use multi-modal LLM for ASR rescoring. We also explore discriminative training to further improve the foundational model rescoring performance. We demonstrate cross-modal knowledge transfer in speech-text LLM can benefit rescoring. Our experiments demonstrate up-to 20% relative improvements over Whisper large ASR and up-to 15% relative improvements over text-only LLM.
Published: 2024

3. Multi-Modal Retrieval For Large Language Model Based Speech Recognition

Author: Kolehmainen, Jari, Gourav, Aditya, Shivakumar, Prashanth Gurunath, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Strimel, Grant, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Information Retrieval, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multi-modal large language models, it is important to extend the pure text based methods to incorporate other modalities in retrieval as well for applications across the wide spectrum of machine learning tasks and data types. In this work, we propose multi-modal retrieval with two approaches: kNN-LM and cross-attention techniques. We demonstrate the effectiveness of our retrieval approaches empirically by applying them to automatic speech recognition tasks with access to external information. Under this setting, we show that speech-based multi-modal retrieval outperforms text based retrieval, and yields up to 50 % improvement in word error rate over the multi-modal language model baseline. Furthermore, we achieve state-of-the-art recognition results on the Spoken-Squad question answering dataset.
Published: 2024

4. Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

Author: Yu, Yu, Yang, Chao-Han Huck, Dinh, Tuan, Ryu, Sungho, Kolehmainen, Jari, Ren, Roger, Filimonov, Denis, Shivakumar, Prashanth G., Gandhe, Ankur, Rastow, Ariya, Xu, Jia, Bulyko, Ivan, and Stolcke, Andreas
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions of 3.50\% on the public Librispeech dataset and of 3.67\% on an internal dataset in the messaging domain. To further characterize the stability of LoRA-based second-pass speech recognition models, we examine robustness against input perturbations. These perturbations are rooted in homophone replacements and a novel metric called N-best Perturbation-based Rescoring Robustness (NPRR), both designed to measure the relative degradation in the performance of rescoring models. Our experimental results indicate that while advanced variants of LoRA, such as dynamic rank-allocated LoRA, lead to performance degradation in $1$-best perturbation, they alleviate the degradation in $N$-best perturbation. This finding is in comparison to fully-tuned models and vanilla LoRA tuning baselines, suggesting that a comprehensive selection is needed when using LoRA-based adaptation for compute-cost savings and robust language modeling.
Published: 2024

5. Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

Author: Everson, Kevin, Gu, Yile, Yang, Huck, Shivakumar, Prashanth Gurunath, Lin, Guan-Ting, Kolehmainen, Jari, Bulyko, Ivan, Gandhe, Ankur, Ghosh, Shalini, Hamza, Wael, Lee, Hung-yi, Rastrow, Ariya, and Stolcke, Andreas
Subjects: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text. In real-world scenarios, prior to input into an LLM, an automated speech recognition (ASR) system generates an output transcript hypothesis, where inherent errors can degrade subsequent SLU tasks. Here we introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis, aiming to encapsulate speech ambiguities and enhance SLU outcomes. Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts with the help of word confusion networks from lattices, bridging the SLU performance gap between using the top ASR hypothesis and an oracle upper bound. Additionally, we delve into the LLM's robustness to varying ASR performance conditions and scrutinize the aspects of in-context learning which prove the most influential., Comment: Accepted to ICASSP 2024
Published: 2024

6. Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

Author: Lin, Guan-Ting, Shivakumar, Prashanth Gurunath, Gandhe, Ankur, Yang, Chao-Han Huck, Gu, Yile, Ghosh, Shalini, Stolcke, Andreas, Lee, Hung-yi, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, which are essential for achieving natural, human-like spoken conversation, especially when such information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT), an LLM that utilizes text and speech modalities to better model the linguistic content and paralinguistic attributes of spoken dialogue. The model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking multimodal framework. Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning. We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset. Experimental results indicate the proposed serialized multitasking method outperforms typical sequence classification techniques on current and response sentiment classification. Furthermore, leveraging conversational context and speech embeddings significantly improves both response text generation and sentiment prediction. Our proposed framework achieves relative improvements of 6.7%, 12.0%, and 3.5% in current sentiment accuracy, response sentiment accuracy, and response text BLEU score, respectively., Comment: Accepted by ICASSP 2024. Camera-ready version
Published: 2023

7. Discriminative Speech Recognition Rescoring with Pre-trained Language Models

Author: Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, and Bulyko, Ivan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Second pass rescoring is a critical component of competitive automatic speech recognition (ASR) systems. Large language models have demonstrated their ability in using pre-trained information for better rescoring of ASR hypothesis. Discriminative training, directly optimizing the minimum word-error-rate (MWER) criterion typically improves rescoring. In this study, we propose and explore several discriminative fine-tuning schemes for pre-trained LMs. We propose two architectures based on different pooling strategies of output embeddings and compare with probability based MWER. We conduct detailed comparisons between pre-trained causal and bidirectional LMs in discriminative settings. Experiments on LibriSpeech demonstrate that all MWER training schemes are beneficial, giving additional gains upto 8.5\% WER. Proposed pooling variants achieve lower latency while retaining most improvements. Finally, our study concludes that bidirectionality is better utilized with discriminative training., Comment: ASRU 2023
Published: 2023

8. Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Author: Yu, Yu, Yang, Chao-Han Huck, Kolehmainen, Jari, Shivakumar, Prashanth G., Gu, Yile, Ryu, Sungho, Ren, Roger, Luo, Qi, Gourav, Aditya, Chen, I-Fan, Liu, Yi-Chieh, Dinh, Tuan, Gandhe, Ankur, Filimonov, Denis, Ghosh, Shalini, Stolcke, Andreas, Rastow, Ariya, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6., Comment: Accepted to IEEE ASRU 2023. Internal Review Approved. Revised 2nd version with Andreas and Huck. The first version is in Sep 29th. 8 pages
Published: 2023
Full Text: View/download PDF

9. Personalization for BERT-based Discriminative Speech Recognition Rescoring

Author: Kolehmainen, Jari, Gu, Yile, Gourav, Aditya, Shivakumar, Prashanth Gurunath, Gandhe, Ankur, Rastrow, Ariya, and Bulyko, Ivan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language
Abstract: Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention based encoder-decoder model. We use internal de-identified en-US data from interactions with a virtual voice assistant supplemented with personalized named entities to compare these approaches. On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline. We also show that on this test set, natural language prompts can improve word error rate by 7% without any training and with a marginal loss in generalization. Overall, gazetteers were found to perform the best with a 10% improvement in word error rate (WER), while also improving WER on a general test set by 1%.
Published: 2023

10. Scaling Laws for Discriminative Speech Recognition Rescoring Models

Author: Gu, Yile, Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gandhe, Ankur, Rastrow, Ariya, and Bulyko, Ivan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent studies have found that model performance has a smooth power-law relationship, or scaling laws, with training data and model size, for a wide range of problems. These scaling laws allow one to choose nearly optimal data and model sizes. We study whether this scaling property is also applicable to second-pass rescoring, which is an important component of speech recognition systems. We focus on RescoreBERT as the rescoring model, which uses a pre-trained Transformer-based architecture fined tuned with an ASR discriminative loss. Using such a rescoring model, we show that the word error rate (WER) follows a scaling law for over two orders of magnitude as training data and model size increase. In addition, it is found that a pre-trained model would require less data than a randomly initialized model of the same size, representing effective data transferred from pre-training step. This effective data transferred is found to also follow a scaling law with the data and model size.
Published: 2023

11. Distillation Strategies for Discriminative Speech Recognition Rescoring

Author: Shivakumar, Prashanth Gurunath, Kolehmainen, Jari, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, and Bulyko, Ivan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Second-pass rescoring is employed in most state-of-the-art speech recognition systems. Recently, BERT based models have gained popularity for re-ranking the n-best hypothesis by exploiting the knowledge from masked language model pre-training. Further, fine-tuning with discriminative loss such as minimum word error rate (MWER) has shown to perform better than likelihood-based loss. Streaming applications with low latency requirements impose significant constraints on the size of the models, thereby limiting the word error rate (WER) performance gains. In this paper, we propose effective strategies for distilling from large models discriminatively trained with the MWER objective. We experiment on Librispeech and production scale internal dataset for voice-assistant. Our results demonstrate relative improvements of upto 7% WER over student models trained with MWER. We also show that the proposed distillation can reduce the WER gap between the student and the teacher by 62% upto 100%., Comment: Accepted at INTERSPEECH 2023
Published: 2023

12. Streaming Speech-to-Confusion Network Speech Recognition

Author: Filimonov, Denis, Pandey, Prabhat, Rastrow, Ariya, Gandhe, Ankur, and Stolcke, Andreas
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language
Abstract: In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR. In this paper, we present a novel streaming ASR architecture that outputs a confusion network while maintaining limited latency, as needed for interactive applications. We show that 1-best results of our model are on par with a comparable RNN-T system, while the richer hypothesis set allows second-pass rescoring to achieve 10-20\% lower word error rate on the LibriSpeech task. We also show that our model outperforms a strong RNN-T baseline on a far-field voice assistant task., Comment: Submitted to Interspeech 2023
Published: 2023
Full Text: View/download PDF

13. Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition

Author: Fu, Xuandi, Sathyendra, Kanthashree Mysore, Gandhe, Ankur, Liu, Jing, Strimel, Grant P., McGowan, Ross, and Mouchtaris, Athanasios
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Attention-based contextual biasing approaches have shown significant improvements in the recognition of generic and/or personal rare-words in End-to-End Automatic Speech Recognition (E2E ASR) systems like neural transducers. These approaches employ cross-attention to bias the model towards specific contextual entities injected as bias-phrases to the model. Prior approaches typically relied on subword encoders for encoding the bias phrases. However, subword tokenizations are coarse and fail to capture granular pronunciation information which is crucial for biasing based on acoustic similarity. In this work, we propose to use lightweight character representations to encode fine-grained pronunciation features to improve contextual biasing guided by acoustic similarity between the audio and the contextual entities (termed acoustic biasing). We further integrate pretrained neural language model (NLM) based encoders to encode the utterance's semantic context along with contextual entities to perform biasing informed by the utterance's semantic context (termed semantic biasing). Experiments using a Conformer Transducer model on the Librispeech dataset show a 4.62% - 9.26% relative WER improvement on different biasing list sizes over the baseline contextual model when incorporating our proposed acoustic and semantic biasing approach. On a large-scale in-house dataset, we observe 7.91% relative WER improvement compared to our baseline model. On tail utterances, the improvements are even more pronounced with 36.80% and 23.40% relative WER improvements on Librispeech rare words and an in-house testset respectively., Comment: Accepted at ICASSP 2023
Published: 2023

14. PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

Author: Pandey, Rahul, Ren, Roger, Luo, Qi, Liu, Jing, Rastrow, Ariya, Gandhe, Ankur, Filimonov, Denis, Strimel, Grant, Stolcke, Andreas, and Bulyko, Ivan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
Abstract: End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places. Rare words often have non-trivial pronunciations, and in such cases, human knowledge in the form of a pronunciation lexicon can be useful. We propose a PROnunCiation-aware conTextual adaptER (PROCTER) that dynamically injects lexicon knowledge into an RNN-T model by adding a phonemic embedding along with a textual embedding. The experimental results show that the proposed PROCTER architecture outperforms the baseline RNN-T model by improving the word error rate (WER) by 44% and 57% when measured on personalized entities and personalized rare entities, respectively, while increasing the model size (number of trainable parameters) by only 1%. Furthermore, when evaluated in a zero-shot setting to recognize personalized device names, we observe 7% WER improvement with PROCTER, as compared to only 1% WER improvement with text-only contextual attention, Comment: To appear in Proc. IEEE ICASSP
Published: 2023
Full Text: View/download PDF

15. On-the-fly Text Retrieval for End-to-End ASR Adaptation

Author: Yusuf, Bolaji, Gourav, Aditya, Gandhe, Ankur, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: End-to-end speech recognition models are improved by incorporating external text sources, typically by fusion with an external language model. Such language models have to be retrained whenever the corpus of interest changes. Furthermore, since they store the entire corpus in their parameters, rare words can be challenging to recall. In this work, we propose augmenting a transducer-based ASR model with a retrieval language model, which directly retrieves from an external text corpus plausible completions for a partial ASR hypothesis. These completions are then integrated into subsequent predictions by an adapter, which is trained once, so that the corpus of interest can be switched without incurring the computational overhead of retraining. Our experiments show that the proposed model significantly improves the performance of a transducer baseline on a pair of question-answering datasets. Further, it outperforms shallow fusion on recognition of named entities by about 7 relative; when the two are combined, the relative improvement increases to 13%., Comment: Accepted to ICASSP 2023; Appendix added to include ablations that could not fit into the conference 4-page limit
Published: 2023

16. HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace

Author: He, Yunzhong, Zhang, Cong, Kong, Ruoyan, Kulkarni, Chaitanya, Liu, Qing, Gandhe, Ashish, Nithianandan, Amit, and Prakash, Arul
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Query categorization at customer-to-customer e-commerce platforms like Facebook Marketplace is challenging due to the vagueness of search intent, noise in real-world data, and imbalanced training data across languages. Its deployment also needs to consider challenges in scalability and downstream integration in order to translate modeling advances into better search result relevance. In this paper we present HierCat, the query categorization system at Facebook Marketplace. HierCat addresses these challenges by leveraging multi-task pre-training of dual-encoder architectures with a hierarchical inference step to effectively learn from weakly supervised training data mined from searcher engagement. We show that HierCat not only outperforms popular methods in offline experiments, but also leads to 1.4% improvement in NDCG and 4.3% increase in searcher engagement at Facebook Marketplace Search in online A/B testing., Comment: Accepted by WWW'2023
Published: 2023
Full Text: View/download PDF

17. Rational design of FXR agonists: a computational approach for NASH therapy

Author: Gandhe, Akshata, Kumari, Sonia, and Elizabeth Sobhia, Masilamani
Published: 2023
Full Text: View/download PDF

18. USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder

Author: Yusuf, Bolaji, Gandhe, Ankur, and Sokolov, Alex
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Improving end-to-end speech recognition by incorporating external text data has been a longstanding research topic. There has been a recent focus on training E2E ASR models that get the performance benefits of external text data without incurring the extra cost of evaluating an external language model at inference time. In this work, we propose training ASR model jointly with a set of text-to-text auxiliary tasks with which it shares a decoder and parts of the encoder. When we jointly train ASR and masked language model with the 960-hour Librispeech and Opensubtitles data respectively, we observe WER reductions of 16% and 20% on test-other and test-clean respectively over an ASR-only baseline without any extra cost at inference time, and reductions of 6% and 8% compared to a stronger MUTE-L baseline which trains the decoder with the same text data as our model. We achieve further improvements when we train masked language model on Librispeech data or when we use machine translation as the auxiliary task, without significantly sacrificing performance on the task itself., Comment: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022)
Published: 2022

19. RescoreBERT: Discriminative Speech Recognition Rescoring with BERT

Author: Xu, Liyan, Gu, Yile, Kolehmainen, Jari, Khan, Haidar, Gandhe, Ankur, Rastrow, Ariya, Stolcke, Andreas, and Bulyko, Ivan
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound
Abstract: Second-pass rescoring is an important component in automatic speech recognition (ASR) systems that is used to improve the outputs from a first-pass decoder by implementing a lattice rescoring or $n$-best re-ranking. While pretraining with a masked language model (MLM) objective has received great success in various natural language understanding (NLU) tasks, it has not gained traction as a rescoring model for ASR. Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored. Here we show how to train a BERT-based rescoring model with MWER loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR. Specifically, we propose a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss. This approach, which we call RescoreBERT, reduces WER by 6.6%/3.4% relative on the LibriSpeech clean/other test sets over a BERT baseline without discriminative objective. We also evaluate our method on an internal dataset from a conversational agent and find that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model., Comment: Accepted to ICASSP 2022
Published: 2022
Full Text: View/download PDF

20. A Likelihood Ratio based Domain Adaptation Method for E2E Models

Author: Choudhury, Chhavi, Gandhe, Ankur, Ding, Xiaohan, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: End-to-end (E2E) automatic speech recognition models like Recurrent Neural Networks Transducer (RNN-T) are becoming a popular choice for streaming ASR applications like voice assistants. While E2E models are very effective at learning representation of the training data they are trained on, their accuracy on unseen domains remains a challenging problem. Additionally, these models require paired audio and text training data, are computationally expensive and are difficult to adapt towards the fast evolving nature of conversational speech. In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities. We show that this method is effective in improving rare words recognition, and results in a relative improvement of 10% in 1-best word error rate (WER) and 10% in n-best Oracle WER (n=8) on multiple out-of-domain datasets without any degradation on a general dataset. We also show that complementing the contextual biasing adaptation with adaptation of a second-pass rescoring model gives additive WER improvements., Comment: Submitted to ICASSP 2022
Published: 2022

21. Exploring the Aggressiveness of Sarcomatoid Carcinoma of the Oral Cavity – an Institutional Experience

Author: Hussain, Mohsina, Gandhe, Sucheta, Menak, Dhruti, Pawar, Yogesh, Dhondge, Rajendra, Shaikh, Ahmer Arif, Roy, Sirshendu, and Nagarkar, Raj
Published: 2023
Full Text: View/download PDF

22. Depth of Invasion, Lymphovascular Invasion, and Perineural Invasion as Predictors of Neck Node Metastasis in Early Oral Cavity Cancers

Author: Pandit, Prakash, Patil, Roshankumar, Palwe, Vijay, Gandhe, Sucheta, Manek, Dhruti, Patil, Rahul, Roy, Sirshendu, Yasam, Venkata Ramesh, Nagarkar, Viren Raj, and Nagarkar, Raj
Published: 2023
Full Text: View/download PDF

23. Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems

Author: Dingliwal, Saket, Shenoy, Ashish, Bodapati, Sravan, Gandhe, Ankur, Gadde, Ravi Teja, and Kirchhoff, Katrin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains creating a need to adapt to new domains with small memory and deployment overhead. In this work, we introduce domain-prompts, a methodology that involves training a small number of domain embedding parameters to prime a Transformer-based Language Model (LM) to a particular domain. Using this domain-adapted LM for rescoring ASR hypotheses can achieve 7-13% WER reduction for a new domain with just 1000 unlabeled textual domain-specific sentences. This improvement is comparable or even better than fully fine-tuned models even though just 0.02% of the parameters of the base LM are updated. Additionally, our method is deployment-friendly as the learnt domain embeddings are prefixed to the input to the model rather than changing the base model architecture. Therefore, our method is an ideal choice for on-the-fly adaptation of LMs used in ASR systems to progressively scale it to new domains., Comment: Accepted at InterSpeech 2022
Published: 2021

24. Lattention: Lattice-attention in ASR rescoring

Author: Pandey, Prabhat, Torres, Sergio Duarte, Bayer, Ali Orkan, Gandhe, Ankur, and Leutnant, Volker
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, I.2.7
Abstract: Lattices form a compact representation of multiple hypotheses generated from an automatic speech recognition system and have been shown to improve performance of downstream tasks like spoken language understanding and speech translation, compared to using one-best hypothesis. In this work, we look into the effectiveness of lattice cues for rescoring n-best lists in second-pass. We encode lattices with a recurrent network and train an attention encoder-decoder model for n-best rescoring. The rescoring model with attention to lattices achieves 4-5% relative word error rate reduction over first-pass and 6-8% with attention to both lattices and acoustic features. We show that rescoring models with attention to lattices outperform models with attention to n-best hypotheses. We also study different ways to incorporate lattice weights in the lattice encoder and demonstrate their importance for n-best rescoring., Comment: Submitted to ICASSP 2022
Published: 2021

25. Prompt-tuning in ASR systems for efficient domain-adaptation

Author: Dingliwal, Saket, Shenoy, Ashish, Bodapati, Sravan, Gandhe, Ankur, Gadde, Ravi Teja, and Kirchhoff, Katrin
Subjects: Computer Science - Computation and Language
Abstract: Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains. Since domain-specific systems perform better than their generic counterparts on in-domain evaluation, the need for memory and compute-efficient domain adaptation is obvious. Particularly, adapting parameter-heavy transformer-based language models used for rescoring ASR hypothesis is challenging. In this work, we overcome the problem using prompt-tuning, a methodology that trains a small number of domain token embedding parameters to prime a transformer-based LM to a particular domain. With just a handful of extra parameters per domain, we achieve much better perplexity scores over the baseline of using an unadapted LM. Despite being parameter-efficient, these improvements are comparable to those of fully-fine-tuned models with hundreds of millions of parameters. We replicate our findings in perplexity numbers to Word Error Rate in a domain-specific ASR system for one such domain., Comment: WeCNLP 2021 camera-ready
Published: 2021

26. ShopTalk: A System for Conversational Faceted Search

Author: Manku, Gurmeet, Lee-Thorp, James, Kanagal, Bhargav, Ainslie, Joshua, Feng, Jingchen, Pearson, Zach, Anjorin, Ebenezer, Gandhe, Sudeep, Eckstein, Ilya, Rosswog, Jim, Sanghai, Sumit, Pohl, Michael, Adams, Larry, and Sivakumar, D.
Subjects: Computer Science - Computation and Language
Abstract: We present ShopTalk, a multi-turn conversational faceted search system for shopping that is designed to handle large and complex schemas that are beyond the scope of state of the art slot-filling systems. ShopTalk decouples dialog management from fulfillment, thereby allowing the dialog understanding system to be domain-agnostic and not tied to the particular shopping application. The dialog understanding system consists of a deep-learned Contextual Language Understanding module, which interprets user utterances, and a primarily rules-based Dialog-State Tracker (DST), which updates the dialog state and formulates search requests intended for the fulfillment engine. The interface between the two modules consists of a minimal set of domain-agnostic "intent operators," which instruct the DST on how to update the dialog state. ShopTalk was deployed in 2020 on the Google Assistant for Shopping searches.
Published: 2021

27. An Anesthesiologist’s Conundrum! Venous and Arterial Cannulation in a Patient with Tattoos Posted for Coronary Artery Bypass Grafting

Author: Gargi Deshpande, Uday Gandhe, and Jitendra Bapat
Subjects: Anesthesiology, RD78.3-87.3, Diseases of the circulatory (Cardiovascular) system, RC666-701
Published: 2024
Full Text: View/download PDF

28. Attention-based Contextual Language Model Adaptation for Speech Recognition

Author: Martinez, Richard Diehl, Novotney, Scott, Bulyko, Ivan, Rastrow, Ariya, Stolcke, Andreas, and Gandhe, Ankur
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Language modeling (LM) for automatic speech recognition (ASR) does not usually incorporate utterance level contextual information. For some domains like voice assistants, however, additional context, such as the time at which an utterance was spoken, provides a rich input signal. We introduce an attention mechanism for training neural speech recognition language models on both text and non-linguistic contextual data. When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7.0% relative over a standard LM that does not incorporate contextual information. When evaluated on utterances extracted from the long tail of the dataset, our method improves perplexity by 9.0% relative over a standard LM and by over 2.8% relative when compared to a state-of-the-art model for contextual LM.
Published: 2021

29. DOCENT: Learning Self-Supervised Entity Representations from Large Document Collections

Author: Zemlyanskiy, Yury, Gandhe, Sudeep, He, Ruining, Kanagal, Bhargav, Ravula, Anirudh, Gottweis, Juraj, Sha, Fei, and Eckstein, Ilya
Subjects: Computer Science - Computation and Language
Abstract: This paper explores learning rich self-supervised entity representations from large amounts of the associated text. Once pre-trained, these models become applicable to multiple entity-centric tasks such as ranked retrieval, knowledge base completion, question answering, and more. Unlike other methods that harvest self-supervision signals based merely on a local context within a sentence, we radically expand the notion of context to include any available text related to an entity. This enables a new class of powerful, high-capacity representations that can ultimately distill much of the useful information about an entity from multiple text sources, without any human supervision. We present several training strategies that, unlike prior approaches, learn to jointly predict words and entities -- strategies we compare experimentally on downstream tasks in the TV-Movies domain, such as MovieLens tag prediction from user reviews and natural language movie search. As evidenced by results, our models match or outperform competitive baselines, sometimes with little or no fine-tuning, and can scale to very large corpora. Finally, we make our datasets and pre-trained models publicly available. This includes Reviews2Movielens (see https://goo.gle/research-docent ), mapping the up to 1B word corpus of Amazon movie reviews (He and McAuley, 2016) to MovieLens tags (Harper and Konstan, 2016), as well as Reddit Movie Suggestions (see https://urikz.github.io/docent ) with natural language queries and corresponding community recommendations., Comment: To appear in the proceedings of EACL 2021
Published: 2021

30. Personalization Strategies for End-to-End Speech Recognition Systems

Author: Gourav, Aditya, Liu, Linda, Gandhe, Ankur, Gu, Yile, Lan, Guitang, Huang, Xiangyang, Kalmane, Shashank, Tiwari, Gautam, Filimonov, Denis, Rastrow, Ariya, Stolcke, Andreas, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language
Abstract: The recognition of personalized content, such as contact names, remains a challenging problem for end-to-end speech recognition systems. In this work, we demonstrate how first and second-pass rescoring strategies can be leveraged together to improve the recognition of such words. Following previous work, we use a shallow fusion approach to bias towards recognition of personalized content in the first-pass decoding. We show that such an approach can improve personalized content recognition by up to 16% with minimum degradation on the general use case. We describe a fast and scalable algorithm that enables our biasing models to remain at the word-level, while applying the biasing at the subword level. This has the advantage of not requiring the biasing models to be dependent on any subword symbol table. We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2.5%., Comment: 5 pages, 5 tables, 1 figure
Published: 2021

31. Domain-aware Neural Language Models for Speech Recognition

Author: Liu, Linda, Gu, Yile, Gourav, Aditya, Gandhe, Ankur, Kalmane, Shashank, Filimonov, Denis, Rastrow, Ariya, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: As voice assistants become more ubiquitous, they are increasingly expected to support and perform well on a wide variety of use-cases across different domains. We present a domain-aware rescoring framework suitable for achieving domain-adaptation during second-pass rescoring in production settings. In our framework, we fine-tune a domain-general neural language model on several domains, and use an LSTM-based domain classification model to select the appropriate domain-adapted model to use for second-pass rescoring. This domain-aware rescoring improves the word error rate by up to 2.4% and slot word error rate by up to 4.1% on three individual domains -- shopping, navigation, and music -- compared to domain general rescoring. These improvements are obtained while maintaining accuracy for the general use case., Comment: ICASSP 2021
Published: 2021

32. Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

Author: Ravi, Vijay, Gu, Yile, Gandhe, Ankur, Rastrow, Ariya, Liu, Linda, Filimonov, Denis, Novotney, Scott, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: End-to-end automatic speech recognition (ASR) systems, such as recurrent neural network transducer (RNN-T), have become popular, but rare word remains a challenge. In this paper, we propose a simple, yet effective method called unigram shallow fusion (USF) to improve rare words for RNN-T. In USF, we extract rare words from RNN-T training data based on unigram count, and apply a fixed reward when the word is encountered during decoding. We show that this simple method can improve performance on rare words by 3.7% WER relative without degradation on general test set, and the improvement from USF is additive to any additional language model based rescoring. Then, we show that the same USF does not work on conventional hybrid system. Finally, we reason that USF works by fixing errors in probability estimates of words due to Viterbi search used during decoding with subword-based RNN-T.
Published: 2020

33. Multi-task Language Modeling for Improving Speech Recognition of Rare Words

Author: Yang, Chao-Han Huck, Liu, Linda, Gandhe, Ankur, Gu, Yile, Raju, Anirudh, Filimonov, Denis, and Bulyko, Ivan
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: End-to-end automatic speech recognition (ASR) systems are increasingly popular due to their relative architectural simplicity and competitive performance. However, even though the average accuracy of these systems may be high, the performance on rare content words often lags behind hybrid ASR systems. To address this problem, second-pass rescoring is often applied leveraging upon language modeling. In this paper, we propose a second-pass system with multi-task learning, utilizing semantic targets (such as intent and slot prediction) to improve speech recognition performance. We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1.4% on a general test and by 2.6% on a rare word test set in terms of word-error-rate relative (WERR). Our best ASR system with multi-task LM shows 4.6% WERR deduction compared with RNN Transducer only ASR baseline for rare words recognition., Comment: Accepted to IEEE Automatic Speech Recognition and Understanding (ASRU) 2021
Published: 2020

34. Audio-attention discriminative language model for ASR rescoring

Author: Gandhe, Ankur and Rastrow, Ariya
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound
Abstract: End-to-end approaches for automatic speech recognition (ASR) benefit from directly modeling the probability of the word sequence given the input audio stream in a single neural network. However, compared to conventional ASR systems, these models typically require more data to achieve comparable results. Well-known model adaptation techniques, to account for domain and style adaptation, are not easily applicable to end-to-end systems. Conventional HMM-based systems, on the other hand, have been optimized for various production environments and use cases. In this work, we propose to combine the benefits of end-to-end approaches with a conventional system using an attention-based discriminative language model that learns to rescore the output of a first-pass ASR system. We show that learning to rescore a list of potential ASR outputs is much simpler than learning to generate the hypothesis. The proposed model results in 8% improvement in word error rate even when the amount of training data is a fraction of data used for training the first-pass system., Comment: 4 pages, 1 figure, Accepted at ICASSP 2020
Published: 2019

35. Solid Pseudopapillary Epithelial Neoplasm of the Pancreas: A Rare Entity with Diagnostic Dilemma

Author: Gandhe Gandhe, Patil Patil, Yasam Yasam, and Nagarkar Nagarkar
Subjects: pancreas, solid pseudopapillary epithelial neoplasm, diagnostic dilemma, immunohistochemistry, surgery, General works, R5-130.5, Science
Abstract: The solid pseudopapillary epithelial neoplasm (SPEN) of the pancreas is a relatively uncommon entity. The aim of the present study was to summarize our experiences with regard to diagnostic dilemma, surgery, postoperative follow-up, and management. This retrospective data were collected during the period from January 1, 2018 to December 31, 2020. A total of four patients (three females and one male) were identified within an age range of 13 to 25 years. All the patients were presented with nonspecific symptoms such as abdomen lumps, swelling in the abdomen, and abdominal pain. To reach a definite diagnosis, imaging studies were conducted along with endoscopic ultrasound fine-needle aspiration (EUS-FNA) and biopsy. After confirmation of SPEN on biopsy, all the patients underwent surgery without any complications. Patients are on follow-up, and to date, no metastasis has been detected. SPEN is a rare pancreatic tumor with unusual pathological features leading to a diagnostic dilemma. The pathologist should be familiar with SPEN and its salient histological characteristics that differentiate it from other look-alike pancreatic tumors and can help in timely surgery and management.
Published: 2023
Full Text: View/download PDF

36. Scalable language model adaptation for spoken dialogue systems

Author: Gandhe, Ankur, Rastrow, Ariya, and Hoffmeister, Bjorn
Subjects: Computer Science - Computation and Language
Abstract: Language models (LM) for interactive speech recognition systems are trained on large amounts of data and the model parameters are optimized on past user data. New application intents and interaction types are released for these systems over time, imposing challenges to adapt the LMs since the existing training data is no longer sufficient to model the future user interactions. It is unclear how to adapt LMs to new application intents without degrading the performance on existing applications. In this paper, we propose a solution to (a) estimate n-gram counts directly from the hand-written grammar for training LMs and (b) use constrained optimization to optimize the system parameters for future use cases, while not degrading the performance on past usage. We evaluated our approach on new applications intents for a personal assistant system and find that the adaptation improves the word error rate by up to 15% on new applications even when there is no adaptation data available for an application., Comment: Accepted at SLT 2018
Published: 2018

37. Anesthetic management of mini sternotomy and excision of mediastinal neurogenic tumor: Brain–Heart crosstalk

Author: Arnab Paul, Joseph N Monteiro, Uday Gandhe, and Gargi Deshpande
Subjects: brachial plexus, mini sternotomy, neurogenic tumor, Anesthesiology, RD78.3-87.3, Diseases of the circulatory (Cardiovascular) system, RC666-701
Abstract: Brachial plexus tumors are rare and pose challenges for neurosurgeons due to their anatomical complexity. Retrosternal extension of a tumor makes it more difficult for the surgeons as well as for the anesthesiologists to secure a definitive airway. A cardiopulmonary bypass would be lifesaving in the event of acute cardiorespiratory decompensation. Multidisciplinary collaboration and cooperation between the neurosurgeon, oncosurgeon, cardiothoracic surgeon, and anesthesiologist are imperative to ensure good patient outcomes. Meticulous preoperative evaluation and operative planning are essentially the key factors in anesthetic management. Here we report a successful management of a 49-year-old male patient presented with a large painless mass arising from his right supraclavicular region and compressing the roots of the brachial plexus, trachea, and esophagus and extending up to the apex of the lungs, posted for mini sternotomy and excision of the mass.
Published: 2023
Full Text: View/download PDF

38. Contextual Language Model Adaptation for Conversational Agents

Author: Raju, Anirudh, Hedayatnia, Behnam, Liu, Linda, Gandhe, Ankur, Khatri, Chandra, Metallinou, Angeliki, Venkatesh, Anu, and Rastrow, Ariya
Subjects: Computer Science - Computation and Language, I.2.7
Abstract: Statistical language models (LM) play a key role in Automatic Speech Recognition (ASR) systems used by conversational agents. These ASR systems should provide a high accuracy under a variety of speaking styles, domains, vocabulary and argots. In this paper, we present a DNN-based method to adapt the LM to each user-agent interaction based on generalized contextual information, by predicting an optimal, context-dependent set of LM interpolation weights. We show that this framework for contextual adaptation provides accuracy improvements under different possible mixture LM partitions that are relevant for both (1) Goal-oriented conversational agents where it's natural to partition the data by the requested application and for (2) Non-goal oriented conversational agents where the data can be partitioned using topic labels that come from predictions of a topic classifier. We obtain a relative WER improvement of 3% with a 1-pass decoding strategy and 6% in a 2-pass decoding framework, over an unadapted model. We also show up to a 15% relative improvement in recognizing named entities which is of significant value for conversational ASR systems., Comment: Interspeech 2018 (accepted)
Published: 2018
Full Text: View/download PDF

39. Rare condition of unilateral submandibular gland aplasia and its diagnostic significance in oral cavity carcinoma

Author: Mangala Targe, Venkata Ramesh Yasam, Sucheta Gandhe, Dhruti Manek, and Raj Nagarkar
Subjects: Submandibular gland aplasia, Oral cavity carcinoma, Computed tomography, Magnetic resonance imaging, Case report, Medical physics. Medical radiology. Nuclear medicine, R895-920
Abstract: Abstract Background The unilateral submandibular gland aplasia (agenesis) is a rare, asymptomatic condition usually discovered incidentally on imaging. This is associated commonly with either compensatory hypertrophy of contralateral submandibular gland or sublingual glands. Case presentation We report the case of a 34-year-old male with incidentally detected unilateral submandibular gland aplasia associated with hypertrophy of ipsilateral sublingual gland, demonstrated by imaging modalities, where we have highlighted the diagnostic significance of such rare findings in oncology, particularly in oral cavity carcinoma cases with metastatic submandibular lymph nodes (level IB) mimicking as submandibular gland. Hence, lymphadenopathy can be missed preoperatively which is an important part of staging and treatment planning. Conclusions Aim of the present report is to create awareness about such rare entity in both clinicians/radiologists and highlight the imaging features for correct identification and to avoid any diagnostic dilemmas.
Published: 2022
Full Text: View/download PDF

40. Rare condition of unilateral submandibular gland aplasia and its diagnostic significance in oral cavity carcinoma

Author: Targe, Mangala, Yasam, Venkata Ramesh, Gandhe, Sucheta, Manek, Dhruti, and Nagarkar, Raj
Published: 2022
Full Text: View/download PDF

41. Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding

Author: Kumar, Anjishnu, Gupta, Arpit, Chan, Julian, Tucker, Sam, Hoffmeister, Bjorn, Dreyer, Markus, Peshterliev, Stanislav, Gandhe, Ankur, Filiminov, Denis, Rastrow, Ariya, Monson, Christian, and Kumar, Agnika
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, Computer Science - Software Engineering, 68T50
Abstract: This paper presents the design of the machine learning architecture that underlies the Alexa Skills Kit (ASK) a large scale Spoken Language Understanding (SLU) Software Development Kit (SDK) that enables developers to extend the capabilities of Amazon's virtual assistant, Alexa. At Amazon, the infrastructure powers over 25,000 skills deployed through the ASK, as well as AWS's Amazon Lex SLU Service. The ASK emphasizes flexibility, predictability and a rapid iteration cycle for third party developers. It imposes inductive biases that allow it to learn robust SLU models from extremely small and sparse datasets and, in doing so, removes significant barriers to entry for software developers and dialogue systems researchers., Comment: Published at the 1st Workshop on Conversational AI at NIPS 2017 (NIPS-WCAI)
Published: 2017

42. The predictive role of neutrophil-to-lymphocyte ratio in the outcomes of patients with sarcomatoid carcinoma of oral cavity

Author: Patil, Roshankumar, Pandit, Prakash, Palwe, Vijay, Patil, Rahul, Gandhe, Sucheta, Kate, Shruti, Yasam, Venkata Ramesh, and Nagarkar, Raj
Published: 2022
Full Text: View/download PDF

43. A survey on handwritten character recognition

Author: Patil, Shubham, Kokate, Shantanu, Mane, Vijay, and Gandhe, S. T.
Published: 2024
Full Text: View/download PDF

44. Synthetic aperture radar image enhancement for object detection

Author: Pawar, Sushant and Gandhe, Sanjay
Published: 2024
Full Text: View/download PDF

45. Enhancing user-helper interactions: A survey of middleware website powered by AI/ML

Author: Haridas, Sanjana, Gangurde, Gaurav, Nanne, Vaibhav, and Gandhe, S. T.
Published: 2024
Full Text: View/download PDF

46. Evaluation of Prognostic Factors that Affect Survival Outcomes of Breast Cancer Patients with Brain Metastases: A Single Institutional Experience

Author: Roshankumar Patil, Prakash Pandit, Vijay Palwe, Shruti Kate, Sucheta Gandhe, Rahul Patil, Yasam Venkata Ramesh, and Raj Nagarkar
Subjects: breast cancer, brain metastases, survival outcome, prognostic factors, ds-gpa score, Neoplasms. Tumors. Oncology. Including cancer and carcinogens, RC254-282, Medicine
Abstract: Objective:This study aimed to evaluate various prognostic factors that play a vital role in stratifying and guiding tailored treatment strategies and survival outcome in breast cancer patients with brain metastases (BM).Materials and Methods:Data regarding demography, clinical presentation, molecular subtypes, risk-stratification, treatment details, and outcomes were retrieved from medical records. All time-to-event (survival) outcomes were analyzed by Kaplan-Meir method and compared using log-rank test. Univariate and multivariate analysis of relevant prognostic factors were performed and p-values ≤0.05 were considered statistically significant.Results:A total of 88 patients (median age: 50 years) were included for this study. The median follow-up time of all surviving patients was ~20 months. During the follow-up, 82 (93.1%) patients died. The median survival of all patients was 12 months, with 1-year and 2-year overall survival (OS) rate of 51% and 22%, respectively. Based on univariate analysis, statistically significant prognostic factors for OS were molecular subtypes, number of BM, and Karnofsky Performance Status (KPS); however, number of BM and KPS emerged as independent predictors of survival based on multivariate analysis.Conclusion:We conclude that, there are other important prognostic factor, such as number of BM, which may affect the OS of these patients, in addition to variables included in the diagnosis-specific graded prognostic assessment score. Prospective studies evaluating these factors are necessary to further refine the stratification of patients, which will aid the initiation of appropriate treatment to improve the OS of patients.
Published: 2021
Full Text: View/download PDF

47. Association of bacillus calmette guerin vaccine strains with COVID-19 morbidity and mortality – evaluation of global data

Author: Vijaya Laxman Chaudhari, Charuta Jaykumar Godbole, Prajakta Parag Gandhe, Nithya Jaideep Gogtay, and Urmila Mukund Thatte
Subjects: hic, lower middle income, vaccination policy, Public aspects of medicine, RA1-1270
Abstract: Background: Literature suggests that the presence of the current Bacillus Calmette Guerin (BCG) policy appears to mitigate COVID-19 disease burden but no information exists on the nature of the BCG strain and disease burden. Objectives: To study the association between type of BCG strain, BCG coverage (%), and COVID-19 disease burden. Methodology: An audit of global data on strains and disease burden was done. Country-specific data for COVID-19 cases and deaths, BCG-related data, and income level were obtained from the online databases, and the association was analyzed using linear regression. Results: Data of 139 countries were studied and 117 (84%) had a current BCG policy. Data on BCG strains were available for 51 countries and 18/51 (35%) used the Danish strain. While the choice of strain did not impact COVID-19-related disease burden, the presence of a current BCG policy was significantly associated with lower COVID-19 mortality. Conclusion: The presence of current BCG policy is associated with decreased COVID-19-related disease burden, but the type of strain used by a country in its vaccination program does not impact disease burden.
Published: 2021
Full Text: View/download PDF

48. Neuroanesthesia Practice during COVID-19: A Single-Center Experience

Author: Rajashree U. Gandhe, Chinmaya P. Bhave, Neha T. Gedam, and Rashnita Sengupta
Subjects: aerosol-generating procedures, anesthesia, covid-19, neurointervention, neurosurgery, Anesthesiology, RD78.3-87.3
Abstract: The coronavirus disease 2019 (COVID-19) pandemic is a challenge for all health care providers (HCPs). Anesthesiologists are vulnerable to acquiring the disease during aerosol-generating procedures in operating theater and intensive care units. High index of suspicion, detailed history including travel history, strict hand hygiene, use of face masks, and appropriate personal protective equipment are some ways to minimize the risk of exposure to disease. Neurologic manifestations of COVID-19, modification of anesthesia regimen based on the procedure performed, and HCP safety are some implications relevant to a neuroanesthesiologist. National and international guidelines, recommendations, and position statements help in risk stratification, prioritization, and scheduling of neurosurgery and neurointervention procedures. Institutional protocols can be formulated based on the guidelines wherein each HCP has a definite role in this ever-changing scenario. Mental and physical well-being of HCPs is an integral part of successful management of patients. We present our experience in managing 143 patients during the lockdown period in India.
Published: 2020
Full Text: View/download PDF

49. Prevalence of Molecular Subtypes of Breast Cancer: A Single Institutional Experience of 2062 Patients

Author: Prakash Pandit, Roshankumar Patil, Vijay Palwe, Sucheta Gandhe, Rahul Patil, and Rajnish Nagarkar
Subjects: retrospective observational study, molecular classification, breast cancer, immunohistochemistry, tertiary cancer centre, Neoplasms. Tumors. Oncology. Including cancer and carcinogens, RC254-282, Medicine
Abstract: Objective: The aim of the study was to analyze the prevalence of molecular subtypes of all breast cancer patients treated at tertiary cancer centre in West India in 12 years.Materials and Methods: A retrospective observational study carried out in Tertiary Cancer Care Centre in Western India. Electronic medical records of all breast cancer patients were retrieved from the hospital database between March 2007 to March 2019. Patient’s characteristic, histological features and molecular subtypes were collected and analyzed.Results: A total of 2062 women fulfilled the criteria for this study and were analyzed. The median age of study population was 51 years (range 22–100 years). Among these, 1357 (65.8%) were of ≤55 years and 705 (34.2%) were over 55 years. The overall incidence of Hormonal Receptor‑positive patients (either estrogen-receptor (ER) or progesterone-receptor (PR) or both) was 1162 (56.4%). The Mean tumor size was 3.8cm (range 0-18cm). The most common histology was IDC (96%). Axillary nodes were positive in 62.5%. Luminal type A was positive in 762 (37%) patients while Luminal type B was present in 157 (7.6%) patients. Basal-like subtype was observed in 537 (26%) patients while HER2 rich subtype was seen in 229 (11.1%). The incidence of Luminal A subtype increased with age. The highest observed among patients (72%) aged 70 years or more. Incidence of Basal like subtype was highest in patients less than 30 years (52%).Conclusion: Luminal-like disease is the most common molecular subtype in India. Identification of Basal like breast cancer, a highly aggressive, biologically and clinically distinct subtype different than its non-basal variant, is important for treatment planning and target therapy.
Published: 2020
Full Text: View/download PDF

50. Sarcomatoid Carcinoma of the Penis: An Uncommon Penile Neoplasm

Author: Sucheta Gandhe, Rahul Patil, and Raj Nagarkar
Subjects: sarcomatoid carcinoma, penis, immunohistochemistry, case report, Pathology, RB1-214
Abstract: Sarcomatoid squamous cell carcinomas are extremely rare, high grade, aggressive variant of penile cancers. Sarcomatoid carcinoma are biphasic neoplasms with a combination of both sarcomatoid components and carcinomatous elements. These neoplasms are very rare in the urogenital system. We report a 53-year-old male presented with an ulcerated lesion on the glans penis. The rarity of this case reiterates the importance of thorough morphological and histological examination along with immunohistochemistry in diagnosing, staging, treatment and follow up of patients.
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

145 results on '"Gandhe, P."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources