174 results on '"Corrado, Greg"'
Search Results
152. Understanding Neural Coding through the Model-Based Analysis of Decision Making
- Author
-
Corrado, Greg and Doya, Kenji
- Subjects
Decision Making ,Animals ,Humans ,Mini-Review ,Models, Psychological ,Nerve Net ,Comprehension - Abstract
The study of decision making poses new methodological challenges for systems neuroscience. Whereas our traditional approach linked neural activity to external variables that the experimenter directly observed and manipulated, many of the key elements that contribute to decisions are internal to the decider. Variables such as subjective value or subjective probability may be influenced by experimental conditions and manipulations but can neither be directly measured nor precisely controlled. Pioneering work on the neural basis of decision circumvented this difficulty by studying behavior in static conditions, in which knowledge of the average state of these quantities was sufficient. More recently, a new wave of studies has confronted the conundrum of internal decision variables more directly by leveraging quantitative behavioral models. When these behavioral models are successful in predicting a subject's choice, the model's internal variables may serve as proxies for the unobservable decision variables that actually drive behavior. This new methodology has allowed researchers to localize neural subsystems that encode hidden decision variables related to free choice and to study these variables under dynamic conditions.
- Published
- 2007
153. Chapter 30 - The Trouble with Choice: Studying Decision Variables in the Brain
- Author
-
Corrado, Greg S., Sugrue, Leo P., Brown, Julian R., and Newsome, William T.
- Published
- 2009
- Full Text
- View/download PDF
154. Pathologist Validation of a Machine Learning–Derived Feature for Colon Cancer Risk Stratification.
- Author
-
L'Imperio, Vincenzo, Wulczyn, Ellery, Plass, Markus, Müller, Heimo, Tamini, Nicolò, Gianotti, Luca, Zucchini, Nicola, Reihs, Robert, Corrado, Greg S., Webster, Dale R., Peng, Lily H., Chen, Po-Hsuan Cameron, Lavitrano, Marialuisa, Liu, Yun, Steiner, David F., Zatloukal, Kurt, and Pagni, Fabio
- Published
- 2023
- Full Text
- View/download PDF
155. Distribution of Confidence Ratings for a Simple Perceptual Task
- Author
-
Lachter, Joel, primary, Corrado, Greg S., additional, Johnston, James C., additional, and McClelland, James L., additional
- Published
- 2011
- Full Text
- View/download PDF
156. Stimulus onset quenches neural variability: a widespread cortical phenomenon
- Author
-
Churchland, Mark M, primary, Yu, Byron M, additional, Cunningham, John P, additional, Sugrue, Leo P, additional, Cohen, Marlene R, additional, Corrado, Greg S, additional, Newsome, William T, additional, Clark, Andrew M, additional, Hosseini, Paymon, additional, Scott, Benjamin B, additional, Bradley, David C, additional, Smith, Matthew A, additional, Kohn, Adam, additional, Movshon, J Anthony, additional, Armstrong, Katherine M, additional, Moore, Tirin, additional, Chang, Steve W, additional, Snyder, Lawrence H, additional, Lisberger, Stephen G, additional, Priebe, Nicholas J, additional, Finn, Ian M, additional, Ferster, David, additional, Ryu, Stephen I, additional, Santhanam, Gopal, additional, Sahani, Maneesh, additional, and Shenoy, Krishna V, additional
- Published
- 2010
- Full Text
- View/download PDF
157. Toward online measurement of decision state
- Author
-
Lachter, Joel, primary, Johnston, James C., additional, Corrado, Greg S., additional, and McClelland, James L., additional
- Published
- 2009
- Full Text
- View/download PDF
158. Recursive Sparse, Spatiotemporal Coding
- Author
-
Dean, Thomas, primary, Washington, Rich, additional, and Corrado, Greg, additional
- Published
- 2009
- Full Text
- View/download PDF
159. Choosing the greater of two goods: neural currencies for valuation and decision making
- Author
-
Sugrue, Leo P., primary, Corrado, Greg S., additional, and Newsome, William T., additional
- Published
- 2005
- Full Text
- View/download PDF
160. Early social distancing policies in Europe, changes in mobility & COVID-19 case trajectories: insights from Spring 2020
- Author
-
Woskie, Liana R., Hennessy, Jonathan, Espinosa, Valeria, Tsai, Thomas C., Vispute, Swapnil, Jacobson, Benjamin H., Cattuto, Ciro, Gauvin, Laetitia, Tizzoni, Michele, Fabrikant, Alex, Gadepalli, Krishna, Boulanger, Adam, Pearce, Adam, Kamath, Chaitanya, Schlosberg, Arran, Stanton, Charlotte, Bavadekar, Shailesh, Abueg, Matthew, Hogue, Michael, Oplinger, Andrew, Chou, Katherine, Corrado, Greg, Shekel, Tomer, Jha, Ashish K., Wellenius, Gregory A., Gabrilovich, Evgeniy, Woskie, Liana R., Hennessy, Jonathan, Espinosa, Valeria, Tsai, Thomas C., Vispute, Swapnil, Jacobson, Benjamin H., Cattuto, Ciro, Gauvin, Laetitia, Tizzoni, Michele, Fabrikant, Alex, Gadepalli, Krishna, Boulanger, Adam, Pearce, Adam, Kamath, Chaitanya, Schlosberg, Arran, Stanton, Charlotte, Bavadekar, Shailesh, Abueg, Matthew, Hogue, Michael, Oplinger, Andrew, Chou, Katherine, Corrado, Greg, Shekel, Tomer, Jha, Ashish K., Wellenius, Gregory A., and Gabrilovich, Evgeniy
- Abstract
Background Social distancing have been widely used to mitigate community spread of SARS-CoV-2. We sought to quantify the impact of COVID-19 social distancing policies across 27 European counties in spring 2020 on population mobility and the subsequent trajectory of disease. Methods We obtained data on national social distancing policies from the Oxford COVID-19 Government Response Tracker and aggregated and anonymized mobility data from Google. We used a pre-post comparison and two linear mixed-effects models to first assess the relationship between implementation of national policies and observed changes in mobility, and then to assess the relationship between changes in mobility and rates of COVID-19 infections in subsequent weeks. Results Compared to a pre-COVID baseline, Spain saw the largest decrease in aggregate population mobility (~70%), as measured by the time spent away from residence, while Sweden saw the smallest decrease (~20%). The largest declines in mobility were associated with mandatory stay-at-home orders, followed by mandatory workplace closures, school closures, and non-mandatory workplace closures. While mandatory shelter-in-place orders were associated with 16.7% less mobility (95% CI: -23.7% to -9.7%), non-mandatory orders were only associated with an 8.4% decrease (95% CI: -14.9% to -1.8%). Large-gathering bans were associated with the smallest change in mobility compared with other policy types. Changes in mobility were in turn associated with changes in COVID-19 case growth. For example, a 10% decrease in time spent away from places of residence was associated with 11.8% (95% CI: 3.8%, 19.1%) fewer new COVID-19 cases. Discussion This comprehensive evaluation across Europe suggests that mandatory stay-at-home orders and workplace closures had the largest impacts on population mobility and subsequent COVID-19 cases at the onset of the pandemic. With a better understanding of policies’ relative performance, countries can more effectively inve
161. A toolbox for surfacing health equity harms and biases in large language models.
- Author
-
Pfohl SR, Cole-Lewis H, Sayres R, Neal D, Asiedu M, Dieng A, Tomasev N, Rashid QM, Azizi S, Rostamzadeh N, McCoy LG, Celi LA, Liu Y, Schaekermann M, Walton A, Parrish A, Nagpal C, Singh P, Dewitt A, Mansfield P, Prakash S, Heller K, Karthikesalingam A, Semturs C, Barral J, Corrado G, Matias Y, Smith-Loud J, Horn I, and Singhal K
- Abstract
Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases and EquityMedQA, a collection of seven datasets enriched for adversarial queries. Both our human assessment framework and our dataset design process are grounded in an iterative participatory approach and review of Med-PaLM 2 answers. Through our empirical study, we find that our approach surfaces biases that may be missed by narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. While our approach is not sufficient to holistically assess whether the deployment of an artificial intelligence (AI) system promotes equitable health outcomes, we hope that it can be leveraged and built upon toward a shared goal of LLMs that promote accessible and equitable healthcare., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
162. Using generative AI to investigate medical imagery models and datasets.
- Author
-
Lang O, Yaya-Stupp D, Traynis I, Cole-Lewis H, Bennett CR, Lyles CR, Lau C, Irani M, Semturs C, Webster DR, Corrado GS, Hassidim A, Matias Y, Liu Y, Hammel N, and Babenko B
- Subjects
- Humans, Cardiomegaly, Fundus Oculi, Artificial Intelligence, Algorithms, Cataract
- Abstract
Background: AI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust of doctors in AI-based models, especially in domains where AI prediction capabilities surpass those of humans. Moreover, such explanations could enable novel scientific discovery by uncovering signals in the data that aren't yet known to experts., Methods: In this paper, we present a workflow for generating hypotheses to understand which visual signals in images are correlated with a classification model's predictions for a given task. This approach leverages an automatic visual explanation algorithm followed by interdisciplinary expert review. We propose the following 4 steps: (i) Train a classifier to perform a given task to assess whether the imagery indeed contains signals relevant to the task; (ii) Train a StyleGAN-based image generator with an architecture that enables guidance by the classifier ("StylEx"); (iii) Automatically detect, extract, and visualize the top visual attributes that the classifier is sensitive towards. For visualization, we independently modify each of these attributes to generate counterfactual visualizations for a set of images (i.e., what the image would look like with the attribute increased or decreased); (iv) Formulate hypotheses for the underlying mechanisms, to stimulate future research. Specifically, present the discovered attributes and corresponding counterfactual visualizations to an interdisciplinary panel of experts so that hypotheses can account for social and structural determinants of health (e.g., whether the attributes correspond to known patho-physiological or socio-cultural phenomena, or could be novel discoveries)., Findings: To demonstrate the broad applicability of our approach, we present results on eight prediction tasks across three medical imaging modalities-retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples where many of the automatically-learned attributes clearly capture clinically known features (e.g., types of cataract, enlarged heart), and demonstrate automatically-learned confounders that arise from factors beyond physiological mechanisms (e.g., chest X-ray underexposure is correlated with the classifier predicting abnormality, and eye makeup is correlated with the classifier predicting low hemoglobin levels). We further show that our method reveals a number of physiologically plausible, previously-unknown attributes based on the literature (e.g., differences in the fundus associated with self-reported sex, which were previously unknown)., Interpretation: Our approach enables hypotheses generation via attribute visualizations and has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models, as well as debug and design better datasets. Though not designed to infer causality, importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors, and hence interdisciplinary perspectives are critical in these investigations. Finally, we will release code to help researchers train their own StylEx models and analyze their predictive tasks of interest, and use the methodology presented in this paper for responsible interpretation of the revealed attributes., Funding: Google., Competing Interests: Declaration of interests OL, DYS, HCL, CS, DRW, GSC, AH, YM, YL, NH and BB are current or past Google employees and may own Alphabet stock. IT, CRB, and CL are paid consultants to Google. All other authors declare no competing interests., (Copyright © 2024 The Author(s). Published by Elsevier B.V. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
163. Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study.
- Author
-
Schaekermann M, Spitz T, Pyles M, Cole-Lewis H, Wulczyn E, Pfohl SR, Martin D Jr, Jaroensri R, Keeling G, Liu Y, Farquhar S, Xue Q, Lester J, Hughes C, Strachan P, Tan F, Bui P, Mermel CH, Peng LH, Matias Y, Corrado GS, Webster DR, Virmani S, Semturs C, Liu Y, Horn I, and Cameron Chen PH
- Abstract
Background: Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study., Methods: Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case., Findings: Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs., Interpretation: Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to pre-existing health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes., Funding: Google LLC., Competing Interests: This study was funded by Google LLC. MS, TS, HC, EW, SP, DM, RJ, GK, YL, SF, QX, CH, PS, FT, PB, LHP, CHM, YM, GSC, DW, SV, CS, YL, IH, PHCC are current or former employees of Google and own stock as part of the standard compensation package. MP and JL are paid consultants of Google., (© 2024 The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
164. An intentional approach to managing bias in general purpose embedding models.
- Author
-
Weng WH, Sellergen A, Kiraly AP, D'Amour A, Park J, Pilgrim R, Pfohl S, Lau C, Natarajan V, Azizi S, Karthikesalingam A, Cole-Lewis H, Matias Y, Corrado GS, Webster DR, Shetty S, Prabhakara S, Eswaran K, Celi LAG, and Liu Y
- Subjects
- Humans, Bias, Algorithms, Machine Learning, Delivery of Health Care
- Abstract
Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components-GPPEs-from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended., Competing Interests: Declaration of interests W-HW, AS, APK, AD’A, JP, RP, SP, VN, SA, AK, HC-L, YM, GSC, DRW, SS, SP, KE, and YL are employees of Google and hold Alphabet stock. W-HW, AS, APK, AD’A, RP, SP, VN, SA, AK, HC-L, YM, GSC, DRW, SS, SP, KE, and YL have patents filed or in progress under Google, broadly related to machine learning and embedding models. CL performs work at Google as a medical consultant via Vituity and receives consulting fees for clinical perspective and guidance. LAGC receives support for educational events and meetings from the National Institutes of Health, Stanford University, University of California San Francisco, University of Toronto, College of Intensive Care Medicine of Australia and New Zealand, University of Bergen, Amsterdam University Medical Centers, Académie Nationale de Médecine (France), and the Doris Duke Foundation (for the Reconsidering Race in Clinical Algorithms workshop at the National Academy of Medicine, Washington, DC). LAGC is an Editor in Chief of PLOS Digital Health, and on the advisory board of the Lancet Digital Health., (Copyright © 2024 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license. Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2024
- Full Text
- View/download PDF
165. Large language models encode clinical knowledge.
- Author
-
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Schärli N, Chowdhery A, Mansfield P, Demner-Fushman D, Agüera Y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, and Natarajan V
- Subjects
- Bias, Clinical Competence, Comprehension, Datasets as Topic, Licensure, Patient Safety, Physicians, Benchmarking, Computer Simulation, Knowledge, Medicine methods, Medicine standards, Natural Language Processing
- Abstract
Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model
1 (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM2 on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA3 , MedMCQA4 , PubMedQA5 and Measuring Massive Multitask Language Understanding (MMLU) clinical topics6 ), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications., (© 2023. The Author(s).)- Published
- 2023
- Full Text
- View/download PDF
166. Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians.
- Author
-
Dvijotham KD, Winkens J, Barsbey M, Ghaisas S, Stanforth R, Pawlowski N, Strachan P, Ahmed Z, Azizi S, Bachrach Y, Culp L, Daswani M, Freyberg J, Kelly C, Kiraly A, Kohlberger T, McKinney S, Mustafa B, Natarajan V, Geras K, Witowski J, Qin ZZ, Creswell J, Shetty S, Sieniek M, Spitz T, Corrado G, Kohli P, Cemgil T, and Karthikesalingam A
- Subjects
- Reproducibility of Results, Workflow, Humans, Artificial Intelligence, Triage
- Abstract
Predictive artificial intelligence (AI) systems based on deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings, but can make errors in cases accurately diagnosed by clinicians and vice versa. We developed Complementarity-Driven Deferral to Clinical Workflow (CoDoC), a system that can learn to decide between the opinion of a predictive AI model and a clinical workflow. CoDoC enhances accuracy relative to clinician-only or AI-only baselines in clinical workflows that screen for breast cancer or tuberculosis (TB). For breast cancer screening, compared to double reading with arbitration in a screening program in the UK, CoDoC reduced false positives by 25% at the same false-negative rate, while achieving a 66% reduction in clinician workload. For TB triaging, compared to standalone AI and clinical workflows, CoDoC achieved a 5-15% reduction in false positives at the same false-negative rate for three of five commercially available predictive AI systems. To facilitate the deployment of CoDoC in novel futuristic clinical settings, we present results showing that CoDoC's performance gains are sustained across several axes of variation (imaging modality, clinical setting and predictive AI system) and discuss the limitations of our evaluation and where further validation would be needed. We provide an open-source implementation to encourage further research and application., (© 2023. The Author(s), under exclusive licence to Springer Nature America, Inc.)
- Published
- 2023
- Full Text
- View/download PDF
167. Lessons learned from translating AI from development to deployment in healthcare.
- Author
-
Widner K, Virmani S, Krause J, Nayar J, Tiwari R, Pedersen ER, Jeji D, Hammel N, Matias Y, Corrado GS, Liu Y, Peng L, and Webster DR
- Subjects
- Delivery of Health Care, Artificial Intelligence
- Published
- 2023
- Full Text
- View/download PDF
168. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging.
- Author
-
Azizi S, Culp L, Freyberg J, Mustafa B, Baur S, Kornblith S, Chen T, Tomasev N, Mitrović J, Strachan P, Mahdavi SS, Wulczyn E, Babenko B, Walker M, Loh A, Chen PC, Liu Y, Bavishi P, McKinney SM, Winkens J, Roy AG, Beaver Z, Ryan F, Krogue J, Etemadi M, Telang U, Liu Y, Peng L, Corrado GS, Webster DR, Fleet D, Hinton G, Houlsby N, Karthikesalingam A, Norouzi M, and Natarajan V
- Subjects
- Diagnostic Imaging, Supervised Machine Learning, Machine Learning
- Abstract
Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such 'out of distribution' performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for 'Robust and Efficient Medical Imaging with Self-supervision'), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1-33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging., (© 2023. The Author(s), under exclusive licence to Springer Nature Limited.)
- Published
- 2023
- Full Text
- View/download PDF
169. Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning.
- Author
-
Krogue JD, Azizi S, Tan F, Flament-Auvigne I, Brown T, Plass M, Reihs R, Müller H, Zatloukal K, Richeson P, Corrado GS, Peng LH, Mermel CH, Liu Y, Chen PC, Gombar S, Montine T, Shen J, Steiner DF, and Wulczyn E
- Abstract
Background: Presence of lymph node metastasis (LNM) influences prognosis and clinical decision-making in colorectal cancer. However, detection of LNM is variable and depends on a number of external factors. Deep learning has shown success in computational pathology, but has struggled to boost performance when combined with known predictors., Methods: Machine-learned features are created by clustering deep learning embeddings of small patches of tumor in colorectal cancer via k-means, and then selecting the top clusters that add predictive value to a logistic regression model when combined with known baseline clinicopathological variables. We then analyze performance of logistic regression models trained with and without these machine-learned features in combination with the baseline variables., Results: The machine-learned extracted features provide independent signal for the presence of LNM (AUROC: 0.638, 95% CI: [0.590, 0.683]). Furthermore, the machine-learned features add predictive value to the set of 6 clinicopathologic variables in an external validation set (likelihood ratio test, p < 0.00032; AUROC: 0.740, 95% CI: [0.701, 0.780]). A model incorporating these features can also further risk-stratify patients with and without identified metastasis (p < 0.001 for both stage II and stage III)., Conclusion: This work demonstrates an effective approach to combine deep learning with established clinicopathologic factors in order to identify independently informative features associated with LNM. Further work building on these specific results may have important impact in prognostication and therapeutic decision making for LNM. Additionally, this general computational approach may prove useful in other contexts., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
170. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge.
- Author
-
Bulten W, Kartasalo K, Chen PC, Ström P, Pinckaers H, Nagpal K, Cai Y, Steiner DF, van Boven H, Vink R, Hulsbergen-van de Kaa C, van der Laak J, Amin MB, Evans AJ, van der Kwast T, Allan R, Humphrey PA, Grönberg H, Samaratunga H, Delahunt B, Tsuzuki T, Häkkinen T, Egevad L, Demkin M, Dane S, Tan F, Valkonen M, Corrado GS, Peng L, Mermel CH, Ruusuvuori P, Litjens G, and Eklund M
- Subjects
- Algorithms, Biopsy, Cohort Studies, Humans, Male, Prostatic Neoplasms diagnosis, Reproducibility of Results, Neoplasm Grading, Prostatic Neoplasms pathology
- Abstract
Artificial intelligence (AI) has shown promise for diagnosing prostate cancer in biopsies. However, results have been limited to individual studies, lacking validation in multinational settings. Competitions have been shown to be accelerators for medical imaging innovations, but their impact is hindered by lack of reproducibility and independent validation. With this in mind, we organized the PANDA challenge-the largest histopathology competition to date, joined by 1,290 developers-to catalyze development of reproducible AI algorithms for Gleason grading using 10,616 digitized prostate biopsies. We validated that a diverse set of submitted algorithms reached pathologist-level performance on independent cross-continental cohorts, fully blinded to the algorithm developers. On United States and European external validation sets, the algorithms achieved agreements of 0.862 (quadratically weighted κ, 95% confidence interval (CI), 0.840-0.884) and 0.868 (95% CI, 0.835-0.900) with expert uropathologists. Successful generalization across different patient populations, laboratories and reference standards, achieved by a variety of algorithmic approaches, warrants evaluating AI-based Gleason grading in prospective clinical trials., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
171. Erratum: Publisher Correction: Development and validation of a deep learning algorithm for improving Gleason scoring of prostate cancer.
- Author
-
Nagpal K, Foote D, Liu Y, Chen PC, Wulczyn E, Tan F, Olson N, Smith JL, Mohtashamian A, Wren JH, Corrado GS, MacDonald R, Peng LH, Amin MB, Evans AJ, Sangoi AR, Mermel CH, Hipp JD, and Stumpe MC
- Abstract
[This corrects the article DOI: 10.1038/s41746-019-0112-2.]., (© The Author(s) 2019.)
- Published
- 2019
- Full Text
- View/download PDF
172. Erratum: Author Correction: Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program.
- Author
-
Ruamviboonsuk P, Krause J, Chotcomwongse P, Sayres R, Raman R, Widner K, Campana BJL, Phene S, Hemarat K, Tadarati M, Silpa-Archa S, Limwattanayingyong J, Rao C, Kuruvilla O, Jung J, Tan J, Orprayoon S, Kangwanwongpaisan C, Sukumalpaiboon R, Luengchaichawang C, Fuangkaew J, Kongsap P, Chualinpha L, Saree S, Kawinpanitan S, Mitvongsa K, Lawanasakol S, Thepchatri C, Wongpichedchai L, Corrado GS, Peng L, and Webster DR
- Abstract
[This corrects the article DOI: 10.1038/s41746-019-0099-8.].
- Published
- 2019
- Full Text
- View/download PDF
173. Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program.
- Author
-
Raumviboonsuk P, Krause J, Chotcomwongse P, Sayres R, Raman R, Widner K, Campana BJL, Phene S, Hemarat K, Tadarati M, Silpa-Archa S, Limwattanayingyong J, Rao C, Kuruvilla O, Jung J, Tan J, Orprayoon S, Kangwanwongpaisan C, Sukumalpaiboon R, Luengchaichawang C, Fuangkaew J, Kongsap P, Chualinpha L, Saree S, Kawinpanitan S, Mitvongsa K, Lawanasakol S, Thepchatri C, Wongpichedchai L, Corrado GS, Peng L, and Webster DR
- Abstract
Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. A total of 25,326 gradable retinal images of patients with diabetes from the community-based, nationwide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME). Grades adjudicated by a panel of international retinal specialists served as the reference standard. Relative to human graders, for detecting referable DR (moderate NPDR or worse), the deep learning algorithm had significantly higher sensitivity (0.97 vs. 0.74, p < 0.001), and a slightly lower specificity (0.96 vs. 0.98, p < 0.001). Higher sensitivity of the algorithm was also observed for each of the categories of severe or worse NPDR, PDR, and DME ( p < 0.001 for all comparisons). The quadratic-weighted kappa for determination of DR severity levels by the algorithm and human graders was 0.85 and 0.78 respectively ( p < 0.001 for the difference). Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Deep learning algorithms may serve as a valuable tool for DR screening., Competing Interests: Competing interestsJ.K., R.S., K.W., B.J.L.C., G.S.C., L.P., S.P. and D.R.W. are Google employees and receive salary and stock as a part of the standard compensation package. O.K., J.J. and J.T. are consultants for Google.
- Published
- 2019
- Full Text
- View/download PDF
174. Understanding neural coding through the model-based analysis of decision making.
- Author
-
Corrado G and Doya K
- Subjects
- Animals, Humans, Comprehension physiology, Decision Making physiology, Models, Psychological, Nerve Net physiology
- Abstract
The study of decision making poses new methodological challenges for systems neuroscience. Whereas our traditional approach linked neural activity to external variables that the experimenter directly observed and manipulated, many of the key elements that contribute to decisions are internal to the decider. Variables such as subjective value or subjective probability may be influenced by experimental conditions and manipulations but can neither be directly measured nor precisely controlled. Pioneering work on the neural basis of decision circumvented this difficulty by studying behavior in static conditions, in which knowledge of the average state of these quantities was sufficient. More recently, a new wave of studies has confronted the conundrum of internal decision variables more directly by leveraging quantitative behavioral models. When these behavioral models are successful in predicting a subject's choice, the model's internal variables may serve as proxies for the unobservable decision variables that actually drive behavior. This new methodology has allowed researchers to localize neural subsystems that encode hidden decision variables related to free choice and to study these variables under dynamic conditions.
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.