77 results on '"Tourassi G"'
Search Results
2. Artificial Neural Networks as a Computer Aid for Lung Disease Detection and Classification in Ventilation-Perfusion Lung Scans
- Author
-
Tourassi, G. D., primary, Frederick, E. D., additional, and Coleman, R. E., additional
- Published
- 2001
- Full Text
- View/download PDF
3. Experimental detection of iron overload in liver through neutron stimulated emission spectroscopy
- Author
-
Kapadia, A J, primary, Tourassi, G D, additional, Sharma, A C, additional, Crowell, A S, additional, Kiser, M R, additional, and Howell, C R, additional
- Published
- 2008
- Full Text
- View/download PDF
4. Neutron stimulated emission computed tomography: a Monte Carlo simulation approach
- Author
-
Sharma, A C, primary, Harrawood, B P, additional, Bender, J E, additional, Tourassi, G D, additional, and Kapadia, A J, additional
- Published
- 2007
- Full Text
- View/download PDF
5. Quality evaluation of fast morphing interpolation model for 3D volume reconstruction.
- Author
-
Fadeev, A., Eltonsy, N., Tourassi, G., and Elmaghraby, A.
- Published
- 2005
- Full Text
- View/download PDF
6. A methodology for analysis extraction and visualization of CT scans.
- Author
-
Eltonsy, N., Tourassi, G., Desoky, A., and Elmaghraby, A.
- Published
- 2003
- Full Text
- View/download PDF
7. Acute pulmonary embolism: cost-effectiveness analysis of the effect of artificial neural networks on patient care.
- Author
-
Tourassi, G D, primary, Floyd, C E, additional, and Coleman, R E, additional
- Published
- 1998
- Full Text
- View/download PDF
8. Artificial neural network for diagnosis of acute pulmonary embolism: effect of case and observer selection.
- Author
-
Tourassi, G D, primary, Floyd, C E, additional, Sostman, H D, additional, and Coleman, R E, additional
- Published
- 1995
- Full Text
- View/download PDF
9. Acute pulmonary embolism: artificial neural network approach for diagnosis.
- Author
-
Tourassi, G D, primary, Floyd, C E, additional, Sostman, H D, additional, and Coleman, R E, additional
- Published
- 1993
- Full Text
- View/download PDF
10. Multifractal texture analysis of perfusion lung scans as a potential diagnostic tool for acute pulmonary embolism
- Author
-
Tourassi, G. D., Frederick, E. D., Jr., C. E., and Coleman, R. E.
- Published
- 2001
- Full Text
- View/download PDF
11. Significance of MPEG-7 textural features for improved mass detection in mammography
- Author
-
Eltonsy, N. H., Tourassi, G. D., Fadeev, A., and Adel Elmaghraby
12. Investigating performance of a morphology-based CAD scheme in detecting architectural distortion in screening mammograms
- Author
-
Eltonsy, N., Tourassi, G., and Adel Elmaghraby
13. Generalized mutual information similarity metrics for multimodal biomedical image registration
- Author
-
Wachowiak, M. P., Smolíková, R., Tourassi, G. D., and Adel Elmaghraby
14. Feature and knowledge based analysis for reduction of false positives in the computerized detection of masses in screening mammography
- Author
-
Tourassi, G. D., Eltonsy, N. H., Graham, J. H., Floyd, C. E., and Adel Elmaghraby
15. Characterization of ultrasonic backscatter based on generalized entropy
- Author
-
Smolíková, R., Wachowiak, M. P., Tourassi, G. D., Adel Elmaghraby, and Zurada, J. M.
16. Probabilistic framework for reliability analysis of information-theoretic CAD systems in mammography
- Author
-
Habas, P. A., Zurada, J. M., Adel Elmaghraby, and Tourassi, G. D.
17. The United States Department of Energy and National Institutes of Health Collaboration: Medical Care Advances by Discovery in Radiation Detection.
- Author
-
Buchsbaum J, Capala J, Obcemea C, Keppel C, Asai M, Chen GH, Christy ME, Fakhri GE, Gueye P, Pogue B, Ruckman L, Tourassi G, Vetter K, Zhao W, Squires A, Saboury B, Wang G, Domurat-Sousa K, and Weisenberger A
- Abstract
A National Institutes of Health (NIH) and U.S. Department of Energy (DOE) Office of Science virtual workshop on shared general topics was held in July of 2021 and reported on in this publication in January of 2023. Following the inaugural 2021 joint meeting representatives from the DOE Office of Science and NIH met to discuss organizing a second joint workshop that would concentrate on radiation detection to bring together teams from both agencies and their grantee populations to stimulate collaboration and efficiency. To meet this scientific mission within the NIH and DOE radiation detection space, the organizers assembled workshop sessions covering the state-of-the-art in cameras, detectors, and sensors for radiation external and internal (diagnostic and therapeutic) to human, data acquisition and electronics, image reconstruction and processing, and the application of artificial intelligence. NIH and DOE are committed to continuing the process of convening a joint workshop every 12-24 months. This Special Report recaps the findings of this second workshop. Beyond showing only the innovations and areas of success, important gaps in our knowledge were defined and presented. We summarize by defining four areas of greatest opportunity and need that emerged from the unique, dynamic dialogue the in-person workshop provided the attendees., (© 2024 American Association of Physicists in Medicine. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.)
- Published
- 2024
- Full Text
- View/download PDF
18. Author Correction: BigNeuron: a resource to benchmark and predict performance of algorithms for automated tracing of neurons in light microscopy datasets.
- Author
-
Manubens-Gil L, Zhou Z, Chen H, Ramanathan A, Liu X, Liu Y, Bria A, Gillette T, Ruan Z, Yang J, Radojević M, Zhao T, Cheng L, Qu L, Liu S, Bouchard KE, Gu L, Cai W, Ji S, Roysam B, Wang CW, Yu H, Sironi A, Iascone DM, Zhou J, Bas E, Conde-Sousa E, Aguiar P, Li X, Li Y, Nanda S, Wang Y, Muresan L, Fua P, Ye B, He HY, Staiger JF, Peter M, Cox DN, Simonneau M, Oberlaender M, Jefferis G, Ito K, Gonzalez-Bellido P, Kim J, Rubel E, Cline HT, Zeng H, Nern A, Chiang AS, Yao J, Roskams J, Livesey R, Stevens J, Liu T, Dang C, Guo Y, Zhong N, Tourassi G, Hill S, Hawrylycz M, Koch C, Meijering E, Ascoli GA, and Peng H
- Published
- 2024
- Full Text
- View/download PDF
19. Machine learning and deep learning tools for the automated capture of cancer surveillance data.
- Author
-
Hsu E, Hanson H, Coyle L, Stevens J, Tourassi G, and Penberthy L
- Subjects
- Humans, United States epidemiology, Registries, National Cancer Institute (U.S.), Neoplasms diagnosis, Neoplasms epidemiology, Deep Learning, Machine Learning
- Abstract
The National Cancer Institute and the Department of Energy strategic partnership applies advanced computing and predictive machine learning and deep learning models to automate the capture of information from unstructured clinical text for inclusion in cancer registries. Applications include extraction of key data elements from pathology reports, determination of whether a pathology or radiology report is related to cancer, extraction of relevant biomarker information, and identification of recurrence. With the growing complexity of cancer diagnosis and treatment, capturing essential information with purely manual methods is increasingly difficult. These new methods for applying advanced computational capabilities to automate data extraction represent an opportunity to close critical information gaps and create a nimble, flexible platform on which new information sources, such as genomics, can be added. This will ultimately provide a deeper understanding of the drivers of cancer and outcomes in the population and increase the timeliness of reporting. These advances will enable better understanding of how real-world patients are treated and the outcomes associated with those treatments in the context of our complex medical and social environment., (Published by Oxford University Press 2024.)
- Published
- 2024
- Full Text
- View/download PDF
20. Diversity and scale: Genetic architecture of 2068 traits in the VA Million Veteran Program.
- Author
-
Verma A, Huffman JE, Rodriguez A, Conery M, Liu M, Ho YL, Kim Y, Heise DA, Guare L, Panickan VA, Garcon H, Linares F, Costa L, Goethert I, Tipton R, Honerlaw J, Davies L, Whitbourne S, Cohen J, Posner DC, Sangar R, Murray M, Wang X, Dochtermann DR, Devineni P, Shi Y, Nandi TN, Assimes TL, Brunette CA, Carroll RJ, Clifford R, Duvall S, Gelernter J, Hung A, Iyengar SK, Joseph J, Kember R, Kranzler H, Kripke CM, Levey D, Luoh SW, Merritt VC, Overstreet C, Deak JD, Grant SFA, Polimanti R, Roussos P, Shakt G, Sun YV, Tsao N, Venkatesh S, Voloudakis G, Justice A, Begoli E, Ramoni R, Tourassi G, Pyarajan S, Tsao P, O'Donnell CJ, Muralidhar S, Moser J, Casas JP, Bick AG, Zhou W, Cai T, Voight BF, Cho K, Gaziano JM, Madduri RK, Damrauer S, and Liao KP
- Subjects
- Humans, Male, Genetic Variation, Longitudinal Studies, Polymorphism, Single Nucleotide, United States, United States Department of Veterans Affairs, Female, Genetic Predisposition to Disease, Genome-Wide Association Study, Quantitative Trait Loci, Veterans
- Abstract
One of the justifiable criticisms of human genetic studies is the underrepresentation of participants from diverse populations. Lack of inclusion must be addressed at-scale to identify causal disease factors and understand the genetic causes of health disparities. We present genome-wide associations for 2068 traits from 635,969 participants in the Department of Veterans Affairs Million Veteran Program, a longitudinal study of diverse United States Veterans. Systematic analysis revealed 13,672 genomic risk loci; 1608 were only significant after including non-European populations. Fine-mapping identified causal variants at 6318 signals across 613 traits. One-third ( n = 2069) were identified in participants from non-European populations. This reveals a broadly similar genetic architecture across populations, highlights genetic insights gained from underrepresented groups, and presents an extensive atlas of genetic associations.
- Published
- 2024
- Full Text
- View/download PDF
21. Accelerating Genome- and Phenome-Wide Association Studies using GPUs - A case study using data from the Million Veteran Program.
- Author
-
Rodriguez A, Kim Y, Nandi TN, Keat K, Kumar R, Bhukar R, Conery M, Liu M, Hessington J, Maheshwari K, Schmidt D, Begoli E, Tourassi G, Muralidhar S, Natarajan P, Voight BF, Cho K, Gaziano JM, Damrauer SM, Liao KP, Zhou W, Huffman JE, Verma A, and Madduri RK
- Abstract
The expansion of biobanks has significantly propelled genomic discoveries yet the sheer scale of data within these repositories poses formidable computational hurdles, particularly in handling extensive matrix operations required by prevailing statistical frameworks. In this work, we introduce computational optimizations to the SAIGE (Scalable and Accurate Implementation of Generalized Mixed Model) algorithm, notably employing a GPU-based distributed computing approach to tackle these challenges. We applied these optimizations to conduct a large-scale genome-wide association study (GWAS) across 2,068 phenotypes derived from electronic health records of 635,969 diverse participants from the Veterans Affairs (VA) Million Veteran Program (MVP). Our strategies enabled scaling up the analysis to over 6,000 nodes on the Department of Energy (DOE) Oak Ridge Leadership Computing Facility (OLCF) Summit High-Performance Computer (HPC), resulting in a 20-fold acceleration compared to the baseline model. We also provide a Docker container with our optimizations that was successfully used on multiple cloud infrastructures on UK Biobank and All of Us datasets where we showed significant time and cost benefits over the baseline SAIGE model.
- Published
- 2024
- Full Text
- View/download PDF
22. AI and machine learning in medical imaging: key points from development to translation.
- Author
-
Samala RK, Drukker K, Shukla-Dave A, Chan HP, Sahiner B, Petrick N, Greenspan H, Mahmood U, Summers RM, Tourassi G, Deserno TM, Regge D, Näppi JJ, Yoshida H, Huo Z, Chen Q, Vergara D, Cha KH, Mazurchuk R, Grizzard KT, Huisman H, Morra L, Suzuki K, Armato SG 3rd, and Hadjiiski L
- Abstract
Innovation in medical imaging artificial intelligence (AI)/machine learning (ML) demands extensive data collection, algorithmic advancements, and rigorous performance assessments encompassing aspects such as generalizability, uncertainty, bias, fairness, trustworthiness, and interpretability. Achieving widespread integration of AI/ML algorithms into diverse clinical tasks will demand a steadfast commitment to overcoming issues in model design, development, and performance assessment. The complexities of AI/ML clinical translation present substantial challenges, requiring engagement with relevant stakeholders, assessment of cost-effectiveness for user and patient benefit, timely dissemination of information relevant to robust functioning throughout the AI/ML lifecycle, consideration of regulatory compliance, and feedback loops for real-world performance evidence. This commentary addresses several hurdles for the development and adoption of AI/ML technologies in medical imaging. Comprehensive attention to these underlying and often subtle factors is critical not only for tackling the challenges but also for exploring novel opportunities for the advancement of AI in radiology., Competing Interests: Ravi K. Samala – nothing to disclose. Karen Drukker – receives royalties from Hologic. Amita Shukla-Dave – nothing to disclose. Heang-Ping Chan – nothing to disclose. Berkman Sahiner – nothing to disclose. Nicholas Petrick – nothing to disclose. Hayit Greenspan - nothing to disclose. Usman Mahmood – nothing to disclose. Ronald M Summers – received royalties for patents or software licenses from iCAD, Philips, ScanMed, Translation Holdings, PingAn and MGB, received research support from PingAn through a Cooperative Research and Development Agreement, not related to this work. Georgia Tourassi – nothing to disclose. Thomas M. Deserno – nothing to disclose. Daniele Regge – nothing to disclose. Janne J. Näppi – has received royalties from Hologic and from MEDIAN Technologies, through the University of Chicago licensing, not related to this work. Hiroyuki Yoshida – has received royalties from licensing fees to Hologic and Medians Technologies through the University of Chicago licensing, not related to this work. Zhimin Huo – nothing to disclose. Quan Chen – has received compensations from Carina Medical LLC, not related to this work, pro-vides consulting services for Reflexion Medical, not related to this work. Daniel Vergara – nothing to disclose. Kenny Cha – nothing to disclose. Richard Mazurchuk – nothing to disclose. Kevin T. Grizzard – nothing to disclose. Henkjan Huisman – has received grant support from Siemens Healthineers and Canon Medical for a scientific research project, not related to this work. Lia Morra – has received funding from HealthTriagesrl, not related to this work. Kenji Suzuki – provides consulting services for Canon Medical, not related to this work. Samuel G. Armato III – has received royalties and licensing fees for computer-aided diagnosis through the University of Chicago, not related to this work., (Published by Oxford University Press on behalf of the British Institute of Radiology 2024.)
- Published
- 2024
- Full Text
- View/download PDF
23. Artificial intelligence in medicine: mitigating risks and maximizing benefits via quality assurance, quality control, and acceptance testing.
- Author
-
Mahmood U, Shukla-Dave A, Chan HP, Drukker K, Samala RK, Chen Q, Vergara D, Greenspan H, Petrick N, Sahiner B, Huo Z, Summers RM, Cha KH, Tourassi G, Deserno TM, Grizzard KT, Näppi JJ, Yoshida H, Regge D, Mazurchuk R, Suzuki K, Morra L, Huisman H, Armato SG 3rd, and Hadjiiski L
- Abstract
The adoption of artificial intelligence (AI) tools in medicine poses challenges to existing clinical workflows. This commentary discusses the necessity of context-specific quality assurance (QA), emphasizing the need for robust QA measures with quality control (QC) procedures that encompass (1) acceptance testing (AT) before clinical use, (2) continuous QC monitoring, and (3) adequate user training. The discussion also covers essential components of AT and QA, illustrated with real-world examples. We also highlight what we see as the shared responsibility of manufacturers or vendors, regulators, healthcare systems, medical physicists, and clinicians to enact appropriate testing and oversight to ensure a safe and equitable transformation of medicine through AI., Competing Interests: U.M., A.S.-D., H.-P.C., R.K.S., D.V., H.G., N.P., B.S., Z.H., K.C., G.T., T.M.D., D.R., R.M., and L.H. have nothing to disclose. K.D. receives royalties from Hologic. Q.C. has received compensations from Carina Medical LLC, not related to this work, provides consulting services for Reflexion Medical, not related to this work. R.M.S. received royalties for patents or software licenses from iCAD, Philips, ScanMed, Translation Holdings, PingAn, and MGB, and received research support from PingAn through a Cooperative Research and Development Agreement, not related to this work. J.J.N. has received royalties from Hologic and from MEDIAN Technologies, through the University of Chicago licensing, not related to this work. H.Y. has received royalties from licensing fees to Hologic and Medians Technologies through the University of Chicago licensing, not related to this work. K.S. provides consulting services for Canon Medical, not related to this work. L.M. has received funding from HealthTriagesrl, not related to this work. H.H. has received funding from Siemens Healthineers for a scientific research project, not related to this work. SG.A. III has received royalties and licensing fees for computer-aided diagnosis through the University of Chicago Consultant, Novartis, not related to this work., (© The Author(s) 2024. Published by Oxford University Press on behalf of the British Institute of Radiology.)
- Published
- 2024
- Full Text
- View/download PDF
24. Diversity and Scale: Genetic Architecture of 2,068 Traits in the VA Million Veteran Program.
- Author
-
Verma A, Huffman JE, Rodriguez A, Conery M, Liu M, Ho YL, Kim Y, Heise DA, Guare L, Panickan VA, Garcon H, Linares F, Costa L, Goethert I, Tipton R, Honerlaw J, Davies L, Whitbourne S, Cohen J, Posner DC, Sangar R, Murray M, Wang X, Dochtermann DR, Devineni P, Shi Y, Nandi TN, Assimes TL, Brunette CA, Carroll RJ, Clifford R, Duvall S, Gelernter J, Hung A, Iyengar SK, Joseph J, Kember R, Kranzler H, Levey D, Luoh SW, Merritt VC, Overstreet C, Deak JD, Grant SFA, Polimanti R, Roussos P, Sun YV, Venkatesh S, Voloudakis G, Justice A, Begoli E, Ramoni R, Tourassi G, Pyarajan S, Tsao PS, O'Donnell CJ, Muralidhar S, Moser J, Casas JP, Bick AG, Zhou W, Cai T, Voight BF, Cho K, Gaziano MJ, Madduri RK, Damrauer SM, and Liao KP
- Abstract
Genome-wide association studies (GWAS) have underrepresented individuals from non-European populations, impeding progress in characterizing the genetic architecture and consequences of health and disease traits. To address this, we present a population-stratified phenome-wide GWAS followed by a multi-population meta-analysis for 2,068 traits derived from electronic health records of 635,969 participants in the Million Veteran Program (MVP), a longitudinal cohort study of diverse U.S. Veterans genetically similar to the respective African (121,177), Admixed American (59,048), East Asian (6,702), and European (449,042) superpopulations defined by the 1000 Genomes Project. We identified 38,270 independent variants associating with one or more traits at experiment-wide P < 4.6 × 10 - 11 significance; fine-mapping 6,318 signals identified from 613 traits to single-variant resolution. Among these, a third (2,069) of the associations were found only among participants genetically similar to non-European reference populations, demonstrating the importance of expanding diversity in genetic studies. Our work provides a comprehensive atlas of phenome-wide genetic associations for future studies dissecting the architecture of complex traits in diverse populations., Competing Interests: Competing interests CJD and JPC are employed full-time by the Novartis Institute of Biomedical Interest (their major contributions to this project were while employed at VA Boston Healthcare System). H.K. is a member of advisory boards for Dicerna Pharmaceuticals, Sophrosyne Pharmaceuticals; Enthion Pharmaceuticals; and Clearmind Medicine; a consultant to Sobrera Pharmaceuticals; the recipient of research funding and medication supplies for an investigator-initiated study from Alkermes member of the American Society of Clinical Psychopharmacology’s Alcohol Clinical Trials Initiative; which was supported in the last three years by Alkermes, Dicerna, Ethypharm, Lundbeck, Mitsubishi, Otsuka, and Pear Therapeutics; and holder of U.S. patent 10,900,082 titled: “Genotype-guided dosing of opioid agonists,” issued 26 January 2021. JG and RP are paid for their editorial work in the journal Complex Psychiatry. RP reports a research grant from Alkermes. SD reports grants from Alnylam Pharmaceuticals, Inc, grants from Astellas Pharma, Inc; grants from AstraZeneca Pharmaceuticals LP; grants from Biodesix, grants from Celgene Corporation; grants from Cerner Enviza; grants from GlaxoSmithKline PLC, grants from Janssen Pharmaceuticals, Inc.; grants from Kantar Health; grants from Myriad Genetic Laboratories, Inc.; grants from Novartis International AG; grants from Parexel International Corporation through the University of Utah or Western Institute for Veteran Research outside the submitted work. SMD receives research support from RenalytixAI and Novo Nordisk, outside the scope of the current research, and is named as a co-inventor on a Government-owned US Patent application related to the use of genetic risk prediction for venous thromboembolic disease and for the use of PDE3B inhibition for preventing cardiovascular disease, both filed by the US Department of Veterans Affairs in accordance with Federal regulatory requirements.
- Published
- 2023
- Full Text
- View/download PDF
25. BigNeuron: a resource to benchmark and predict performance of algorithms for automated tracing of neurons in light microscopy datasets.
- Author
-
Manubens-Gil L, Zhou Z, Chen H, Ramanathan A, Liu X, Liu Y, Bria A, Gillette T, Ruan Z, Yang J, Radojević M, Zhao T, Cheng L, Qu L, Liu S, Bouchard KE, Gu L, Cai W, Ji S, Roysam B, Wang CW, Yu H, Sironi A, Iascone DM, Zhou J, Bas E, Conde-Sousa E, Aguiar P, Li X, Li Y, Nanda S, Wang Y, Muresan L, Fua P, Ye B, He HY, Staiger JF, Peter M, Cox DN, Simonneau M, Oberlaender M, Jefferis G, Ito K, Gonzalez-Bellido P, Kim J, Rubel E, Cline HT, Zeng H, Nern A, Chiang AS, Yao J, Roskams J, Livesey R, Stevens J, Liu T, Dang C, Guo Y, Zhong N, Tourassi G, Hill S, Hawrylycz M, Koch C, Meijering E, Ascoli GA, and Peng H
- Subjects
- Imaging, Three-Dimensional methods, Neurons physiology, Algorithms, Benchmarking, Microscopy methods
- Abstract
BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is representative of the data obtained in many neuroscience laboratories interested in neuron tracing. Here, we report generated gold standard manual annotations for a subset of the available imaging datasets and quantified tracing quality for 35 automatic tracing algorithms. The goal of generating such a hand-curated diverse dataset is to advance the development of tracing algorithms and enable generalizable benchmarking. Together with image quality features, we pooled the data in an interactive web application that enables users and developers to perform principal component analysis, t-distributed stochastic neighbor embedding, correlation and clustering, visualization of imaging and tracing data, and benchmarking of automatic tracing algorithms in user-defined data subsets. The image quality metrics explain most of the variance in the data, followed by neuromorphological features related to neuron size. We observed that diverse algorithms can provide complementary information to obtain accurate results and developed a method to iteratively combine methods and generate consensus reconstructions. The consensus trees obtained provide estimates of the neuron structure ground truth that typically outperform single algorithms in noisy datasets. However, specific algorithms may outperform the consensus tree strategy in specific imaging conditions. Finally, to aid users in predicting the most accurate automatic tracing results without manual annotations for comparison, we used support vector machine regression to predict reconstruction quality given an image volume and a set of automatic tracings., (© 2023. The Author(s), under exclusive licence to Springer Nature America, Inc.)
- Published
- 2023
- Full Text
- View/download PDF
26. The United States Department of Energy and National Institutes of Health Collaboration: Medical Care Advances via Discovery in Physical Sciences.
- Author
-
Keppel C, Weisenberger A, Atanasijevic T, Wang S, Zubal G, Buchsbaum J, Brechbiel M, Capala J, Escorcia F, Obcemea C, Boehnlein A, Heyes G, Bourne P, Cherry S, Colby E, El Fakhri G, Gillo J, Gropler R, Gueye P, Tourassi G, Peggs S, and Woody C
- Subjects
- United States, Artificial Intelligence, National Institutes of Health (U.S.), Laboratories, Biomedical Research, Natural Science Disciplines
- Abstract
Over several months, representatives from the U.S. Department of Energy (DOE) Office of Science and National Institutes of Health (NIH) had a number of meetings that lead to the conclusion that innovations in the Nation's health care could be realized by more directed interactions between NIH and DOE. It became clear that the expertise amassed and instrumentation advances developed at the DOE physical science laboratories to enable cutting-edge research in particle physics could also feed innovation in medical healthcare. To meet their scientific mission, the DOE laboratories created advances in such technologies as particle beam generation, radioisotope production, high-energy particle detection and imaging, superconducting particle accelerators, superconducting magnets, cryogenics, high-speed electronics, artificial intelligence, and big data. To move forward, NIH and DOE initiated the process of convening a joint workshop which occurred on July 12th and 13th, 2021. This Special Report presents a summary of the findings of the collaborative workshop and introduces the goals of the next one., (© 2023 American Association of Physicists in Medicine. This article has been contributed to by U.S. Government employees and their work is in the public domain in the USA.)
- Published
- 2023
- Full Text
- View/download PDF
27. AAPM task group report 273: Recommendations on best practices for AI and machine learning for computer-aided diagnosis in medical imaging.
- Author
-
Hadjiiski L, Cha K, Chan HP, Drukker K, Morra L, Näppi JJ, Sahiner B, Yoshida H, Chen Q, Deserno TM, Greenspan H, Huisman H, Huo Z, Mazurchuk R, Petrick N, Regge D, Samala R, Summers RM, Suzuki K, Tourassi G, Vergara D, and Armato SG 3rd
- Subjects
- Humans, Reproducibility of Results, Diagnostic Imaging, Machine Learning, Artificial Intelligence, Diagnosis, Computer-Assisted methods
- Abstract
Rapid advances in artificial intelligence (AI) and machine learning, and specifically in deep learning (DL) techniques, have enabled broad application of these methods in health care. The promise of the DL approach has spurred further interest in computer-aided diagnosis (CAD) development and applications using both "traditional" machine learning methods and newer DL-based methods. We use the term CAD-AI to refer to this expanded clinical decision support environment that uses traditional and DL-based AI methods. Numerous studies have been published to date on the development of machine learning tools for computer-aided, or AI-assisted, clinical tasks. However, most of these machine learning models are not ready for clinical deployment. It is of paramount importance to ensure that a clinical decision support tool undergoes proper training and rigorous validation of its generalizability and robustness before adoption for patient care in the clinic. To address these important issues, the American Association of Physicists in Medicine (AAPM) Computer-Aided Image Analysis Subcommittee (CADSC) is charged, in part, to develop recommendations on practices and standards for the development and performance assessment of computer-aided decision support systems. The committee has previously published two opinion papers on the evaluation of CAD systems and issues associated with user training and quality assurance of these systems in the clinic. With machine learning techniques continuing to evolve and CAD applications expanding to new stages of the patient care process, the current task group report considers the broader issues common to the development of most, if not all, CAD-AI applications and their translation from the bench to the clinic. The goal is to bring attention to the proper training and validation of machine learning algorithms that may improve their generalizability and reliability and accelerate the adoption of CAD-AI systems for clinical decision support., (© 2022 American Association of Physicists in Medicine.)
- Published
- 2023
- Full Text
- View/download PDF
28. Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports.
- Author
-
De Angeli K, Gao S, Blanchard A, Durbin EB, Wu XC, Stroup A, Doherty J, Schwartz SM, Wiggins C, Coyle L, Penberthy L, Tourassi G, and Yoon HJ
- Abstract
Objective: We aim to reduce overfitting and model overconfidence by distilling the knowledge of an ensemble of deep learning models into a single model for the classification of cancer pathology reports., Materials and Methods: We consider the text classification problem that involves 5 individual tasks. The baseline model consists of a multitask convolutional neural network (MtCNN), and the implemented ensemble (teacher) consists of 1000 MtCNNs. We performed knowledge transfer by training a single model (student) with soft labels derived through the aggregation of ensemble predictions. We evaluate performance based on accuracy and abstention rates by using softmax thresholding., Results: The student model outperforms the baseline MtCNN in terms of abstention rates and accuracy, thereby allowing the model to be used with a larger volume of documents when deployed. The highest boost was observed for subsite and histology, for which the student model classified an additional 1.81% reports for subsite and 3.33% reports for histology., Discussion: Ensemble predictions provide a useful strategy for quantifying the uncertainty inherent in labeled data and thereby enable the construction of soft labels with estimated probabilities for multiple classes for a given document. Training models with the derived soft labels reduce model confidence in difficult-to-classify documents, thereby leading to a reduction in the number of highly confident wrong predictions., Conclusions: Ensemble model distillation is a simple tool to reduce model overconfidence in problems with extreme class imbalance and noisy datasets. These methods can facilitate the deployment of deep learning models in high-risk domains with low computational resources where minimizing inference time is required., (© The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2022
- Full Text
- View/download PDF
29. Predictive Radiation Oncology - A New NCI-DOE Scientific Space and Community.
- Author
-
Buchsbaum JC, Jaffray DA, Ba D, Borkon LL, Chalk C, Chung C, Coleman MA, Coleman CN, Diehn M, Droegemeier KK, Enderling H, Espey MG, Greenspan EJ, Hartshorn CM, Hoang T, Hsiao HT, Keppel C, Moore NW, Prior F, Stahlberg EA, Tourassi G, and Willcox KE
- Subjects
- Academies and Institutes, Humans, National Cancer Institute (U.S.), United States, Radiation Oncology education
- Abstract
With a widely attended virtual kickoff event on January 29, 2021, the National Cancer Institute (NCI) and the Department of Energy (DOE) launched a series of 4 interactive, interdisciplinary workshops-and a final concluding "World Café" on March 29, 2021-focused on advancing computational approaches for predictive oncology in the clinical and research domains of radiation oncology. These events reflect 3,870 human hours of virtual engagement with representation from 8 DOE national laboratories and the Frederick National Laboratory for Cancer Research (FNL), 4 research institutes, 5 cancer centers, 17 medical schools and teaching hospitals, 5 companies, 5 federal agencies, 3 research centers, and 27 universities. Here we summarize the workshops by first describing the background for the workshops. Participants identified twelve key questions-and collaborative parallel ideas-as the focus of work going forward to advance the field. These were then used to define short-term and longer-term "Blue Sky" goals. In addition, the group determined key success factors for predictive oncology in the context of radiation oncology, if not the future of all of medicine. These are: cross-discipline collaboration, targeted talent development, development of mechanistic mathematical and computational models and tools, and access to high-quality multiscale data that bridges mechanisms to phenotype. The workshop participants reported feeling energized and highly motivated to pursue next steps together to address the unmet needs in radiation oncology specifically and in cancer research generally and that NCI and DOE project goals align at the convergence of radiation therapy and advanced computing., (©2022 by Radiation Research Society. All rights of reproduction in any form reserved.)
- Published
- 2022
- Full Text
- View/download PDF
30. Artificial intelligence in cancer research, diagnosis and therapy.
- Author
-
Elemento O, Leslie C, Lundin J, and Tourassi G
- Subjects
- Artificial Intelligence, Drug Discovery methods, Humans, Machine Learning, Medical Oncology, Biomedical Research, Neoplasms diagnosis, Neoplasms therapy
- Abstract
Standfirst: Artificial intelligence and machine learning techniques are breaking into biomedical research and health care, which importantly includes cancer research and oncology, where the potential applications are vast. These include detection and diagnosis of cancer, subtype classification, optimization of cancer treatment and identification of new therapeutic targets in drug discovery. While big data used to train machine learning models may already exist, leveraging this opportunity to realize the full promise of artificial intelligence in both the cancer research space and the clinical space will first require significant obstacles to be surmounted. In this Viewpoint article, we asked four experts for their opinions on how we can begin to implement artificial intelligence while ensuring standards are maintained so as transform cancer diagnosis and the prognosis and treatment of patients with cancer and to drive biological discovery., (© 2021. ©UT-Battelle, LLC, under exclusive licence to Springer Nature Limited 2021.)
- Published
- 2021
- Full Text
- View/download PDF
31. Pharmacoepidemiology, Machine Learning, and COVID-19: An Intent-to-Treat Analysis of Hydroxychloroquine, With or Without Azithromycin, and COVID-19 Outcomes Among Hospitalized US Veterans.
- Author
-
Gerlovin H, Posner DC, Ho YL, Rentsch CT, Tate JP, King JT, Kurgansky KE, Danciu I, Costa L, Linares FA, Goethert ID, Jacobson DA, Freiberg MS, Begoli E, Muralidhar S, Ramoni RB, Tourassi G, Gaziano JM, Justice AC, Gagnon DR, and Cho K
- Subjects
- Aged, Aged, 80 and over, Anti-Bacterial Agents adverse effects, Azithromycin adverse effects, COVID-19 mortality, Drug Therapy, Combination, Female, Humans, Hydroxychloroquine adverse effects, Intention to Treat Analysis, Machine Learning, Male, Middle Aged, Pharmacoepidemiology, Retrospective Studies, SARS-CoV-2, Treatment Outcome, United States epidemiology, Anti-Bacterial Agents therapeutic use, Azithromycin therapeutic use, Hospitalization statistics & numerical data, Hydroxychloroquine therapeutic use, Veterans statistics & numerical data, COVID-19 Drug Treatment
- Abstract
Hydroxychloroquine (HCQ) was proposed as an early therapy for coronavirus disease 2019 (COVID-19) after in vitro studies indicated possible benefit. Previous in vivo observational studies have presented conflicting results, though recent randomized clinical trials have reported no benefit from HCQ among patients hospitalized with COVID-19. We examined the effects of HCQ alone and in combination with azithromycin in a hospitalized population of US veterans with COVID-19, using a propensity score-adjusted survival analysis with imputation of missing data. According to electronic health record data from the US Department of Veterans Affairs health care system, 64,055 US Veterans were tested for the virus that causes COVID-19 between March 1, 2020 and April 30, 2020. Of the 7,193 veterans who tested positive, 2,809 were hospitalized, and 657 individuals were prescribed HCQ within the first 48-hours of hospitalization for the treatment of COVID-19. There was no apparent benefit associated with HCQ receipt, alone or in combination with azithromycin, and there was an increased risk of intubation when HCQ was used in combination with azithromycin (hazard ratio = 1.55; 95% confidence interval: 1.07, 2.24). In conclusion, we assessed the effectiveness of HCQ with or without azithromycin in treatment of patients hospitalized with COVID-19, using a national sample of the US veteran population. Using rigorous study design and analytic methods to reduce confounding and bias, we found no evidence of a survival benefit from the administration of HCQ., (Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health 2021.)
- Published
- 2021
- Full Text
- View/download PDF
32. Limitations of Transformers on Clinical Text Classification.
- Author
-
Gao S, Alawad M, Young MT, Gounley J, Schaefferkoetter N, Yoon HJ, Wu XC, Durbin EB, Doherty J, Stroup A, Coyle L, and Tourassi G
- Subjects
- Humans, Natural Language Processing, Neural Networks, Computer
- Abstract
Bidirectional Encoder Representations from Transformers (BERT) and BERT-based approaches are the current state-of-the-art in many natural language processing (NLP) tasks; however, their application to document classification on long clinical texts is limited. In this work, we introduce four methods to scale BERT, which by default can only handle input sequences up to approximately 400 words long, to perform document classification on clinical texts several thousand words long. We compare these methods against two much simpler architectures - a word-level convolutional neural network and a hierarchical self-attention network - and show that BERT often cannot beat these simpler baselines when classifying MIMIC-III discharge summaries and SEER cancer pathology reports. In our analysis, we show that two key components of BERT - pretraining and WordPiece tokenization - may actually be inhibiting BERT's performance on clinical text classification tasks where the input document is several thousand words long and where correctly identifying labels may depend more on identifying a few key words or phrases rather than understanding the contextual meaning of sequences of text.
- Published
- 2021
- Full Text
- View/download PDF
33. Privacy-Preserving Deep Learning NLP Models for Cancer Registries.
- Author
-
Alawad M, Yoon HJ, Gao S, Mumphrey B, Wu XC, Durbin EB, Jeong JC, Hands I, Rust D, Coyle L, Penberthy L, and Tourassi G
- Abstract
Population cancer registries can benefit from Deep Learning (DL) to automatically extract cancer characteristics from the high volume of unstructured pathology text reports they process annually. The success of DL to tackle this and other real-world problems is proportional to the availability of large labeled datasets for model training. Although collaboration among cancer registries is essential to fully exploit the promise of DL, privacy and confidentiality concerns are main obstacles for data sharing across cancer registries. Moreover, DL for natural language processing (NLP) requires sharing a vocabulary dictionary for the embedding layer which may contain patient identifiers. Thus, even distributing the trained models across cancer registries causes a privacy violation issue. In this paper, we propose DL NLP model distribution via privacy-preserving transfer learning approaches without sharing sensitive data. These approaches are used to distribute a multitask convolutional neural network (MT-CNN) NLP model among cancer registries. The model is trained to extract six key cancer characteristics - tumor site, subsite, laterality, behavior, histology, and grade - from cancer pathology reports. Using 410,064 pathology documents from two cancer registries, we compare our proposed approach to conventional transfer learning without privacy-preserving, single-registry models, and a model trained on centrally hosted data. The results show that transfer learning approaches including data sharing and model distribution outperform significantly the single-registry model. In addition, the best performing privacy-preserving model distribution approach achieves statistically indistinguishable average micro- and macro-F1 scores across all extraction tasks (0.823,0.580) as compared to the centralized model (0.827,0.585).
- Published
- 2021
- Full Text
- View/download PDF
34. COVID-19 Evidence Accelerator: A parallel analysis to describe the use of Hydroxychloroquine with or without Azithromycin among hospitalized COVID-19 patients.
- Author
-
Stewart M, Rodriguez-Watson C, Albayrak A, Asubonteng J, Belli A, Brown T, Cho K, Das R, Eldridge E, Gatto N, Gelman A, Gerlovin H, Goldberg SL, Hansen E, Hirsch J, Ho YL, Ip A, Izano M, Jones J, Justice AC, Klesh R, Kuranz S, Lam C, Mao Q, Mataraso S, Mera R, Posner DC, Rassen JA, Siefkas A, Schrag A, Tourassi G, Weckstein A, Wolf F, Bhat A, Winckler S, Sigal EV, and Allen J
- Subjects
- Data Management methods, Drug Therapy, Combination methods, Female, Hospitalization, Humans, Male, SARS-CoV-2 drug effects, Antiviral Agents therapeutic use, Azithromycin therapeutic use, Hydroxychloroquine therapeutic use, Pandemics prevention & control, COVID-19 Drug Treatment
- Abstract
Background: The COVID-19 pandemic remains a significant global threat. However, despite urgent need, there remains uncertainty surrounding best practices for pharmaceutical interventions to treat COVID-19. In particular, conflicting evidence has emerged surrounding the use of hydroxychloroquine and azithromycin, alone or in combination, for COVID-19. The COVID-19 Evidence Accelerator convened by the Reagan-Udall Foundation for the FDA, in collaboration with Friends of Cancer Research, assembled experts from the health systems research, regulatory science, data science, and epidemiology to participate in a large parallel analysis of different data sets to further explore the effectiveness of these treatments., Methods: Electronic health record (EHR) and claims data were extracted from seven separate databases. Parallel analyses were undertaken on data extracted from each source. Each analysis examined time to mortality in hospitalized patients treated with hydroxychloroquine, azithromycin, and the two in combination as compared to patients not treated with either drug. Cox proportional hazards models were used, and propensity score methods were undertaken to adjust for confounding. Frequencies of adverse events in each treatment group were also examined., Results: Neither hydroxychloroquine nor azithromycin, alone or in combination, were significantly associated with time to mortality among hospitalized COVID-19 patients. No treatment groups appeared to have an elevated risk of adverse events., Conclusion: Administration of hydroxychloroquine, azithromycin, and their combination appeared to have no effect on time to mortality in hospitalized COVID-19 patients. Continued research is needed to clarify best practices surrounding treatment of COVID-19., Competing Interests: The authors have read the journal’s policy and the authors of this manuscript have the following competing interests: AA is a paid employee and stockholder at Health Catalyst. JA is a paid employee and stockholder at Gilead Sciences. AB is a paid employee of COTA, Inc with ownership interest (equity). TB is a paid employee of Sypase. NG is a paid employee and shareholder of Aetion, Inc. SG has equity ownership with COTA, Inc. EH is a paid employee by COTA, Inc. with ownership interest (equity). JH is Founder and President of Syapse with pharmaceutical company funders including Roche, Amgen, Merck & Co. (Syapse employees engaged in design, collection, analysis, interpretation, writing, and the decision to submit for publication). JH also reported being an advisor for Freenome. MI is a paid employee of Syapse. SK is a paid employee of TriNetX, LLC. RM is a paid employee and shareholder of Gilead Sciences. JR is a paid employee of and shareholder in Aetion, Inc., a company that makes software for the analysis of real-world data. AW is an employee and shareholder of Aetion, Inc., a company that makes software for the analysis of real-world data. This does not alter our adherence to PLOS One policies on sharing data and material. There are no patents, products in development, or marketed products associated with this research to declare.
- Published
- 2021
- Full Text
- View/download PDF
35. Deep active learning for classifying cancer pathology reports.
- Author
-
De Angeli K, Gao S, Alawad M, Yoon HJ, Schaefferkoetter N, Wu XC, Durbin EB, Doherty J, Stroup A, Coyle L, Penberthy L, and Tourassi G
- Subjects
- Algorithms, Humans, Neural Networks, Computer, Machine Learning, Neoplasms genetics, Neoplasms pathology
- Abstract
Background: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model., Results: We compare the performance of each active learning strategy using two differently sized datasets and two different classification tasks. Our results show that on all tasks and dataset sizes, all active learning strategies except diversity-sampling strategies outperformed random sampling, i.e., no active learning. On our large dataset (15K initial labelled samples, adding 15K additional labelled samples each iteration of active learning), there was no clear winner between the different active learning strategies. On our small dataset (1K initial labelled samples, adding 1K additional labelled samples each iteration of active learning), marginal and ratio uncertainty sampling performed better than all other active learning techniques. We found that compared to random sampling, active learning strongly helps performance on rare classes by focusing on underrepresented classes., Conclusions: Active learning can save annotation cost by helping human annotators efficiently and intelligently select which samples to label. Our results show that a dataset constructed using effective active learning techniques requires less than half the amount of labelled data to achieve the same performance as a dataset constructed using random sampling.
- Published
- 2021
- Full Text
- View/download PDF
36. Knowledge Graph-Enabled Cancer Data Analytics.
- Author
-
Hasan SMS, Rivera D, Wu XC, Durbin EB, Christian JB, and Tourassi G
- Subjects
- Adult, Aged, Aged, 80 and over, Algorithms, Databases, Factual, Female, Humans, Incidence, Male, Middle Aged, Knowledge Bases, Neoplasms diagnosis, Neoplasms epidemiology, Neoplasms physiopathology, Registries
- Abstract
Cancer registries collect unstructured and structured cancer data for surveillance purposes which provide important insights regarding cancer characteristics, treatments, and outcomes. Cancer registry data typically (1) categorize each reportable cancer case or tumor at the time of diagnosis, (2) contain demographic information about the patient such as age, gender, and location at time of diagnosis, (3) include planned and completed primary treatment information, and (4) may contain survival outcomes. As structured data is being extracted from various unstructured sources, such as pathology reports, radiology reports, medical records, and stored for reporting and other needs, the associated information representing a reportable cancer is constantly expanding and evolving. While some popular analytic approaches including SEER*Stat and SAS exist, we provide a knowledge graph approach to organizing cancer registry data. Our approach offers unique advantages for timely data analysis and presentation and visualization of valuable information. This knowledge graph approach semantically enriches the data, and easily enables linking with third-party data which can help explain variation in cancer incidence patterns, disparities, and outcomes. We developed a prototype knowledge graph based on the Louisiana Tumor Registry dataset. We present the advantages of the knowledge graph approach by examining: i) scenario-specific queries, ii) links with openly available external datasets, iii) schema evolution for iterative analysis, and iv) data visualization. Our results demonstrate that this graph based solution can perform complex queries, improve query run-time performance by up to 76%, and more easily conduct iterative analyses to enhance researchers' understanding of cancer registry data.
- Published
- 2020
- Full Text
- View/download PDF
37. Using case-level context to classify cancer pathology reports.
- Author
-
Gao S, Alawad M, Schaefferkoetter N, Penberthy L, Wu XC, Durbin EB, Coyle L, Ramanathan A, and Tourassi G
- Subjects
- Histological Techniques, Humans, Natural Language Processing, SEER Program, Electronic Health Records classification, Neoplasms pathology
- Abstract
Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence-for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks-site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks., Competing Interests: Author LC is employed by the commercial company Information Management Services Inc (IMS). This does not alter our adherence to PLOS ONE policies on sharing data and materials.
- Published
- 2020
- Full Text
- View/download PDF
38. Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.
- Author
-
Alawad M, Gao S, Qiu JX, Yoon HJ, Blair Christian J, Penberthy L, Mumphrey B, Wu XC, Coyle L, and Tourassi G
- Subjects
- Humans, Neoplasms classification, Support Vector Machine, Information Storage and Retrieval methods, Machine Learning, Natural Language Processing, Neoplasms pathology, Neural Networks, Computer, Registries
- Abstract
Objective: We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficiency., Materials and Methods: Multitask CNN (MTCNN) attempts to tackle document information extraction by learning to extract multiple key cancer characteristics simultaneously. We trained our MTCNN to perform 5 information extraction tasks: (1) primary cancer site (65 classes), (2) laterality (4 classes), (3) behavior (3 classes), (4) histological type (63 classes), and (5) histological grade (5 classes). We evaluated the performance on a corpus of 95 231 pathology documents (71 223 unique tumors) obtained from the Louisiana Tumor Registry. We compared the performance of the MTCNN models against single-task CNN models and 2 traditional machine learning approaches, namely support vector machine (SVM) and random forest classifier (RFC)., Results: MTCNNs offered superior performance across all 5 tasks in terms of classification accuracy as compared with the other machine learning models. Based on retrospective evaluation, the hard parameter sharing and cross-stitch MTCNN models correctly classified 59.04% and 57.93% of the pathology reports respectively across all 5 tasks. The baseline models achieved 53.68% (CNN), 46.37% (RFC), and 36.75% (SVM). Based on prospective evaluation, the percentages of correctly classified cases across the 5 tasks were 60.11% (hard parameter sharing), 58.13% (cross-stitch), 51.30% (single-task CNN), 42.07% (RFC), and 35.16% (SVM). Moreover, hard parameter sharing MTCNNs outperformed the other models in computational efficiency by using about the same number of trainable parameters as a single-task CNN., Conclusions: The hard parameter sharing MTCNN offers superior classification accuracy for automated coding support of pathology documents across a wide range of cancers and multiple information extraction tasks while maintaining similar training and inference time as those of a single task-specific model., (© The Author(s) 2019. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2020
- Full Text
- View/download PDF
39. Classifying cancer pathology reports with hierarchical self-attention networks.
- Author
-
Gao S, Qiu JX, Alawad M, Hinkle JD, Schaefferkoetter N, Yoon HJ, Christian B, Fearn PA, Penberthy L, Wu XC, Coyle L, Tourassi G, and Ramanathan A
- Subjects
- Deep Learning, Humans, Natural Language Processing, Neoplasms classification, Neural Networks, Computer, Neoplasms pathology
- Abstract
We introduce a deep learning architecture, hierarchical self-attention networks (HiSANs), designed for classifying pathology reports and show how its unique architecture leads to a new state-of-the-art in accuracy, faster training, and clear interpretability. We evaluate performance on a corpus of 374,899 pathology reports obtained from the National Cancer Institute's (NCI) Surveillance, Epidemiology, and End Results (SEER) program. Each pathology report is associated with five clinical classification tasks - site, laterality, behavior, histology, and grade. We compare the performance of the HiSAN against other machine learning and deep learning approaches commonly used on medical text data - Naive Bayes, logistic regression, convolutional neural networks, and hierarchical attention networks (the previous state-of-the-art). We show that HiSANs are superior to other machine learning and deep learning text classifiers in both accuracy and macro F-score across all five classification tasks. Compared to the previous state-of-the-art, hierarchical attention networks, HiSANs not only are an order of magnitude faster to train, but also achieve about 1% better relative accuracy and 5% better relative macro F-score., (Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.)
- Published
- 2019
- Full Text
- View/download PDF
40. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records.
- Author
-
Savova GK, Danciu I, Alamudun F, Miller T, Lin C, Bitterman DS, Tourassi G, and Warner JL
- Subjects
- Electronic Health Records, Humans, Machine Learning, Natural Language Processing, Phenotype, Data Mining methods, Medical Oncology methods
- Abstract
Current models for correlating electronic medical records with -omics data largely ignore clinical text, which is an important source of phenotype information for patients with cancer. This data convergence has the potential to reveal new insights about cancer initiation, progression, metastasis, and response to treatment. Insights from this real-world data will catalyze clinical care, research, and regulatory activities. Natural language processing (NLP) methods are needed to extract these rich cancer phenotypes from clinical text. Here, we review the advances of NLP and information extraction methods relevant to oncology based on publications from PubMed as well as NLP and machine learning conference proceedings in the last 3 years. Given the interdisciplinary nature of the fields of oncology and information extraction, this analysis serves as a critical trail marker on the path to higher fidelity oncology phenotypes from real-world data., (©2019 American Association for Cancer Research.)
- Published
- 2019
- Full Text
- View/download PDF
41. AI Meets Exascale Computing: Advancing Cancer Research With Large-Scale High Performance Computing.
- Author
-
Bhattacharya T, Brettin T, Doroshow JH, Evrard YA, Greenspan EJ, Gryshuk AL, Hoang TT, Lauzon CBV, Nissley D, Penberthy L, Stahlberg E, Stevens R, Streitz F, Tourassi G, Xia F, and Zaki G
- Abstract
The application of data science in cancer research has been boosted by major advances in three primary areas: (1) Data: diversity, amount, and availability of biomedical data; (2) Advances in Artificial Intelligence (AI) and Machine Learning (ML) algorithms that enable learning from complex, large-scale data; and (3) Advances in computer architectures allowing unprecedented acceleration of simulation and machine learning algorithms. These advances help build in silico ML models that can provide transformative insights from data including: molecular dynamics simulations, next-generation sequencing, omics, imaging, and unstructured clinical text documents. Unique challenges persist, however, in building ML models related to cancer, including: (1) access, sharing, labeling, and integration of multimodal and multi-institutional data across different cancer types; (2) developing AI models for cancer research capable of scaling on next generation high performance computers; and (3) assessing robustness and reliability in the AI models. In this paper, we review the National Cancer Institute (NCI) -Department of Energy (DOE) collaboration, Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) , a multi-institution collaborative effort focused on advancing computing and data technologies to accelerate cancer research on three levels: molecular, cellular, and population. This collaboration integrates various types of generated data, pre-exascale compute resources, and advances in ML models to increase understanding of basic cancer biology, identify promising new treatment options, predict outcomes, and eventually prescribe specialized treatments for patients with cancer., (Copyright © 2019 Bhattacharya, Brettin, Doroshow, Evrard, Greenspan, Gryshuk, Hoang, Lauzon, Nissley, Penberthy, Stahlberg, Stevens, Streitz, Tourassi, Xia and Zaki.)
- Published
- 2019
- Full Text
- View/download PDF
42. Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports.
- Author
-
Alawad M, Gao S, Qiu J, Schaefferkoetter N, Hinkle JD, Yoon HJ, Christian JB, Wu XC, Durbin EB, Jeong JC, Hands I, Rust D, and Tourassi G
- Abstract
Automated text information extraction from cancer pathology reports is an active area of research to support national cancer surveillance. A well-known challenge is how to develop information extraction tools with robust performance across cancer registries. In this study we investigated whether transfer learning (TL) with a convolutional neural network (CNN) can facilitate cross-registry knowledge sharing. Specifically, we performed a series of experiments to determine whether a CNN trained with single-registry data is capable of transferring knowledge to another registry or whether developing a cross-registry knowledge database produces a more effective and generalizable model. Using data from two cancer registries and primary tumor site and topography as the information extraction task of interest, our study showed that TL results in 6.90% and 17.22% improvement of classification macro F-score over the baseline single-registry models. Detailed analysis illustrated that the observed improvement is evident in the low prevalence classes.
- Published
- 2019
- Full Text
- View/download PDF
43. CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules.
- Author
-
Hengartner N, Cuellar L, Wu XC, Tourassi G, Qiu J, Christian B, and Bhattacharya T
- Subjects
- Bayes Theorem, Humans, Information Storage and Retrieval, Computers trends, Databases, Factual trends, Triage methods
- Abstract
Background: Manual extraction of information from electronic pathology (epath) reports to populate the Surveillance, Epidemiology, and End Result (SEER) database is labor intensive. Systematizing the data extraction automatically using machine-learning (ML) and natural language processing (NLP) is desirable to reduce the human labor required to populate the SEER database and to improve the timeliness of the data. This enables scaling up registry efficiency and collection of new data elements. To ensure the integrity, quality, and continuity of the SEER data, the misclassification error of ML and NPL algorithms needs to be negligible. Current algorithms fail to achieve the precision of human experts who can bring additional information in their assessments. Differences in registry format and the desire to develop a common information extraction platform further complicate the ML/NLP tasks. The purpose of our study is to develop triage rules to partially automate registry workflow to improve the precision of the auto-extracted information., Results: This paper presents a mathematical framework to improve the precision of a classifier beyond that of the Bayes classifier by selectively classifying item that are most likely to be correct. This results in a triage rule that only classifies a subset of the item. We characterize the optimal triage rule and demonstrate its usefulness in the problem of classifying cancer site from electronic pathology reports to achieve a desired precision., Conclusions: From the mathematical formalism, we propose a heuristic estimate for triage rule based on post-processing the soft-max output from standard machine learning algorithms. We show, in test cases, that the triage rule significantly improve the classification accuracy.
- Published
- 2018
- Full Text
- View/download PDF
44. Modeling sequential context effects in diagnostic interpretation of screening mammograms.
- Author
-
Alamudun F, Paulus P, Yoon HJ, and Tourassi G
- Abstract
Prior research has shown that physicians' medical decisions can be influenced by sequential context, particularly in cases where successive stimuli exhibit similar characteristics when analyzing medical images. This type of systematic error is known to psychophysicists as sequential context effect as it indicates that judgments are influenced by features of and decisions about the preceding case in the sequence of examined cases, rather than being based solely on the peculiarities unique to the present case. We determine if radiologists experience some form of context bias, using screening mammography as the use case. To this end, we explore correlations between previous perceptual behavior and diagnostic decisions and current decisions. We hypothesize that a radiologist's visual search pattern and diagnostic decisions in previous cases are predictive of the radiologist's current diagnostic decisions. To test our hypothesis, we tasked 10 radiologists of varied experience to conduct blind reviews of 100 four-view screening mammograms. Eye-tracking data and diagnostic decisions were collected from each radiologist under conditions mimicking clinical practice. Perceptual behavior was quantified using the fractal dimension of gaze scanpath, which was computed using the Minkowski-Bouligand box-counting method. To test the effect of previous behavior and decisions, we conducted a multifactor fixed-effects ANOVA. Further, to examine the predictive value of previous perceptual behavior and decisions, we trained and evaluated a predictive model for radiologists' current diagnostic decisions. ANOVA tests showed that previous visual behavior, characterized by fractal analysis, previous diagnostic decisions, and image characteristics of previous cases are significant predictors of current diagnostic decisions. Additionally, predictive modeling of diagnostic decisions showed an overall improvement in prediction error when the model is trained on additional information about previous perceptual behavior and diagnostic decisions.
- Published
- 2018
- Full Text
- View/download PDF
45. A novel web informatics approach for automated surveillance of cancer mortality trends.
- Author
-
Tourassi G, Yoon HJ, and Xu S
- Subjects
- Breast Neoplasms, Humans, Incidence, Lung Neoplasms, Mortality, United States epidemiology, Internet, Medical Informatics, Neoplasms mortality, Population Surveillance, SEER Program
- Abstract
Cancer surveillance data are collected every year in the United States via the National Program of Cancer Registries (NPCR) and the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute (NCI). General trends are closely monitored to measure the nation's progress against cancer. The objective of this study was to apply a novel web informatics approach for enabling fully automated monitoring of cancer mortality trends. The approach involves automated collection and text mining of online obituaries to derive the age distribution, geospatial, and temporal trends of cancer deaths in the US. Using breast and lung cancer as examples, we mined 23,850 cancer-related and 413,024 general online obituaries spanning the timeframe 2008-2012. There was high correlation between the web-derived mortality trends and the official surveillance statistics reported by NCI with respect to the age distribution (ρ=0.981 for breast; ρ=0.994 for lung), the geospatial distribution (ρ=0.939 for breast; ρ=0.881 for lung), and the annual rates of cancer deaths (ρ=0.661 for breast; ρ=0.839 for lung). Additional experiments investigated the effect of sample size on the consistency of the web-based findings. Overall, our study findings support web informatics as a promising, cost-effective way to dynamically monitor spatiotemporal cancer mortality trends., Competing Interests: The authors declared that there is no conflict of interest., (Copyright © 2016 Elsevier Inc. All rights reserved.)
- Published
- 2016
- Full Text
- View/download PDF
46. The utility of web mining for epidemiological research: studying the association between parity and cancer risk.
- Author
-
Tourassi G, Yoon HJ, Xu S, and Han X
- Subjects
- Adult, Age Distribution, Aged, Aged, 80 and over, Breast Neoplasms epidemiology, Case-Control Studies, Colonic Neoplasms epidemiology, Female, Humans, Lung Neoplasms epidemiology, Middle Aged, Ovarian Neoplasms epidemiology, Pancreatic Neoplasms epidemiology, Risk Factors, Young Adult, Data Mining methods, Epidemiologic Methods, Internet, Neoplasms epidemiology, Parity
- Abstract
Background: The World Wide Web has emerged as a powerful data source for epidemiological studies related to infectious disease surveillance. However, its potential for cancer-related epidemiological discoveries is largely unexplored., Methods: Using advanced web crawling and tailored information extraction procedures, the authors automatically collected and analyzed the text content of 79 394 online obituary articles published between 1998 and 2014. The collected data included 51 911 cancer (27 330 breast; 9470 lung; 6496 pancreatic; 6342 ovarian; 2273 colon) and 27 483 non-cancer cases. With the derived information, the authors replicated a case-control study design to investigate the association between parity (i.e., childbearing) and cancer risk. Age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for each cancer type and compared to those reported in large-scale epidemiological studies., Results: Parity was found to be associated with a significantly reduced risk of breast cancer (OR = 0.78, 95% CI, 0.75-0.82), pancreatic cancer (OR = 0.78, 95% CI, 0.72-0.83), colon cancer (OR = 0.67, 95% CI, 0.60-0.74), and ovarian cancer (OR = 0.58, 95% CI, 0.54-0.62). Marginal association was found for lung cancer risk (OR = 0.87, 95% CI, 0.81-0.92). The linear trend between increased parity and reduced cancer risk was dramatically more pronounced for breast and ovarian cancer than the other cancers included in the analysis., Conclusion: This large web-mining study on parity and cancer risk produced findings very similar to those reported with traditional observational studies. It may be used as a promising strategy to generate study hypotheses for guiding and prioritizing future epidemiological studies., (© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.)
- Published
- 2016
- Full Text
- View/download PDF
47. Investigating the Association Between Sociodemographic Factors and Lung Cancer Risk Using Cyber Informatics.
- Author
-
Yoon HJ and Tourassi G
- Abstract
Openly available online sources can be very valuable for executing in silico case-control epidemiological studies. Adjustment of confounding factors to isolate the association between an observing factor and disease is essential for such studies. However, such information is not always readily available online. This paper suggests natural language processing methods for extracting socio-demographic information from content openly available online. Feasibility of the suggested method is demonstrated by performing a case-control study focusing on the association between age, gender, and income level and lung cancer risk. The study shows stronger association between older age and lower socioeconomic status and higher lung cancer risk, which is consistent with the findings reported in traditional cancer epidemiology studies.
- Published
- 2016
- Full Text
- View/download PDF
48. Predicting Lung Cancer Incidence from Air Pollution Exposures Using Shapelet-based Time Series Analysis.
- Author
-
Yoon HJ, Xu S, and Tourassi G
- Abstract
In this paper we investigated whether the geographical variation of lung cancer incidence can be predicted through examining the spatiotemporal trend of particulate matter air pollution levels. Regional trends of air pollution levels were analyzed by a novel shapelet-based time series analysis technique. First, we identified U.S. counties with reportedly high and low lung cancer incidence between 2008 and 2012 via the State Cancer Profiles provided by the National Cancer Institute. Then, we collected particulate matter exposure levels (PM
2.5 and PM10 ) of the counties for the previous decade (1998-2007) via the AirData dataset provided by the Environmental Protection Agency. Using shapelet-based time series pattern mining, regional environmental exposure profiles were examined to identify frequently occurring sequential exposure patterns. Finally, a binary classifier was designed to predict whether a U.S. region is expected to experience high lung cancer incidence based on the region's PM2.5 and PM10 exposure the decade prior. The study confirmed the association between prolonged PM exposure and lung cancer risk. In addition, the study findings suggest that not only cumulative exposure levels but also the temporal variability of PM exposure influence lung cancer risk.- Published
- 2016
- Full Text
- View/download PDF
49. Residential Mobility and Lung Cancer Risk: Data-Driven Exploration Using Internet Sources.
- Author
-
Yoon HJ, Tourassi G, and Xu S
- Abstract
Frequent relocation has been linked to health decline, particularly with respect to emotional and psychological wellbeing. In this paper we investigate whether there is an association between frequent relocation and lung cancer risk. For the initial investigation we used web crawling and tailored text mining to collect cancer and control subjects from online data sources. One data source includes online obituaries. The second data source includes augmented LinkedIn profiles. For each data source, the subjects' spatiotemporal history is reconstructed from the available information provided in the obituaries and from the education and work experience provided in the LinkedIn profiles. The study shows that lung cancer subjects have higher mobility frequency than the control group. This trend is consistent for both data sources.
- Published
- 2015
- Full Text
- View/download PDF
50. Detecting Rumors Through Modeling Information Propagation Networks in a Social Media Environment.
- Author
-
Liu Y, Xu S, and Tourassi G
- Abstract
In the midst of today's pervasive influence of social media content and activities, information credibility has increasingly become a major issue. Accordingly, identifying false information, e.g. rumors circulated in social media environments, attracts expanding research attention and growing interests. Many previous studies have exploited user-independent features for rumor detection. These prior investigations uniformly treat all users relevant to the propagation of a social media message as instances of a generic entity. Such a modeling approach usually adopts a homogeneous network to represent all users, the practice of which ignores the variety across an entire user population in a social media environment. Recognizing this limitation of modeling methodologies, this study explores user-specific features in a social media environment for rumor detection. The new approach hypothesizes that whether a user tends to spread a rumor is dependent upon specific attributes of the user in addition to content characteristics of the message itself. Under this hypothesis, information propagation patterns of rumors versus those of credible messages in a social media environment are systematically differentiable. To explore and exploit this hypothesis, we develop a new information propagation model based on a heterogeneous user representation for rumor recognition. The new approach is capable of differentiating rumors from credible messages through observing distinctions in their respective propagation patterns in social media. Experimental results show that the new information propagation model based on heterogeneous user representation can effectively distinguish rumors from credible social media content.
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.