24 results on '"James Zou"'
Search Results
2. Graph deep learning for the characterization of tumour microenvironments from spatial protein profiles in tissue specimens
- Author
-
Zhenqin Wu, Alexandro E. Trevino, Eric Wu, Kyle Swanson, Honesty J. Kim, H. Blaize D’Angio, Ryan Preska, Gregory W. Charville, Piero D. Dalerba, Ann Marie Egloff, Ravindra Uppaluri, Umamaheswar Duvvuri, Aaron T. Mayer, and James Zou
- Subjects
Biomedical Engineering ,Medicine (miscellaneous) ,Bioengineering ,Computer Science Applications ,Biotechnology - Published
- 2022
- Full Text
- View/download PDF
3. Advances, challenges and opportunities in creating data for trustworthy AI
- Author
-
Weixin Liang, Girmaw Abebe Tadesse, Daniel Ho, L. Fei-Fei, Matei Zaharia, Ce Zhang, and James Zou
- Subjects
Human-Computer Interaction ,Artificial Intelligence ,Computer Networks and Communications ,Computer Vision and Pattern Recognition ,Software - Published
- 2022
- Full Text
- View/download PDF
4. Deep learning-based electrocardiographic screening for chronic kidney disease
- Author
-
Lauri Holmstrom, Matthew Christensen, Neal Yuan, J. Weston Hughes, John Theurer, Melvin Jujjavarapu, Pedram Fatehi, Alan Kwan, Roopinder K. Sandhu, Joseph Ebinger, Susan Cheng, James Zou, Sumeet S. Chugh, and David Ouyang
- Abstract
Background Undiagnosed chronic kidney disease (CKD) is a common and usually asymptomatic disorder that causes a high burden of morbidity and early mortality worldwide. We developed a deep learning model for CKD screening from routinely acquired ECGs. Methods We collected data from a primary cohort with 111,370 patients which had 247,655 ECGs between 2005 and 2019. Using this data, we developed, trained, validated, and tested a deep learning model to predict whether an ECG was taken within one year of the patient receiving a CKD diagnosis. The model was additionally validated using an external cohort from another healthcare system which had 312,145 patients with 896,620 ECGs between 2005 and 2018. Results Using 12-lead ECG waveforms, our deep learning algorithm achieves discrimination for CKD of any stage with an AUC of 0.767 (95% CI 0.760–0.773) in a held-out test set and an AUC of 0.709 (0.708–0.710) in the external cohort. Our 12-lead ECG-based model performance is consistent across the severity of CKD, with an AUC of 0.753 (0.735–0.770) for mild CKD, AUC of 0.759 (0.750–0.767) for moderate-severe CKD, and an AUC of 0.783 (0.773–0.793) for ESRD. In patients under 60 years old, our model achieves high performance in detecting any stage CKD with both 12-lead (AUC 0.843 [0.836–0.852]) and 1-lead ECG waveform (0.824 [0.815–0.832]). Conclusions Our deep learning algorithm is able to detect CKD using ECG waveforms, with stronger performance in younger patients and more severe CKD stages. This ECG algorithm has the potential to augment screening for CKD.
- Published
- 2023
- Full Text
- View/download PDF
5. A spectral method for assessing and combining multiple data visualizations
- Author
-
Rong Ma, Eric D. Sun, and James Zou
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Multidisciplinary ,General Physics and Astronomy ,Machine Learning (stat.ML) ,General Chemistry ,Quantitative Biology - Quantitative Methods ,Statistics - Applications ,General Biochemistry, Genetics and Molecular Biology ,Machine Learning (cs.LG) ,Methodology (stat.ME) ,Statistics - Machine Learning ,FOS: Biological sciences ,Applications (stat.AP) ,Quantitative Methods (q-bio.QM) ,Statistics - Methodology - Abstract
Dimension reduction and data visualization aim to project a high-dimensional dataset to a low-dimensional space while capturing the intrinsic structures in the data. It is an indispensable part of modern data science, and many dimensional reduction and visualization algorithms have been developed. However, different algorithms have their own strengths and weaknesses, making it critically important to evaluate their relative performance for a given dataset, and to leverage and combine their individual strengths. In this paper, we propose an efficient spectral method for assessing and combining multiple visualizations of a given dataset produced by diverse algorithms. The proposed method provides a quantitative measure -- the visualization eigenscore -- of the relative performance of the visualizations for preserving the structure around each data point. Then it leverages the eigenscores to obtain a consensus visualization, which has much improved { quality over the individual visualizations in capturing the underlying true data structure.} Our approach is flexible and works as a wrapper around any visualizations. We analyze multiple simulated and real-world datasets from diverse applications to demonstrate the effectiveness of the eigenscores for evaluating visualizations and the superiority of the proposed consensus visualization. Furthermore, we establish rigorous theoretical justification of our method based on a general statistical framework, yielding fundamental principles behind the empirical success of consensus visualization along with practical guidance., Under revision of Nature Communications
- Published
- 2023
- Full Text
- View/download PDF
6. Analyses of canine cancer mutations and treatment outcomes using real-world clinico-genomics data of 2119 dogs
- Author
-
Kevin Wu, Lucas Rodrigues, Gerald Post, Garrett Harvey, Michelle White, Aubrey Miller, Lindsay Lambert, Benjamin Lewis, Christina Lopes, and James Zou
- Subjects
Cancer Research ,Oncology - Abstract
Spontaneous tumors in canines share significant genetic and histological similarities with human tumors, positioning them as valuable models to guide drug development. However, current translational studies have limited real world evidence as cancer outcomes are dispersed across veterinary clinics and genomic tests are rarely performed on dogs. In this study, we aim to expand the value of canine models by systematically characterizing genetic mutations in tumors and their response to targeted treatments. In total, we collect and analyze survival outcomes for 2119 tumor-bearing dogs and the prognostic effect of genomic alterations in a subset of 1108 dogs. Our analysis identifies prognostic concordance between canines and humans in several key oncogenes, including TP53 and PIK3CA. We also find that several targeted treatments designed for humans are associated with a positive prognosis when used to treat canine tumors with specific genomic alterations, underscoring the value of canine models in advancing drug discovery for personalized oncology.
- Published
- 2023
- Full Text
- View/download PDF
7. Comprehensive analysis of 2.4 million patent-to-research citations maps the biomedical innovation and translation landscape
- Author
-
Shuchen Song, Hongyu Li, Arya Gowda, Anoop Manjunath, Nathan Kahrobai, Shu Liu, James Zou, Angelina Seffens, Zhixing Zhang, and Ishan Kumar
- Subjects
0303 health sciences ,Biomedical Research ,media_common.quotation_subject ,Biomedical Engineering ,MEDLINE ,Bioengineering ,Applied Microbiology and Biotechnology ,Data science ,United States ,Patents as Topic ,Translational Research, Biomedical ,03 medical and health sciences ,0302 clinical medicine ,Geography ,Humans ,Molecular Medicine ,Citation ,Productivity ,030217 neurology & neurosurgery ,Demography ,030304 developmental biology ,Biotechnology ,Diversity (politics) ,media_common - Abstract
A citation map connecting patents to biomedical publications provides insights that can be used to better evaluate productivity, diversity and translational impact.
- Published
- 2021
- Full Text
- View/download PDF
8. Prostate cancer therapy personalization via multi-modal deep learning on randomized phase III clinical trials
- Author
-
Andre, Esteva, Jean, Feng, Douwe, van der Wal, Shih-Cheng, Huang, Jeffry P, Simko, Sandy, DeVries, Emmalyn, Chen, Edward M, Schaeffer, Todd M, Morgan, Yilun, Sun, Amirata, Ghorbani, Nikhil, Naik, Dhruv, Nathawani, Richard, Socher, Jeff M, Michalski, Mack, Roach, Thomas M, Pisansky, Jedidiah M, Monson, Farah, Naz, James, Wallace, Michelle J, Ferguson, Jean-Paul, Bahary, James, Zou, Matthew, Lungren, Serena, Yeung, Ashley E, Ross, Howard M, Sandler, Phuoc T, Tran, Daniel E, Spratt, Stephanie, Pugh, Felix Y, Feng, and Khalil, Katato
- Subjects
Health Information Management ,Medicine (miscellaneous) ,Health Informatics ,Computer Science Applications - Abstract
Prostate cancer is the most frequent cancer in men and a leading cause of cancer death. Determining a patient’s optimal therapy is a challenge, where oncologists must select a therapy with the highest likelihood of success and the lowest likelihood of toxicity. International standards for prognostication rely on non-specific and semi-quantitative tools, commonly leading to over- and under-treatment. Tissue-based molecular biomarkers have attempted to address this, but most have limited validation in prospective randomized trials and expensive processing costs, posing substantial barriers to widespread adoption. There remains a significant need for accurate and scalable tools to support therapy personalization. Here we demonstrate prostate cancer therapy personalization by predicting long-term, clinically relevant outcomes using a multimodal deep learning architecture and train models using clinical data and digital histopathology from prostate biopsies. We train and validate models using five phase III randomized trials conducted across hundreds of clinical centers. Histopathological data was available for 5654 of 7764 randomized patients (71%) with a median follow-up of 11.4 years. Compared to the most common risk-stratification tool—risk groups developed by the National Cancer Center Network (NCCN)—our models have superior discriminatory performance across all endpoints, ranging from 9.2% to 14.6% relative improvement in a held-out validation set. This artificial intelligence-based tool improves prognostication over standard tools and allows oncologists to computationally predict the likeliest outcomes of specific patients to determine optimal treatment. Outfitted with digital scanners and internet access, any clinic could offer such capabilities, enabling global access to therapy personalization.
- Published
- 2022
- Full Text
- View/download PDF
9. Evaluating eligibility criteria of oncology trials using real-world data and AI
- Author
-
William B. Capra, Navdeep Pal, Ying Lu, James Zou, Samuel Whipple, Brandon Arnieri, Shemra Rizzo, Michael Lu, Ruishan Liu, Ryan Copping, and Arturo Lopez Pineda
- Subjects
medicine.medical_specialty ,Lung Neoplasms ,MEDLINE ,Datasets as Topic ,Medical Oncology ,03 medical and health sciences ,Patient safety ,0302 clinical medicine ,Artificial Intelligence ,Common Criteria ,Carcinoma, Non-Small-Cell Lung ,medicine ,Electronic Health Records ,Humans ,Medical physics ,030212 general & internal medicine ,Lung cancer ,Proportional Hazards Models ,Clinical Trials as Topic ,Multidisciplinary ,Clinical Laboratory Techniques ,business.industry ,Patient Selection ,Hazard ratio ,Reproducibility of Results ,Cancer ,medicine.disease ,Clinical trial ,030220 oncology & carcinogenesis ,Patient Safety ,business ,Real world data - Abstract
There is a growing focus on making clinical trials more inclusive but the design of trial eligibility criteria remains challenging1-3. Here we systematically evaluate the effect of different eligibility criteria on cancer trial populations and outcomes with real-world data using the computational framework of Trial Pathfinder. We apply Trial Pathfinder to emulate completed trials of advanced non-small-cell lung cancer using data from a nationwide database of electronic health records comprising 61,094 patients with advanced non-small-cell lung cancer. Our analyses reveal that many common criteria, including exclusions based on several laboratory values, had a minimal effect on the trial hazard ratios. When we used a data-driven approach to broaden restrictive criteria, the pool of eligible patients more than doubled on average and the hazard ratio of the overall survival decreased by an average of 0.05. This suggests that many patients who were not eligible under the original trial criteria could potentially benefit from the treatments. We further support our findings through analyses of other types of cancer and patient-safety data from diverse clinical trials. Our data-driven methodology for evaluating eligibility criteria can facilitate the design of more-inclusive trials while maintaining safeguards for patient safety.
- Published
- 2021
- Full Text
- View/download PDF
10. Machine Learning Prediction of Clinical Trial Operational Efficiency
- Author
-
Kevin Wu, Eric Wu, Michael DAndrea, Nandini Chitale, Melody Lim, Marek Dabrowski, Klaudia Kantor, Hanoor Rangi, Ruishan Liu, Marius Garmhausen, Navdeep Pal, Chris Harbron, Shemra Rizzo, Ryan Copping, and James Zou
- Subjects
Machine Learning ,Clinical Trials as Topic ,Patient Selection ,Humans ,Pharmaceutical Science ,Forecasting - Abstract
Clinical trials are the gatekeepers and bottlenecks of progress in medicine. In recent years, they have become increasingly complex and expensive, driven by a growing number of stakeholders requiring more endpoints, more diverse patient populations, and a stringent regulatory environment. Trial designers have historically relied on investigator expertise and legacy norms established within sponsor companies to improve operational efficiency while achieving study goals. As such, data-driven forecasts of operational metrics can be a useful resource for trial design and planning. We develop a machine learning model to predict clinical trial operational efficiency using a novel dataset from Roche containing over 2,000 clinical trials across 20 years and multiple disease areas. The data includes important operational metrics related to patient recruitment and trial duration, as well as a variety of trial features such as the number of procedures, eligibility criteria, and endpoints. Our results demonstrate that operational efficiency can be predicted robustly using trial features, which can provide useful insights to trial designers on the potential impact of their decisions on patient recruitment success and trial duration.
- Published
- 2022
- Full Text
- View/download PDF
11. Large language models associate Muslims with violence
- Author
-
James Zou, Maheen Farooqi, and Abubakar Abid
- Subjects
Human-Computer Interaction ,Artificial Intelligence ,Computer Networks and Communications ,Computer science ,Software deployment ,Computer Vision and Pattern Recognition ,Applications of artificial intelligence ,Language model ,Data science ,Software - Abstract
Large language models, which are increasingly used in AI applications, display undesirable stereotypes such as persistent associations between Muslims and violence. New approaches are needed to systematically reduce the harmful bias of language models in deployment.
- Published
- 2021
- Full Text
- View/download PDF
12. Variation in COVID-19 Data Reporting Across India: 6 Months into the Pandemic
- Author
-
Varsha Sankar, James Zou, Abeynaya Gnanasekaran, Siddarth A. Vasudevan, and Varun Vasudevan
- Subjects
0301 basic medicine ,2019-20 coronavirus outbreak ,medicine.medical_specialty ,Multidisciplinary ,Coronavirus disease 2019 (COVID-19) ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,Public health ,Article ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,Geography ,Pandemic ,medicine ,030212 general & internal medicine ,Data reporting ,Socioeconomics - Abstract
Background. Transparent and accessible reporting of COVID-19 data is critical for public health efforts. Each state and union territory (UT) of India has its own mechanism for reporting COVID-19 data, and the quality of their reporting has not been systematically evaluated. We present a comprehensive assessment of the quality of COVID-19 data reporting done by the Indian state and union territory governments. This assessment informs the public health efforts in India and serves as a guideline for pandemic data reporting by other governments. Methods. We designed a semi-quantitative framework to assess the quality of COVID-19 data reporting done by the states and union territories of India. This framework captures four key aspects of public health data reporting - availability, accessibility, granularity, and privacy. We then used this framework to calculate a COVID-19 Data Reporting Score (CDRS, ranging from 0 to 1) for 29 states based on the quality of COVID-19 data reporting done by the state during the two-week period from 19 May to 1 June, 2020. States that reported less than 10 total confirmed cases as of May 18 were excluded from the study. Findings. Our results indicate a strong disparity in the quality of COVID-19 data reporting done by the state governments in India. CDRS varies from 0.61 (good) in Karnataka to 0.0 (poor) in Bihar and Uttar Pradesh, with a median value of 0.26. Only ten states provide a visual representation of the trend in COVID-19 data. Ten states do not report any data stratified by age, gender, comorbidities or districts. In addition, we identify that Punjab and Chandigarh compromised the privacy of individuals under quarantine by releasing their personally identifiable information on the official websites. Across the states, the CDRS is positively associated with the state's sustainable development index for good health and well-being (Pearson correlation: r=0.630, p=0.0003). Interpretation. The disparity in CDRS across states highlights three important findings at the national, state, and individual level. At the national level, it shows the lack of a unified framework for reporting COVID-19 data in India, and highlights the need for a central agency to monitor or audit the quality of data reporting done by the states. Without a unified framework, it is difficult to aggregate the data from different states, gain insights from them, and coordinate an effective nationwide response to the pandemic. Moreover, it reflects the inadequacy in coordination or sharing of resources among the states in India. Coordination among states is particularly important as more people start moving across states in the coming months. The disparate reporting score also reflects inequality in individual access to public health information and privacy protection based on the state of residence. Funding. J.Z. is supported by NSF CCF 1763191, NIH R21 MD012867-01, NIH P30AG059307, NIH U01MH098953 and grants from the Silicon Valley Foundation and the Chan-Zuckerberg Initiative.
- Published
- 2020
- Full Text
- View/download PDF
13. Patient Experience Surveys Reveal Gender-Biased Descriptions of Their Care Providers
- Author
-
Dylan Haynes, Kathryn Schwarzenberger, Teri M. Greiling, James Zou, Michael Heath, Anusri Pampari, and Christina Topham
- Subjects
Male ,medicine.medical_specialty ,Health Personnel ,media_common.quotation_subject ,Medicine (miscellaneous) ,Health Informatics ,Health informatics ,Health Information Management ,Surveys and Questionnaires ,Patient experience ,medicine ,Humans ,Quality (business) ,Retrospective Studies ,media_common ,business.industry ,Retrospective cohort study ,University hospital ,Transparency (behavior) ,Patient Outcome Assessment ,Patient Satisfaction ,Family medicine ,Female ,Metric (unit) ,Psychology ,business ,Delivery of Health Care ,Healthcare providers ,Information Systems - Abstract
Patient experience surveys (PES) are collected by healthcare systems as a surrogate marker of quality and published unedited online for the purpose of transparency, but these surveys may reflect gender biases directed toward healthcare providers. This retrospective study evaluated PES at a single university hospital between July 2016 and June 2018. Surveys were stratified by overall provider rating and self-identified provider gender. Adjectives from free-text survey comments were extracted using natural language processing techniques and applied to a statistical machine learning model to identify descriptors predictive of provider gender. 109,994 surveys were collected, 17,395 contained free-text comments describing 687 unique providers. The mean overall rating between male (8.84, n = 8558) and female (8.80, n = 8837) providers did not differ (p = 0.149). However, highly-rated male providers were more often described for their agentic qualities using adjectives such as “informative,” “forthright,” “superior,” and “utmost” (OR 1.48, p
- Published
- 2021
- Full Text
- View/download PDF
14. Author Correction: Advances, challenges and opportunities in creating data for trustworthy AI
- Author
-
Weixin Liang, Girmaw Abebe Tadesse, Daniel Ho, L. Fei-Fei, Matei Zaharia, Ce Zhang, and James Zou
- Subjects
Human-Computer Interaction ,Artificial Intelligence ,Computer Networks and Communications ,Computer Vision and Pattern Recognition ,Software - Published
- 2022
- Full Text
- View/download PDF
15. An online platform for interactive feedback in biomedical machine learning
- Author
-
Ali Abid, Ali Abdalla, James Zou, Abubakar Abid, Dawood Khan, and Abdulrahman Alfozan
- Subjects
0301 basic medicine ,Computer Networks and Communications ,business.industry ,Computer science ,Machine learning ,computer.software_genre ,Test (assessment) ,Human-Computer Interaction ,03 medical and health sciences ,Interactive feedback ,030104 developmental biology ,0302 clinical medicine ,Artificial Intelligence ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery ,Software ,Reliability (statistics) - Abstract
Machine learning models have great potential in biomedical applications. A new platform called GradioHub offers an interactive and intuitive way for clinicians and biomedical researchers to try out models and test their reliability on real-world, out-of-training data.
- Published
- 2020
- Full Text
- View/download PDF
16. Sex and gender analysis improves science and engineering
- Author
-
Rob Ellis, Friederike Anne Eyssel, James Zou, Cara Tannenbaum, and Londa Schiebinger
- Subjects
Male ,0301 basic medicine ,Science ,Science and engineering ,Scientific discovery ,03 medical and health sciences ,Engineering ,Sex Factors ,0302 clinical medicine ,Artificial Intelligence ,Animals ,Humans ,Gender analysis ,Molecular Targeted Therapy ,Sex Characteristics ,Multidisciplinary ,ComputingMilieux_THECOMPUTINGPROFESSION ,Reproducibility of Results ,Societal impact of nanotechnology ,Marine Biology (journal) ,Medical research ,030104 developmental biology ,Research Design ,Sample Size ,Female ,Engineering ethics ,030217 neurology & neurosurgery ,Sex characteristics ,Social equality - Abstract
The goal of sex and gender analysis is to promote rigorous, reproducible and responsible science. Incorporating sex and gender analysis into experimental design has enabled advancements across many disciplines, such as improved treatment of heart disease and insights into the societal impact of algorithmic bias. Here we discuss the potential for sex and gender analysis to foster scientific discovery, improve experimental efficiency and enable social equality. We provide a roadmap for sex and gender analysis across scientific disciplines and call on researchers, funding agencies, peer-reviewed journals and universities to coordinate efforts to implement robust methods of sex and gender analysis. The authors discuss the potential for sex and gender analysis to foster scientific discovery, improve experimental efficiency and enable social equality.
- Published
- 2019
- Full Text
- View/download PDF
17. Feedback GAN for DNA optimizes protein functions
- Author
-
James Zou and Anvita Gupta
- Subjects
0301 basic medicine ,Computer Networks and Communications ,Computer science ,Drug discovery ,In silico ,Antimicrobial peptides ,Genomics ,Computational biology ,DNA sequencing ,Human-Computer Interaction ,03 medical and health sciences ,Synthetic biology ,ComputingMethodologies_PATTERNRECOGNITION ,030104 developmental biology ,0302 clinical medicine ,Data point ,Artificial Intelligence ,Computer Vision and Pattern Recognition ,030217 neurology & neurosurgery ,Software ,Coding (social sciences) - Abstract
Generative adversarial networks (GANs) represent an attractive and novel approach to generate realistic data, such as genes, proteins or drugs, in synthetic biology. Here, we apply GANs to generate synthetic DNA sequences encoding for proteins of variable length. We propose a novel feedback-loop architecture, feedback GAN (FBGAN), to optimize the synthetic gene sequences for desired properties using an external function analyser. The proposed architecture also has the advantage that the analyser does not need to be differentiable. We apply the feedback-loop mechanism to two examples: generating synthetic genes coding for antimicrobial peptides, and optimizing synthetic genes for the secondary structure of their resulting peptides. A suite of metrics, calculated in silico, demonstrates that the GAN-generated proteins have desirable biophysical properties. The FBGAN architecture can also be used to optimize GAN-generated data points for useful properties in domains beyond genomics. Generative machine learning models are used in synthetic biology to find new structures such as DNA sequences, proteins and other macromolecules with applications in drug discovery, environmental treatment and manufacturing. Gupta and Zou propose and demonstrate in silico a feedback-loop architecture to optimize the output of a generative adversarial network that generates synthetic genes to produce ones specifically coding for antimicrobial peptides.
- Published
- 2019
- Full Text
- View/download PDF
18. A primer on deep learning in genomics
- Author
-
Amalio Telenti, Ali Torkamani, Pejman Mohammadi, Abubakar Abid, James Zou, and Mikael Huss
- Subjects
0303 health sciences ,Class (computer programming) ,business.industry ,Extramural ,Deep learning ,Genomics ,Biology ,Pathogenicity ,Data science ,03 medical and health sciences ,0302 clinical medicine ,Online tutorial ,Genetics ,Artificial intelligence ,business ,030217 neurology & neurosurgery ,030304 developmental biology - Abstract
Deep learning methods are a class of machine learning techniques capable of identifying highly complex patterns in large datasets. Here, we provide a perspective and primer on deep learning applications for genome analysis. We discuss successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores. We include general guidance for how to effectively use deep learning methods as well as a practical guide to tools and resources. This primer is accompanied by an interactive online tutorial.
- Published
- 2018
- Full Text
- View/download PDF
19. AI can be sexist and racist — it’s time to make it fair
- Author
-
James Zou and Londa Schiebinger
- Subjects
Male ,0301 basic medicine ,Multidisciplinary ,Training set ,business.industry ,Computer science ,Sexism ,Datasets as Topic ,Information technology ,Data science ,Race Factors ,Machine Learning ,03 medical and health sciences ,Racism ,Sex Factors ,030104 developmental biology ,0302 clinical medicine ,Artificial Intelligence ,Social Justice ,Humans ,Female ,Neural Networks, Computer ,business ,030217 neurology & neurosurgery ,Natural Language Processing - Abstract
Computer scientists must identify sources of bias, de-bias training data and develop artificial-intelligence algorithms that are robust to skews in the data, argue James Zou and Londa Schiebinger. Computer scientists must identify sources of bias, de-bias training data and develop artificial-intelligence algorithms that are robust to skews in the data.
- Published
- 2018
- Full Text
- View/download PDF
20. DeepTag: inferring diagnoses from veterinary clinical notes
- Author
-
Arturo Lopez Pineda, Yuhui Zhang, Rodney L. Page, Ashley M. Zehnder, Carlos Bustamante, James Zou, Allen Nie, and Manuel A. Rivas
- Subjects
0301 basic medicine ,Veterinary medicine ,Computer science ,Medicine (miscellaneous) ,Health Informatics ,Translational research ,Medical classification ,lcsh:Computer applications to medicine. Medical informatics ,Article ,03 medical and health sciences ,Annotation ,0302 clinical medicine ,Health Information Management ,Preprocessor ,030212 general & internal medicine ,Medical diagnosis ,Public health ,business.industry ,Deep learning ,Health services ,Computer Science Applications ,030104 developmental biology ,lcsh:R858-859.7 ,Artificial intelligence ,Diagnosis code ,business ,Coding (social sciences) - Abstract
Large scale veterinary clinical records can become a powerful resource for patient care and research. However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free-text notes. The lack of standard coding makes it challenging to use the clinical data to improve patient care. It is also a major impediment to cross-species translational research, which relies on the ability to accurately identify patient cohorts with specific diagnostic criteria in humans and animals. In order to reduce the coding burden for veterinary clinical practice and aid translational research, we have developed a deep learning algorithm, DeepTag, which automatically infers diagnostic codes from veterinary free-text notes. DeepTag is trained on a newly curated dataset of 112,558 veterinary notes manually annotated by experts. DeepTag extends multitask LSTM with an improved hierarchical objective that captures the semantic structures between diseases. To foster human-machine collaboration, DeepTag also learns to abstain in examples when it is uncertain and defers them to human experts, resulting in improved performance. DeepTag accurately infers disease codes from free-text even in challenging cross-hospital settings where the text comes from different clinical settings than the ones used for training. It enables automated disease annotation across a broad range of clinical diagnoses with minimal preprocessing. The technical framework in this work can be applied in other medical domains that currently lack medical coding resources.
- Published
- 2018
- Full Text
- View/download PDF
21. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies
- Author
-
Yael Baran, Elior Rahmani, Donglei Hu, James Zou, Eran Halperin, Noah Zaitlen, Celeste Eng, Esteban G. Burchard, Eleazar Eskin, Joshua Galanter, and Sam S. Oh
- Subjects
0301 basic medicine ,Computer science ,Genetic heterogeneity ,Sparse PCA ,Genome-wide association study ,Cell Biology ,Epigenome ,Computational biology ,Bioinformatics ,Biochemistry ,Article ,03 medical and health sciences ,030104 developmental biology ,Principal component analysis ,False positive paradox ,Molecular Biology ,Biotechnology ,Epigenomics ,Genetic association - Abstract
In epigenome-wide association studies (EWAS), different methylation profiles of distinct cell types may lead to false discoveries. We introduce ReFACTor, a method based on principal component analysis (PCA) and designed for the correction of cell type heterogeneity in EWAS. ReFACTor does not require knowledge of cell counts, and it provides improved estimates of cell type composition, resulting in improved power and control for false positives in EWAS. Corresponding software is available at http://www.cs.tau.ac.il/~heran/cozygene/software/refactor.html.
- Published
- 2016
- Full Text
- View/download PDF
22. Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation
- Author
-
Joshua Galanter, Eleazar Eskin, Noah Zaitlen, Eran Halperin, Yael Baran, Sam S. Oh, James Zou, Celeste Eng, Donglei Hu, Esteban G. Burchard, and Elior Rahmani
- Subjects
0301 basic medicine ,Genetics ,Cell type ,Extramural ,Cell Biology ,Biology ,Biochemistry ,Article ,03 medical and health sciences ,030104 developmental biology ,0302 clinical medicine ,CpG site ,DNA methylation ,Epigenetics ,Molecular Biology ,030217 neurology & neurosurgery ,Biotechnology ,Epigenesis - Published
- 2017
- Full Text
- View/download PDF
23. Epigenome-wide association studies without the need for cell-type composition
- Author
-
Martin J. Aryee, Christoph Lippert, Jennifer Listgarten, David Heckerman, and James Zou
- Subjects
Epigenomics ,Cell type composition ,business.industry ,Cells ,Linear model ,Cell Biology ,Epigenome ,Bioinformatics ,computer.software_genre ,Biochemistry ,Software ,Methylation analysis ,Linear Models ,Genome informatics ,Humans ,Data mining ,Explicit knowledge ,business ,Molecular Biology ,computer ,Genome-Wide Association Study ,Biotechnology ,Genetic association - Abstract
In epigenome-wide association studies, cell-type composition often differs between cases and controls, yielding associations that simply tag cell type rather than reveal fundamental biology. Current solutions require actual or estimated cell-type composition--information not easily obtainable for many samples of interest. We propose a method, FaST-LMM-EWASher, that automatically corrects for cell-type composition without the need for explicit knowledge of it, and then validate our method by comparison with the state-of-the-art approach. Corresponding software is available from http://www.microsoft.com/science/.
- Published
- 2014
- Full Text
- View/download PDF
24. Undesired usage and the robust self-assembly of heterogeneous structures
- Author
-
James Zou, Arvind Murugan, and Michael Brenner
- Subjects
Models, Molecular ,Multidisciplinary ,Platelet Aggregation ,Computer science ,Molecular Conformation ,General Physics and Astronomy ,Nanotechnology ,General Chemistry ,General Biochemistry, Genetics and Molecular Biology ,Supramolecular assembly ,Kinetics ,Multiprotein Complexes ,Colloids ,Self-assembly ,Biophysical chemistry - Abstract
Inspired by multiprotein complexes in biology and recent successes in synthetic DNA tile and colloidal self-assembly, we study the spontaneous assembly of structures made of many kinds of components. The major challenge with achieving high assembly yield is eliminating incomplete or incorrectly bound structures. Here, we find that such undesired structures rapidly degrade yield with increasing structural size and complexity in diverse models of assembly, if component concentrations reflect the composition (that is, stoichiometry) of the desired structure. But this yield catastrophe can be mitigated by using highly non-stoichiometric concentrations. Our results support a general principle of 'undesired usage'-concentrations of components should be chosen to account for how they are 'used' by undesired structures and not just by the desired structure. This principle could improve synthetic assembly methods, but also raises new questions about expression levels of proteins that form biological complexes such as the ribosome.
- Published
- 2015
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.