256 results on '"David Madigan"'
Search Results
102. Interpreting observational studies: why empirical calibration is needed to correct p-values
- Author
-
William DuMouchel, Patrick B. Ryan, David Madigan, Marc A. Suchard, Martijn J. Schuemie, and Medical Informatics
- Subjects
Statistics and Probability ,Male ,Data Interpretation ,Epidemiology ,Calibration (statistics) ,Statistics & Probability ,Bias ,Statistical significance ,Statistics ,hypothesis testing ,Econometrics ,Isoniazid ,Humans ,Spurious relationship ,observational studies ,Research Articles ,Statistical hypothesis testing ,Confounding ,Replicate ,Statistical ,calibration ,Observational Studies as Topic ,Research Design ,Data Interpretation, Statistical ,negative controls ,Bias (Epidemiology) ,Serotonin Uptake Inhibitors ,Public Health and Health Services ,Observational study ,Female ,Chemical and Drug Induced Liver Injury ,Null hypothesis ,Psychology ,Gastrointestinal Hemorrhage ,Selective Serotonin Reuptake Inhibitors - Abstract
Often the literature makes assertions of medical product effects on the basis of p
- Published
- 2013
103. Does Design Matter? Systematic Evaluation of the Impact of Analytical Choices on Effect Estimates in Observational Studies
- Author
-
David Madigan, Patrick B. Ryan, Martijn J. Schuemie, and Medical Informatics
- Subjects
Healthcare database ,business.industry ,Reference database ,Electronic medical record ,Medicine ,Pharmacology (medical) ,Observational study ,Health records ,business ,Health outcomes ,Data science ,Administrative claims ,Original Research - Abstract
Background: Clinical studies that use observational databases, such as administrative claims and electronic health records, to evaluate the effects of medical products have become commonplace. These studies begin by selecting a particular study design, such as a case control, cohort, or self-controlled design, and different authors can and do choose different designs for the same clinical question. Furthermore, published papers invariably report the study design but do not discuss the rationale for the specific choice. Studies of the same clinical question with different designs, however, can generate different results, sometimes with strikingly different implications. Even within a specific study design, authors make many different analytic choices and these too can profoundly impact results. In this paper, we systematically study heterogeneity due to the type of study design and due to analytic choices within study design. Methods and findings: We conducted our analysis in 10 observational healthcare databases but mostly present our results in the context of the GE Centricity EMR database, an electronic health record database containing data for 11.2 million lives. We considered the impact of three different study design choices on estimates of associations between bisphosphonates and four particular health outcomes for which there is no evidence of an association. We show that applying alternative study designs can yield discrepant results, in terms of direction and significance of association. We also highlight that while traditional univariate sensitivity analysis may not show substantial variation, systematic assessment of all analytical choices within a study design can yield inconsistent results ranging from statistically significant decreased risk to statistically significant increased risk. Our findings show that clinical studies using observational databases can be sensitive both to study design choices and to specific analytic choices within study design. Conclusion: More attention is needed to consider how design choices may be impacting results and, when possible, investigators should examine a wide array of possible choices to confirm that significant findings are consistently identified.
- Published
- 2013
104. Empirical Performance of the Calibrated Self-Controlled Cohort Analysis Within Temporal Pattern Discovery: Lessons for Developing a Risk Identification and Analysis System
- Author
-
G. Niklas Norén, Patrick B. Ryan, Martijn J. Schuemie, Tomas Bergvall, Kristina Juhlin, David Madigan, and Medical Informatics
- Subjects
Research design ,Databases, Factual ,Drug-Related Side Effects and Adverse Reactions ,Toxicology ,Bioinformatics ,030226 pharmacology & pharmacy ,Risk Assessment ,Cohort Studies ,03 medical and health sciences ,0302 clinical medicine ,Bias ,Statistics ,Medicine ,Electronic Health Records ,Humans ,Pharmacology (medical) ,030212 general & internal medicine ,Set (psychology) ,Pharmacology ,Receiver operating characteristic ,business.industry ,Confounding ,3. Good health ,Data set ,Test case ,Research Design ,Area Under Curve ,Calibration ,Observational study ,Chemical and Drug Induced Liver Injury ,business ,Cohort study - Abstract
Observational healthcare data offer the potential to identify adverse drug reactions that may be missed by spontaneous reporting. The self-controlled cohort analysis within the Temporal Pattern Discovery framework compares the observed-to-expected ratio of medical outcomes during post-exposure surveillance periods with those during a set of distinct pre-exposure control periods in the same patients. It utilizes an external control group to account for systematic differences between the different time periods, thus combining within- and between-patient confounder adjustment in a single measure. To evaluate the performance of the calibrated self-controlled cohort analysis within Temporal Pattern Discovery as a tool for risk identification in observational healthcare data. Different implementations of the calibrated self-controlled cohort analysis were applied to 399 drug-outcome pairs (165 positive and 234 negative test cases across 4 health outcomes of interest) in 5 real observational databases (four with administrative claims and one with electronic health records). Performance was evaluated on real data through sensitivity/specificity, the area under receiver operator characteristics curve (AUC), and bias. The calibrated self-controlled cohort analysis achieved good predictive accuracy across the outcomes and databases under study. The optimal design based on this reference set uses a 360 days surveillance period and a single control period 180 days prior to new prescriptions. It achieved an average AUC of 0.75 and AUC >0.70 in all but one scenario. A design with three separate control periods performed better for the electronic health records database and for acute renal failure across all data sets. The estimates for negative test cases were generally unbiased, but a minor negative bias of up to 0.2 on the RR-scale was observed with the configurations using multiple control periods, for acute liver injury and upper gastrointestinal bleeding. The calibrated self-controlled cohort analysis within Temporal Pattern Discovery shows promise as a tool for risk identification; it performs well at discriminating positive from negative test cases. The optimal parameter configuration may vary with the data set and medical outcome of interest.
- Published
- 2013
- Full Text
- View/download PDF
105. Empirical Performance of the Self-Controlled Case Series Design: Lessons for Developing a Risk Identification and Analysis System
- Author
-
Shawn E. Simpson, David Madigan, Martijn J. Schuemie, Patrick B. Ryan, Ivan Zorych, Marc A. Suchard, and Medical Informatics
- Subjects
Research design ,Multivariate statistics ,Drug-Related Side Effects and Adverse Reactions ,Coverage probability ,MEDLINE ,Toxicology ,Risk Assessment ,030226 pharmacology & pharmacy ,03 medical and health sciences ,0302 clinical medicine ,Bias ,Health care ,Econometrics ,Humans ,Medicine ,Pharmacology (medical) ,030212 general & internal medicine ,Probability ,Pharmacology ,business.industry ,Outcome (probability) ,3. Good health ,Research Design ,Area Under Curve ,Relative risk ,Observational study ,business - Abstract
Background The self-controlled case series (SCCS) offers potential as an statistical method for risk identification involving medical products from large-scale observational healthcare data. However, analytic design choices remain in encoding the longitudinal health records into the SCCS framework and its risk identification performance across real-world databases is unknown. Objectives To evaluate the performance of SCCS and its design choices as a tool for risk identification in observational healthcare data. Research Design We examined the risk identification performance of SCCS across five design choices using 399 drug-health outcome pairs in five real observational databases (four administrative claims and one electronic health records). In these databases, the pairs involve 165 positive controls and 234 negative controls. We also consider several synthetic databases with known relative risks between drug-outcome pairs. Measures We evaluate risk identification performance through estimating the area under the receiver-operator characteristics curve (AUC) and bias and coverage probability in the synthetic examples. Results The SCCS achieves strong predictive performance. Twelve of the twenty health outcome-database scenarios return AUCs >0.75 across all drugs. Including all adverse events instead of just the first per patient and applying a multivariate adjustment for concomitant drug use are the most important design choices. However, the SCCS as applied here returns relative risk point-estimates biased towards the null value of 1 with low coverage probability. Conclusions The SCCS recently extended to apply a multivariate adjustment for concomitant drug use offers promise as a statistical tool for risk identification in largescale observational healthcare databases. Poor estimator calibration dampens enthusiasm, but on-going work should correct this short-coming.
- Published
- 2013
- Full Text
- View/download PDF
106. Association Between Trauma Center Type and Mortality Among Injured Adolescent Patients
- Author
-
Randall S. Burd, Avery B. Nathens, Chethan Sathya, David Madigan, Sushil Mittal, Elizabeth A. Carter, Jichaun Wang, Rachel B. Webman, and Michael L. Nance
- Subjects
Male ,Pediatrics ,medicine.medical_specialty ,Adolescent ,Poison control ,Wounds, Penetrating ,Wounds, Nonpenetrating ,Article ,03 medical and health sciences ,Young Adult ,0302 clinical medicine ,Blunt ,Age Distribution ,Trauma Centers ,030225 pediatrics ,Injury prevention ,Medicine ,Humans ,Young adult ,Cause of death ,Abbreviated Injury Scale ,business.industry ,Trauma center ,030208 emergency & critical care medicine ,medicine.disease ,United States ,Surgery ,Pediatrics, Perinatology and Child Health ,Female ,business ,Emergency Service, Hospital ,Pediatric trauma - Abstract
Although data obtained from regional trauma systems demonstrate improved outcomes for children treated at pediatric trauma centers (PTCs) compared with those treated at adult trauma centers (ATCs), differences in mortality have not been consistently observed for adolescents. Because trauma is the leading cause of death and acquired disability among adolescents, it is important to better define differences in outcomes among injured adolescents by using national data.To use a national data set to compare mortality of injured adolescents treated at ATCs, PTCs, or mixed trauma centers (MTCs) that treat both pediatric and adult trauma patients and to determine the final discharge disposition of survivors at different center types.Data from level I and II trauma centers participating in the 2010 National Trauma Data Bank (January 1 to December 31, 2010) were used to create multilevel models accounting for center-specific effects to evaluate the association of center characteristics (PTC, ATC, or MTC) on mortality among patients aged 15 to 19 years who were treated for a blunt or penetrating injury. The models controlled for sex; mechanism of injury (blunt vs penetrating); injuries sustained, based on the Abbreviated Injury Scale scores (post-dot values3 or ≥3 by body region); initial systolic blood pressure; and Glasgow Coma Scale scores. Missing data were managed using multiple imputation, accounting for multilevel data structure. Data analysis was conducted from January 15, 2013, to March 15, 2016.Type of trauma center.Mortality at each center type.Among 29 613 injured adolescents (mean [SD] age, 17.3 [1.4] years; 72.7% male), most were treated at ATCs (20 402 [68.9%]), with the remainder at MTCs (7572 [25.6%]) or PTCs (1639 [5.5%]). Adolescents treated at PTCs were more likely to be injured by a blunt than penetrating injury mechanism (91.4%) compared with those treated at ATCs (80.4%) or MTCs (84.6%). Mortality was higher among adolescents treated at ATCs and MTCs than those treated at PTCs (3.2% and 3.5% vs 0.4%; P .001). The adjusted odds of mortality were higher at ATCs (odds ratio, 4.19; 95% CI, 1.30-13.51) and MTCs (odds ratio, 6.68; 95% CI, 2.03-21.99) compared with PTCs but was not different between level I and II centers (odds ratio, 0.76; 95% CI, 0.59-0.99).Mortality among injured adolescents was lower among those treated at PTCs, compared with those treated at ATCs and MTCs. Defining resource and patient features that account for these observed differences is needed to optimize adolescent outcomes after injury.
- Published
- 2016
107. Under-reporting of cardiovascular events in the rofecoxib Alzheimer disease studies
- Author
-
Jerry Avorn, Daniel W. Sigelman, James W. Mayer, David Madigan, and Curt D. Furberg
- Subjects
Drug ,medicine.medical_specialty ,media_common.quotation_subject ,Placebo ,Lactones ,Alzheimer Disease ,Risk Factors ,Internal medicine ,Under-reporting ,medicine ,Adverse Drug Reaction Reporting Systems ,Humans ,Sulfones ,Rofecoxib ,Randomized Controlled Trials as Topic ,media_common ,Cyclooxygenase 2 Inhibitors ,business.industry ,Thrombosis ,medicine.disease ,Intention to Treat Analysis ,Surgery ,Clinical trial ,Cardiovascular Diseases ,Relative risk ,Alzheimer's disease ,Cardiology and Cardiovascular Medicine ,business ,medicine.drug - Abstract
Background In September 2004, rofecoxib (Vioxx) was removed from the market after it was found to produce a near doubling of cardiovascular thrombotic (CVT) events in a placebo-controlled study. Its manufacturer stated that this was the first clear evidence of such risk and criticized previous analyses of earlier CVT risk for focusing on investigator-reported events. We studied contemporaneously adjudicated CVT events to assess the information on cardiovascular risk available while the drug was in widespread use. Methods Using an intention-to-treat analysis of adjudicated CVT deaths, we analyzed detailed patient-level data collected during 3 randomized placebo-controlled trials of rofecoxib versus placebo that had been designed to define the drug's possible role in the prevention or treatment of Alzheimer disease. All trials had been completed by April 2003. Results In the 3 studies combined, the data indicated that rofecoxib more than tripled the risk of confirmed CVT death (risk ratio = 3.57 [1.48-9.72], P = .004). This finding reached the P Conclusion Intention-to-treat analysis of placebo-controlled studies of rofecoxib for Alzheimer disease demonstrated that the drug produced a significant increase in confirmed CVT deaths nearly 40 months before it was removed from the market. By contrast, published analyses of these trials were restricted to on-treatment analyses (ending 14 days after cessation of treatment) that did not reveal this risk. Intention-to-treat analyses of clinical trial data can reveal important information about potential drug risks and should be performed routinely and reported in a timely manner.
- Published
- 2012
- Full Text
- View/download PDF
108. Novel Data-Mining Methodologies for Adverse Drug Event Discovery and Analysis
- Author
-
Patrick B. Ryan, William DuMouchel, Nigam H. Shah, Carol Friedman, David Madigan, and Rave Harpaz
- Subjects
Pharmacology ,Research design ,Databases, Factual ,Drug-Related Side Effects and Adverse Reactions ,Extramural ,Computer science ,MEDLINE ,computer.software_genre ,Databases, Bibliographic ,Article ,Cohort Studies ,Pharmacovigilance ,Patient safety ,Artificial Intelligence ,Research Design ,Adverse drug event ,Case-Control Studies ,Multivariate Analysis ,Data Mining ,Humans ,Pharmacology (medical) ,Data mining ,Drug toxicity ,computer - Abstract
An important goal of the health system is to identify new adverse drug events (ADEs) in the postapproval period. Datamining methods that can transform data into meaningful knowledge to inform patient safety have proven essential for this purpose. New opportunities have emerged to harness data sources that have not been used within the traditional framework. This article provides an overview of recent methodological innovations and data sources used to support ADE discovery and analysis.
- Published
- 2012
- Full Text
- View/download PDF
109. Machine learning and data mining: strategies for hypothesis generation
- Author
-
David Madigan, Maria A. Oquendo, Hanga Galfalvy, Fernando Perez-Cruz, Antonio Artés-Rodríguez, Hilario Blasco-Fontecilla, N Duan, and Enrique Baca-García
- Subjects
Data collection ,business.industry ,Mental Disorders ,As is ,Electronic medical record ,Contrast (statistics) ,Biology ,computer.software_genre ,Machine learning ,Models, Biological ,Pipeline (software) ,Cellular and Molecular Neuroscience ,Psychiatry and Mental health ,Artificial Intelligence ,Data Mining ,Humans ,Data farming ,Artificial intelligence ,Data mining ,business ,Molecular Biology ,computer ,Pace - Abstract
Strategies for generating knowledge in medicine have included observation of associations in clinical or research settings and more recently, development of pathophysiological models based on molecular biology. Although critically important, they limit hypothesis generation to an incremental pace. Machine learning and data mining are alternative approaches to identifying new vistas to pursue, as is already evident in the literature. In concert with these analytic strategies, novel approaches to data collection can enhance the hypothesis pipeline as well. In data farming, data are obtained in an 'organic' way, in the sense that it is entered by patients themselves and available for harvesting. In contrast, in evidence farming (EF), it is the provider who enters medical data about individual patients. EF differs from regular electronic medical record systems because frontline providers can use it to learn from their own past experience. In addition to the possibility of generating large databases with farming approaches, it is likely that we can further harness the power of large data sets collected using either farming or more standard techniques through implementation of data-mining and machine-learning strategies. Exploiting large databases to develop new hypotheses regarding neurobiological and genetic underpinnings of psychiatric illness is useful in itself, but also affords the opportunity to identify novel mechanisms to be targeted in drug discovery and development.
- Published
- 2012
- Full Text
- View/download PDF
110. When should case-only designs be used for safety monitoring of medical products?
- Author
-
Malcolm Maclure, Bruce Fireman, Jennifer C. Nelson, Wei Hua, Azadeh Shoaibi, David Madigan, and Antonio Paredes
- Subjects
Selection bias ,Epidemiology ,business.industry ,media_common.quotation_subject ,Crossover ,Confounding ,MEDLINE ,Crossover study ,Risk analysis (engineering) ,Statistics ,Cohort ,Medicine ,Pharmacology (medical) ,Observational study ,business ,Safety monitoring ,media_common - Abstract
Purpose To assess case-only designs for surveillance with administrative databases. Methods We reviewed literature on two designs that are observational analogs to crossover experiments: the self-controlled case series (SCCS) and the case-crossover (CCO) design. Results SCCS views the ‘experiment’ prospectively, comparing outcome risks in windows with different exposures. CCO retrospectively compares exposure frequencies in case and control windows. The main strength of case-only designs is they entail self-controlled analyses that eliminate confounding and selection bias by time-invariant characteristics not recorded in healthcare databases. They also protect privacy and are computationally efficient, as they require fewer subjects and variables. They are better than cohort designs for investigating transient effects of accurately recorded preventive agents, for example, vaccines. They are problematic if timing of self-administration is sporadic and dissociated from dispensing times,forexample, analgesics. Theytendtohave lessexposure misclassificationbiasandtime-varyingconfounding ifexposures are brief. Standard SCCS designs are bidirectional (using time both before and after the first exposure event), so they are more susceptible than CCOstoreverse-causalitybias,includingimmortal-timebias.Thisistruealsoforsequencesymmetryanalysis,asimplifiedSCCS.Unidirectional CCOs use only time before the outcome, so they are less affected by reverse causality but susceptible to exposure-trend bias. Modifications of SCCS and CCO partially deal with these biases. The head-to-head comparison of multiple products helps to control residual biases. Conclusion The case-only analyses of intermittent users complement the cohort analyses of prolonged users because their different biases compensate for one another. Copyright © 2012 John Wiley & Sons, Ltd. key words—methods; safety monitoring; self-controlled designs; crossover Pharmacoepidemiologists who monitor the safety of medical products using healthcare administrative databases are increasingly interested to know when case-only designs can or cannot be used. To address this question, we (i) defined case-only designs in relation to each other; (ii) examined their main strength: self-controlled comparisons; (iii) discussed the major difference among the designs: directionality; (iv) described the range of medical products assessed with these designs in relation to their susceptibility to exposure misclassification; and (v) made recommendations to safety surveillance programs.
- Published
- 2012
- Full Text
- View/download PDF
111. Dynamic Logistic Regression and Dynamic Model Averaging for Binary Classification
- Author
-
David Madigan, Adrian E. Raftery, Randall S. Burd, and Tyler H. McCormick
- Subjects
Statistics and Probability ,Computer science ,Bayesian inference ,Machine learning ,computer.software_genre ,Article ,General Biochemistry, Genetics and Molecular Biology ,Pattern Recognition, Automated ,Prevalence ,Appendectomy ,Humans ,Computer Simulation ,Child ,Hidden Markov model ,Data collection ,General Immunology and Microbiology ,Markov chain ,business.industry ,Applied Mathematics ,Regression analysis ,General Medicine ,Appendicitis ,Logistic Models ,Treatment Outcome ,Binary classification ,Posterior predictive distribution ,Dynamic Extension ,Regression Analysis ,Laparoscopy ,Artificial intelligence ,Data mining ,General Agricultural and Biological Sciences ,business ,computer - Abstract
We propose an online binary classification procedure for cases when there is uncertainty about the model to use and parameters within a model change over time. We account for model uncertainty through dynamic model averaging, a dynamic extension of Bayesian model averaging in which posterior model probabilities may also change with time. We apply a state-space model to the parameters of each model and we allow the data-generating model to change over time according to a Markov chain. Calibrating a "forgetting" factor accommodates different levels of change in the data-generating mechanism. We propose an algorithm that adjusts the level of forgetting in an online fashion using the posterior predictive distribution, and so accommodates various levels of change at different times. We apply our method to data from children with appendicitis who receive either a traditional (open) appendectomy or a laparoscopic procedure. Factors associated with which children receive a particular type of procedure changed substantially over the 7 years of data collection, a feature that is not captured using standard regression modeling. Because our procedure can be implemented completely online, future data collection for similar studies would require storing sensitive patient information only temporarily, reducing the risk of a breach of confidentiality.
- Published
- 2011
- Full Text
- View/download PDF
112. Efficient sequential decision-making algorithms for container inspection operations
- Author
-
Fred S. Roberts, Sushil Mittal, and David Madigan
- Subjects
Incremental decision tree ,Mathematical optimization ,Computer science ,Binary decision diagram ,Control (management) ,Decision tree ,Ocean Engineering ,Management Science and Operations Research ,Task (computing) ,Search algorithm ,Modeling and Simulation ,Container (abstract data type) ,Boolean function ,Algorithm - Abstract
Following work of Stroud and Saeger (Proceedings of ISI, Springer Verlag, New York, 2006) and Anand et al. (Proceedings of Computer, Communication and Control Technologies, 2003), we formulate a port of entry inspection sequencing task as a problem of finding an optimal binary decision tree for an appropriate Boolean decision function. We report on new algorithms for finding such optimal trees that are more efficient computationally than those presented by Stroud and Saeger and Anand et al. We achieve these efficiencies through a combination of specific numerical methods for finding optimal thresholds for sensor functions and two novel binary decision tree search algorithms that operate on a space of potentially acceptable binary decision trees. The improvements enable us to analyze substantially larger applications than was previously possible. © 2011 Wiley Periodicals, Inc. Naval Research Logistics, 2011
- Published
- 2011
- Full Text
- View/download PDF
113. Improving reproducibility by using high-throughput observational studies with empirical calibration
- Author
-
George Hripcsak, Martijn J. Schuemie, Patrick B. Ryan, David Madigan, and Marc A. Suchard
- Subjects
0301 basic medicine ,medicine ,Computer science ,General Mathematics ,Best practice ,General Physics and Astronomy ,Machine learning ,computer.software_genre ,03 medical and health sciences ,Consistency (database systems) ,0302 clinical medicine ,030212 general & internal medicine ,Set (psychology) ,reproducibility ,Throughput (business) ,Reliability (statistics) ,publication bias ,business.industry ,General Engineering ,Articles ,observational research ,030104 developmental biology ,Null (SQL) ,Paradigm shift ,Observational study ,Artificial intelligence ,business ,computer ,Research Article - Abstract
Concerns over reproducibility in science extend to research using existing healthcare data; many observational studies investigating the same topic produce conflicting results, even when using the same data. To address this problem, we propose a paradigm shift. The current paradigm centres on generating one estimate at a time using a unique study design with unknown reliability and publishing (or not) one estimate at a time. The new paradigm advocates for high-throughput observational studies using consistent and standardized methods, allowing evaluation, calibration and unbiased dissemination to generate a more reliable and complete evidence base. We demonstrate this new paradigm by comparing all depression treatments for a set of outcomes, producing 17 718 hazard ratios, each using methodology on par with current best practice. We furthermore include control hypotheses to evaluate and calibrate our evidence generation process. Results show good transitivity and consistency between databases, and agree with four out of the five findings from clinical trials. The distribution of effect size estimates reported in the literature reveals an absence of small or null effects, with a sharp cut-off at p = 0.05. No such phenomena were observed in our results, suggesting more complete and more reliable evidence. This article is part of a discussion meeting issue ‘The growing ubiquity of algorithms in society: implications, impacts and innovations’.
- Published
- 2018
- Full Text
- View/download PDF
114. Challenges and opportunities in high-dimensional choice data analyses
- Author
-
Eric T. Bradlow, Prasad A. Naik, Lynd Bacon, Michel Wedel, David Madigan, Peter Lenk, Alan L. Montgomery, Wagner A. Kamakura, Anand V. Bodapati, and Jeffrey Kreulen
- Subjects
Marketing ,Economics and Econometrics ,Digital marketing ,business.industry ,Computer science ,High dimensional ,Data science ,Common value auction ,The Internet ,Geodemographic segmentation ,Business and International Management ,Dimension (data warehouse) ,business ,Marketing research ,Relationship marketing - Abstract
Modern businesses routinely capture data on millions of observations across subjects, brand SKUs, time periods, predictor variables, and store locations, thereby generating massive high-dimensional datasets. For example, Netflix has choice data on billions of movies selected, user ratings, and geodemographic characteristics. Similar datasets emerge in retailing with potential use of RFIDs, online auctions (e.g., eBay), social networking sites (e.g., mySpace), product reviews (e.g., ePinion), customer relationship marketing, internet commerce, and mobile marketing. We envision massive databases as four-way VAST matrix arrays of Variables × Alternatives × Subjects × Time where at least one dimension is very large. Predictive choice modeling of such massive databases poses novel computational and modeling issues, and the negligence of academic research to address them will result in a disconnect from the marketing practice and an impoverishment of marketing theory. To address these issues, we discuss and identify the challenges and opportunities for both practicing and academic marketers. Thus, we offer an impetus for advancing research in this nascent area and fostering collaboration across scientific disciplines to improve the practice of marketing in information-rich environment.
- Published
- 2008
- Full Text
- View/download PDF
115. Bayesian Logistic Injury Severity Score: A Method for Predicting Mortality Using International Classification of Disease-9 Codes
- Author
-
David Madigan, Ming Ouyang, and Randall S. Burd
- Subjects
Adult ,Male ,Databases, Factual ,Calibration (statistics) ,Bayesian probability ,Poison control ,Injury Severity Score ,International Classification of Diseases ,Predictive Value of Tests ,Injury prevention ,Statistics ,Humans ,Medicine ,Receiver operating characteristic ,business.industry ,Bayes Theorem ,General Medicine ,Middle Aged ,Regression ,Survival Rate ,Logistic Models ,Predictive value of tests ,Emergency Medicine ,Wounds and Injuries ,Female ,business - Abstract
OBJECTIVES: Owing to the large number of injury International Classification of Disease-9 revision (ICD-9) codes, it is not feasible to use standard regression methods to estimate the independent risk of death for each injury code. Bayesian logistic regression is a method that can select among a large numbers of predictors without loss of model performance. The purpose of this study was to develop a model for predicting in-hospital trauma deaths based on this method and to compare its performance with the ICD-9-based Injury Severity Score (ICISS). METHODS: The authors used Bayesian logistic regression to train and test models for predicting mortality based on injury ICD-9 codes (2,210 codes) and injury codes with two-way interactions (243,037 codes and interactions) using data from the National Trauma Data Bank (NTDB). They evaluated discrimination using area under the receiver operating curve (AUC) and calibration with the Hosmer-Lemeshow (HL) h-statistic. The authors compared performance of these models with one developed using ICISS. RESULTS: The discrimination of a model developed using individual ICD-9 codes was similar to that of a model developed using individual codes and their interactions (AUC = 0.888 vs. 0.892). Inclusion of injury interactions, however, improved model calibration (HL h-statistic = 2,737 vs. 1,347). A model based on ICISS had similar discrimination (AUC = .855) but showed worse calibration (HL h-statistic = 45,237) than those based on regression. CONCLUSIONS: A model that incorporates injury interactions had better predictive performance than one based only on individual injuries. A regression approach to predicting injury mortality based on injury ICD-9 codes yields models with better predictive performance than ICISS. Language: en
- Published
- 2008
- Full Text
- View/download PDF
116. Detecting adverse drug reactions following long-term exposure in longitudinal observational data: The exposure-adjusted self-controlled case series
- Author
-
Gianluca Trifirò, Patrick B. Ryan, Martijn J. Schuemie, Preciosa M. Coloma, David Madigan, and Medical Informatics
- Subjects
Statistics and Probability ,Drug ,medicine.medical_specialty ,Databases, Factual ,Drug-Related Side Effects and Adverse Reactions ,Epidemiology ,media_common.quotation_subject ,Adverse drug reactions ,Self-controlled case series ,Receiver operator characteristics curve ,computer.software_genre ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Health Information Management ,Health care ,medicine ,Adverse drug reactions, Claims databases, Methods analysis, Receiver operator characteristics curve, Self-controlled case series ,Humans ,Longitudinal Studies ,030212 general & internal medicine ,0101 mathematics ,Intensive care medicine ,Set (psychology) ,Adverse effect ,Drug Labeling ,media_common ,Safety monitoring ,Insurance Claim Reporting ,Series (stratigraphy) ,Insurance, Health ,business.industry ,Methods analysis ,Term (time) ,Observational Studies as Topic ,ROC Curve ,Research Design ,Observational study ,Data mining ,Claims databases ,business ,computer - Abstract
Most approaches used in postmarketing drug safety monitoring, including spontaneous reporting and statistical risk identification using electronic health care records, are primarily suited to pick up only acute adverse drug effects. With the availability of increasingly larger electronic health record and administrative claims databases comes the opportunity to monitor for potential adverse effects that occur only after prolonged exposure to a drug, but analysis methods are lacking. We propose an adaptation of the self-controlled case series design that uses the notion of accumulated exposure to capture long-term effects of drugs and evaluate extensions to correct for age and recurrent events. Several variations of the approach are tested on simulated data and two large insurance claims databases. To evaluate performance a set of positive and negative control drug–event pairs was created by medical experts based on drug product labels and review of the literature. Performance on the real data was measured using the area under the receiver operator characteristics curve. The best performing method achieved an area under the receiver operator characteristics curve of 0.86 in the largest database using a spline model, adjustment for age, and ignoring recurrent events, but it appears this performance can only be achieved with very large data sets.
- Published
- 2016
117. Data Mining in Pharmacovigilance
- Author
-
Donald J. O’Hara, Manfred Hauben, David Madigan, Alan M. Hochberg, and Stephanie J. Reisinger
- Subjects
Pharmacology ,business.industry ,Pharmacovigilance ,Medicine ,business ,Data science - Published
- 2007
- Full Text
- View/download PDF
118. Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model
- Author
-
David Madigan, Cynthia Rudin, Benjamin Letham, and Tyler H. McCormick
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Multivariate statistics ,Computer science ,Feature vector ,Bayesian probability ,Posterior probability ,Bayesian analysis ,Machine Learning (stat.ML) ,02 engineering and technology ,Decision list ,Machine learning ,computer.software_genre ,Statistics - Applications ,Machine Learning (cs.LG) ,Statistics - Machine Learning ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Applications (stat.AP) ,Interpretability ,Structure (mathematical logic) ,business.industry ,Generative model ,Computer Science - Learning ,classification ,Modeling and Simulation ,020201 artificial intelligence & image processing ,Artificial intelligence ,Statistics, Probability and Uncertainty ,business ,interpretability ,computer - Abstract
We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS$_2$ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more accurate., Comment: Published at http://dx.doi.org/10.1214/15-AOAS848 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- Published
- 2015
119. Causal Inference for Meta-Analysis and Multi-Level Data Structures, with Application to Randomized Studies of Vioxx
- Author
-
Wei Wang, Michael E. Sobel, and David Madigan
- Subjects
Male ,Psychometrics ,Randomized experiment ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,chemistry.chemical_compound ,Lactones ,0302 clinical medicine ,Meta-Analysis as Topic ,Risk Factors ,Econometrics ,Humans ,030212 general & internal medicine ,Sulfones ,0101 mathematics ,General Psychology ,Randomized Controlled Trials as Topic ,Nonsteroidal ,Cyclooxygenase 2 Inhibitors ,Applied Mathematics ,Individual participant data ,Multi level data ,chemistry ,Cardiovascular Diseases ,Meta-analysis ,Causal inference ,Psychology - Abstract
We construct a framework for meta-analysis and other multi-level data structures that codifies the sources of heterogeneity between studies or settings in treatment effects and examines their implications for analyses. The key idea is to consider, for each of the treatments under investigation, the subject's potential outcome in each study or setting were he to receive that treatment. We consider four sources of heterogeneity: (1) response inconsistency, whereby a subject's response to a given treatment would vary across different studies or settings, (2) the grouping of nonequivalent treatments, where two or more treatments are grouped and treated as a single treatment under the incorrect assumption that a subject's responses to the different treatments would be identical, (3) nonignorable treatment assignment, and (4) response-related variability in the composition of subjects in different studies or settings. We then examine how these sources affect heterogeneity/homogeneity of conditional and unconditional treatment effects. To illustrate the utility of our approach, we re-analyze individual participant data from 29 randomized placebo-controlled studies on the cardiovascular risk of Vioxx, a Cox-2 selective nonsteroidal anti-inflammatory drug approved by the FDA in 1999 for the management of pain and withdrawn from the market in 2004.
- Published
- 2015
120. Probabilistic Temporal Reasoning.
- Author
-
Steve Hanks and David Madigan
- Published
- 2005
- Full Text
- View/download PDF
121. Response to Comment on ‘Empirical assessment of methods for risk identification in healthcare data’
- Author
-
J. Marc Overhage, Paul E. Stang, Judith A. Racoosin, Patrick B. Ryan, Abraham G. Hartzema, and David Madigan
- Subjects
Statistics and Probability ,Empirical assessment ,Actuarial science ,Drug-Related Side Effects and Adverse Reactions ,Epidemiology ,Computer science ,Pharmacoepidemiology ,Product Surveillance, Postmarketing ,Risk identification ,Econometrics ,Electronic Health Records ,Humans ,Healthcare data - Published
- 2013
- Full Text
- View/download PDF
122. Effective directed tests for models with ordered categorical data
- Author
-
David Madigan, Harold B. Sackrowitz, and Arthur Cohen
- Subjects
Statistics and Probability ,Contingency table ,Multivariate statistics ,Pearson's chi-squared test ,computer.software_genre ,Data modeling ,symbols.namesake ,Sample size determination ,Econometrics ,symbols ,Nuisance parameter ,Data mining ,Statistics, Probability and Uncertainty ,Focus (optics) ,Categorical variable ,computer ,Mathematics - Abstract
Summary This paper offers a new method for testing one-sided hypotheses in discrete multivariate data models. One-sided alternatives mean that there are restrictions on the multidimensional parameter space. The focus is on models dealing with ordered categorical data. In particular, applications are concerned with R×C contingency tables. The method has advantages over other general approaches. All tests are exact in the sense that no large sample theory or large sample distribution theory is required. Testing is unconditional although its execution is done conditionally, section by section, where a section is determined by marginal totals. This eliminates any potential nuisance parameter issues. The power of the tests is more robust than the power of the typical linear tests often recommended. Furthermore, computer programs are available to carry out the tests efficiently regardless of the sample sizes or the order of the contingency tables. Both censored data and uncensored data models are discussed.
- Published
- 2003
- Full Text
- View/download PDF
123. [Untitled]
- Author
-
Greg Ridgeway and David Madigan
- Subjects
Computer Networks and Communications ,Computer science ,Bayesian probability ,Posterior probability ,Markov chain Monte Carlo ,Mixture model ,Bayesian inference ,computer.software_genre ,Computer Science Applications ,Reduction (complexity) ,symbols.namesake ,symbols ,Data mining ,Particle filter ,computer ,Importance sampling ,Information Systems - Abstract
Markov chain Monte Carlo (MCMC) techniques revolutionized statistical practice in the 1990s by providing an essential toolkit for making the rigor and flexibility of Bayesian analysis computationally practical. At the same time the increasing prevalence of massive datasets and the expansion of the field of data mining has created the need for statistically sound methods that scale to these large problems. Except for the most trivial examples, current MCMC methods require a complete scan of the dataset for each iteration eliminating their candidacy as feasible data mining techniques. In this article we present a method for making Bayesian analysis of massive datasets computationally feasible. The algorithm simulates from a posterior distribution that conditions on a smaller, more manageable portion of the dataset. The remainder of the dataset may be incorporated by reweighting the initial draws using importance sampling. Computation of the importance weights requires a single scan of the remaining observations. While importance sampling increases efficiency in data access, it comes at the expense of estimation efficiency. A simple modification, based on the “rejuvenation” step used in particle filters for dynamic systems models, sidesteps the loss of efficiency with only a slight increase in the number of data accesses. To show proof-of-concept, we demonstrate the method on two examples. The first is a mixture of transition models that has been used to model web traffic and robotics. For this example we show that estimation efficiency is not affected while offering a 99% reduction in data accesses. The second example applies the method to Bayesian logistic regression and yields a 98% reduction in data accesses.
- Published
- 2003
- Full Text
- View/download PDF
124. Birth month affects lifetime disease risk: a phenome-wide method
- Author
-
David Madigan, Nicholas P. Tatonetti, Mary Regina Boland, Zachary Shahn, and George Hripcsak
- Subjects
Adult ,Male ,Risk ,Pediatrics ,medicine.medical_specialty ,Adolescent ,Health Informatics ,Disease ,Logistic regression ,Research and Applications ,Young Adult ,Pregnancy ,maternal exposure ,medicine ,Attention deficit hyperactivity disorder ,Data Mining ,Electronic Health Records ,Humans ,Young adult ,embryonic and fetal development ,Aged ,Aged, 80 and over ,seasons ,business.industry ,Incidence (epidemiology) ,Incidence ,Birth Month ,personalized medicine ,Middle Aged ,medicine.disease ,3. Good health ,cardiovascular diseases ,Logistic Models ,Prenatal Exposure Delayed Effects ,prenatal nutritional physiological phenomena ,Population study ,Female ,business ,Algorithms ,Demography - Abstract
Objective An individual’s birth month has a significant impact on the diseases they develop during their lifetime. Previous studies reveal relationships between birth month and several diseases including atherothrombosis, asthma, attention deficit hyperactivity disorder, and myopia, leaving most diseases completely unexplored. This retrospective population study systematically explores the relationship between seasonal affects at birth and lifetime disease risk for 1688 conditions. Methods We developed a hypothesis-free method that minimizes publication and disease selection biases by systematically investigating disease-birth month patterns across all conditions. Our dataset includes 1 749 400 individuals with records at New York-Presbyterian/Columbia University Medical Center born between 1900 and 2000 inclusive. We modeled associations between birth month and 1688 diseases using logistic regression. Significance was tested using a chi-squared test with multiplicity correction. Results We found 55 diseases that were significantly dependent on birth month. Of these 19 were previously reported in the literature (P Conclusions Lifetime disease risk is affected by birth month. Seasonally dependent early developmental mechanisms may play a role in increasing lifetime risk of disease.
- Published
- 2015
125. Bayesian Variable and Transformation Selection in Linear Regression
- Author
-
Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting
- Subjects
Statistics and Probability ,Proper linear model ,Bayesian probability ,Posterior probability ,Markov chain Monte Carlo ,symbols.namesake ,Transformation (function) ,Linear regression ,Econometrics ,symbols ,Discrete Mathematics and Combinatorics ,Applied mathematics ,Sensitivity analysis ,Statistics, Probability and Uncertainty ,Selection (genetic algorithm) ,Mathematics - Abstract
This article suggests a method for variable and transformation selection based on posterior probabilities. Our approach allows for consideration of all possible combinations of untransformed and transformed predictors along with transformed and untransformed versions of the response. To transform the predictors in the model, we use a change-point model, or “change-point transformation,” which can yield more interpretable models and transformations than the standard Box–Tidwell approach. We also address the problem of model uncertainty in the selection of models. By averaging over models, we account for the uncertainty inherent in inference based on a single model chosen from the set of models under consideration. We use a Markov chain Monte Carlo model composition (MC3) method which allows us to average over linear regression models when the space of models under consideration is very large. This considers the selection of variables and transformations at the same time. In an example, we show that model a...
- Published
- 2002
- Full Text
- View/download PDF
126. [Untitled]
- Author
-
William DuMouchel, Christian Posse, Nandini Raghavan, Martha Nason, David Madigan, and Greg Ridgeway
- Subjects
Computer Networks and Communications ,Computer science ,Statistical model ,ComputerSystemsOrganization_PROCESSORARCHITECTURES ,Lossy compression ,computer.software_genre ,Computer Science Applications ,Statistical analyses ,Statistical analysis ,Data mining ,Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING ,Hardware_REGISTER-TRANSFER-LEVELIMPLEMENTATION ,computer ,Information Systems ,Data compression - Abstract
Squashing is a lossy data compression technique that preserves statistical information. Specifically, squashing compresses a massive dataset to a much smaller one so that outputs from statistical analyses carried out on the smaller (squashed) dataset reproduce outputs from the same statistical analyses carried out on the original dataset. Likelihood-based data squashing (LDS) differs from a previously published squashing algorithm insofar as it uses a statistical model to squash the data. The results show that LDS provides excellent squashing performance even when the target statistical analysis departs from the model used to squash the data.
- Published
- 2002
- Full Text
- View/download PDF
127. Book Reviews
- Author
-
David Madigan
- Subjects
Computational Mathematics ,Applied Mathematics ,Library science ,Sociology ,Theoretical Computer Science - Published
- 2002
- Full Text
- View/download PDF
128. Commentary: What Can We Really Learn From Observational Studies?
- Author
-
David Madigan and Patrick B. Ryan
- Subjects
Empirical assessment ,Safety surveillance ,Medical education ,Empirical research ,Epidemiology ,business.industry ,Comparative effectiveness research ,MEDLINE ,Medicine ,Observational study ,business - Published
- 2011
- Full Text
- View/download PDF
129. Optimizing the leveraging of real-world data to improve the development and use of medicines
- Author
-
Marc L. Berger, Craig Lipset, David Madigan, Alex Gutteridge, Kirsten Axelsen, and Prasun Subedi
- Subjects
HRHIS ,Comparative Effectiveness Research ,Knowledge management ,real-world data ,business.industry ,Computer science ,Information Dissemination ,Health Policy ,Big data ,Comparative effectiveness research ,Public Health, Environmental and Occupational Health ,Health technology ,Research Personnel ,Data sharing ,data access ,Data access ,health research ,Pharmaceutical Preparations ,big data ,Health care ,Humans ,business ,Delivery of Health Care ,Health policy - Abstract
Health research, including health outcomes and comparative effectiveness research, is on the cusp of a golden era of access to digitized real-world data, catalyzed by the adoption of electronic health records and the integration of clinical and biological information with other data. This era promises more robust insights into what works in health care. Several barriers, however, will need to be addressed if the full potential of these new data are fully realized; these will involve both policy solutions and stakeholder cooperation. Although a number of these issues have been widely discussed, we focus on the one we believe is the most important—the facilitation of greater openness among public and private stakeholders to collaboration, connecting information and data sharing, with the goal of making robust and complete data accessible to all researchers. In this way, we can better understand the consequences of health care delivery, improve the effectiveness and efficiency of health care systems, and develop advancements in health technologies. Early real-world data initiatives illustrate both potential and the need for future progress, as well as the essential role of collaboration and data sharing. Health policies critical to progress will include those that promote open source data standards, expand access to the data, increase data capture and connectivity, and facilitate communication of findings.
- Published
- 2014
130. Development and Evaluation of Infrastructure and Analytic Methods for Systematic Drug Safety Surveillance: Lessons and Resources from the Observational Medical Outcomes Partnership
- Author
-
Abraham G. Hartzema, Christian G. Reich, Emily Welebob, Patrick B. Ryan, J. Marc Overhage, Paul E. Stang, Thomas Scarnecchia, and David Madigan
- Subjects
Safety surveillance ,Knowledge management ,business.industry ,Management science ,General partnership ,Simulated data ,Claims data ,Medicine ,Observational study ,business - Published
- 2014
- Full Text
- View/download PDF
131. Alternative Markov Properties for Chain Graphs
- Author
-
Steen A. Andersson, Michael D. Perlman, and David Madigan
- Subjects
Statistics and Probability ,Combinatorics ,Continuous-time Markov chain ,Markov kernel ,Markov chain ,Variable-order Markov model ,Additive Markov chain ,Markov property ,Graph theory ,Statistics, Probability and Uncertainty ,Markov model ,Mathematics - Abstract
Graphical Markov models use graphs to represent possible dependences among statistical variables. Lauritzen, Wermuth, and Frydenberg (LWF) introduced a Markov property for chain graphs (CG): graphs that can be used to represent both structural and associative dependences simultaneously and that include both undirected graphs (UG) and acyclic directed graphs (ADG) as special cases. Here an alternative Markov property (AMP) for CGs is introduced and shown to be the Markov property satisfied by a block-recursive linear system with multivariate normal errors. This model can be decomposed into a collection of conditional normal models, each of which combines the features of multivariate linear regression models and covariance selection models, facilitating the estimation of its parameters. In the general case, necessary and sufficient conditions are given for the equivalence of the LWF and AMP Markov properties of a CG, for the AMP Markov equivalence of two CGs, for the AMP Markov equivalence of a CG to some ADG or decomposable UG, and for other equivalences. For CGs, in some ways the AMP property is a more direct extension of the ADG Markov property than is the LWF property.
- Published
- 2001
- Full Text
- View/download PDF
132. Analgesia for Colposcopy
- Author
-
David Madigan, Lynn M. Oliver, Lili Church, Allan Ellsworth, and Sharon A. Dobie
- Subjects
Adult ,Visual analogue scale ,Benzocaine ,Analgesic ,Administration, Oral ,Ibuprofen ,Placebo ,law.invention ,Randomized controlled trial ,Double-Blind Method ,law ,medicine ,Humans ,Local anesthesia ,Pain Measurement ,Colposcopy ,medicine.diagnostic_test ,business.industry ,Obstetrics and Gynecology ,Anesthesia ,Female ,business ,Gels ,medicine.drug - Abstract
Objective: To evaluate pain relief effectiveness of oral ibuprofen and topical benzocaine gel during colposcopy. Methods: In a double-masked, randomized controlled trial, women who attended a family medicine colposcopy clinic received one of four treatments, 800 mg of oral ibuprofen, 20% topical benzocaine, both, or placebos. Using visual analog scales, women recorded their pain after speculum placement, endocervical curettage (ECC), and cervical biopsy. Participants were 18–55 years old, spoke English, and were not taking other pain or psychotropic medications. Demographic and historical information was collected from each participant. Results: Ninety-nine subjects participated. Twenty-five received oral ibuprofen and topical benzocaine (median pain scores on a 10-point scale for speculum placement, ECC, and biopsy were 0.75, 3.00, and 3.38, respectively), 24 received oral placebo and topical benzocaine (1.00, 3.75, and 2.63), 24 received oral ibuprofen and topical placebo (0.63, 3.75, and 2.25), and 26 received oral and topical placebos (0.75, 3.50, and 3.00). There were no statistically significant differences in patient visual analogue pain scale scores across the four groups (statistical power, ECC = 0.74, cervical biopsy = 0.62). Younger women and women who had pain with speculum placement were more likely to have increased pain during ECC. Increased pain during biopsy was associated with history of severe dysmenorrhea but no other demographic or historical factors. Women overall reported ECC and biopsy to be mildly painful, with median scores of 3.5 for ECC and 2.75 for biopsy on a 10-point scale. The range in pain scores was large, with some women reporting severe pain (for ECC minimum = 0.25, maximum = 10.0; biopsy: minimum = 0.0, maximum = 9.0). Conclusion: Colposcopy is perceived as somewhat painful, but oral ibuprofen and topical benzocaine gel, alone or together, provided no advantage over placebo in decreasing colposcopy pain.
- Published
- 2001
- Full Text
- View/download PDF
133. Studies on the Effects of Common Process Variables on the Colloidal Stability of Beer
- Author
-
Clare Mcenroe, Dympna Harmey, Sara L. Matthews, David Madigan, Roger J. Kelly, and Henry Byrne
- Subjects
0106 biological sciences ,Haze ,Sorbent ,Chromatography ,Polyvinylpolypyrrolidone ,Chemistry ,food and beverages ,04 agricultural and veterinary sciences ,Shelf life ,040401 food science ,01 natural sciences ,Applied Microbiology and Biotechnology ,law.invention ,Warehouse ,Colloid ,chemistry.chemical_compound ,0404 agricultural biotechnology ,law ,010608 biotechnology ,Scientific method ,Filtration ,Food Science ,Biotechnology - Abstract
The effect of simple changes in process variables on the colloidal stability of beer was examined. Lager beers were stabilized using silica hydrogel (SHG) at 25-125 g/hl or polyvinylpolypyrrolidone (PVPP) at 10-50 g/hl. The effect of storage time (up to six days) and storage temperature (0 and 4°C), before stabilization and filtration, on the resultant colloidal stability of the beers was examined. The results were used to optimize the stabilization process with respect to sorbent dosage rate and beer storage time. Haze precursors were measured using the standard rapid assays: tannoids, total polyphenols, total flavanols, sensitive proteins, and alcohol-chill haze. Colloidal shelf life was measured as the time taken to reach a total haze of 2 EBC units in weeks at 37°C or months (28 days) at 18°C. Colloidal shelf life was directly proportional to sorbent dosage rate for both SHG and PVPP. The rapid assays were poor predictors of beer shelf life but were effective monitors of the stabilization process. The optimum storage time examined for lager beer before filtration was three days. Increasing the storage temperature from 0 to 4°C could be compensated for, in terms of measured colloidal shelf life, by increased stabilizer dosage rate.
- Published
- 2000
- Full Text
- View/download PDF
134. Furanic Aldehyde Analysis by HPLC as a Method to Determine Heat-Induced Flavor Damage to Beer
- Author
-
Michael Clements, Adela Perez, and David Madigan
- Subjects
0106 biological sciences ,chemistry.chemical_classification ,Heat induced ,Chromatography ,Chemistry ,Organoleptic ,04 agricultural and veterinary sciences ,Furfural ,040401 food science ,01 natural sciences ,Applied Microbiology and Biotechnology ,Aldehyde ,High-performance liquid chromatography ,Warehouse ,chemistry.chemical_compound ,0404 agricultural biotechnology ,010608 biotechnology ,Volatile organic compound ,Flavor ,Food Science ,Biotechnology - Abstract
The temperature at which packaged beer is stored in transit from the brewery to the consumer may profoundly influence the flavor of the product upon consumption. High-temperature storage usually re...
- Published
- 1998
- Full Text
- View/download PDF
135. Bayesian Model Averaging in Proportional Hazard Models: Assessing the Risk of a Stroke
- Author
-
David Madigan, Richard A. Kronmal, Adrian E. Raftery, and Chris Volinsky
- Subjects
Statistics and Probability ,Variable (computer science) ,Proportional hazards model ,Posterior probability ,Statistics ,Econometrics ,Context (language use) ,Feature selection ,Statistics, Probability and Uncertainty ,Bayesian inference ,Selection (genetic algorithm) ,Standard model (cryptography) ,Mathematics - Abstract
SUMMARY In the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for strokes, we apply Bayesian model averaging to the selection of variables in Cox proportional hazard models. We use an extension of the leaps-and-bounds algorithm for locating the models that are to be averaged over and make available S-PLUS software to implement the methods. Bayesian model averaging provides a posterior probability that each variable belongs in the model, a more directly interpretable measure of variable importance than a P-value. P-values from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable and do not account for model uncertainty. We introduce the partial predictive score to evaluate predictive performance. For the Cardiovascular Health Study, Bayesian model averaging predictively outperforms standard model selection and does a better job of assessing who is at high risk for a stroke.
- Published
- 1997
- Full Text
- View/download PDF
136. Evaluation of Rapid Colloidal Stabilization with Polyvinylpolypyrrolidone (PVPP)
- Author
-
Ian McMurrough, Roger J. Kelly, and David Madigan
- Subjects
0106 biological sciences ,Chromatography ,Polyvinylpolypyrrolidone ,Chemistry ,food and beverages ,macromolecular substances ,04 agricultural and veterinary sciences ,Laboratory scale ,040401 food science ,01 natural sciences ,Applied Microbiology and Biotechnology ,law.invention ,chemistry.chemical_compound ,Colloid ,0404 agricultural biotechnology ,law ,010608 biotechnology ,Filtration ,Food Science ,Biotechnology - Abstract
Several colorimetric, chromatographic, and nephelometric tests were compared as tools for monitoring the rapid (
- Published
- 1997
- Full Text
- View/download PDF
137. Bayesian Model Averaging for Linear Regression Models
- Author
-
David Madigan, Adrian E. Raftery, and Jennifer A. Hoeting
- Subjects
Statistics and Probability ,Markov chain ,Linear model ,Bayes factor ,Regression analysis ,Markov chain Monte Carlo ,Bayesian inference ,symbols.namesake ,Linear regression ,Econometrics ,symbols ,Applied mathematics ,Sensitivity analysis ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem involves averaging over all possible models (i.e., combinations of predictors) when making inferences about quantities of interest. This approach is often not practical. In this article we offer two alternative approaches. First, we describe an ad hoc procedure, “Occam's window,” which indicates a small set of models over which a model average can be computed. Second, we describe a Markov chain Monte Carlo approach that directly approximates the exact solution. In the presence of model uncertainty, both of these model averaging procedures provide better predictive performance than any single model that might reasonably have been selected. In the extreme case where there are many candidate predictors but ...
- Published
- 1997
- Full Text
- View/download PDF
138. On the Markov Equivalence of Chain Graphs, Undirected Graphs, and Acyclic Digraphs
- Author
-
Steen A. Andersson, Michael D. Perlman, and David Madigan
- Subjects
Discrete mathematics ,Statistics and Probability ,Markov chain ,Directed graph ,Computer Science::Computational Geometry ,Markov model ,Directed acyclic graph ,Combinatorics ,Indifference graph ,Chordal graph ,Graphical model ,Statistics, Probability and Uncertainty ,Random variable ,Mathematics - Abstract
Graphical Markov models use undirected graphs (UDGs), acyclic directed graphs (ADGs), or (mixed) chain graphs to represent possible dependencies among random variables in a multivariate distribution. Whereas a UDG is uniquely determined by its associated Markov model, this is not true for ADGs or for general chain graphs (which include both UDGs and ADGs as special cases). This paper addresses three questions regarding the equivalence of graphical Markov models: when is a given chain graph Markov equivalent (1) to some UDG? (2) to some (at least one) ADG? (3) to some decomposable UDG? The answers are obtained by means of an extension of Frydenberg’s (1990) elegant graph-theoretic characterization of the Markov equivalence of chain graphs.
- Published
- 1997
- Full Text
- View/download PDF
139. Bayesian methods for estimation of the size of a closed population
- Author
-
Jeremy C. York and David Madigan
- Subjects
Statistics and Probability ,Estimation ,education.field_of_study ,Applied Mathematics ,General Mathematics ,Bayesian probability ,Population ,Interval (mathematics) ,Agricultural and Biological Sciences (miscellaneous) ,Bayesian statistics ,Frequentist inference ,Statistics ,Covariate ,Econometrics ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,education ,Bayesian average ,Mathematics - Abstract
SUMMARY A Bayesian methodology for estimating the size of a closed population from multiple incomplete administrative lists is proposed. The approach allows for a variety of dependence structures between the lists, can make use of covariates, and explicitly accounts for model uncertainty. Interval estimates from this approach are compared to frequentist and previously published Bayesian approaches. Several examples are considered.
- Published
- 1997
- Full Text
- View/download PDF
140. [Untitled]
- Author
-
Christopher M. Triggs, David Madigan, Michael D. Perlman, and Steen A. Andersson
- Subjects
Combinatorics ,Transitive relation ,Markov chain ,Conditional independence ,Artificial Intelligence ,Applied Mathematics ,Linear regression ,Multivariate normal distribution ,Seemingly unrelated regressions ,Markov model ,Missing data ,Mathematics - Abstract
Lattice conditional independence (LCI) models for multivariate normal data recently have been introduced for the analysis of nondmonotone missing data patterns and of nonnested dependent linear regression models (\equiv seemingly unrelated regressions). It is shown here that the class of LCI models coincides with a subclass of the class of graphical Markov models determined by acyclic digraphs (ADGs), namely, the subclass of transitive ADG models. An explicit graphdtheoretic characterization of those ADGs that are Markov equivalent to some transitive ADG is obtained. This characterization allows one to determine whether a specific ADG D is Markov equivalent to some transitive ADG, hence to some LCI model, in polynomial time, without an exhaustive search of the (possibly superexponentially large) equivalence class [D]. These results do not require the existence or positivity of joint densities.
- Published
- 1997
- Full Text
- View/download PDF
141. [Untitled]
- Author
-
Clark Glymour, Daryl Pregibon, Padhraic Smyth, and David Madigan
- Subjects
Computer Networks and Communications ,Computer science ,Interface (computing) ,Data mining ,Variance (accounting) ,computer.software_genre ,computer ,Data science ,Field (computer science) ,Computer Science Applications ,Information Systems - Abstract
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
- Published
- 1997
- Full Text
- View/download PDF
142. Empirical performance of LGPS and LEOPARD: lessons for developing a risk identification and analysis system
- Author
-
Patrick B. Ryan, David Madigan, Martijn J. Schuemie, and Medical Informatics
- Subjects
Shrinkage estimator ,Databases, Factual ,Drug-Related Side Effects and Adverse Reactions ,Coverage probability ,Toxicology ,Machine learning ,computer.software_genre ,Poisson distribution ,030226 pharmacology & pharmacy ,Risk Assessment ,03 medical and health sciences ,symbols.namesake ,0302 clinical medicine ,Bias ,biology.animal ,Medicine ,Humans ,Pharmacology (medical) ,030212 general & internal medicine ,Healthcare data ,Probability ,Retrospective Studies ,Pharmacology ,biology ,business.industry ,Active monitoring ,Risk identification ,Leopard ,3. Good health ,Research Design ,Area Under Curve ,symbols ,Observational study ,Artificial intelligence ,business ,computer - Abstract
Background The availability of large-scale observational healthcare data allows for the active monitoring of safety of drugs, but research is needed to determine which statistical methods are best suited for this task. Recently, the Longitudinal Gamma Poisson Shrinker (LGPS) and Longitudinal Evaluation of Observational Profiles of Adverse events Related to Drugs (LEOPARD) methods were developed specifically for this task. LGPS applies Bayesian shrinkage to an estimated incidence rate ratio, and LEOPARD aims to detect and discard associations due to protopathic bias. The operating characteristics of these methods still need to be determined. Objective Establish the operating characteristics of LGPS and LEOPARD for large scale observational analysis in drug safety. Research Design We empirically evaluated LGPS and LEOPARD in five real observational healthcare databases and six simulated datasets. We retrospectively studied the predictive accuracy of the methods when applied to a collection of 165 positive control and 234 negative control drug-outcome pairs across four outcomes: acute liver injury, acute myocardial infarction, acute kidney injury, and upper gastrointestinal bleeding. Results In contrast to earlier findings, we found that LGPS and LEOPARD provide weak discrimination between positive and negative controls, although the use of LEOPARD does lead to higher performance in this respect. Furthermore, the methods produce biased estimates and confidence intervals that have poor coverage properties. Conclusions For the four outcomes we examined, LGPS and LEOPARD may not be the designs of choice for risk identification.
- Published
- 2013
143. Large-Scale Parametric Survival Analysis†
- Author
-
David Madigan, Sushil Mittal, Jerry Q. Cheng, and Randall S. Burd
- Subjects
Statistics and Probability ,Adolescent ,Epidemiology ,Computer science ,Calibration (statistics) ,Computation ,Scale (descriptive set theory) ,Breast Neoplasms ,Overfitting ,Machine learning ,computer.software_genre ,Article ,Data acquisition ,Humans ,Child ,Parametric statistics ,Models, Statistical ,business.industry ,Middle Aged ,Survival Analysis ,Range (mathematics) ,Child, Preschool ,Data Interpretation, Statistical ,Parametric model ,Wounds and Injuries ,Female ,Artificial intelligence ,business ,computer - Abstract
Survival analysis has been a topic of active statistical research in the past few decades with applications spread across several areas. Traditional applications usually consider data with only a small numbers of predictors with a few hundreds or thousands of observations. Recent advances in data acquisition techniques and computation power have led to considerable interest in analyzing very-high-dimensional data where the number of predictor variables and the number of observations range between 10(4) and 10(6). In this paper, we present a tool for performing large-scale regularized parametric survival analysis using a variant of the cyclic coordinate descent method. Through our experiments on two real data sets, we show that application of regularized models to high-dimensional data avoids overfitting and can provide improved predictive performance and calibration over corresponding low-dimensional models.
- Published
- 2013
144. CONTROL OF FERULIC ACID AND 4-VINYL GUAIACOL IN BREWING
- Author
-
Gerard P. Hennigan, Ian McMurrough, Niall McNulty, June Hurley, Dan Donnelly, Malcolm R. Smyth, Ann-Marie Doyle, and David Madigan
- Subjects
Chromatography ,biology ,Decarboxylation ,business.industry ,food and beverages ,Phenolic acid ,biology.organism_classification ,Saccharomyces ,Yeast ,Ferulic acid ,chemistry.chemical_compound ,chemistry ,Brewing ,Guaiacol ,Phenols ,business ,Food Science - Abstract
Phenolic acids in beer are important because they can be decarboxylated to phenols, which usually impart off-flavours. An improved high performance liquid chromatographic system was used to monitor phenolic acids and phenols during the brewing process. Ferulic acid was the most significant phenolic acid found in beers prepared from malted barley. Extraction of ferulic acid from malt involved an enzymatic release mechanism with an optimum temperature about 45°C. Mashing-in at 65°C significantly decreased the release of free ferulic acid into the wort. Wort boiling produced 4-vinyl guaiacol by thermal decarboxylation, in amounts (0.3 mg/L) close to its taste threshold, from worts that contained high contents of free ferulic acid (> 6 mg/L). The capacity of yeasts to decarboxylate phenolic acids (Pof + phenotype) was strong in wild strains of Saccharomyces and absent in all lager brewing yeast and most ale brewing yeasts. Some top-fermenting strains, especially those used in wheat beer production, possessed a weak decarboxylating activity (i.e. Pol ± ). During storage of beers there were appreciable temperature-dependent losses of 4-vinyl guaiacol. These results indicated that the production of 4-vinyl guaiacol is amenable to close technological control.
- Published
- 1996
- Full Text
- View/download PDF
145. A method for simultaneous variable selection and outlier identification in linear regression
- Author
-
David Madigan, Jennifer A. Hoeting, and Adrian E. Raftery
- Subjects
Statistics and Probability ,Markov chain ,business.industry ,Applied Mathematics ,Posterior probability ,Prediction interval ,Pattern recognition ,Markov chain Monte Carlo ,Feature selection ,Bayesian inference ,Computational Mathematics ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Linear regression ,Outlier ,symbols ,Artificial intelligence ,business ,Mathematics - Abstract
We suggest a method for simultaneous variable selection and outlier identification based on the computation of posterior model probabilities. This avoids the problem that the model you select depends upon the order in which variable selection and outlier identification are carried out. Our method can find multiple outliers and appears to be successful in identifying masked outliers. We also address the problem of model uncertainty via Bayesian model averaging. For problems where the number of models is large, we suggest a Markov chain Monte Carlo approach to approximate the Bayesian model average over the space of all possible variables and outliers under consideration. Software for implementing this approach is described. In an example, we show that model averaging via simultaneous variable selection and outlier identification improves predictive performance and provides more accurate prediction intervals as compared to any single model that might reasonably be selected.
- Published
- 1996
- Full Text
- View/download PDF
146. The Role of Flavanoid Polyphenols in Beer Stability
- Author
-
Malcolm R. Smyth, David Madigan, Roger J. Kelly, and Ian McMurrough
- Subjects
0106 biological sciences ,Polyvinylpolypyrrolidone ,Organoleptic ,Flavour ,04 agricultural and veterinary sciences ,040401 food science ,01 natural sciences ,Applied Microbiology and Biotechnology ,chemistry.chemical_compound ,0404 agricultural biotechnology ,chemistry ,Polyphenol ,010608 biotechnology ,Amide ,Organic chemistry ,Chemical composition ,Flavor ,Food Science ,Biotechnology - Abstract
Treatment of different beers with polyvinylpolypyrollidone (PVPP) at 100 g/hl was shown to remove polyphenols and simultaneously to diminish the endogenous reducing capacities by 9-38%. Polyphenols...
- Published
- 1996
- Full Text
- View/download PDF
147. Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs
- Author
-
Chris Volinsky, Steen A. Andersson, Michael D. Perlman, and David Madigan
- Subjects
Statistics and Probability ,Combinatorics ,symbols.namesake ,Markov chain ,Model selection ,symbols ,Bayesian network ,Graph (abstract data type) ,Markov chain Monte Carlo ,Statistical model ,Bayesian inference ,Equivalence class ,Mathematics - Abstract
Acyclic digraphs (ADGs) are widely used to describe dependences among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow explicit maximum likelihood estimates and that are well suited to building Bayesian networks for expert systems. There may, however, be many ADGs that determine the same dependence (= Markov) model. Thus, the family of all ADGs with a given set of vertices is naturally partitioned into Markov-equivalence classes, each class being associated with a unique statistical model. Statistical procedures, such as model selection or model averaging, that fail to take into account these equivalence classes, may incur substantial computational or other inefficiencies. Recent results have shown that each Markov-equivalence class is uniquely determined by a single chain graph, the essential graph, that is itself Markov-equivalent simultaneously to all ADGs in the equivalence class. Here we propose t...
- Published
- 1996
- Full Text
- View/download PDF
148. Semipreparative Chromatographic Procedure for the Isolation of Dimeric and Trimeric Proanthocyanidins from Barley
- Author
-
Ian McMurrough, David Madigan, and Malcolm R. Smyth
- Subjects
chemistry.chemical_compound ,Chromatography ,chemistry ,Elution ,Size-exclusion chromatography ,Catechin ,General Chemistry ,Procyanidin B3 ,Reversed-phase chromatography ,General Agricultural and Biological Sciences ,Procyanidin C2 ,High-performance liquid chromatography ,Prodelphinidin B3 - Abstract
A semipreparative chromatographic method for the isolation of small amounts (10−20 μg) of dimeric and trimeric proanthocyanidins from barley is described. Concentrated extracts of barley were injected onto a high-performance gel filtration column (Superdex 75 HR), and were eluted with methanol. This procedure resolved the dimeric proanthocyanidins (prodelphinidin B3 and procyanidin B3), as well as the trimeric procyanidin C2 and three other trimeric prodelphinidins. The separated flavanoid peaks were collected and their contents were estimated by UV spectrophotometry, reaction with p-dimethylaminocinnamadehyde, and reversed phase HPLC with electrochemical detection. This method produced proanthocyanidins in sufficient amounts to calibrate a system for direct injection chromatographic analysis of beers and barley extracts. The method described may be optimized for the isolation of dimeric proanthocyanidins only, in which case the preparation can take as little as 3 h; alternatively, by extending the chroma...
- Published
- 1996
- Full Text
- View/download PDF
149. Application of Gradient Ion Chromatography with Pulsed Electrochemical Detection to the Analysis of Carbohydrates in Brewing
- Author
-
Ian McMurrough, Malcolm R. Smyth, and David Madigan
- Subjects
0106 biological sciences ,Chromatography ,Chemistry ,business.industry ,Ion chromatography ,food and beverages ,04 agricultural and veterinary sciences ,Electrochemical detection ,040401 food science ,01 natural sciences ,Applied Microbiology and Biotechnology ,High-performance liquid chromatography ,Amperometry ,Ion ,0404 agricultural biotechnology ,010608 biotechnology ,Brewing ,Separation method ,Fermentation ,business ,Food Science ,Biotechnology - Abstract
A rapid ion chromatographic method for the determination of carbohydrates in samples collected at all stages of beer production is described. Carbohydrates were separated by direct-injection anion ...
- Published
- 1996
- Full Text
- View/download PDF
150. Bayesian methods for design and analysis of safety trials
- Author
-
Mani Lakshminarayanan, John Scott, David H. Manner, David Madigan, Laura Thompson, James D. Stamey, H. Amy Xia, and Karen L. Price
- Subjects
Statistics and Probability ,Research design ,medicine.medical_specialty ,Drug-Related Side Effects and Adverse Reactions ,Bayesian probability ,Alternative medicine ,computer.software_genre ,Risk Assessment ,Food and drug administration ,Bayes' theorem ,Meta-Analysis as Topic ,medicine ,Humans ,Pharmacology (medical) ,Pharmacology ,Clinical Trials as Topic ,business.industry ,Bayes Theorem ,Risk analysis (engineering) ,Medical product ,Research Design ,Sample Size ,Data mining ,business ,Risk assessment ,computer - Abstract
Safety assessment is essential throughout medical product development. There has been increased awareness of the importance of safety trials recently, in part due to recent US Food and Drug Administration guidance related to thorough assessment of cardiovascular risk in the treatment of type 2 diabetes. Bayesian methods provide great promise for improving the conduct of safety trials. In this paper, the safety subteam of the Drug Information Association Bayesian Scientific Working Group evaluates challenges associated with current methods for designing and analyzing safety trials and provides an overview of several suggested Bayesian opportunities that may increase efficiency of safety trials along with relevant case examples.
- Published
- 2013
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.