72 results on '"REGRESSION-MODELS"'
Search Results
2. Celebrating 40 years of panel data analysis
- Author
-
Tom Wansbeek, Vasilis Sarafidis, and Research programme EEF
- Subjects
Economics and Econometrics ,Temporal effects ,History ,Multi-dimensional data ,Common factor models ,BIAS REDUCTION ,Panel data analysis ,TESTING SLOPE HOMOGENEITY ,Library science ,TIME-SERIES ,REGRESSION-MODELS ,Unobserved heterogeneity ,Multi-level data ,HETEROGENEITY ,SPECIFICATION ,Aggregation bias ,Nonlinear models ,Cross-sectional dependence ,IDENTIFICATION ,Applied Mathematics ,Multi level data ,Bias reduction ,Dynamic models ,Econometric and statistical methods ,LINEAR-MODELS ,Dynamic relationships ,Incidental parameter problem ,Omitted variables ,Econometrics not elsewhere classified ,DYNAMIC-MODELS ,CROSS-SECTION ,Multi dimensional data ,Panel data - Abstract
The present special issue features a collection of papers presented at the 2017 International Panel Data Conference, hosted by the University of Macedonia in Thessaloniki, Greece. The conference marked the 40th anniversary of the inaugural International Panel Data Conference, which was held in 1977 at INSEE in Paris, under the auspices of the French National Centre for Scientific Research. As a collection, the papers appearing in this special issue of the Journal of Econometrics continue to advance the analysis of panel data, and paint a state-of-the-art picture of the field. (c) 2020 Elsevier B.V. All rights reserved.
- Published
- 2021
3. Body size and weight change over adulthood and risk of breast cancer by menopausal and hormone receptor status
- Author
-
Julie E. Buring, Rashmi Sinha, Molin Wang, Gary G. Goodman, Kim Robien, Niclas Håkansson, Alpa V. Patel, M. Elena Martinez, Yu Chen, Shoichiro Tsugane, Walter C. Willett, Anne Zeleniuch-Jacquotte, Anthony B. Miller, Kala Visvanathan, Loic Le Marchand, Claudia Agnoli, Regina G. Ziegler, Thomas E. Rohan, Vittorio Krogh, Kami K. White, Alicja Wolk, Jeanine M. Genkinger, Stephanie A. Smith-Warner, Piet A. van den Brandt, Lauren R. Teras, Leslie Bernstein, Ruifeng Li, Hans-Olov Adami, Marian L. Neuhouser, Gretchen L. Gierach, Graham G. Giles, Norie Sawada, Tao Hou, I-Min Lee, A. Heather Eliassen, Linda M. Liao, Avonne E. Connor, Leo J. Schouten, Roger L. Milne, Elisabete Weiderpass, Rachael Z. Stolzenberg-Solomon, Anna E. Prizment, Epidemiologie, RS: GROW - R1 - Prevention, and RS: CAPHRI - R5 - Optimising Patient Care
- Subjects
Adult ,0301 basic medicine ,Epidemiology ,Physiology ,Weight Gain ,Body Mass Index ,03 medical and health sciences ,EARLY-LIFE ,0302 clinical medicine ,Breast cancer ,MASS INDEX ,REGRESSION-MODELS ,Risk Factors ,Humans ,Medicine ,Estrogen receptor ,Mass index ,Prospective Studies ,Prospective cohort study ,METAANALYSIS ,Aged ,business.industry ,Weight change ,PROGESTERONE ,Middle Aged ,Body weight ,medicine.disease ,Body height ,ESTROGEN ,030104 developmental biology ,PHYSICAL-ACTIVITY ,Premenopause ,Receptors, Estrogen ,POSTMENOPAUSAL WOMEN ,030220 oncology & carcinogenesis ,Cohort ,Cohort studies ,GROWTH ,Female ,SEX-HORMONES ,Menopause ,medicine.symptom ,Breast neoplasms ,business ,Weight gain ,Body mass index ,Meta-Analysis ,Cohort study - Abstract
Associations between anthropometric factors and breast cancer (BC) risk have varied inconsistently by estrogen and/or progesterone receptor (ER/PR) status. Associations between prediagnostic anthropometric factors and risk of premenopausal and postmenopausal BC overall and ER/PR status subtypes were investigated in a pooled analysis of 20 prospective cohorts, including 36,297 BC cases among 1,061,915 women, using multivariable Cox regression analyses, controlling for reproductive factors, diet and other risk factors. We estimated dose–response relationships and tested for nonlinear associations using restricted cubic splines. Height showed positive, linear associations for premenopausal and postmenopausal BC risk (6–7% RR increase per 5 cm increment), with stronger associations for receptor-positive subtypes. Body mass index (BMI) at cohort baseline was strongly inversely associated with premenopausal BC risk, and strongly positively—and nonlinearly—associated with postmenopausal BC (especially among women who never used hormone replacement therapy). This was primarily observed for receptor-positive subtypes. Early adult BMI (at 18–20 years) showed inverse, linear associations for premenopausal and postmenopausal BC risk (21% and 11% RR decrease per 5 kg/m2, respectively) with stronger associations for receptor-negative subtypes. Adult weight gain since 18–20 years was positively associated with postmenopausal BC risk, stronger for receptor-positive subtypes, and among women who were leaner in early adulthood. Women heavier in early adulthood generally had reduced premenopausal BC risk, independent of later weight gain. Positive associations between height, baseline (adult) BMI, adult weight gain and postmenopausal BC risk were substantially stronger for hormone receptor-positive versus negative subtypes. Premenopausal BC risk was positively associated with height, but inversely with baseline BMI and weight gain (mostly in receptor-positive subtypes). Inverse associations with early adult BMI seemed stronger in receptor-negative subtypes of premenopausal and postmenopausal BC. Electronic supplementary material The online version of this article (10.1007/s10654-020-00688-3) contains supplementary material, which is available to authorized users.
- Published
- 2021
4. Robust estimation of large panels with factor structures
- Author
-
Avarucci, M and Zaffaroni, P
- Subjects
Statistics and Probability ,GLS ,Science & Technology ,Factor structure ,Statistics & Probability ,0104 Statistics ,1603 Demography ,REGRESSION-MODELS ,Physical Sciences ,1403 Econometrics ,Panel ,GROWTH ,INFERENCE ,Statistics, Probability and Uncertainty ,Robustness ,Mathematics ,Weighted least squares estimation ,ERROR - Abstract
This article studies estimation of linear panel regression models with heterogeneous coefficients using a class of weighted least squares estimators, when both the regressors and the error possibly contain a common latent factor structure. Our theory is robust to the specification of such a factor structure because it does not require any information on the number of factors or estimation of the factor structure itself. Moreover, our theory is efficient, in certain circumstances, because it nests the GLS principle. We first show how our unfeasible weighted-estimator provides a bias-adjusted estimator with the conventional limiting distribution, for situations in which the OLS is affected by a first-order bias. The technical challenge resolved in the article consists of showing how these properties are preserved for the feasible weighted estimator in a double-asymptotics setting. Our theory is illustrated by extensive Monte Carlo experiments and an empirical application that investigates the link between capital accumulation and economic growth in an international setting. Supplementary materials for this article are available online.
- Published
- 2022
5. Analysing point patterns on networks — A review
- Author
-
Baddeley, Adrian, Nair, Gopalan, Rakshit, Suman, McSwiggan, Greg, Davies, Tilman, Baddeley, Adrian, Nair, Gopalan, Rakshit, Suman, McSwiggan, Greg, and Davies, Tilman
- Abstract
We review recent research on statistical methods for analysing spatial patterns of points on a network of lines, such as road accident locations along a road network. Due to geometrical complexities, the analysis of such data is extremely challenging, and we describe several common methodological errors. The intrinsic lack of homogeneity in a network militates against the traditional methods of spatial statistics based on stationary processes. Topics include kernel density estimation, relative risk estimation, parametric and non-parametric modelling of intensity, second-order analysis using the K-function and pair correlation function, and point process model construction. An important message is that the choice of distance metric on the network is pivotal in the theoretical development and in the analysis of real data. Challenges for statistical computation are discussed and open-source software is provided.
- Published
- 2021
6. The single‐index/Cox mixture cure model
- Author
-
Ingrid Van Keilegom, Catherine Legrand, Maïlis Amico, and UCL - SSH/LIDAM/ISBA - Institut de Statistique, Biostatistique et Sciences Actuarielles
- Subjects
Life Sciences & Biomedicine - Other Topics ,proportional hazards model ,Logistic regression ,01 natural sciences ,survival analysis ,010104 statistics & probability ,REGRESSION-MODELS ,Statistics ,MAXIMUM-LIKELIHOOD ,050205 econometrics ,Event (probability theory) ,Mathematics ,Likelihood Functions ,education.field_of_study ,Applied Mathematics ,05 social sciences ,General Medicine ,3. Good health ,Survival function ,Physical Sciences ,Kernel smoother ,Cure models ,Female ,General Agricultural and Biological Sciences ,Life Sciences & Biomedicine ,Statistics and Probability ,kernel smoothing ,Statistics & Probability ,Population ,Breast Neoplasms ,General Biochemistry, Genetics and Molecular Biology ,0502 economics and business ,Expectation–maximization algorithm ,Covariate ,Humans ,Computer Simulation ,0101 mathematics ,EM algorithm ,education ,Biology ,Proportional Hazards Models ,logistic model ,Science & Technology ,Models, Statistical ,General Immunology and Microbiology ,Proportional hazards model ,Survival Analysis ,Logistic Models ,Mathematical & Computational Biology - Abstract
In survival analysis, it often happens that a certain fraction of the subjects under study never experience the event of interest, that is, they are considered "cured." In the presence of covariates, a common model for this type of data is the mixture cure model, which assumes that the population consists of two subpopulations, namely the cured and the non-cured ones, and it writes the survival function of the whole population given a set of covariates as a mixture of the survival function of the cured subjects (which equals one), and the survival function of the non-cured ones. In the literature, one usually assumes that the mixing probabilities follow a logistic model. This is, however, a strong modeling assumption, which might not be met in practice. Therefore, in order to have a flexible model which at the same time does not suffer from curse-of-dimensionality problems, we propose in this paper a single-index model for the mixing probabilities. For the survival function of the non-cured subjects we assume a Cox proportional hazards model. We estimate this model using a maximum likelihood approach. We also carry out a simulation study, in which we compare the estimators under the single-index model and under the logistic model for various model settings, and we apply the new model and estimation method on a breast cancer data set. ispartof: BIOMETRICS vol:75 issue:2 pages:452-462 ispartof: location:United States status: published
- Published
- 2019
7. Inference on zero inflated ordinal models with semiparametric link
- Author
-
Kalyan Das and Ujjwal Das
- Subjects
Statistics and Probability ,Ordinal data ,POISSON MODELS ,GENERALIZED LINEAR-MODELS ,Inference ,Scale (descriptive set theory) ,Zero inflation ,01 natural sciences ,010104 statistics & probability ,REGRESSION-MODELS ,RATIO ,Sieve MLE ,0502 economics and business ,Statistics ,Covariate ,Statistical inference ,Semiparametric regression ,0101 mathematics ,050205 econometrics ,Mathematics ,Applied Mathematics ,05 social sciences ,Estimator ,Semiparametric model ,Computational Mathematics ,COUNT DATA ,Computational Theory and Mathematics ,TESTS ,Knot selection - Abstract
In socioeconomics or in Biological studies, observations on individuals are often observed longitudinally on a Likert-type scale with substantially large proportion of zeros. This leads to a special case of mixture structured data where extra-variation occurs. Obviously the standard ordinal data analysis fails to provide appropriate statistical inference. We propose a suitable zero inflated semiparametric ordinal model that takes into account the non linear link between the ordinal response and a covariate. A sieve maximum likelihood estimator(MLE) is proposed for the regression parameter of interest. We also propose a test for the zero proportion in this semiparametric model. A simulation study has been carried out to investigate the performance of the estimator as well as the test. We illustrate the methodology using data from a survey on Tuberculosis patients in and around Kolkata, India. (C) 2018 Elsevier B.V. All rights reserved.
- Published
- 2018
8. Semiparametric transition models
- Author
-
Chao Hui Koo, Pavel Čížek, Econometrics and Operations Research, and Research Group: Econometrics
- Subjects
Economics and Econometrics ,Statistics::Theory ,Local linear estimation ,nonlinear time series ,Statistics::Applications ,Transition function ,semiparametric estimation ,Transition (fiction) ,NONLINEARITIES ,REGRESSION-MODELS ,Statistics::Methodology ,Statistical physics ,regime-switching models ,Time series ,COEFFICIENT ,Mathematics - Abstract
A new semiparametric time series model is introduced - the semiparametric transition (SETR) model - that generalizes the threshold and smooth transition models by letting the transition function to be of an unknown form. Estimation is based on a combination of the (local) least squares estimations of the transition function and regression parameters. The asymptotic behavior for the regression coefficient estimator of the SETR model is established, including its oracle property. Monte Carlo simulations demonstrate that the proposed estimator is more robust to the form of the transition function than parametric threshold and smooth transition methods and more precise than varying coefficient estimators.
- Published
- 2021
- Full Text
- View/download PDF
9. Experimental investigation and prediction of performance and emission responses of a CI engine fuelled with different metal-oxide based nanoparticles-diesel blends using different machine learning algorithms
- Author
-
Ali Etem Gürel, Suat Sarıdemir, Ümit Ağbulut, and [Belirlenecek]
- Subjects
Vegetable-Oil ,Thermal efficiency ,Materials science ,Support Vector Machine ,020209 energy ,Artificial Neural-Networks ,Combustion ,02 engineering and technology ,Machine learning ,computer.software_genre ,Nanodiesel ,Industrial and Manufacturing Engineering ,Emission ,Diesel fuel ,Brake specific fuel consumption ,chemistry.chemical_compound ,Nanoparticle ,Thermal-Conductivity ,020401 chemical engineering ,Expanded Perlite ,Engine performance ,0202 electrical engineering, electronic engineering, information engineering ,0204 chemical engineering ,Electrical and Electronic Engineering ,NOx ,Civil and Structural Engineering ,business.industry ,Mechanical Engineering ,Exhaust gas ,Building and Construction ,Exhaust Emissions ,Regression-Models ,Pollution ,Global Solar-Radiation ,General Energy ,chemistry ,Nitrogen oxide ,Artificial intelligence ,Biodiesel ,Combustion chamber ,business ,Prediction ,computer ,Algorithm - Abstract
Deep learning (DL), Artificial Neural Network (ANN), Kernel Nearest Neighbor (k-NN), and Support Vector Machine (SVM) have been applied to numerous fields owing to their high-accuracy and ability to analyze the non-linear problems. In this study, these machine learning algorithms (MLAs) are used to predict emission and performance characteristics of a CI engine fuelled with various metal-oxide based nano particles (Al2O3, CuO, and TiO2) at a mass fractions of 200 ppm. Assessed parameters in the study are carbon dioxide (CO), nitrogen oxide (NOx), exhaust gas temperature (EGT), brake specific fuel consumption (BSFC), and brake thermal efficiency (BTE). To evaluate the success of algorithms, four metrics (R-2, RMSE, rRMSE, and MBE) are discussed in detail. Tests performed at varying engine speeds from 1500 rpm to 3400 rpm with the intervals of 100 rpm. The addition of nanoparticles simultaneously reduced CO and NOx emissions because they ensured more complete combustion thanks to their inherent oxygen, the higher surface to volume ratio, superior thermal conductivities and their catalytic activity role. Further, the nano-sized particles ensured an accelerated heat transfer from the combustion chamber. In comparison with that of neat diesel fuel, the reduction in NOx is found to be 3.28, 7.53, and 10.05%, and the reduction in CO is found to be 8.3, 11.6, and 15.5% for TiO2, Al2O3, and CuO test fuels, respectively. Moreover, the presence of nanoparticles in test fuels has improved engine performance. As compared with those of neat diesel fuel, the doping of nanoparticles drops the BSFC value by 5.54, 7.89, and 9.96% for TiO2, CuO, and Al2O3, respectively, and enhanced BTE value to be 6.15, 8.87, and 11.23% for TiO2, CuO, and Al2O3, respectively. On the other hand, it can be said that all algorithms presented very satisfying results in the prediction of CI engine responses. All R-2 has changed between 0.901 and 0.994, and DL has given the highest R-2 value for each engine response. In terms of rRMSE, all results (except for one result in k-NN) are categorized as excellent according to the classification in the literature. Considering all metrics together, DL is giving the best results in the prediction of engine responses for the dataset used in this paper. Then it is closely followed by ANN, SVM, and k-NN algorithms, respectively. In conclusion, this paper is proving that the nanoparticle addition for ICEs is significantly dropping the exhaust pollutants, and improving the engine performance, and further the results can be successfully predicted with the machine learning algorithms. (c) 2020 Elsevier Ltd. All rights reserved. Duzce University Scientific Research Projects Coordination UnitDuzce University [2020.07.04.1097, 2019.07.04.1049]; Duzce UniversityDuzce University This present work is being funded by Duzce University Scientific Research Projects Coordination Unit with the project numbers 2020.07.04.1097, and 2019.07.04.1049. The authors thank Duzce University for its financial support, and express special thanks to PhD Candidate Melahat Sevgul Bakay from Biomedical Engineering Department at Duzce University for her valuable suggestions in the prediction stages of the paper. WOS:000596172500006 2-s2.0-85094937244
- Published
- 2021
10. Geographic clusters of objectively measured physical activity and the characteristics of their built environment in a Swiss urban area
- Author
-
Juan R. Vallarta-Robledo, Stéphane Joost, Marco André Vieira Ruas, Cédric Gubelmann, Peter Vollenweider, Pedro Marques-Vidal, and Idris Guessous
- Subjects
Adult ,Male ,obesity ,Spatial Analysis ,Multidisciplinary ,Urban Population ,attributes ,Walking ,Middle Aged ,outcomes ,Cross-Sectional Studies ,inactivity ,Socioeconomic Factors ,Residence Characteristics ,regression-models ,impact ,Humans ,Female ,Healthy Lifestyle ,Longitudinal Studies ,Built Environment ,Aged ,Follow-Up Studies ,Switzerland - Abstract
Introduction, Evidence suggests that the built environment can influence the intensity of physical activity. However, despite the importance of the geographic context, most of the studies do not consider the spatial framework of this association. We aimed to assess individual spatial dependence of objectively measured moderate and vigorous physical activity (MVPA) and describe the characteristics of the built environment among spatial clusters of MVPA., Methods, Cross-sectional data from the second follow-up (2014-2017) of CoLaus vertical bar PsyCoLaus, a longitudinal population-based study of the Lausanne area (Switzerland), was used to objectively measure MVPA using accelerometers. Local Moran's I was used to assess the spatial dependence of MVPA and detect geographic clusters of low and high MVPA. Additionally, the characteristics of the built environment observed in the clusters based on raw MVPA and MVPA adjusted for socioeconomic and demographic factors were compared., Results, Data from 1,889 participants (median age 63, 55% women) were used. The geographic distribution of MVPA and the characteristics of the built environment among clusters were similar for raw and adjusted MVPA. In the adjusted model, we found a low concentration of individuals within spatial clusters of high MVPA (median: 38.5mins; 3% of the studied population) and low MVPA (median: 10.9 mins; 2% of the studied population). Yet, clear differences were found in both models between clusters regarding the built environment; high MVPA clusters were located in areas where specific compositions of the built environment favor physical activity., Conclusions, Our results suggest the built environment may influence local spatial patterns of MVPA independently of socioeconomic and demographic factors. Interventions in the built environment should be considered to promote physically active behaviors in urban areas.
- Published
- 2022
11. Association of income and education with fecundability in a North American preconception cohort
- Author
-
Ellen M. Mikkelsen, Elizabeth E. Hatch, Kenneth J. Rothman, Nina L. Schrager, Amelia K. Wesselink, Tanran R. Wang, Lauren A. Wise, and Renée Boynton-Jarrett
- Subjects
Adult ,Epidemiology ,UNITED-STATES ,TRANSITIONS ,UNINTENDED PREGNANCY ,WHITE WOMEN ,BEHAVIORS ,01 natural sciences ,Article ,Education ,INFERTILITY ,03 medical and health sciences ,Social determinants of health ,0302 clinical medicine ,REGRESSION-MODELS ,DISPARITIES ,Pregnancy ,Surveys and Questionnaires ,Humans ,Medicine ,Prospective Studies ,030212 general & internal medicine ,0101 mathematics ,Prospective cohort study ,Socioeconomic status ,Time to pregnancy ,SELECTION BIAS ,Fecundability ,business.industry ,PREGNANCY INTENTIONS ,010102 general mathematics ,Confounding ,medicine.disease ,United States ,Confidence interval ,Time-to-Pregnancy ,Fertility ,Reproductive Health ,Cohort ,Income ,Educational Status ,Household income ,Female ,business ,Maternal Age ,Demography ,Female pregnancy - Abstract
Purpose The purpose of this study is to evaluate socioeconomic determinants of fecundability. Methods Among 8654 female pregnancy planners from Pregnancy Study Online, a North American prospective cohort study (2013–2019), we examined associations between socioeconomic status and fecundability (the per-cycle probability of conception). Information on income and education was collected via baseline questionnaires. Bimonthly follow-up questionnaires were used to ascertain pregnancy status. We estimated fecundability ratios (FRs) and 95% confidence intervals (CIs) using proportional probabilities regression, controlling for potential confounders. Results Relative to an annual household income of greater than or equal to $150,000, adjusted FRs were 0.91 (95% CI: 0.83–1.01) for less than $50,000, 0.99 (95% CI: 0.92–1.07) for $50,000–$99,000, and 1.09 (95% CI: 1.01–1.18) for $100,000–$149,000. FRs for less than 12, 13–15, and 16 years of education, relative to greater than or equal to 17 years, were 0.90 (95% CI: 0.76–1.08), 0.84 (95% CI: 0.78–0.91), and 0.89 (95% CI: 0.84–0.95), respectively. Slightly stronger associations for income and education were seen among older women. Conclusions Lower levels of education and income were associated with modestly reduced fecundability. These results demonstrate the presence of socioeconomic disparities in fecundability.
- Published
- 2020
12. Infectious diseases epidemiology, quantitative methodology, and clinical research in the midst of the COVID-19 pandemic : perspective from a European country
- Author
-
Geert Molenberghs, Marc Buyse, Steven Abrams, Ariel Alonso Abad, Catherine Legrand, Pierre Van Damme, Stefanie De Buyser, Herman Goossens, Koen Pepermans, Sereina A. Herzog, Thomas Neyens, Ingrid Van Keilegom, Niko Speybroeck, Geert Verbeke, Philippe Beutels, Christel Faes, Heidi Theeten, Frank Hulstaert, Niel Hens, and UCL - SSS/IRSS - Institut de recherche santé et société
- Subjects
Immunity, Herd ,Economic growth ,Biomedical Research ,Time Factors ,IMPACT ,Psychological intervention ,CORONAVIRUS ,Antiviral therapy ,Public opinion ,Factorial designs ,REGRESSION-MODELS ,0302 clinical medicine ,COVID-19 Testing ,Cause of Death ,Pandemic ,Medicine and Health Sciences ,Prevalence ,Medicine ,Pharmacology (medical) ,030212 general & internal medicine ,Nowcasting ,Randomized Controlled Trials as Topic ,Pragmatic trials ,Mathematical epidemiology ,Pharmacology. Therapy ,Age Factors ,General Medicine ,Infection fatality rate ,Europe ,Diagnostic testing ,INFLUENZA ,Seasons ,0305 other medical science ,COVID-19 Vaccines ,Drug Industry ,TRANSMISSION ,Endpoint Determination ,IMMUNITY ,ERRORS ,Biostatistics ,Mathematical modelling of infectious disease ,Article ,Vaccine development ,03 medical and health sciences ,Sex Factors ,Drug Development ,Humans ,Platform trials ,Mortality ,Health communication ,Pandemics ,Estimation ,030505 public health ,Non-pharmaceutical intervention ,business.industry ,SARS-CoV-2 ,COVID-19 ,Models, Theoretical ,Clinical trial ,Health Communication ,Public Opinion ,Communicable Disease Control ,Mathematical modeling ,Data sharing ,Human medicine ,business ,Epidemiologic Methods - Abstract
Starting from historic reflections, the current SARS-CoV-2 induced COVID-19 pandemic is examined from various perspectives, in terms of what it implies for the implementation of non-pharmaceutical interventions, the modeling and monitoring of the epidemic, the development of early-warning systems, the study of mortality, prevalence estimation, diagnostic and serological testing, vaccine development, and ultimately clinical trials. Emphasis is placed on how the pandemic had led to unprecedented speed in methodological and clinical development, the pitfalls thereof, but also the opportunities that it engenders for national and international collaboration, and how it has simplified and sped up procedures. We also study the impact of the pandemic on clinical trials in other indications. We note that it has placed biostatistics, epidemiology, virology, infectiology, and vaccinology, and related fields in the spotlight in an unprecedented way, implying great opportunities, but also the need to communicate effectively, often amidst controversy. Buyse, M (corresponding author), Hasselt Univ, Interuniv Inst Biostat & Stat Bioinformat, Data Sci Inst, Martelarenlaan 42, Hasselt, Belgium. marc.buyse@iddi.com
- Published
- 2020
13. Forecasting China's wastewater discharge using dynamic factors and mixed-frequency data
- Author
-
Meng Han, Zhanlei Lv, Lili Ding, Wei Wang, and Xin Zhao
- Subjects
China ,Mixed frequency ,010504 meteorology & atmospheric sciences ,IMPROVE ,Health, Toxicology and Mutagenesis ,010501 environmental sciences ,Wastewater ,MIDAS regression ,Toxicology ,01 natural sciences ,Gross domestic product ,SUSTAINABILITY ,REGRESSION-MODELS ,POLLUTION ,Benchmark (surveying) ,Forecast combination ,Wastewater discharge ,EMISSIONS ,0105 earth and related environmental sciences ,ECONOMIC-GROWTH ,Environmental engineering ,Sampling (statistics) ,Regression analysis ,General Medicine ,Models, Theoretical ,PERFORMANCE ,NETWORKS ,SCARCITY ,Environmental science ,Sewage treatment ,VOLATILITY ,Forecasting - Abstract
Forecasting wastewater discharge is the basis for wastewater treatment and policy formulation. This paper proposes a novel mixed-data sampling regression model, i.e., combination-MIDAS model to forecast quarterly wastewater emissions in China based on dynamic factors at different frequencies. The results show that a significant auto-correlation for wastewater emissions exists and that water consumption per ten thousand gross domestic product is the best predictor of wastewater emissions. The forecast performances of the combination-MIDAS models are robust and better than those of the benchmark models. Therefore, the combination-MIDAS models can better capture the characteristics of wastewater emissions, suggesting that the proposed method is a good method to deal with model misspecification and uncertainty for the control and management of wastewater discharge in China. (C) 2019 Elsevier Ltd. All rights reserved.
- Published
- 2019
14. Using ecological propensity score to adjust for missing confounders in small area studies
- Author
-
Yingbo Wang, Sylvia Richardson, Marta Blangiardo, Anna Hansell, Monica Pirani, Medical Research Council, and Medical Research Council (MRC)
- Subjects
Statistics and Probability ,MULTIPLE-IMPUTATION ,Propensity score ,Statistics & Probability ,Missing data ,Coronary Disease ,Biostatistics ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,REGRESSION-MODELS ,Patient Admission ,VALIDATION DATA ,Observational study ,Air Pollution ,London ,Humans ,Computer Simulation ,030212 general & internal medicine ,0101 mathematics ,Spatial analysis ,Small-Area Analysis ,0604 Genetics ,Science & Technology ,Spatial statistics ,Ecology ,Confounding ,0104 Statistics ,Regression analysis ,General Medicine ,AIR-POLLUTION ,Articles ,LONG-TERM EXPOSURE ,3. Good health ,Data Interpretation, Statistical ,Propensity score matching ,Physical Sciences ,Mathematical & Computational Biology ,Environmental epidemiology ,Statistics, Probability and Uncertainty ,Epidemiologic Methods ,Life Sciences & Biomedicine ,Hierarchical model ,Mathematics - Abstract
Summary Small area ecological studies are commonly used in epidemiology to assess the impact of area level risk factors on health outcomes when data are only available in an aggregated form. However, the resulting estimates are often biased due to unmeasured confounders, which typically are not available from the standard administrative registries used for these studies. Extra information on confounders can be provided through external data sets such as surveys or cohorts, where the data are available at the individual level rather than at the area level; however, such data typically lack the geographical coverage of administrative registries. We develop a framework of analysis which combines ecological and individual level data from different sources to provide an adjusted estimate of area level risk factors which is less biased. Our method (i) summarizes all available individual level confounders into an area level scalar variable, which we call ecological propensity score (EPS), (ii) implements a hierarchical structured approach to impute the values of EPS whenever they are missing, and (iii) includes the estimated and imputed EPS into the ecological regression linking the risk factors to the health outcome. Through a simulation study, we show that integrating individual level data into small area analyses via EPS is a promising method to reduce the bias intrinsic in ecological studies due to unmeasured confounders; we also apply the method to a real case study to evaluate the effect of air pollution on coronary heart disease hospital admissions in Greater London.
- Published
- 2017
15. Efficacy of Reduced-Intensity Chemotherapy With Oxaliplatin and Capecitabine on Quality of Life and Cancer Control Among Older and Frail Patients With Advanced Gastroesophageal Cancer
- Author
-
Pei Loo Ow, Angel Garcia, Lesley Samuel, Rajarshi Roy, Adam McGeoch, D. Fyfe, W Saku, Simon Aird Grumett, Jonathan Nicoll, Jo Dent, Tom Samuel Waddell, Jo Webster, Christine Allmark, Tania Tillett, Colin Askill, Justin S. Waters, C. Handforth, Erica Beaumont, Vallipuram Vigneswaran, Sharon Ruddock, Nick Wadd, Syed Zubair, Kinnari Patel, Vanessa Potter, Daniel Propper, Olwyn Williams, Marc Jones, Kamalnayan Guptal, Peter Hall, Gareth Griffiths, Joseph Mano, Juan W. Valle, Sheela Rao, David A Cairns, Go Trial Investigators, Eszter Katona, Nick Maisey, Chris Twelves, Daniel Swinson, Nicholas Reed, Heike I. Grabsch, Joanne Askey, Jonathan Wadsley, Tom Roques, Sue Cheeseman, Stephen Falk, Louise Medley, Arshad Jamil, Emma Cattell, Victori Kunene, Matthew R. Sydes, Charles Candish, Claire Hobbs, Rebecca Herbertson, Jo Parkinson, Nicholas S. Reed, Louise Brook, Zuzana Stokes, Mohammed Khan, Ann Crossley, Elin Jones, George Bozas, Sebastian Cummins, Anirban Chatterjee, Michael Bennet, Helen Marshall, Pavel Bezecny, David Sherriff, Matthew T. Seymour, Lauren Gorf, Galina Velikova, Jean Gall, Kamposioras Konstantinos-Velios, Sally Clive, Eleanor James, Fiona Collinson, Dunca Wilkins, Simon Lord, Julia Brown, Serena Hilman, A. Robinson, Richard Ellis, Alaaeldin Shablak, Russell D Petty, Sherif Raouf, Helen Howard, RS: GROW - R2 - Basic and Translational Cancer Biology, and Pathologie
- Subjects
Cancer Research ,medicine.medical_specialty ,Randomization ,EUROPEAN-ORGANIZATION ,law.invention ,II TRIAL ,Capecitabine ,ESOPHAGOGASTRIC JUNCTION ,03 medical and health sciences ,REGRESSION-MODELS ,0302 clinical medicine ,Randomized controlled trial ,Quality of life ,law ,Internal medicine ,medicine ,030212 general & internal medicine ,ELDERLY-PATIENTS ,ADVANCED GASTRIC-CANCER ,business.industry ,Hazard ratio ,ADENOCARCINOMA ,Combination chemotherapy ,Chemotherapy regimen ,FLUOROURACIL ,Oxaliplatin ,METASTATIC COLORECTAL-CANCER ,Oncology ,030220 oncology & carcinogenesis ,1ST-LINE THERAPY ,business ,medicine.drug - Abstract
Importance: Older and/or frail patients are underrepresented in landmark cancer trials. Tailored research is needed to address this evidence gap. Objective: The GO2 randomized clinical trial sought to optimize chemotherapy dosing in older and/or frail patients with advanced gastroesophageal cancer, and explored baseline geriatric assessment (GA) as a tool for treatment decision-making. Design, Setting, and Participants: This multicenter, noninferiority, open-label randomized trial took place at oncology clinics in the United Kingdom with nurse-led geriatric health assessment. Patients were recruited for whom full-dose combination chemotherapy was considered unsuitable because of advanced age and/or frailty. Interventions: There were 2 randomizations that were performed: CHEMO-INTENSITY compared oxaliplatin/capecitabine at Level A (oxaliplatin 130 mg/m 2on day 1, capecitabine 625 mg/m 2twice daily on days 1-21, on a 21-day cycle), Level B (doses 0.8 times A), or Level C (doses 0.6 times A). Alternatively, if the patient and clinician agreed the indication for chemotherapy was uncertain, the patient could instead enter CHEMO-BSC, comparing Level C vs best supportive care. Main Outcomes and Measures: First, broad noninferiority of the lower doses vs reference (Level A) was assessed using a permissive boundary of 34 days reduction in progression-free survival (PFS) (hazard ratio, HR = 1.34), selected as acceptable by a forum of patients and clinicians. Then, the patient experience was compared using Overall Treatment Utility (OTU), which combines efficacy, toxic effects, quality of life, and patient value/acceptability. For CHEMO-BSC, the main outcome measure was overall survival. Results: A total of 514 patients entered CHEMO-INTENSITY, of whom 385 (75%) were men and 299 (58%) were severely frail, with median age 76 years. Noninferior PFS was confirmed for Levels B vs A (HR = 1.09 [95% CI, 0.89-1.32]) and C vs A (HR = 1.10 [95% CI, 0.90-1.33]). Level C produced less toxic effects and better OTU than A or B. No subgroup benefited from higher doses: Level C produced better OTU even in younger or less frail patients. A total of 45 patients entered the CHEMO-BSC randomization: overall survival was nonsignificantly longer with chemotherapy: median 6.1 vs 3.0 months (HR = 0.69 [95% CI, 0.32-1.48], P =.34). In multivariate analysis in 522 patients with all variables available, baseline frailty, quality of life, and neutrophil to lymphocyte ratio were independently associated with OTU, and can be combined in a model to estimate the probability of different outcomes. Conclusions and Relevance: This phase 3 randomized clinical trial found that reduced-intensity chemotherapy provided a better patient experience without significantly compromising cancer control and should be considered for older and/or frail patients. Baseline geriatric assessment can help predict the utility of chemotherapy but did not identify a group benefiting from higher-dose treatment. Trial Registration: isrctn.org Identifier: ISRCTN44687907.
- Published
- 2021
16. Econometric Analysis of Panel Data Models with Multifactor Error Structures
- Author
-
Franz Palm, Jean-Pierre Urbain, Hande Karabiyik, QE Econometrics, and RS: GSBE Theme Data-Driven Decision-Making
- Subjects
Shrinkage estimator ,Economics and Econometrics ,UNIT-ROOT TESTS ,nonstationary panels ,COINTEGRATION ,Maximum likelihood ,TIME-SERIES ,panel data ,REGRESSION-MODELS ,0502 economics and business ,Econometrics ,cross-sectional dependence ,050207 economics ,stationary panels ,050205 econometrics ,Mathematics ,principal components ,common correlated effects ,Cointegration ,05 social sciences ,factor-augmented panel regression ,Econometric analysis ,Regression analysis ,CCE ESTIMATION ,MAXIMUM-LIKELIHOOD-ESTIMATION ,CROSS-SECTIONAL DEPENDENCE ,LARGE NUMBER ,Principal component analysis ,BAYESIAN SHRINKAGE ,SLOPE HOMOGENEITY ,Panel data - Abstract
Economic panel data often exhibit cross-sectional dependence, even after conditioning on appropriate explanatory variables. Two approaches to modeling cross-sectional dependence in economic panel data are often used: the spatial dependence approach, which explains cross-sectional dependence in terms of distance among units, and the residual multifactor approach, which explains cross-sectional dependence by common factors that affect individuals to a different extent. This article reviews the theory on estimation and statistical inference for stationary and nonstationary panel data with cross-sectional dependence, particularly for models with a multifactor error structure. Tests and diagnostics for testing for unit roots, slope homogeneity, cointegration, and the number of factors are provided. We discuss issues such as estimating common factors, dealing with parameter plethora in practice, testing for structural stability and nonlinearity, and dealing with model and parameter uncertainty. Finally, we address issues related to the use of these economic panel models.
- Published
- 2019
17. Variable Prioritization in Nonlinear Black Box Methods: A Genetic Association Case Study
- Author
-
Mike West, Lorin Crawford, Daniel E. Runcie, and Seth Flaxman
- Subjects
0301 basic medicine ,SELECTION ,FOS: Computer and information sciences ,Computer science ,Gaussian processes ,computer.software_genre ,01 natural sciences ,Quantitative Biology - Quantitative Methods ,010104 statistics & probability ,REGRESSION-MODELS ,Statistics - Machine Learning ,Black box ,variable prioritization ,POPULATION ,Quantitative Methods (q-bio.QM) ,education.field_of_study ,QUANTITATIVE TRAIT LOCI ,0104 Statistics ,Linear model ,Regression analysis ,stat.ML ,BODY-WEIGHT ,stat.ME ,Modeling and Simulation ,Physical Sciences ,statistical genetics ,Statistics, Probability and Uncertainty ,Statistics and Probability ,Statistics & Probability ,Population ,Bayesian probability ,Feature selection ,Machine Learning (stat.ML) ,Machine learning ,Statistics - Applications ,Generalized linear mixed model ,Article ,LINEAR MIXED MODELS ,Methodology (stat.ME) ,03 medical and health sciences ,1403 Econometrics ,Nonlinear regression ,EPISTASIS ,Applications (stat.AP) ,0101 mathematics ,GENOME-WIDE ASSOCIATION ,education ,stat.AP ,Statistics - Methodology ,Science & Technology ,q-bio.QM ,business.industry ,COMPLEX TRAITS ,centrality measures ,Nonparametric regression ,030104 developmental biology ,FOS: Biological sciences ,genome-wide association studies ,STATISTICAL-METHODS ,Artificial intelligence ,business ,computer ,Mathematics - Abstract
The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the "RelATive cEntrality" (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Bayesian Gaussian process regression, but the methodological innovations apply to other "black box" methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and two real data association mapping studies, we show that applying RATE enables an explanation for this improved performance., 28 pages, 5 figures, 1 tables; Supplementary Material
- Published
- 2018
18. An approximate marginal logistic distribution for the analysis of longitudinal ordinal data
- Author
-
Ernst Wit, Nazanin Nooraee, Johan Ormel, Fentaw Abegaz, Edwin R. van den Heuvel, Stochastic Operations Research, Statistics, and Stochastic Studies and Statistics
- Subjects
Statistics and Probability ,Ordinal data ,CATEGORICAL-DATA ,Visual Analog Scale ,Flexible correlation matrix ,Marginal model ,Latent variable ,COMPUTATION ,01 natural sciences ,Ordinal regression ,Sensitivity and Specificity ,General Biochemistry, Genetics and Molecular Biology ,010104 statistics & probability ,REGRESSION-MODELS ,Statistics ,Econometrics ,0501 psychology and cognitive sciences ,Computer Simulation ,Longitudinal Studies ,0101 mathematics ,Categorical variable ,Latent variable models ,Mathematics ,Likelihood Functions ,Population-averaged (marginal) models ,General Immunology and Microbiology ,Logistic distribution ,BINARY DATA ,Applied Mathematics ,05 social sciences ,t-distribution ,Reproducibility of Results ,Regression analysis ,General Medicine ,MULTIVARIATE-T-PROBABILITIES ,LIFE ,Logistic Models ,Data Interpretation, Statistical ,BIVARIATE ,Generalized estimating equations ,Multivariate logistic distribution ,INFERENCE ,General Agricultural and Biological Sciences ,Elliptical distribution ,Algorithms ,050104 developmental & child psychology ,Maximum likelihood ,RESPONSES - Abstract
Subject-specific and marginal models have been developed for the analysis of longitudinal ordinal data. Subject-specific models often lack a population-average interpretation of the model parameters due to the conditional formulation of random intercepts and slopes. Marginal models frequently lack an underlying distribution for ordinal data, in particular when generalized estimating equations are applied. To overcome these issues, latent variable models underneath the ordinal outcomes with a multivariate logistic distribution can be applied. In this article, we extend the work of O'Brien and Dunson (2004), who studied the multivariate t-distribution with marginal logistic distributions. We use maximum likelihood, instead of a Bayesian approach, and incorporated covariates in the correlation structure, in addition to the mean model. We compared our method with GEE and demonstrated that it performs better than GEE with respect to the fixed effect parameter estimation when the latent variables have an approximately elliptical distribution, and at least as good as GEE for other types of latent variable distributions.
- Published
- 2016
19. Haem iron intake and risk of lung cancer in the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort
- Author
-
Renée T. Fortner, José María Huerta, Roel Vermeulen, Anouar Fanidi, Franco Berrino, H. Bas Bueno-de-Mesquita, Miguel Rodríguez-Barranco, Therese Haugdahl Nøst, Carlo La Vecchia, Rudolf Kaaks, Gianluca Severi, Anne Tjønneland, Eva Ardanaz, Elio Riboli, Antonia Trichopoulou, Kjell Grankvist, Aurélie Affret, Magdalena Stepien, Antonio Agudo, Nerea Larrañaga, Mikael Johansson, David C. Muller, Mattias Johansson, Vittorio Krogh, Elisabete Weiderpass, Hans Brunnström, Domenico Palli, Heinz Freisling, Melissa A. Merritt, Paula Jakszyn, Isabel Drake, Ruth C. Travis, Heiner Boeing, Amanda J. Cross, Marie-Christine Boutron-Ruault, Fulvio Ricceri, Salvatore Panico, Torkjel M. Sandanger, Anastasia Kotanidou, Christina C. Dahm, Julia Whitman, Rosario Tumino, Petra H.M. Peeters, Louise Hansen, Kim Overvad, Heather Ward, José Ramón Quirós, One Health Chemisch, dIRAS RA-2, Department of Medical and Clinical Genetics, University of Helsinki, Faculty of Medicine, Commission of the European Communities, Ward, Heather A, Whitman, Julia, Muller, David C, Johansson, Mattia, Jakszyn, Paula, Weiderpass, Elisabete, Palli, Domenico, Fanidi, Anouar, Vermeulen, Roel, Tjønneland, Anne, Hansen, Louise, Dahm, Christina C, Overvad, Kim, Severi, Gianluca, Boutron-Ruault, Marie-Christine, Affret, Aurélie, Kaaks, Rudolf, Fortner, Renee, Boeing, Heiner, Trichopoulou, Antonia, La Vecchia, Carlo, Kotanidou, Anastasia, Berrino, Franco, Krogh, Vittorio, Tumino, Rosario, Ricceri, Fulvio, Panico, Salvatore, Bueno-de-Mesquita, H Ba, Peeters, Petra H, Nøst, Therese Haugdahl, Sandanger, Torkjel M, Quirós, Jose Ramón, Agudo, Antonio, Rodríguez-Barranco, Miguel, Larrañaga, Nerea, Huerta, Jose Maria, Ardanaz, Eva, Drake, Isabel, Brunnström, Han, Johansson, Mikael, Grankvist, Kjell, Travis, Ruth C, Freisling, Heinz, Stepien, Magdalena, Merritt, Melissa A, Riboli, Elio, and Cross, Amanda J
- Subjects
0301 basic medicine ,Male ,Lung Neoplasms ,1106 Human Movement and Sports Sciences ,Physiology ,Medicine (miscellaneous) ,MEAT CONSUMPTION ,VDP::Medical disciplines: 700::Health sciences: 800::Nutrition: 811 ,Haem iron ,Cohort Studies ,ZINC ,0302 clinical medicine ,REGRESSION-MODELS ,WORLD ,Risk Factors ,non-haem iron ,Cigarette smokers ,Fumadors ,Prospective Studies ,Prospective cohort study ,2. Zero hunger ,Nutrition and Dietetics ,Confounding ,Hazard ratio ,digestive, oral, and skin physiology ,MUTAGENS ,cohort ,Middle Aged ,3. Good health ,European Prospective Investigation into Cancer and Nutrition ,Europe ,Red meat ,LIFE-STYLE ,Female ,dietary iron ,3143 Nutrition ,Lung cancer ,Life Sciences & Biomedicine ,Iron, Dietary ,MINERAL INTAKE ,SMOKERS ,030209 endocrinology & metabolism ,Heme ,Risk Assessment ,Article ,DIET ,VDP::Medisinske Fag: 700::Helsefag: 800::Ernæring: 811 ,03 medical and health sciences ,medicine ,Journal Article ,Humans ,cotinine ,Proportional Hazards Models ,haem iron ,Science & Technology ,030109 nutrition & dietetics ,Nutrition & Dietetics ,VDP::Medical disciplines: 700::Clinical medical disciplines: 750::Oncology: 762 ,business.industry ,Proportional hazards model ,Case-control study ,medicine.disease ,PRODUCTS ,VDP::Medisinske Fag: 700::Klinisk medisinske fag: 750::Onkologi: 762 ,lung cancer ,Nutrition Assessment ,Risk factors ,Case-Control Studies ,Càncer de pulmó ,1111 Nutrition and Dietetics ,3111 Biomedicine ,business ,EPIC ,0908 Food Sciences - Abstract
Accepted manuscript version. Published version available at https://doi.org/10.1038/s41430-018-0271-2. Background: Epidemiological studies suggest that haem iron, which is found predominantly in red meat and increases endogenous formation of carcinogenic N-nitroso compounds, may be positively associated with lung cancer. The objective was to examine the relationship between haem iron intake and lung cancer risk using detailed smoking history data and serum cotinine to control for potential confounding. Methods: In the European Prospective Investigation into Cancer and Nutrition (EPIC), 416,746 individuals from 10 countries completed demographic and dietary questionnaires at recruitment. Cox proportional hazards models were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for incident lung cancer (n = 3731) risk relative to haem iron, non-haem iron, and total dietary iron intake. A corresponding analysis was conducted among a nested subset of 800 lung cancer cases and 1489 matched controls for whom serum cotinine was available. Results: Haem iron was associated with lung cancer risk, including after adjustment for details of smoking history (time since quitting, number of cigarettes per day): as a continuous variable (HR per 0.3 mg/1000 kcal 1.03, 95% CI 1.00–1.07), and in the highest versus lowest quintile (HR 1.16, 95% CI 1.02–1.32; trend across quintiles: P = 0.035). In contrast, non-haem iron intake was related inversely with lung cancer risk; however, this association attenuated after adjustment for smoking history. Additional adjustment for serum cotinine did not considerably alter the associations detected in the nested case–control subset. Conclusions: Greater haem iron intake may be modestly associated with lung cancer risk.
- Published
- 2018
20. Probabilistic partial least squares model: Identifiability, estimation and application
- Author
-
Caroline Hayward, Hae-Won Uh, Jeanine J. Houwing-Duistermaat, Geurt Jongbloed, Said el Bouhaddani, el Bouhaddani S., Uh H.-W., Hayward C., Jongbloed G., and Houwing-Duistermaat J.
- Subjects
0301 basic medicine ,Statistics and Probability ,FOS: Computer and information sciences ,INFORMATION ,Probabilistic partial least squares ,Generalized least squares ,01 natural sciences ,Methodology (stat.ME) ,010104 statistics & probability ,03 medical and health sciences ,REGRESSION-MODELS ,Inference ,Statistics ,Partial least squares regression ,Expectation–maximization algorithm ,TOOL ,Identifiability ,0101 mathematics ,Total least squares ,MAXIMUM-LIKELIHOOD ,EM algorithm ,METAANALYSIS ,Statistics - Methodology ,Mathematics ,Numerical Analysis ,Probabilistic logic ,Statistical model ,030104 developmental biology ,Non-linear least squares ,Dimension reduction ,Probability and Uncertainty ,Statistics, Probability and Uncertainty ,Algorithm - Abstract
With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome-dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts., Comment: Accepted in Journal of Multivariate Analysis
- Published
- 2018
21. Lessons learned from IDeAl - 33 recommendations from the IDeAl-net about design and analysis of small population clinical trials
- Author
-
Ralf-Dieter Hilgers, Malgorzata Bogdan, Carl-Fredrik Burman, Holger Dette, Mats Karlsson, Franz König, Christoph Male, France Mentré, Geert Molenberghs, and Stephen Senn
- Subjects
Research design ,Work package ,Computer science ,lcsh:Medicine ,Review ,Research & Experimental Medicine ,01 natural sciences ,010104 statistics & probability ,REGRESSION-MODELS ,0302 clinical medicine ,Pharmacology (medical) ,030212 general & internal medicine ,Genetics (clinical) ,Genetics & Heredity ,Clinical Trials as Topic ,education.field_of_study ,Management science ,RESEARCH-AND-DEVELOPMENT ,Statistical methodology ,MIXED-EFFECTS MODELS ,Small population clinical trials ,General Medicine ,3. Good health ,Medicine, Research & Experimental ,Work (electrical) ,Research Design ,Statistical analysis ,Data Interpretation, Statistical ,SIMULATION ,PARAMETER UNCERTAINTY ,Statistical design ,ADAPTIVE DESIGNS ,Life Sciences & Biomedicine ,Relation (database) ,Population ,03 medical and health sciences ,Rare Diseases ,TEST DECISIONS ,Humans ,DISTRIBUTIONS ,0101 mathematics ,education ,SELECTION BIAS ,Science & Technology ,Ideal (set theory) ,lcsh:R ,Clinical trial ,DRUG RESEARCH ,Rare disease - Abstract
BACKGROUND: IDeAl (Integrated designs and analysis of small population clinical trials) is an EU funded project developing new statistical design and analysis methodologies for clinical trials in small population groups. Here we provide an overview of IDeAl findings and give recommendations to applied researchers. METHOD: The description of the findings is broken down by the nine scientific IDeAl work packages and summarizes results from the project's more than 60 publications to date in peer reviewed journals. In addition, we applied text mining to evaluate the publications and the IDeAl work packages' output in relation to the design and analysis terms derived from in the IRDiRC task force report on small population clinical trials. RESULTS: The results are summarized, describing the developments from an applied viewpoint. The main result presented here are 33 practical recommendations drawn from the work, giving researchers a comprehensive guidance to the improved methodology. In particular, the findings will help design and analyse efficient clinical trials in rare diseases with limited number of patients available. We developed a network representation relating the hot topics developed by the IRDiRC task force on small population clinical trials to IDeAl's work as well as relating important methodologies by IDeAl's definition necessary to consider in design and analysis of small-population clinical trials. These network representation establish a new perspective on design and analysis of small-population clinical trials. CONCLUSION: IDeAl has provided a huge number of options to refine the statistical methodology for small-population clinical trials from various perspectives. A total of 33 recommendations developed and related to the work packages help the researcher to design small population clinical trial. The route to improvements is displayed in IDeAl-network representing important statistical methodological skills necessary to design and analysis of small-population clinical trials. The methods are ready for use. ispartof: ORPHANET JOURNAL OF RARE DISEASES vol:13 issue:1 ispartof: location:England status: published
- Published
- 2018
22. Prospective association between tobacco smoking and death by suicide: a competing risks hazard analysis in a large twin cohort with 35-year follow-up
- Author
-
Tellervo Korhonen, A. E. Evins, Taru H. Kinnunen, Jaakko Kaprio, Clinicum, Department of Public Health, Institute for Molecular Medicine Finland, University of Helsinki, Genetic Epidemiology, and School of Medicine / Public Health
- Subjects
Male ,Nicotine dependence ,Poison control ,prospective cohort studies ,Suicide prevention ,tobacco ,Tobacco smoke ,3124 Neurology and psychiatry ,0302 clinical medicine ,REGRESSION-MODELS ,030212 general & internal medicine ,Registries ,Prospective cohort study ,Applied Psychology ,POPULATION ,Finland ,education.field_of_study ,Incidence ,Hazard ratio ,PSYCHIATRIC-DISORDERS ,Middle Aged ,3142 Public health care science, environmental and occupational health ,3. Good health ,Psychiatry and Mental health ,Suicide ,Cohort ,Female ,Medical emergency ,BEHAVIOR ,Adult ,Risk ,Adolescent ,515 Psychology ,Population ,NATIONAL EPIDEMIOLOGIC SURVEY ,COMPLETED SUICIDE ,smoking ,Cigarette Smoking ,03 medical and health sciences ,Young Adult ,medicine ,Humans ,education ,nicotine dependence ,LIFE SATISFACTION ,suicide ,METAANALYSIS ,Aged ,business.industry ,Odds ratio ,Original Articles ,medicine.disease ,CIGARETTE-SMOKING ,business ,NICOTINE WITHDRAWAL ,030217 neurology & neurosurgery ,Demography - Abstract
The relationship between smoking and suicide remains controversial. A total of 16 282 twin pairs born before 1958 in Finland and alive in 1974 were queried with detailed health and smoking questionnaires in 1975 and 1981, with response rates of 89% and 84%. Smoking status and dose, marital, employment, and socio-economic status, and indicators of psychiatric and somatic illness were assessed at both time points. Emergent psychiatric and medical illness and vital status, including suicide determined by forensic autopsy, were evaluated over 35-year follow-up through government registries. The association between smoking and suicide was determined in competing risks hazard models. In twin pairs discordant for smoking and suicide, the prospective association between smoking and suicide was determined using a matched case–control design. Smokers had a higher cumulative suicide incidence than former or never smokers. Heavy smokers had significantly higher suicide risk [hazard ratio (HR) 3.47, 95% confidence interval (CI) 2.31–5.22] than light smokers (HR 2.30, 95% CI 1.61–3.23) (p = 0.017). Compared with never smokers, smokers, but not former smokers, had increased suicide risk (HR 2.56, 95% CI 1.43–4.59), adjusting for depressive symptoms, alcohol and sedative–hypnotic use, and excluding those who developed serious somatic or psychiatric illness. In twin pairs discordant for smoking and suicide, suicide was more likely in smokers [odds ratio (OR) 6.0, 95% CI 2.06–23.8]. Adults who smoked tobacco were more likely to die by suicide, with a large, dose-dependent effect. This effect remained after consideration of many known predictors of suicide and shared familial effects, consistent with the hypothesis that exposure to tobacco smoke increases the risk of suicide., published version, peerReviewed
- Published
- 2017
23. Weight randomization test for the selection of the number of components in PLS models
- Author
-
Lutgarde M. C. Buydens, Lionel Blanchet, Thanh N. Tran, Jan Gerretzen, Nelson Lee Afanador, Ewa Szymańska, RS: NUTRIM - R4 - Gene-environment interaction, and RS: NUTRIM - R3 - Respiratory & Age-related Health
- Subjects
0301 basic medicine ,Computer science ,PARTIAL LEAST-SQUARES ,01 natural sciences ,Analytical Chemistry ,CHEMOMETRICS ,03 medical and health sciences ,REGRESSION-MODELS ,Resampling ,Linear regression ,Partial least squares regression ,partial least squares ,Null distribution ,number of components ,Preprocessor ,DISTRIBUTIONS ,MULTIVARIATE CALIBRATION ,CROSS-VALIDATION ,OPTIMIZATION ,Selection (genetic algorithm) ,Sequence ,SPECTROSCOPY ,ION MOBILITY SPECTROMETRY ,Applied Mathematics ,010401 analytical chemistry ,VARIABLE IMPORTANCE ,randomization test ,0104 chemical sciences ,030104 developmental biology ,Metric (mathematics) ,Algorithm - Abstract
The selection of the optimal number of components remains a difficult but essential task in partial least squares (PLS). Randomization tests have the advantage of being automatic and they make use of the entire dataset, in contrary with the widely used cross-validation approaches. Partial least squares modeling may include component(s) with a large amount of irrelevant data variation, and this might affect the model, depending on the assigned y-loading (which is the regression coefficient in the latent domain). This has recently been indicated by us in the basic sequence framework with respect to the underlying theory of the PLS algorithm and presented to the chemometrics society. We will show in this work that this irrelevant data variation is the root cause of the difficulty in current methods for selecting the optimal number of components. For randomization tests, PLS models with nonsignificant components may result in false positive tests because of the incorrect assumption that "the components enter the model in a natural order". In this work, we introduce a new randomization test, weight randomization test, selection of the optimal number of components in PLS in light of the underlying theory of the PLS algorithm. In the proposed method the null distribution is well characterized and efficiently determined taking into account a newly defined model quality metric: the number of consecutive non-significant components (CNC). We illustrate the effectiveness of weight randomization test in optimization of preprocessing as well as in classification models, where results are compared with the double cross-validation procedure for the latter. This is an important step towards the full automation of PLS model development and routine updates.
- Published
- 2017
24. Robust Designs in Generalized Linear Models: A Quantile Dispersion Graphs Approach
- Author
-
I. Das, M.L. Aggarwal, and Siuli Mukhopadhyay
- Subjects
Statistics and Probability ,Generalized linear model ,Statistics::Theory ,Mathematical optimization ,Family Of Link Functions ,Logistic Link ,Linear prediction ,Generalized linear mixed model ,Transformation ,Families ,Parameter Orthogonality ,Kriging ,Statistics ,Statistics::Methodology ,Statistical dispersion ,Mathematics ,Statistics::Applications ,Generalized additive model ,Logistic-Models ,Function (mathematics) ,Regression-Models ,Standardization ,Statistics::Computation ,Modeling and Simulation ,Prediction ,Quantile - Abstract
This article studies design selection for generalized linear models (GLMs) using the quantile dispersion graphs (QDGs) approach in the presence of misspecification in the link and/or linear predictor. The uncertainty in the linear predictor is represented by a unknown function and estimated using kriging. For addressing misspecified link functions, a generalized family of link functions is used. Numerical examples are shown to illustrate the proposed methodology.
- Published
- 2014
25. Efficient bootstrap with weakly dependent processes
- Author
-
Francesco Bravo and Federico Crudu
- Subjects
Statistics and Probability ,Statistics::Theory ,Heteroscedasticity ,alpha-mixing ,GENERALIZED-METHOD ,α-mixing ,symbols.namesake ,REGRESSION-MODELS ,mixing, Consumption CAPM, GEL, GMM, Hypothesis testing ,Econometrics ,Statistics::Methodology ,Applied mathematics ,MOMENTS ESTIMATORS ,HETEROSKEDASTICITY ,GMM ,EMPIRICAL LIKELIHOOD ,Statistical hypothesis testing ,Mathematics ,Consumption CAPM ,GEL ,Covariance matrix ,Applied Mathematics ,ASSET PRICING-MODELS ,jel:C12 ,Estimator ,jel:C13 ,jel:C58 ,Moment (mathematics) ,Computational Mathematics ,Nonlinear system ,Empirical likelihood ,Hypothesis testing ,Computational Theory and Mathematics ,Lagrange multiplier ,WILD BOOTSTRAP ,TESTS ,symbols ,COVARIANCE-MATRIX - Abstract
The efficient bootstrap methodology is developed for overidentified moment conditions models with weakly dependent observation. The resulting bootstrap procedure is shown to be asymptotically valid and can be used to approximate the distributions of t-statistics, the J-statistic for overidentifying restrictions, and Wald, Lagrange multiplier and distance statistics for nonlinear hypotheses. The asymptotic validity of the efficient bootstrap based on a computationally less demanding approximate k-step estimator is also shown. The finite sample performance of the proposed bootstrap is assessed using simulations in an intertemporal consumption based asset pricing model. (C) 2010 Elsevier B.V. All rights reserved.
- Published
- 2012
26. W-based versus latent variables spatial autoregressive models
- Author
-
An Liu, Henk Folmer, Johan H. L. Oud, Faculty of Spatial Sciences, and Urban and Regional Studies Institute
- Subjects
Social Sciences(all) ,General Social Sciences ,Regression analysis ,ECONOMETRICS ,Latent variable ,econometrics ,Latent class model ,Structural equation modeling ,Urban Economics ,specification ,REGRESSION-MODELS ,WEIGHTS MATRIX ,Environmental Science(all) ,MGS ,regression-models ,Statistics ,Spatial econometrics ,Spatial dependence ,Latent variable model ,SPECIFICATION ,Developmental Psychopathology ,Spatial analysis ,weights matrix ,General Environmental Science ,Mathematics - Abstract
Contains fulltext : 103199.pdf (Publisher’s version ) (Open Access) In this paper, we compare by means of Monte Carlo simulations two approaches to take spatial autocorrelation into account: the classical spatial autoregressive model and the structural equations model with latent variables. The former accounts for spatial dependence and spillover effects in georeferenced data by means of a spatial weights matrix W. The latter represents spatial dependence and spillover effects by means of a latent variable in the structural (regression) model while the observed spatially lagged variables are related to the latent spatial dependence variable in the measurement model. The simulation results based on Anselin's Columbus, Ohio, crime data set show that the misspecified latent variables approach slightly trails the correctly specified classical approach in terms of bias and root mean squared error of the coefficient estimators. 21 p.
- Published
- 2011
27. Leadership effectiveness and recorded sickness absence among nursing staff
- Subjects
REGRESSION-MODELS ,HEALTHY WORK ENVIRONMENTS ,PRODUCTIVITY ,LEAVE ,leadership flexibility ,healthcare ,situational leadership ,EMPLOYEES ,leadership effectiveness ,sickness absence ,COMMITMENT ,JOB-SATISFACTION ,BEHAVIOR - Abstract
Aim To investigate nurse managers' leadership behaviour in relation to the sickness absence records of nursing staff.Background Sickness absence is high in healthcare and interferes with nursing efficiency and quality. Nurse managers' leadership behaviour may be associated with nursing staff sickness absence.Method Six nurse managers completed the Leadership Effectiveness and Adaptability Description (LEAD) questionnaire, which assesses leadership behaviour in terms of leadership flexibility (i.e. the range of leadership styles) and effectiveness (i.e. using the leadership style that is appropriate for a given situation). LEAD scores were linked to the number of recorded days of sickness absence and both short (17 days) and long (>7 days) episodes of sickness absence in the nursing teams.Results Leadership flexibility of nurse managers was not associated with sickness absence among nurses. High leadership effectiveness was associated with fewer days and fewer short episodes of sickness absence. Leadership effectiveness was unrelated to the number of long episodes of sickness absence.Conclusion Effective nurse managers had less short-term sickness absence in their nursing teams.Implications for nursing management If these tentative cross-sectional associations are confirmed in longitudinal studies including more departments, then training effective leadership may improve the management of short-term sickness absence.
- Published
- 2011
28. Linking intronic polymorphism on the CHD1-Z gene with fitness correlates in Black-tailed GodwitsLimosa l. limosa
- Author
-
Marco van der Velde, Theunis Piersma, Christiaan Both, Oliver Haddrath, Jos C.E.W. Hooijmeijer, Julia Schroeder, Rosemarie Kentie, Allan J. Baker, Piersma group, and Both group
- Subjects
Population ,Zoology ,shorebirds ,Overdominance ,Biology ,LAPWING VANELLUS-VANELLUS ,REGRESSION-MODELS ,Polymorphism (computer science) ,Genetic linkage ,Allele ,education ,Ecology, Evolution, Behavior and Systematics ,Genetics ,education.field_of_study ,breeding plumage coloration ,PLUMAGE ,NON-RATITE BIRDS ,MARKED ANIMALS ,population structure ,DNA ,EGG SIZE ,biology.organism_classification ,CHICK SURVIVAL ,SEX IDENTIFICATION ,Plumage ,Godwit ,Animal Science and Zoology ,intronic polymorphism ,Limosa limosa ,PARENTAL QUALITY ,molecular sexing - Abstract
We show that variation in an intronic length polymorphism in the CHD1-Z gene in Black-tailed Godwits Limosa l. limosa is associated with fitness correlates. This is the second example of the CHDZ-1 gene being correlated with fitness, a previous study having established that Moorhens Gallinula chloropus carrying the rare Z* allele have reduced survival. In Godwits, however, carriers of the Z* allele (374 bp) fared better than those with the more frequent Z allele (378 bp) with respect to body mass, plumage ornamentation, reproductive parameters and habitat quality. The Z* allele was found in 14% of 251 adult birds from nature reserves, but was absent from 33 birds breeding in intensively managed agricultural lands. Males and females with the Z* allele had less extensive breeding plumage, and females had a higher body mass, bred earlier and had larger eggs. There were no significant differences in annual survival between birds with and without the Z* allele. DNA isolated from museum skins demonstrated that this polymorphism was present at low frequency in 1929. We speculate that strong asymmetrical overdominance may explain the low frequency of the Z* allele and that genetic linkage to causal genes might be an explanation for the phenotypic correlations. Our findings suggest a degree of cryptic genetic population structuring in the Dutch Godwit population.
- Published
- 2010
29. Analysis of Incomplete Data Using Inverse Probability Weighting and Doubly Robust Estimators
- Author
-
James R. Carpenter, Stijn Vansteelandt, and Michael G. Kenward
- Subjects
NONRESPONSE ,REPEATED OUTCOMES ,multiple imputation ,Inverse probability weighting ,extrapolation ,General Social Sciences ,Inference ,Estimator ,Regression analysis ,Missing data ,doubly robust estimation ,Outcome (probability) ,Horvitz–Thompson estimator ,extreme weights ,missing data ,Mathematics and Statistics ,REGRESSION-MODELS ,Statistics ,Covariate ,Econometrics ,INFERENCE ,Horvitz-Thompson estimator ,inverse probability weighting ,General Psychology ,Mathematics - Abstract
This article reviews inverse probability weighting methods and doubly robust estimation methods for the analysis of incomplete data sets. We first consider methods for estimating a population mean when the outcome is missing at random, in the sense that measured covariates can explain whether or not the outcome is observed. We then sketch the rationale of these methods and elaborate on their usefulness in the presence of influential inverse weights. We finally outline how to apply these methods in a variety of settings, such as for fitting regression models with incomplete outcomes or covariates, emphasizing the use of standard software programs.
- Published
- 2010
30. The impact of weather and atmospheric circulation on O3 and PM10 levels at a rural mid-latitude site
- Subjects
particulate matter ,Meteorologie en Luchtkwaliteit ,platteland ,concentration ,WIMEK ,Meteorology and Air Quality ,meteorological factors ,rural areas ,time-series ,air pollution ,surface ozone concentrations ,air-quality ,ozone ,meteorologische factoren ,urban air ,regression-models ,ozon ,artificial neural-networks ,multilayer perceptron ,sulfur-dioxide concentrations ,luchtverontreiniging ,fijn stof ,pattern-classification ,concentratie - Abstract
In spite of the strict EU regulations, concentrations of surface ozone and PM10 often exceed the pollution standards for the Netherlands and Europe. Their concentrations are controlled by (precursor) emissions, social and economic developments and a complex combination of meteorological actors. This study tackles the latter, and provides insight in the meteorological processes that play a role in O3 and PM10 levels in rural mid-latitudes sites in the Netherlands. The relations between meteorological actors and air quality are studied on a local scale based on observations from four rural sites and are determined by a comprehensive correlation analysis and a multiple regression (MLR) analysis in 2 modes, with and without air quality variables as predictors. Furthermore, the objective Lamb Weather Type approach is used to assess the influence of the large-scale circulation on air quality. Keeping in mind its future use in downscaling future climate scenarios for air quality purposes, special emphasis is given to an appropriate selection of the regressor variables readily available from operational meteorological forecasts or AOGCMs (Atmosphere-Ocean coupled General Circulation Models). The regression models perform satisfactory, especially for O3, with an (R2 of 57.0% and 25.0% for PM10. Including previous day air quality information increases significantly the models performance by 15% (O3) and 18% (PM10). The Lamb weather types show a seasonal distinct pattern for high (low) episodes of average O3 and PM10 concentrations, and these are clear related with the meteorology-air quality correlation analysis. Although using a circulation type approach can provide important additional physical relations forward, our analysis reveals the circulation method is limited in terms of short-term air quality forecast for both O3 and PM10 (R2 between 0.12 and 23%). In summary, it is concluded that the use of a regression model is more promising for short-term downscaling from climate scenarios than the use of a weather type classification approach.
- Published
- 2009
31. Mixture of inhomogeneous matrix models for species-rich ecosystems
- Author
-
Mortier, Frederic, Ouedraogo, Dakis-Yaoba, Claeys, Florian, Tadesse, Mahlet G., Cornu, Guillaume, Baya, Fidele, Benedet, Fabrice, Freycon, Vincent, Gourlet-Fleury, Sylvie, Picard, Nicolas, Biens et services des écosystèmes forestiers tropicaux : l'enjeu du changement global (Cirad-Es-UPR 105 BSEF), Département Environnements et Sociétés (Cirad-ES), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad), Laboratoire d'Economie Forestière (LEF), AgroParisTech-Institut National de la Recherche Agronomique (INRA), Department Mathematic and Statistic, Georgetown University, Ministère Centrafricain des Eaux, Forêts, Chasses et Pêches (MEFCP), CoForChange project - ERA-Net BiodivERsA, ANR (France), NERC (UK), CoForTips project - ERA-Net BiodivERsA, FWF (Austria), BelSPO (Belgium), Biens et services des écosystèmes forestiers tropicaux : l'enjeu du changement global (UPR BSEF), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad), and Institut National de la Recherche Agronomique (INRA)-AgroParisTech
- Subjects
DYNAMICS ,F40 - Écologie végétale ,[SDV]Life Sciences [q-bio] ,F62 - Physiologie végétale - Croissance et développement ,Mortalité ,POPULATION-MODELS ,CO2-INDUCED CLIMATE-CHANGE ,mixture models ,lasso selection ,species-rich ecosystems ,usher models ,TROPICAL RAIN-FORESTS ,VARIABLE SELECTION ,REGRESSION-MODELS ,FUNCTIONAL-GROUPS ,FINITE MIXTURES ,GAP MODEL ,GROWTH ,Écologie forestière ,K01 - Foresterie - Considérations générales ,Dynamique des populations ,Forêt tropicale humide ,Croissance ,Méthode statistique ,U10 - Informatique, mathématiques et statistiques ,Composition botanique ,Modèle de simulation ,Régénération naturelle ,Biodiversité ,Modèle mathématique - Abstract
Understanding how environmental factors could impact population dynamics is of primary importance for species conservation. Matrix population models are widely used to predict population dynamics. However, in species-rich ecosystems with many rare species, the small population sizes hinder a good fit of species-specific models. In addition, classical matrix models do not take into account environmental variability. We propose a mixture of regression models with variable selection allowing the simultaneous clustering of species into groups according to vital rate information (recruitment, growth and mortality) and the identification of group-specific explicative environmental variables. We develop an inference method coupling the R packages flexmix and glmnet. We first highlight the effectiveness of the method on simulated datasets. Next, we apply it to data from a tropical rain forest in the Central African Republic. We demonstrate the accuracy of the inhomogeneous mixture matrix model in successfully reproducing stand dynamics and classifying tree species into well-differentiated groups with clear ecological interpretations. Copyright (c) 2014 John Wiley & Sons, Ltd.
- Published
- 2015
32. Development and Validation of a Score for Adjusting Health Care Costs in General Practice
- Author
-
Claudio Cricelli, Giampiero Mazzaglia, Elisa Bianchini, Francesco Lapi, Iacopo Cricelli, Gianluca Trifirò, Lapi, F, Bianchini, E, Cricelli, I, Trifiro, G, Mazzaglia, G, and Cricelli, C
- Subjects
Budgets ,Male ,Gerontology ,Time Factors ,case-mix ,costs adjustment ,health care costs ,HSM Index ,Adolescent ,Adult ,Aged ,Chronic Disease ,Comorbidity ,Cost-Benefit Analysis ,Databases, Factual ,Female ,General Practice ,Health Care Rationing ,Health Services Needs and Demand ,Health Services Research ,Humans ,Italy ,Linear Models ,Middle Aged ,Models, Economic ,National Health Programs ,Needs Assessment ,Primary Health Care ,Reproducibility of Results ,Young Adult ,Health Care Costs ,Health Policy ,Public Health, Environmental and Occupational Health ,Medicine (all) ,Margin of error ,Health care rationing ,REGRESSION-MODELS ,Models ,Statistics ,Health care ,Medicine ,education.field_of_study ,Cost–benefit analysis ,Health services research ,health care cost ,Explained variation ,SETTING BUDGETS ,Public Health ,MEASURING PERFORMANCE ,Population ,Economic ,Databases ,MORBIDITY ,Case mix index ,education ,Factual ,business.industry ,Environmental and Occupational Health ,RISK-ADJUSTMENT ,CHRONIC DISEASE ,CAPITATION PAYMENTS ,business ,COMORBIDITY ,PREDICT COSTS - Abstract
Objective To develop and validate the Italian Health Search Morbidity (HSM) Index to adjust health care costs in general practice. Methods The study population comprised 1,076,311 patients registered in the Health Search CSD Longitudinal Patient Database between January 1, 2008, and December 31, 2010. We randomly selected 538,254 and 538,057 patients to form the development and validation cohorts, respectively. To ensure model convergence, 5% of the aforementioned cohorts were selected randomly to create development and validation samples. The outcome was the total direct health care costs covered by the national health system. Interaction between age and sex, chronic diseases, and acute diseases were entered in a multilevel generalized linear latent mixed model with random intercepts (province of residence and general practitioner) to identify determinants associated with increased or decreased costs. The estimated coefficients were linearly combined to create the HSM Index for individual patients. The score was applied to the validation sample, and measures of predictive accuracy, explained variance, and the observed/predicted ratio were computed to evaluate the model's accuracy. Results The mean yearly cost was €414.57 per patient, and the HSM Index had a median value of 5.08 (25th–75th range 4.44–5.98). The HSM Index explained 50.17% of the variation in costs. Concerning calibration, in 80% of the population, the margin of error in the estimation of costs was around 10%. Conclusions The HSM Index is a reliable case-mix system that could be implemented in general practice for costs adjustment. This tool should ensure fairer scrutiny of resource use and allocation of budgets among general practitioners.
- Published
- 2015
33. Efficiency and Bootstrap in the Promotion Time Cure Model
- Author
-
Ingrid Van Keilegom, Anouar El Ghouch, Francois Portier, and UCL - SSH/IMMAQ/ISBA - Institut de Statistique, Biostatistique et Sciences Actuarielles
- Subjects
Statistics and Probability ,Mathematical optimization ,Statistics::Theory ,Semiparametric efficiency ,Statistics & Probability ,promotion time cure model ,PROPORTIONAL HAZARDS MODEL ,01 natural sciences ,Asymptotic inference ,FRACTION ,CONSISTENCY ,010104 statistics & probability ,REGRESSION-MODELS ,0502 economics and business ,Applied mathematics ,Statistics::Methodology ,Semiparametric regression ,ASYMPTOTIC THEORY ,0101 mathematics ,bootstrap ,050205 econometrics ,Mathematics ,Parametric statistics ,Science & Technology ,05 social sciences ,Nonparametric statistics ,Estimator ,Promotion time cure model ,Maximization ,asymptotic inference ,Asymptotic theory (statistics) ,MAXIMUM-LIKELIHOOD-ESTIMATION ,semiparametric efficiency ,TRANSFORMATION ,Bootstrap ,Semiparametric model ,Delta method ,Cox model ,Physical Sciences ,SURVIVAL-DATA - Abstract
© 2017 ISI/BS. In this paper, we consider a semiparametric promotion time cure model and study the asymptotic properties of its nonparametric maximum likelihood estimator (NPMLE). First, by relying on a profile likelihood approach, we show that the NPMLE may be computed by a single maximization over a set whose dimension equals the dimension of the covariates plus one. Next, using Z-estimation theory for semiparametric models, we derive the asymptotics of both the parametric and nonparametric components of the model and show their efficiency. We also express the asymptotic variance of the estimator of the parametric component. Since the variance is difficult to estimate, we develop a weighted bootstrap procedure that allows for a consistent approximation of the asymptotic law of the estimators. As in the Cox model, it turns out that suitable tools are the martingale theory for counting processes and the infinite dimensional Z-estimation theory. Finally, by means of simulations, we show the accuracy of the bootstrap approximation. ispartof: BERNOULLI vol:23 issue:4B pages:3437-3468 status: published
- Published
- 2015
34. Novel unified framework for latent modeling and its interpretation
- Author
-
Lutgarde M. C. Buydens, Thanh N. Tran, Nelson Lee Afanador, Lionel Blanchet, Farmacologie en Toxicologie, and RS: NUTRIM - R4 - Gene-environment interaction
- Subjects
Computer science ,PARTIAL LEAST-SQUARES ,Big data ,PLS ,Machine learning ,computer.software_genre ,Field (computer science) ,Chemometrics theory ,Analytical Chemistry ,CHEMOMETRICS ,Chemometrics ,REGRESSION-MODELS ,Partial least squares ,Partial least squares regression ,Latent variable model ,Implementation ,Spectroscopy ,Interpretability ,PHARMACEUTICAL APPLICATIONS ,SELECTIVITY RATIO PLOT ,business.industry ,Process Chemistry and Technology ,TARGET PROJECTION ,Interpretation ,VARIABLE IMPORTANCE ,MASS-SPECTROMETRY ,Data science ,BIOMARKER DISCOVERY ,Computer Science Applications ,Range (mathematics) ,Multivariate analysis ,Artificial intelligence ,business ,computer ,Software ,Principal component regression - Abstract
An important characteristic of chemometrics has been its need to manage the tradeoff between computational, mathematical and statistical performance against data interpretability. Additionally, being mostly seen as a conglomeration of data analytic methods that target the solution to real-world problems, the development of chemometrics as an independent and well-defined field has been hampered by its applied nature. Consequently, the broad range and diversity of application of chemometric tools has hindered the development of a unified theory able to propel it beyond its current use in analytical and industrial chemistry to larger and more complex data problems.In this paper, we provide a mathematical vehicle for the understanding and improvement of current methods popular in chemometrics. Starting from a historical solution to matrix factorization we develop a novel unified framework for the fundamentals of latent variable modeling methods, elucidate major properties and clarify controversies between major PLS implementations and interpretations. The concepts presented in this work aims at contributing to a deeper understanding of the underlying theory of chemometrics methods, and strengthen their use in practice. Furthermore, this effort attempts to bridge the gap between chemometrics and big data problems and contribute to the development and acceptance of chemometrics as a mature and independent scientific field by the broader data analytic community. (C) 2015 Elsevier B.V. All rights reserved.
- Published
- 2015
35. Development and Validation of a Score for Adjusting Health Care Costs in General Practice
- Author
-
Lapi, F, Bianchini, E, Cricelli, I, Trifiro, G, Mazzaglia, G, Cricelli, C, Lapi, F, Bianchini, E, Cricelli, I, Trifiro, G, Mazzaglia, G, and Cricelli, C
- Abstract
Objective To develop and validate the Italian Health Search Morbidity (HSM) Index to adjust health care costs in general practice. Methods The study population comprised 1,076,311 patients registered in the Health Search CSD Longitudinal Patient Database between January 1, 2008, and December 31, 2010. We randomly selected 538,254 and 538,057 patients to form the development and validation cohorts, respectively. To ensure model convergence, 5% of the aforementioned cohorts were selected randomly to create development and validation samples. The outcome was the total direct health care costs covered by the national health system. Interaction between age and sex, chronic diseases, and acute diseases were entered in a multilevel generalized linear latent mixed model with random intercepts (province of residence and general practitioner) to identify determinants associated with increased or decreased costs. The estimated coefficients were linearly combined to create the HSM Index for individual patients. The score was applied to the validation sample, and measures of predictive accuracy, explained variance, and the observed/predicted ratio were computed to evaluate the model's accuracy. Results The mean yearly cost was €414.57 per patient, and the HSM Index had a median value of 5.08 (25th-75th range 4.44-5.98). The HSM Index explained 50.17% of the variation in costs. Concerning calibration, in 80% of the population, the margin of error in the estimation of costs was around 10%. Conclusions The HSM Index is a reliable case-mix system that could be implemented in general practice for costs adjustment. This tool should ensure fairer scrutiny of resource use and allocation of budgets among general practitioners.
- Published
- 2015
36. Variability in axillary lymph node dissection for breast cancer
- Author
-
Pax H.B. Willemse, Renée Otter, Elisabeth G.E. de Vries, Vaclav Fidler, J. Grond, Pieter L de Vogel, Winette T. A. van der Graaf, Michael Schaapveld, Faculteit Medische Wetenschappen/UMCG, and Guided Treatment in Optimal Selected Cancer Patients (GUTS)
- Subjects
medicine.medical_specialty ,Axillary lymph nodes ,I-II CARCINOMA ,medicine.medical_treatment ,Population ,Breast Neoplasms ,Mastectomy, Segmental ,breast cancer ,REGRESSION-MODELS ,Breast cancer ,STAGE ,medicine ,Humans ,axillary lymph node dissection ,education ,MULTIVARIATE-ANALYSIS ,Neoplasm Staging ,Observer Variation ,education.field_of_study ,pattern of care ,business.industry ,Axillary Lymph Node Dissection ,regional variation ,staging ,General Medicine ,Sentinel node ,medicine.disease ,Surgery ,Radiation therapy ,Axilla ,METASTASES ,medicine.anatomical_structure ,Oncology ,Lymphatic Metastasis ,Multivariate Analysis ,SURVIVAL ,BIOPSY ,ARM ,Lymph Node Excision ,Female ,Radiotherapy, Adjuvant ,Lymph Nodes ,Radiology ,business ,SENTINEL-NODE ,Mastectomy ,RADIOTHERAPY - Abstract
Background: The axillary nodal status may influence the prognosis and the choice of adjuvant treatment of individual breast cancer patients. The variation in number of reported axillary lymph nodes and its effect on the axillary nodal stage were studied and the implications are discussed.Methods: Between 1994 and 1997, a total of 4,806 axillary dissections for invasive breast cancers in 4,715 patients were performed in hospitals in the North-Netherlands. The factors associated with the number of reported nodes and the relation of this number with the nodal status and the number of positive nodes were studied.Results: The number of reported nodes varied significantly between pathology laboratories, the median number of nodes ranged from 9 to 15, respectively. The individual hospitals explained even more variability in the number of nodes than pathology laboratories (range in median number 8-15, P 20 nodes were examined, the percentage of tumors with >4 positive nodes increased from 4 to 31%. Multivariate analysis confirmed these results.Conclusions: This population-based study showed a large variation in the number of reported lymph nodes between hospitals. A more extensive surgical dissection or histopathological examination of the specimen generally resulted in a higher number of positive nodes. Although the impact of misclassification on adjuvant treatment will have varied, the impact with regard to adjuvant regional radiotherapy may have been considerable. (C) 2004 Wiley-Liss, Inc.
- Published
- 2004
37. Dairy products and pancreatic cancer risk: a pooled analysis of 14 cohort studies
- Author
-
Anthony B. Miller, Walter C. Willett, Graham G. Giles, Regina G. Ziegler, Anita Koushik, Pamela L. Horn-Ross, Kristin E. Anderson, Charles S. Fuchs, Thomas E. Rohan, Demetrius Albanes, Stephanie A. Smith-Warner, Kim Robien, Niclas Håkansson, Debra T. Silverman, Marjorie L. McCullough, Catherine Schairer, Jeanine M. Genkinger, Jo L. Freudenheim, James R. Marshall, Leslie Bernstein, Susan M. Gapstur, Ruifeng Li, Jarmo Virtamo, Molin Wang, Rachael Z. Stolzenberg-Solomon, Dallas R. English, R.A. Goldbohm, P.A. van den Brandt, Alicja Wolk, Epidemiologie, RS: CAPHRI School for Public Health and Primary Care, RS: CAPHRI - Clinical epidemiology, RS: CAPHRI - Occupational Epidemiology, RS: GROW - Oncology, and RS: GROW - R1 - Prevention
- Subjects
Male ,25-HYDROXYVITAMIN D ,pancreatic cancer ,NUTRITIONAL FACTORS ,VITAMIN-D STATUS ,LS - Life Style ,DIETARY ASSESSMENT ,Cohort Studies ,Cancer risk ,REGRESSION-MODELS ,Cheese ,Food intake ,Risk Factors ,calcium intake ,Medicine ,Vitamin D ,Prospective cohort study ,Risk assessment ,Hazard ratio ,Smoking ,Hematology ,Milk ,Oncology ,Body mass ,Health ,Yoghurt ,FOOD-FREQUENCY QUESTIONNAIRE ,Female ,Sex ,Cohort analysis ,pooled analysis ,Pancreas adenocarcinoma ,Healthy Living ,Cohort study ,Human ,Adult ,Ice cream ,medicine.medical_specialty ,NUTRIENT INTAKE ,Adolescent ,Reviews ,Major clinical study ,Vitamin intake ,Calcium intake ,Pooled analysis ,Behavioural Changes ,Internal medicine ,Pancreatic cancer ,Humans ,Risk factor ,Aged ,Proportional Hazards Models ,business.industry ,Proportional hazards model ,dairy products ,Very elderly ,medicine.disease ,Diet ,Pancreatic Neoplasms ,Endocrinology ,PHYSICAL-ACTIVITY ,CALIFORNIA TEACHERS ,GLYCEMIC INDEX ,ELSS - Earth, Life and Social Sciences ,Healthy for Life ,business ,Body mass index ,Dairy products - Abstract
Pancreatic cancer has few early symptoms, is usually diagnosed at late stages, and has a high case-fatality rate. Identifying modifiable risk factors is crucial to reducing pancreatic cancer morbidity and mortality. Prior studies have suggested that specific foods and nutrients, such as dairy products and constituents, may play a role in pancreatic carcinogenesis. In this pooled analysis of the primary data from 14 prospective cohort studies, 2212 incident pancreatic cancer cases were identified during follow-up among 862 680 individuals. Adjusting for smoking habits, personal history of diabetes, alcohol intake, body mass index (BMI), and energy intake, multivariable study-specific hazard ratios (MVHR) and 95% confidence intervals (CIs) were calculated using the Cox proportional hazards models and then pooled using a random effects model. There was no association between total milk intake and pancreatic cancer risk (MVHR = 0.98, 95% CI = 0.82-1.18 comparing ≥500 with 1-69.9 g/day). Similarly, intakes of low-fat milk, whole milk, cheese, cottage cheese, yogurt, and icecream were not associated with pancreatic cancer risk. No statistically significant association was observed between dietary (MVHR = 0.96, 95% CI = 0.77-1.19) and total calcium (MVHR = 0.89, 95% CI = 0.71-1.12) intake and pancreatic cancer risk overall when comparing intakes ≥1300 with
- Published
- 2014
38. “The Smarts That Counts?”: Psychologists' Decision-Making in Personnel Selection
- Author
-
Zysberg, Leehu and Nevo, Baruch
- Published
- 2004
- Full Text
- View/download PDF
39. A systematic approach to obtain validated partial least square models for predicting lipoprotein subclasses from serum NMR spectra
- Author
-
Johan A. Westerhuis, Velitchka V. Mihaleva, Age K. Smilde, F.A. van Dorsten, Doris M. Jacobs, J.P.M. van Duynhoven, A. de Graaf, D.B. van Schalkwijk, Jacques Vervoort, Biosystems Data Analysis (SILS, FNWI), and Faculteit der Geneeskunde
- Subjects
Male ,Very low-density lipoprotein ,Biomedical Innovation ,Blood lipids ,Lipoproteins, VLDL ,Biochemistry ,Analytical Chemistry ,chemistry.chemical_compound ,Life ,plasma-lipoproteins ,Middle Aged ,NMR spectra database ,Lipoproteins, LDL ,Biofysica ,nuclear-magnetic-resonance ,Low-density lipoprotein ,chromatography ,Female ,lipids (amino acids, peptides, and proteins) ,abnormalities ,Lipoproteins, HDL ,Healthy Living ,medicine.medical_specialty ,spectroscopy ,Biophysics ,Biochemie ,insulin-resistance ,Insulin resistance ,Double-Blind Method ,Internal medicine ,medicine ,Humans ,Least-Squares Analysis ,Biology ,Nuclear Magnetic Resonance, Biomolecular ,Aged ,VLAG ,Triglyceride ,Cholesterol ,medicine.disease ,chemometrics ,quantification ,MSB - Microbiology and Systems Biology ,Endocrinology ,chemistry ,low-density lipoprotein ,regression-models ,ELSS - Earth, Life and Social Sciences ,Lipoprotein ,Forecasting - Abstract
A systematic approach is described for building validated PLS models that predict cholesterol and triglyceride concentrations in lipoprotein subclasses in fasting serum from a normolipidemic, healthy population. The PLS models were built on diffusion-edited 1H NMR spectra and calibrated on HPLC-derived lipoprotein subclasses. The PLS models were validated using an independent test set. In addition to total VLDL, LDL, and HDL lipoproteins, statistically significant PLS models were obtained for 13 subclasses, including 5 VLDLs (particle size 64-31.3 nm), 4 LDLs (particle size 28.6-20.7 nm) and 4 HDLs (particle size 13.5-9.8 nm). The best models were obtained for triglycerides in VLDL (0.82 < Q2
- Published
- 2014
40. Toward enhanced understanding and projections of climate extremes using physics-guided data mining techniques
- Author
-
Stefan Liess, Devashish Kumar, Arindam Banerjee, Robert J. Oglesby, So O. Chatterjee, Jaya Kawale, Shyam Boriah, James H. Faghmous, Evan Kodra, Alok Choudhary, Katharine Hayhoe, Ankit Agrawal, Vipin Kumar, R. Mawalagedara, Qiang Fu, Karsten Steinhaeuser, Kaustubh Salvi, William Hendrix, Poulomi Ganguli, D. Wang, Sn N. Chatterjee, C. Hays, Auroop R. Ganguly, Peter K. Snyder, Subimal Ghosh, Wei-keng Liao, Debasish Das, Varun Mithal, and Donald J. Wuebbles
- Subjects
Process (engineering) ,Big data ,Climate change ,computer.software_genre ,Downscaling Extremes ,Cluster-Analysis ,Warming Environment ,lcsh:Science ,Typhoon Tracks ,Grand Challenges ,Part I ,Small data ,Emergency management ,business.industry ,Humanitarian aid ,Ocean Model ,lcsh:QC801-809 ,Regression-Models ,Carbon-Cycle ,Natural resource ,lcsh:QC1-999 ,lcsh:Geophysics. Cosmic physics ,13. Climate action ,lcsh:Q ,Data mining ,Tropical Cyclones ,business ,computer ,lcsh:Physics ,Precipitation Extremes - Abstract
Extreme events such as heat waves, cold spells, floods, droughts, tropical cyclones, and tornadoes have potentially devastating impacts on natural and engineered systems and human communities worldwide. Stakeholder decisions about critical infrastructures, natural resources, emergency preparedness and humanitarian aid typically need to be made at local to regional scales over seasonal to decadal planning horizons. However, credible climate change attribution and reliable projections at more localized and shorter time scales remain grand challenges. Long-standing gaps include inadequate understanding of processes such as cloud physics and ocean–land–atmosphere interactions, limitations of physics-based computer models, and the importance of intrinsic climate system variability at decadal horizons. Meanwhile, the growing size and complexity of climate data from model simulations and remote sensors increases opportunities to address these scientific gaps. This perspectives article explores the possibility that physically cognizant mining of massive climate data may lead to significant advances in generating credible predictive insights about climate extremes and in turn translating them to actionable metrics and information for adaptation and policy. Specifically, we propose that data mining techniques geared towards extremes can help tackle the grand challenges in the development of interpretable climate projections, predictability, and uncertainty assessments. To be successful, scalable methods will need to handle what has been called "big data" to tease out elusive but robust statistics of extremes and change from what is ultimately small data. Physically based relationships (where available) and conceptual understanding (where appropriate) are needed to guide methods development and interpretation of results. Such approaches may be especially relevant in situations where computer models may not be able to fully encapsulate current process understanding, yet the wealth of data may offer additional insights. Large-scale interdisciplinary team efforts, involving domain experts and individual researchers who span disciplines, will be necessary to address the challenge.
- Published
- 2014
41. GEE for longitudinal ordinal data: Comparing R-geepack, R-multgee, R-repolr, SAS-GENMOD, SPSS-GENLIN
- Author
-
Edwin R. van den Heuvel, Geert Molenberghs, and Nazanin Nooraee
- Subjects
Statistics and Probability ,Ordinal data ,Multivariate statistics ,Ordinal regression ,Copula (probability theory) ,GENERALIZED ESTIMATING EQUATIONS ,REGRESSION-MODELS ,Statistics ,Econometrics ,MAXIMUM-LIKELIHOOD ,Generalized estimating equation ,Mathematics ,BINARY DATA ,Applied Mathematics ,Regression analysis ,Bridge distribution ,ASSOCIATION ,Confidence interval ,SCORE DATA ,Computational Mathematics ,Computational Theory and Mathematics ,Copula ,LEAST-SQUARES ,Binary data ,Multivariate logistic distribution ,correlated ordinal data ,generalized estimating equations ,copula ,multivariate logistic distribution ,bridge distribution ,ORDERED CATEGORICAL-DATA ,ODDS RATIO ,STATISTICAL SOFTWARE PACKAGES ,Correlated ordinal data - Abstract
Studies in epidemiology and social sciences are often longitudinal and outcome measures are frequently obtained by questionnaires in ordinal scales. To understand the relationship between explanatory variables and outcome measures, generalized estimating equations can be applied to provide a population-averaged interpretation and address the correlation between outcome measures. It can be performed by different software packages, but a motivating example showed differences in the output. This paper investigated the performance of GEE in R (version 3.0.2), SAS (version 9.4), and SPSS (version 22.0.0) using simulated data under default settings. Multivariate logistic distributions were used in the simulation to generate correlated ordinal data. The simulation study demonstrated substantial bias in the parameter estimates and numerical issues for data sets with relative small number of subjects. The unstructured working association matrix requires larger numbers of subjects than the independence and exchangeable working association matrices to reduce the bias and diminish numerical issues. The coverage probabilities of the confidence intervals for fixed parameters were satisfactory for the independence and exchangeable working association matrix, but they were frequently liberal for the unstructured option. Based on the performance and the available options, SPSS and multgee, and repolr in R all perform quite well for relatively large sample sizes (e.g. 300 subjects), but multgee seems to do a little better than SPSS and repolr in most settings. (C) 2014 Elsevier B.V. All rights reserved.
- Published
- 2014
42. Dynamic models in space and time
- Subjects
UNEMPLOYMENT ,REGRESSION-MODELS ,DEPENDENCE ,AUTOREGRESSIVE MODELS ,RATES ,SPECIFICATION ,LABOR-FORCE PARTICIPATION ,ESTIMATORS - Abstract
This paper presents a first-order autoregressive distributed lag model in both space and time. It is shown that this model encompasses a wide series of simpler models frequently used in the analysis of space-time data as well as models that better fit the data and have never been used before. A framework is developed to determine which model is the most likely candidate to study space-time data. As an application, the relationship between the labor force participation rate and the unemployment rate is estimated using regional data of Germany, France, and the United Kingdom derived from Eurostat, 1983-1993.
- Published
- 2001
43. Analysis of Covariance with Incomplete Data Via Semiparametric Model Transformations
- Author
-
Matteo Grigoletto and Michael G. Akritas
- Subjects
TRUNCATED SURVIVAL-DATA, ADDITIVE RISK MODEL, REGRESSION-MODELS, CENSORED-DATA, AIDS, BOOTSTRAP ,Adult ,Statistics and Probability ,Biometry ,Time Factors ,Adolescent ,General Biochemistry, Genetics and Molecular Biology ,REGRESSION-MODELS ,BOOTSTRAP ,Statistics ,Covariate ,Odds Ratio ,Econometrics ,Humans ,Statistics::Methodology ,Truncation (statistics) ,Semiparametric regression ,Child ,Proportional Hazards Models ,Mathematics ,Acquired Immunodeficiency Syndrome ,Analysis of Variance ,Models, Statistical ,General Immunology and Microbiology ,Proportional hazards model ,Applied Mathematics ,Age Factors ,Nonparametric statistics ,General Medicine ,Middle Aged ,ADDITIVE RISK MODEL ,Censoring (statistics) ,Nonparametric regression ,Semiparametric model ,AIDS ,Child, Preschool ,Data Interpretation, Statistical ,Regression Analysis ,TRUNCATED SURVIVAL-DATA ,CENSORED-DATA ,General Agricultural and Biological Sciences - Abstract
We propose a method for fitting semiparametric models such as the proportional hazards (PH), additive risks (AR), and proportional odds (PO) models. Each of these semiparametric models implies that some transformation of the conditional cumulative hazard function (at each t) depends linearly on the covariates. The proposed method is based on nonparametric estimation of the conditional cumulative hazard function, forming a weighted average over a range of t-values, and subsequent use of least squares to estimate the parameters suggested by each model. An approximation to the optimal weight function is given. This allows semiparametric models to be fitted even in incomplete data cases where the partial likelihood fails (e.g., left censoring, right truncation). However, the main advantage of this method rests in the fact that neither the interpretation of the parameters nor the validity of the analysis depend on the appropriateness of the PH or any of the other semiparametric models. In fact, we propose an integrated method for data analysis where the role of the various semiparametric models is to suggest the best fitting transformation. A single continuous covariate and several categorical covariates (factors) are allowed. Simulation studies indicate that the test statistics and confidence intervals have good small-sample performance. A real data set is analyzed.
- Published
- 1999
44. Multiway calibration in 3D QSAR
- Subjects
multilinear PLS ,ANALYSIS COMFA ,SELECTION ,REGRESSION-MODELS ,multiway calibration ,BINDING ,MOLECULAR-FIELD ANALYSIS ,PLS ,leverage ,3D QSAR ,PREDICTIVE ABILITY ,PARAFAC - Abstract
We have introduced multilinear PLS in 3D QSAR and applied it to GRID descriptors from a set of benzamides with affinity to the dopamine D-3 receptor subtype, synthesized as potential drugs against schizophrenia. The key issue in 3D QSAR modelling is to obtain a predictive model that is easy to interpret, Each component in the multilinear PLS model explains clearly defined details, e.g. substituent positions, while the bilinear PLS solution is general and more difficult to interpret. The best models were obtained after four components with multilinear PLS (Q(2) = 51%) and after only one component with bilinear PLS (Q(2) = 50%). The external test set was predicted better with multilinear PLS (Q(2) = 31%) than with bilinear PLS (Q(2) = 25%). With multilinear PLS one loses in fit and gains in stability and simplicity owing to the fewer parameters that need to be estimated as compared with bilinear PLS. Finally, multilinear PLS is also less influenced by insignificant variation in the descriptor block, which is an advantage in 3D QSAR modelling. (C) 1997 John Wiley & Sons, Ltd.
- Published
- 1997
45. Investigating the association between birth weight and complementary air pollution metrics: A cohort study
- Author
-
Judith H. Chung, Olivier Laurent, Lianfa Li, Jun Wu, and Scott M. Bartell
- Subjects
Male ,Socioeconomic-Status ,Exposure Assessment ,Health, Toxicology and Mutagenesis ,Residential Proximity ,Air pollution ,010501 environmental sciences ,medicine.disease_cause ,01 natural sciences ,California ,Cohort Studies ,0302 clinical medicine ,Medicine and Health Sciences ,Birth Weight ,030212 general & internal medicine ,Vehicle Emissions ,Air Pollutants ,Carbon Monoxide ,1. No poverty ,Life Sciences ,Regression analysis ,Environmental exposure ,Regression-Models ,Motor Vehicles ,Female ,Nitrogen Oxides ,medicine.symptom ,Risk assessment ,Cohort study ,Environmental Monitoring ,Risk ,Birth weight ,03 medical and health sciences ,Fetal-Growth ,Oxidants, Photochemical ,Ozone ,Ozone Los-Angeles-County ,Preterm ,Environmental health ,Air Pollution ,medicine ,Humans ,Traffic ,0105 earth and related environmental sciences ,Exposure assessment ,Research ,Public Health, Environmental and Occupational Health ,Infant, Newborn ,Pregnancy Outcomes ,Models, Theoretical ,Traffic Exposure ,Low birth weight ,13. Climate action ,Environmental science ,Particulate Matter - Abstract
Background Exposure to air pollution is frequently associated with reductions in birth weight but results of available studies vary widely, possibly in part because of differences in air pollution metrics. Further insight is needed to identify the air pollution metrics most strongly and consistently associated with birth weight. Methods We used a hospital-based obstetric database of more than 70,000 births to study the relationships between air pollution and the risk of low birth weight (LBW, Results Increased risks of LBW were associated with ambient O3 concentrations as measured by monitoring stations, as well as traffic density and proximity to major roadways. LBW was not significantly associated with other air pollution metrics, except that a decreased risk was associated with ambient NO2 concentrations as measured by monitoring stations. When birth weight was analyzed as a continuous variable, small increases in mean birth weight were associated with most air pollution metrics (3 concentrations. Conclusions We found contrasting results according to the different air pollution metrics examined. Unmeasured confounders and/or measurement errors might have produced spurious positive associations between birth weight and some air pollution metrics. Despite this, ambient O3 was associated with a decrement in mean birth weight and significant increases in the risk of LBW were associated with traffic density, proximity to roads and ambient O3. This suggests that in our study population, these air pollution metrics are more likely related to increased risks of LBW than the other metrics we studied. Further studies are necessary to assess the consistency of such patterns across populations.
- Published
- 2013
46. Intronic variation at the CHD1-Z gene in Black-tailed Godwits Limosa limosa limosa: correlations with fitness components revisited
- Author
-
Krijn B. Trimbos, Theunis Piersma, Carola Poley, Jos C.E.W. Hooijmeijer, Geert R. de Snoo, Rosemarie Kentie, Marco van der Velde, C.J.M. Musters, Piersma group, and Both group
- Subjects
animal structures ,intron ,L.-LIMOSA ,NETHERLANDS ,CHD1-Z ,Zoology ,selection ,Biology ,Reproductive cycle ,REGRESSION-MODELS ,Polymorphism (computer science) ,Seasonal breeder ,MANAGEMENT ,Allele ,Gene ,Ecology, Evolution, Behavior and Systematics ,Genetics ,BIRDS ,sample size ,POLYMORPHISM ,SEX-CHROMOSOME ,CHICKS ,embryonic structures ,SURVIVAL ,GROWTH ,Animal Science and Zoology ,Limosa limosa limosa ,Body condition ,molecular sexing ,neutral variation - Abstract
Recently, Schroeder etal. (2010, Ibis 152: 368-377) suggested that intronic variation in the CHD1-Z gene of Black-tailed Godwits breeding in southwest Friesland, The Netherlands, correlated with fitness components. Here we re-examine this surprising result using an expanded dataset (2088 birds sampled from 2004 to 2010 vs. 284 birds from 2004 to 2007). We find that the presence of the Z* allele (9% of the birds) is not associated with breeding habitat type, egg size, adult survival, adult body mass or adult body condition. The results presented here, when used in synergy with the previously reported results by Schroeder etal., suggest that there might be a tendency towards female adults with the Z* allele laying earlier clutches than adult females without the Z* allele. The occurrence of the Z* allele was also associated with a higher chick body mass and return rate. Chicks with the Z* allele that had hatched early in the breeding season were heavier at birth than chicks without the Z* allele and chicks with the Z* allele that had hatched late. Collectively, the results suggest that variation in the CHD1-Z gene may indeed have arisen as a byproduct of selection acting on females during the egg fase and on chicks during the rearing stages of the reproductive cycle.
- Published
- 2013
47. Can species distribution models be used to describe plant abundance patterns?
- Author
-
Rosalinde Van Couwenberghe, Catherine Collet, Kris Verheyen, Jean-Claude Pierrat, Jean-Claude Gégout, Laboratoire d'Etudes des Ressources Forêt-Bois (LERFoB), Institut National de la Recherche Agronomique (INRA)-AgroParisTech, Dept Forest & Water Management, Lab Forestry, Ghent University [Belgium] (UGENT), Lorraine Region, Office National des Forets (ONF) [533-2007], Agence de l'Environnement et la Maitrise de l'Energie, the Inst. des Sciences et Industries du Vivant et de l'Environnement, ONF, AgroParisTech-Institut National de la Recherche Agronomique (INRA), and Universiteit Gent = Ghent University [Belgium] (UGENT)
- Subjects
0106 biological sciences ,[SDV.SA]Life Sciences [q-bio]/Agricultural sciences ,RESOURCE SELECTION FUNCTIONS ,Species distribution ,CONSERVATION ,Biology ,ECOLOGY ,010603 evolutionary biology ,01 natural sciences ,REGRESSION-MODELS ,Abundance (ecology) ,PRESENCE-ABSENCE ,HABITAT ,Relative species abundance ,Occupancy–abundance relationship ,Ecology, Evolution, Behavior and Systematics ,Relative abundance distribution ,ComputingMilieux_MISCELLANEOUS ,POPULATION ,Ecology ,010604 marine biology & hydrobiology ,NICHE ,Species diversity ,15. Life on land ,ASSUMPTIONS ,CLIMATE ,Species richness ,Rank abundance curve ,[SDE.BE]Environmental Sciences/Biodiversity and Ecology - Abstract
In recent years, there has been increasing interest in modelling of species abundance data in addition to presence data. In this study, we assessed the similarities and differences between presence-absence distributions and abundance distributions along similar environmental gradients, derived, respectively, from presence-absence and abundance data. Moreover, we examined the possibility of using presence-absence distribution models to derive abundance distributions. For this purpose, we used Braun-Blanquet abundance scores for 243 vascular species at 10 996 French forest sites. Species distribution models were used to analyse the link between the patterns of occurrence, low abundance and high abundance for each species with regard to mean annual temperature, June water balance, and soil pH. For each species, differences in the modelled distributions were characterised by the ecological optimum and ecological amplitude. A comparison of the presence-absence and abundance distributions for all species revealed similar optima and different amplitudes along the three ecological factors. An abundant-centre distribution was observed in environmental space, with species abundance being greatest at the optimal conditions and lower at less favourable conditions of the species occurrence response. Geographical habitat mapping also shows centred, high-abundance suitability within the presence habitat of each species. We conclude that species distribution models derived from presence-absence data provide useful information about the ecological optima of abundance distributions but overestimate the range of habitats suitable for high species abundance. This study demonstrates the utility of presence-absence data for ecologist and conservation biologist when they are interested in the optimal conditions of high species abundance.
- Published
- 2013
48. Comparison of designs for generalized linear models under model misspecification
- Author
-
Siuli Mukhopadhyay and André I. Khuri
- Subjects
Statistics and Probability ,Generalized linear model ,Statistics::Theory ,Mathematical optimization ,Proper linear model ,Linear Predictor ,Linear prediction ,Generalized linear mixed model ,Bias ,Kriging ,Robustness (computer science) ,Statistics::Methodology ,Applied mathematics ,Response surface methodology ,Mean-Squared Error Of Prediction ,Response-Surface Designs ,Mean Squared Error ,Mathematics ,Criterion ,Regression-Models ,Model Bias ,Robust Designs ,Response Surface Methodology ,Prediction ,Simulation ,Quantile - Abstract
The purpose of this article is to demonstrate the use of the quantile dispersion graphs (QDGs) approach for comparing candidate designs for generalized linear models in the presence of model misspecification in the linear predictor. The proposed design criterion is based on the mean-squared error of prediction which incorporates the prediction variance and the bias caused by fitting the wrong model. The method of kriging is used to estimate the unknown function assumed to be the cause of model misspecification. The QDGs approach is also useful in assessing the robustness of a given design to values of the unknown parameters in the linear predictor. Three numerical examples are presented to illustrate the application of the proposed methodology. (C) 2011 Elsevier B.V. All rights reserved.
- Published
- 2012
49. On generalized multinomial models and joint percentile estimation
- Author
-
I. Das and Siuli Mukhopadhyay
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Percentile ,I/Ii Clinical-Trials ,Response model ,Interval Estimation ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,Statistics - Applications ,Transformation ,Parameter Orthogonality ,Statistics ,FOS: Mathematics ,Applications (stat.AP) ,Neighbourhood (mathematics) ,Confidence Regions ,Parametric statistics ,Mathematics ,Multicategorical Logistic Link ,Applied Mathematics ,Dose-Response ,Logistic-Models ,Scale invariance ,Regression-Models ,Regression ,Standardization ,Binary Response Models ,Linear-Models ,Designs ,Misspecification ,Multinomial distribution ,Statistics, Probability and Uncertainty ,Parametric family - Abstract
This article proposes a family of link functions for the multinomial response model. The link family includes the multicategorical logistic link as one of its members. Conditions for the local orthogonality of the link and the regression parameters are given. It is shown that local orthogonality of the parameters in a neighbourhood makes the link family location and scale invariant. Confidence regions for jointly estimating the percentiles based on the parametric family of link functions are also determined. A numerical example based on a combination drug study is used to illustrate the proposed parametric link family and the confidence regions for joint percentile estimation., Comment: 28 pages, 4 tables and 1 figure
- Published
- 2012
- Full Text
- View/download PDF
50. Software Cost Modelling and Estimation Using Artificial Neural Networks Enhanced by Input Sensitivity Analysis
- Author
-
Papatheocharous, E. and ANDREAS ANDREOU
- Subjects
Regression-models ,Prediction systems ,Artificial neural networks ,Validation ,Engineering and Technology ,Input sensitivity analysis ,Electrical Engineering - Electronic Engineering - Information Engineering ,software cost estimation ,artificial neural networks ,input sensitivity analysis ,Accuracy ,Software cost estimation - Abstract
This paper addresses the issue of Software Cost Estimation (SCE) providing an alternative approach to modelling and prediction using Artificial Neural Networks (ANN) and Input Sensitivity Analysis (ISA). The overall aim is to identify and investigate the effect of the leading factors in SCE, through ISA. The factors identified decisively influence software effort in the models examined and their ability to provide sufficiently accurate SCEs is examined. ANN of variable topologies are trained to predict effort devoted to software development based on past (finished) projects recorded in two publicly available historical datasets. The main difference with relevant studies is that the proposed approach extracts the most influential cost drivers that describe best the effort devoted to development activities using the weights of the network connections. The approach is validated on known software cost data and the results obtained are assessed and compared. The ANN constructed generalise efficiently the knowledge acquired during training providing accurate effort predictions. The validation process included predictions with only the most highly ranked attributes among the original cost attributes of the datasets and revealed that accuracy performance was maintained at same levels. The results showed that the combination of ANN and ISA is an effective method for evaluating the contribution of cost factors, whereas the subsets of factors selected did not compromise the accuracy of the prediction results.
- Published
- 2012
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.