141 results on '"Jeffrey S. Simonoff"'
Search Results
2. Using Conditional Inference Trees to (Re)Explore Nonprofit Board Composition
- Author
-
Jeffrey S. Simonoff and Rikki Abzug
- Subjects
Social Sciences (miscellaneous) - Abstract
This Research Note introduces nonprofit scholars to the contemporary analytical tool of conditional inference trees as a means to shed more light on the institutional forces behind the changing composition of nonprofit boards of trustees. Revisiting the data of the Six-Cities Cultures of Trusteeship Project, this note illustrates the illuminating power of conditional inference trees for analyzing data (particularly categorical data), not well served by significance testing. Applying these popular models adds depth, nuance, and increased clarity to some of the original findings from the Six-Cities research project. This empirical case serves as a how-to for future researchers hoping to more flexibly model the relative impact of institutional (and other) variables on nonprofit organization structures, as well as expand their methodological toolkit when dealing with all sorts of regression problems.
- Published
- 2022
3. Joint latent class trees: A tree-based approach to modeling time-to-event and longitudinal data
- Author
-
Ningshan Zhang and Jeffrey S Simonoff
- Subjects
Methodology (stat.ME) ,FOS: Computer and information sciences ,Statistics and Probability ,Likelihood Functions ,Statistics::Machine Learning ,Models, Statistical ,ComputingMethodologies_PATTERNRECOGNITION ,Health Information Management ,Latent Class Analysis ,Epidemiology ,Longitudinal Studies ,Statistics - Methodology - Abstract
In this paper, we propose a semiparametric, tree-based joint latent class model for the joint behavior of longitudinal and time-to-event data. Existing joint latent class approaches are parametric and can suffer from high computational cost. The most common parametric approach, the joint latent class model, further restricts analysis to using time-invariant covariates in modeling survival risks and latent class memberships. The proposed tree method (joint latent class tree) is fast to fit, and permits time-varying covariates in all of its modeling components. We demonstrate the prognostic value of using time-varying covariates, and therefore the advantage of joint latent class tree over joint latent class model on simulated data. We apply joint latent class tree to a well-known data set (the PAQUID data set) and confirm its superior prediction performance and orders-of-magnitude speedup over joint latent class model.
- Published
- 2022
4. Dynamic estimation with random forests for discrete‐time survival data
- Author
-
Jeffrey S. Simonoff, Halina Frydman, Denis Larocque, Weichi Yao, and Hoora Moradian
- Subjects
Statistics and Probability ,Hazard (logic) ,Time-varying covariate ,Estimation ,Pooling ,02 engineering and technology ,Function (mathematics) ,01 natural sciences ,Random forest ,010104 statistics & probability ,Discrete time and continuous time ,020204 information systems ,Covariate ,0202 electrical engineering, electronic engineering, information engineering ,Econometrics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
Time-varying covariates are often available in survival studies and estimation of the hazard function needs to be updated as new information becomes available. In this paper, we investigate several different easy-to-implement ways that random forests can be used for dynamic estimation of the survival or hazard function from discrete-time survival data. The results from a simulation study indicate that all methods can perform well, and that none dominates the others. In general, situations that are more difficult from an estimation point of view (such as weaker signals and less data) favour a global fit, pooling over all time points, while situations that are easier from an estimation point of view (such as stronger signals and more data) favor local fits.
- Published
- 2021
5. Analysis of Electrical Power and Oil and Gas Pipeline Failures.
- Author
-
Jeffrey S. Simonoff, Carlos E. Restrepo, Rae Zimmerman, and Zvia Naphtali
- Published
- 2007
- Full Text
- View/download PDF
6. Handbook of Regression Analysis
- Author
-
Samprit Chatterjee, Jeffrey S. Simonoff
- Published
- 2013
7. Unraveling Geographic Interdependencies in Electric Power Infrastructure.
- Author
-
Carlos E. Restrepo, Jeffrey S. Simonoff, and Rae Zimmerman
- Published
- 2006
- Full Text
- View/download PDF
8. The Potential for Nonparametric Joint Latent Class Modeling of Longitudinal and Time-to-Event Data
- Author
-
Ningshan Zhang and Jeffrey S. Simonoff
- Subjects
Flexibility (engineering) ,business.industry ,Computer science ,Nonparametric statistics ,Machine learning ,computer.software_genre ,Latent class model ,Tree (data structure) ,Covariate ,sort ,Artificial intelligence ,business ,computer ,Strengths and weaknesses ,Parametric statistics - Abstract
Joint latent class modeling (JLCM) of longitudinal and time-to-event data is a parametric approach of particular interest in clinical studies. JLCM has the flexibility to uncover complex data-dependent latent classes, but it suffers high computational cost, and it does not use time-varying covariates in modeling time-to-event and latent class membership. In this work, we explore in more detail both the strengths and weaknesses of JLCM. We then discuss the sort of nonparametric joint modeling approach that could address some of JLCM’s weaknesses. In particular, a tree-based approach is fast to fit, and can use any type of covariates in modeling both the time-to-event and the latent class membership, thus serving as an alternative method for JLCM with great potential.
- Published
- 2020
9. Survival trees for interval-censored survival data
- Author
-
Wei Fu and Jeffrey S. Simonoff
- Subjects
Statistics and Probability ,Epidemiology ,Proportional hazards model ,Monte Carlo method ,Inference ,02 engineering and technology ,01 natural sciences ,Midpoint ,010104 statistics & probability ,Survival data ,Tree structure ,Censoring (clinical trials) ,Statistics ,0202 electrical engineering, electronic engineering, information engineering ,Econometrics ,020201 artificial intelligence & image processing ,0101 mathematics ,Survival tree ,Mathematics - Abstract
Interval-censored data, in which the event time is only known to lie in some time interval, arise commonly in practice, for example, in a medical study in which patients visit clinics or hospitals at prescheduled times and the events of interest occur between visits. Such data are appropriately analyzed using methods that account for this uncertainty in event time measurement. In this paper, we propose a survival tree method for interval-censored data based on the conditional inference framework. Using Monte Carlo simulations, we find that the tree is effective in uncovering underlying tree structure, performs similarly to an interval-censored Cox proportional hazards model fit when the true relationship is linear, and performs at least as well as (and in the presence of right-censoring outperforms) the Cox model when the true relationship is not linear. Further, the interval-censored tree outperforms survival trees based on imputing the event time as an endpoint or the midpoint of the censoring interval. We illustrate the application of the method on tooth emergence data.
- Published
- 2017
10. An Ensemble Method for Interval-Censored Time-to-Event Data
- Author
-
Jeffrey S. Simonoff, Halina Frydman, and Weichi Yao
- Subjects
Statistics and Probability ,FOS: Computer and information sciences ,Computer science ,Proportional hazards model ,Monte Carlo method ,Inference ,02 engineering and technology ,General Medicine ,Interval (mathematics) ,Type (model theory) ,01 natural sciences ,Data set ,Methodology (stat.ME) ,010104 statistics & probability ,Nonlinear system ,Tree structure ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,0101 mathematics ,Statistics, Probability and Uncertainty ,Algorithm ,Statistics - Methodology - Abstract
Summary Interval-censored data analysis is important in biomedical statistics for any type of time-to-event response where the time of response is not known exactly, but rather only known to occur between two assessment times. Many clinical trials and longitudinal studies generate interval-censored data; one common example occurs in medical studies that entail periodic follow-up. In this article, we propose a survival forest method for interval-censored data based on the conditional inference framework. We describe how this framework can be adapted to the situation of interval-censored data. We show that the tuning parameters have a non-negligible effect on the survival forest performance and guidance is provided on how to tune the parameters in a data-dependent way to improve the overall performance of the method. Using Monte Carlo simulations, we find that the proposed survival forest is at least as effective as a survival tree method when the underlying model has a tree structure, performs similarly to an interval-censored Cox proportional hazards model fit when the true relationship is linear, and outperforms the survival tree method and Cox model when the true relationship is nonlinear. We illustrate the application of the method on a tooth emergence data set.
- Published
- 2019
11. Discussion: Deterioration of performance of the lasso with many predictors
- Author
-
Clifford M. Hurvich, Cheryl Flynn, and Jeffrey S. Simonoff
- Subjects
Statistics and Probability ,Statistics::Theory ,05 social sciences ,Estimator ,01 natural sciences ,Regularization (mathematics) ,010104 statistics & probability ,Lasso (statistics) ,0502 economics and business ,Econometrics ,Applied mathematics ,0101 mathematics ,Statistics, Probability and Uncertainty ,Oracle inequality ,050205 econometrics ,Mathematics - Abstract
Oracle inequalities provide probability loss bounds for the lasso estimator at a deterministic choice of the regularization parameter and are commonly cited as theoretical justification for the lasso and its ability to handle high-dimensional settings. Unfortunately, in practice, the regularization parameter is not selected to be a deterministic quantity, but is instead chosen using a random, data-dependent procedure, often making these inequalities misleading in their implications. We discuss general results and demonstrate empirically for data using categorical predictors that the amount of deterioration in performance of the lasso as the number of unnecessary predictors increases can be far worse than the oracle inequalities suggest, but imposing structure on the form of the estimates can reduce this deterioration substantially.
- Published
- 2016
12. Cultivating Innovative Entrepreneurs for the Twenty-First Century: A Study of U.S. and German Students
- Author
-
Benjamin S. Selznick, Stephen Vassallo, Matthew J. Mayhew, Jeffrey S. Simonoff, and William J. Baumol
- Subjects
Technology education ,Entrepreneurship ,ComputingMilieux_THECOMPUTINGPROFESSION ,Teaching method ,05 social sciences ,Twenty-First Century ,Exploratory research ,050301 education ,language.human_language ,Education ,German ,0502 economics and business ,Pedagogy ,ComputingMilieux_COMPUTERSANDEDUCATION ,language ,Cross-cultural ,Statistical analysis ,Sociology ,0503 education ,050203 business & management - Abstract
The purpose of this exploratory study was to examine the cultivation of innovative entrepreneurial intentions among students in three distinctive educational settings: a U.S. undergraduate four-year environment, a U.S. M.B.A two-year environment, and a German five-year business and technology environment. Results suggested that innovative entrepreneurial intentions varied based on educational setting. Implications for theory, research, and practice are discussed.
- Published
- 2016
13. Nonprofit Trusteeship in Different Contexts
- Author
-
Rikki Abzug, Jeffrey S. Simonoff, Rikki Abzug, and Jeffrey S. Simonoff
- Subjects
- Nonprofit organizations--Management, Trusts and trustees, Corporate governance
- Abstract
Critical interest in the characteristics, make-up and management of nonprofit organizations has seldom been higher. As this impetus grows, this important book draws on advances in neo-institutional organizational theory to explore the environmental and contextual influences on the structure and composition of boards of nonprofit organizations. Using information theoretic modelling, the book studies the interactions of time, place and organizational types (including faith affiliation) on US nonprofit boards, using unique quantitative data, collected from over 300 prestigious nonprofit organizations in a range of major US cities. With examples drawn from a variety of nonprofit sectors, including hospitals, museums, orchestras, universities, family services and community foundations, the book examines how boards evolve over time, in often unexpected ways; and in ways which reflect the regional, industrial and religious differences in the same period. Detailing the important implications for theory, practice and policy, this is the first book-length treatment of this topic to feature such a range of industries, geographic areas, and time frames. It offers a refreshing narrative and scientific approach; new and comprehensive subject matter; and a sweeping new time frame for literature in the field.
- Published
- 2018
14. Unbiased regression trees for longitudinal and clustered data
- Author
-
Wei Fu and Jeffrey S. Simonoff
- Subjects
Statistics and Probability ,Applied Mathematics ,Linear model ,Decision tree ,Random effects model ,Random binary tree ,Logistic model tree ,Computational Mathematics ,Tree (data structure) ,Tree structure ,Computational Theory and Mathematics ,Statistics ,Order statistic tree ,Mathematics - Abstract
A new version of the RE-EM regression tree method for longitudinal and clustered data is presented. The RE-EM tree is a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. The RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. The previously-suggested methodology used the CART tree algorithm for tree building, and therefore that RE-EM regression tree method inherits the tendency of CART to split on variables with more possible split points at the expense of those with fewer split points. A revised version of the RE-EM regression tree corrects for this bias by using the conditional inference tree as the underlying tree algorithm instead of CART. Simulation studies show that the new version is indeed unbiased, and has several improvements over the original RE-EM regression tree in terms of prediction accuracy and the ability to recover the correct tree structure.
- Published
- 2015
15. Survival of Broadway shows: An empirical investigation of recent trends
- Author
-
Nikolay Kulmatitskiy, Jeffrey S. Simonoff, Jing Cao, Lan Ma Nygren, and Kjell Nygren
- Subjects
Statistics and Probability ,Engineering ,021103 operations research ,business.industry ,Applied Mathematics ,0211 other engineering and technologies ,Attendance ,02 engineering and technology ,Logistic regression ,01 natural sciences ,010104 statistics & probability ,Proportional hazards regression ,Linear regression ,Econometrics ,0101 mathematics ,business ,Analysis ,Period (music) - Abstract
Using data on Broadway performances during the recent decade, we investigate the factors relating to the survival of Broadway shows. We assess the special structure of the Broadway season and build our analysis to accommodate this structure. Three modeling approaches are employed: logistic regression, proportional hazards regression, and (log-)linear regression for censored data. All three approaches reveal persistent positive effects of attendance, awards, and nominations on survival of Broadway shows. Musicals stay open longer than comparable nonmusicals, while nonrevival shows outperform revival shows, for shows that open during the June–February “pre-season” time period. Written reviews appear to matter only for shows that open during the time period between the announcement of the Tony award nominations and the announcement of the awards, with favorable reviews in the Daily News positively related to success in that period. Musicals and nonrevivals also do better during that time period. Econ...
- Published
- 2015
16. Effect Coding as a Mechanism for Improving the Accuracy of Measuring Students Who Self-Identify with More than One Race
- Author
-
Matthew J. Mayhew and Jeffrey S. Simonoff
- Subjects
Variables ,Higher education ,business.industry ,Computer science ,media_common.quotation_subject ,Education ,Educational research ,Mathematics education ,Statistical analysis ,business ,Categorical variable ,media_common ,Cognitive psychology ,Coding (social sciences) - Abstract
The purpose of this paper is to describe effect coding as an alternative quantitative practice for analyzing and interpreting categorical, multi-raced independent variables in higher education research. Not only may effect coding enable researchers to get closer to respondents' original intentions, it allows for more accurate analyses of all race based categories.
- Published
- 2015
17. Non-White, No More: Effect Coding as an Alternative to Dummy Coding With Implications for Higher Education Researchers
- Author
-
Jeffrey S. Simonoff and Matthew J. Mayhew
- Subjects
Variables ,Higher education ,business.industry ,media_common.quotation_subject ,Regression analysis ,Education ,Educational research ,Statistics ,Normative ,business ,Psychology ,Categorical variable ,Reference group ,Coding (social sciences) ,media_common - Abstract
The purpose of this article is to describe effect coding as an alternative quantitative practice for analyzing and interpreting categorical, racebased independent variables in higher education research. Unlike indicator (dummy) codes that imply that one group will be a reference group, effect codes use average responses as a means for interpreting information. This technique is especially appropriate for examining race, as such a process enables raced subgroups to be compared to each other and does not position responses of any raced group as normative— the standard against which all other race effects are interpreted. The issues raised here apply in any research context where a categorical variable without a natural reference group (e.g., college major) is a potential predictor in a regression model.
- Published
- 2015
18. Regional Cultures of Trusteeship
- Author
-
Jeffrey S. Simonoff and Rikki Abzug
- Published
- 2017
19. Statistical Models and Model Selection
- Author
-
Jeffrey S. Simonoff and Rikki Abzug
- Subjects
business.industry ,Computer science ,Model selection ,Statistical model ,Artificial intelligence ,business ,Machine learning ,computer.software_genre ,computer - Published
- 2017
20. Summary and Conclusions
- Author
-
Rikki Abzug and Jeffrey S. Simonoff
- Published
- 2017
21. Industry Cultures of Trusteeship
- Author
-
Jeffrey S. Simonoff and Rikki Abzug
- Published
- 2017
22. The Six Cities Trusteeship Project Dataset
- Author
-
Rikki Abzug and Jeffrey S. Simonoff
- Published
- 2017
23. What Difference Does Faith Make?
- Author
-
Jeffrey S. Simonoff and Rikki Abzug
- Subjects
Faith ,media_common.quotation_subject ,Sociology ,Religious studies ,media_common - Published
- 2017
24. Nonprofit Trusteeship in Different Contexts
- Author
-
Rikki Abzug and Jeffrey S. Simonoff
- Published
- 2017
25. From Whence Structure: Time Period Imprinting
- Author
-
Jeffrey S. Simonoff and Rikki Abzug
- Subjects
Evolutionary biology ,Biology ,Imprinting (psychology) - Published
- 2017
26. Boards of Trustees and Their Intellectual Environment
- Author
-
Jeffrey S. Simonoff and Rikki Abzug
- Published
- 2017
27. Boards and Their Varying Nature
- Author
-
Jeffrey S. Simonoff and Rikki Abzug
- Published
- 2017
28. Analyses of Trusteeship in Different Contexts
- Author
-
Jeffrey S. Simonoff and Rikki Abzug
- Published
- 2017
29. On the Sensitivity of the Lasso to the Number of Predictor Variables
- Author
-
Clifford M. Hurvich, Cheryl J. Flynn, and Jeffrey S. Simonoff
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Clustering high-dimensional data ,General Mathematics ,Estimator ,Machine Learning (stat.ML) ,020206 networking & telecommunications ,02 engineering and technology ,01 natural sciences ,Regularization (mathematics) ,Regression ,Zero (linguistics) ,010104 statistics & probability ,Least absolute shrinkage and selection operator (Lasso) ,high-dimensional data ,Lasso (statistics) ,Statistics - Machine Learning ,0202 electrical engineering, electronic engineering, information engineering ,Applied mathematics ,oracle inequalities ,Orthonormal basis ,Sensitivity (control systems) ,0101 mathematics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
The Lasso is a computationally efficient regression regularization procedure that can produce sparse estimators when the number of predictors (p) is large. Oracle inequalities provide probability loss bounds for the Lasso estimator at a deterministic choice of the regularization parameter. These bounds tend to zero if p is appropriately controlled, and are thus commonly cited as theoretical justification for the Lasso and its ability to handle high-dimensional settings. Unfortunately, in practice the regularization parameter is not selected to be a deterministic quantity, but is instead chosen using a random, data-dependent procedure. To address this shortcoming of previous theoretical work, we study the loss of the Lasso estimator when tuned optimally for prediction. Assuming orthonormal predictors and a sparse true model, we prove that the probability that the best possible predictive performance of the Lasso deteriorates as p increases is positive and can be arbitrarily close to one given a sufficiently high signal to noise ratio and sufficiently large p. We further demonstrate empirically that the amount of deterioration in performance can be far worse than the oracle inequalities suggest and provide a real data example where deterioration is observed.
- Published
- 2017
30. Regression tree-based diagnostics for linear multilevel models
- Author
-
Jeffrey S. Simonoff
- Subjects
Statistics and Probability ,Nonlinear system ,Heteroscedasticity ,Goodness of fit ,Multilevel model ,Statistics ,Econometrics ,Clustered data ,Decision tree ,Structure (category theory) ,Statistics, Probability and Uncertainty ,Random effects model ,Mathematics - Abstract
Longitudinal and clustered data, where multiple observations for individuals are observed, require special models that reflect their hierarchical structure. The most commonly used such model is the linear multilevel model, which combines a linear model for the population-level fixed effects, a linear model for normally distributed individual-level random effects and normally distributed observation-level errors with constant variance. It has the advantage of simplicity of interpretation, but if the assumptions of the model do not hold inferences drawn can be misleading. In this paper, we discuss the use of regression trees that are designed for multilevel data to construct goodness-of-fit tests for this model that can be used to test for nonlinearity of the fixed effects or heteroscedasticity of the errors. Simulations show that the resultant tests are slightly conservative as 0.05 level tests, and have good power to identify explainable model violations (that is, ones that are related to available covariate information in the data). Application of the tests is illustrated on two real datasets.
- Published
- 2013
31. Efficiency for Regularization Parameter Selection in Penalized Likelihood Estimation of Misspecified Models
- Author
-
Jeffrey S. Simonoff, Cheryl J. Flynn, and Clifford M. Hurvich
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Generalized linear model ,Statistics::Theory ,Penalized likelihood ,Penalized regression ,Machine Learning (stat.ML) ,Regression ,Statistics::Computation ,Statistics::Machine Learning ,Lasso (statistics) ,Statistics - Machine Learning ,Sample size determination ,Consistent estimator ,Statistics::Methodology ,Applied mathematics ,Statistics, Probability and Uncertainty ,Population variance ,Mathematics - Abstract
It has been shown that AIC-type criteria are asymptotically efficient selectors of the tuning parameter in non-concave penalized regression methods under the assumption that the population variance is known or that a consistent estimator is available. We relax this assumption to prove that AIC itself is asymptotically efficient and we study its performance in finite samples. In classical regression, it is known that AIC tends to select overly complex models when the dimension of the maximum candidate model is large relative to the sample size. Simulation studies suggest that AIC suffers from the same shortcomings when used in penalized regression. We therefore propose the use of the classical corrected AIC (AICc) as an alternative and prove that it maintains the desired asymptotic properties. To broaden our results, we further prove the efficiency of AIC for penalized likelihood methods in the context of generalized linear models with no dispersion parameter. Similar results exist in the literature but only for a restricted set of candidate models. By employing results from the classical literature on maximum-likelihood estimation in misspecified models, we are able to establish this result for a general set of candidate models. We use simulations to assess the performance of AIC and AICc, as well as that of other selectors, in finite samples for both SCAD-penalized and Lasso regressions and a real data example is considered.
- Published
- 2013
32. Exploring Innovative Entrepreneurship and Its Ties to Higher Educational Experiences
- Author
-
William J. Baumol, Matthew J. Mayhew, Batia M. Wiesenfeld, Michael W. Klein, and Jeffrey S. Simonoff
- Subjects
Entrepreneurship ,ComputingMilieux_THECOMPUTINGPROFESSION ,Higher education ,business.industry ,Teaching method ,media_common.quotation_subject ,Education ,Politics ,Pedagogy ,ComputingMilieux_COMPUTERSANDEDUCATION ,Personality ,Big Five personality traits ,Psychology ,business ,media_common - Abstract
The purpose of this paper was to explore innovative entrepreneurship and to gain insight into the educational practices and experiences that increase the likelihood that a student would graduate with innovative entrepreneurial intentions. To this end, we administered a battery of assessments to 3,700 undergraduate seniors who matriculated in the spring of 2007; these students attended one of five institutions participating in this study. Results showed that, after controlling for a host of personality, demographic, educational, and political covariates, taking an entrepreneurial course and the assessments faculty use as pedagogical strategies for teaching course content were significantly related to innovation intentions. Implications for higher education stakeholders are discussed.
- Published
- 2012
33. Asthma Hospital Admissions and Ambient Air Pollutant Concentrations in New York City
- Author
-
Jeffrey S. Simonoff, George D. Thurston, Carlos E. Restrepo, and Rae Zimmerman
- Subjects
Pollution ,Heat index ,education.field_of_study ,Meteorology ,business.industry ,Names of the days of the week ,media_common.quotation_subject ,Population ,Air pollution ,medicine.disease_cause ,medicine.disease ,Relative risk ,medicine ,Risk factor ,business ,education ,Demography ,media_common ,Asthma - Abstract
Air pollution is considered a risk factor for asthma. In this paper, we analyze the association between daily hospital admissions for asthma and ambient air pollution concentrations in four New York City counties. Negative binomial regression is used to model the association between daily asthma hospital admissions and ambient air pollution concentrations. Potential confounding factors such as heat index, day of week, holidays, yearly population changes, and seasonal and long-term trends are controlled for in the models. Nitrogen dioxide (NO2), sulfur dioxide (SO2) and carbon monoxide (CO) show the most consistent statistically significant associations with daily hospitalizations for asthma during the entire period (1996-2000). The associations are stronger for children (0 - 17 years) than for adults (18 - 64 years). Relative risks (RR) for the inter-quartile range (IQR) of same day 24-hour average pollutant concentration and asthma hospitalizations for children for the four county hospitalization totals were: NO2 (IQR = 0.011 ppm, RR = 1.017, 95% CI = 1.001, 1.034), SO2 (IQR = 0.008 ppm, RR = 1.023, 95% CI = 1.004, 1.042), CO (IQR = 0.232 ppm, RR = 1.014, 95% CI = 1.003, 1.025). In the case of ozone (O3) and particulate matter (PM2.5) statistically significant associations were found for daily one-hour maxima values and children’s asthma hospitalization in models that used lagged values for air pollution concentrations. Five-day weighted average lag models resulted in these estimates: O3 (one-hour maxima) (IQR = 0.025 ppm, RR = 1.049, 95% CI = 1.002, 1.098), PM2.5 (one-hour maxima) (IQR = 16.679 μg/m3, RR = 1.055, 95% CI = 1.008, 1.103). In addition, seasonal variations were also explored for PM2.5 and statistically significant associations with daily hospital admissions for asthma were found during the colder months (November-March) of the year. Important differences in pollution effects were found across pollutants, counties, and age groups. The results for PM2.5 suggest that the composition of PM is important to this health outcome, since the major sources of NYC PM differ between winter and summer months.
- Published
- 2012
34. News from Your Journal: Statistical Modelling
- Author
-
Brian D. Marx, Arnošt Komárek, and Jeffrey S. Simonoff
- Subjects
Statistics and Probability ,Computer science ,Econometrics ,Statistical model ,Statistics, Probability and Uncertainty - Published
- 2017
35. Color-emotion associations in the pharmaceutical industry: Understanding Universal and local themes
- Author
-
Jeffrey S. Simonoff, Anat Lechner, and Leslie Harrington
- Subjects
business.industry ,General Chemical Engineering ,Human Factors and Ergonomics ,Sample (statistics) ,General Chemistry ,Preference ,Color emotion ,Age and gender ,Consistency (negotiation) ,Product marketing ,Marketing ,Adaptation (computer science) ,Psychology ,business ,Pharmaceutical industry - Abstract
The strong shift toward operating in global markets has posed enormous adaptation challenges for product marketing especially with regard to universality and consistency of brand design decisions. The color-in-product design decision is also susceptible to this global–local tension. A pharmaceutical film coating formulator supplier to leading local and global pharmaceutical companies was interested in developing a solid validated global color preference database to enable informed brand decision making for its customers. The following study reports results from a global survey that examined the color–brand attribute associations within the global pharmaceutical industry. Data were collected from a multigeography gender and age balanced sample of 2021 subjects, revealing a strikingly powerful color language comprised of universally consistent associations and local contextual patterns that are each critical to global brand decision makers within this industry. © 2011 Wiley Periodicals,Inc. Col Res Appl, 2012
- Published
- 2011
36. RE-EM trees: a data mining approach for longitudinal and clustered data
- Author
-
Rebecca J. Sela and Jeffrey S. Simonoff
- Subjects
Mixed model ,Computer science ,Autocorrelation ,Linear model ,Fixed effects model ,Random effects model ,computer.software_genre ,Generalized linear mixed model ,Tree (data structure) ,Artificial Intelligence ,Data mining ,computer ,Software ,Parametric statistics - Abstract
Longitudinal data refer to the situation where repeated observations are available for each sampled object. Clustered data, where observations are nested in a hierarchical structure within objects (without time necessarily being involved) represent a similar type of situation. Methodologies that take this structure into account allow for the possibilities of systematic differences between objects that are not related to attributes and autocorrelation within objects across time periods. A standard methodology in the statistics literature for this type of data is the mixed effects model, where these differences between objects are represented by so-called "random effects" that are estimated from the data (population-level relationships are termed "fixed effects," together resulting in a mixed effects model). This paper presents a methodology that combines the structure of mixed effects models for longitudinal and clustered data with the flexibility of tree-based estimation methods. We apply the resulting estimation method, called the RE-EM tree, to pricing in online transactions, showing that the RE-EM tree is less sensitive to parametric assumptions and provides improved predictive power compared to linear models with random effects and regression trees without random effects. We also apply it to a smaller data set examining accident fatalities, and show that the RE-EM tree strongly outperforms a tree without random effects while performing comparably to a linear model with random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance relative to regression trees without random effects and is comparable or superior to using linear models with random effects in more general situations.
- Published
- 2011
37. Resource allocation, emergency response capability, and infrastructure concentration around vulnerable sites
- Author
-
Carlos E. Restrepo, Rae Zimmerman, Henry H. Willis, Jeffrey S. Simonoff, and Zvia Naphtali
- Subjects
Emergency management ,business.industry ,Strategy and Management ,Environmental resource management ,General Engineering ,General Social Sciences ,Critical infrastructure protection ,Industrial Accident ,Critical infrastructure ,Preparedness ,Resource allocation ,Mutual aid ,Safety, Risk, Reliability and Quality ,Natural disaster ,business ,Environmental planning - Abstract
Public and private decision‐makers continue to seek risk‐based approaches to allocate funds to help communities respond to disasters, accidents, and terrorist attacks involving critical infrastructure facilities. The requirements for emergency response capability depend both upon risks within a region's jurisdiction and mutual aid agreements that have been made with other regions. In general, regions in close proximity to infrastructure would benefit more from resources to improve preparedness because there is a greater potential for an event requiring emergency response to occur if there are more facilities at which such events could occur. Thus, a potentially important input into decisions about allocating funds for security is the proximity of a community to high concentrations of infrastructure systems that potentially could be at risk to an industrial accident, natural disaster, or terrorist attack. In this paper, we describe a methodology for measuring a region's exposure to infrastructure‐related r...
- Published
- 2011
38. Model selection in regression based on pre-smoothing
- Author
-
Niel Hens, Marc Aerts, Jeffrey S. Simonoff, AERTS, Marc, HENS, Niel, and Simonoff, Jeffrey S.
- Subjects
Akaike information criterion ,Statistics and Probability ,Polynomial regression ,Proper linear model ,Model selection ,latent variable modelmodel selection ,Generalized linear mixed model ,Nonparametric regression ,fractional polynomial ,Bayesian information criterion ,Statistics ,Log-linear model ,Statistics, Probability and Uncertainty ,pre-smoothing ,Mathematics - Abstract
In this paper, we investigate the effect of pre-smoothing on model selection. Christobal et al 6 showed the beneficial effect of pre-smoothing on estimating the parameters in a linear regression model. Here, in a regression setting, we show that smoothing the response data prior to model selection by Akaike's information criterion can lead to an improved selection procedure. The bootstrap is used to control the magnitude of the random error structure in the smoothed data. The effect of pre-smoothing on model selection is shown in simulations. The method is illustrated in a variety of settings, including the selection of the best fractional polynomial in a generalized linear model. We also gratefully acknowledge the support from the IAP research network nr P5/24 of the Belgian Government (Belgian Science Policy). The research of Niel Hens has been financially supported by the Fund of Scientific Research (FWO, Research Grant # G039304) of Flanders, Belgium.
- Published
- 2010
39. Risk management of cost consequences in natural gas transmission and distribution infrastructures
- Author
-
Carlos E. Restrepo, Rae Zimmerman, and Jeffrey S. Simonoff
- Subjects
Engineering ,business.industry ,General Chemical Engineering ,media_common.quotation_subject ,Risk measure ,Energy Engineering and Power Technology ,Distribution (economics) ,Management Science and Operations Research ,Computer security ,computer.software_genre ,Pipeline (software) ,Industrial and Manufacturing Engineering ,Product (business) ,Risk analysis (engineering) ,Transmission (telecommunications) ,Control and Systems Engineering ,Order (exchange) ,Safety, Risk, Reliability and Quality ,business ,Function (engineering) ,computer ,Risk management ,Food Science ,media_common - Abstract
A critical aspect of risk management in energy systems is minimizing pipeline incidents that can potentially affect life, property and economic well-being. Risk measures and scenarios are developed in this paper in order to better understand how consequences of pipeline failures are linked to causes and other incident characteristics. An important risk measure for decision-makers in this field is the association between incident cause and cost consequences. Data from the Office of Pipeline Safety (OPS) on natural gas transmission and distribution pipeline incidents are used to analyze the association between various characteristics of the incidents and product loss cost and property damage cost. The data for natural gas transmission incidents are for the period 2002 through May 2009 and include 959 incidents. In the case of natural gas distribution incidents the data include 823 incidents that took place during the period 2004 through May 2009. A two-step approach is used in the statistical analyses to model the consequences and the costs associated with pipeline incidents. In the first step the probability that there is a nonzero consequence associated with an incident is estimated as a function of the characteristics of the incident. In the second step the magnitudes of the consequence measures, given that there is a nonzero outcome, are evaluated as a function of the characteristics of the incidents. It is found that the important characteristics of an incident for risk management can be quite different depending on whether the incident involves a transmission or distribution pipeline, and the type of cost consequence being modeled. The application of this methodology could allow decision-makers in the energy industry to construct scenarios to gain a better understanding of how cost consequence measures vary depending on factors such as incident cause and incident type.
- Published
- 2010
40. Causes, cost consequences, and risk implications of accidents in US hazardous liquid pipeline infrastructure
- Author
-
Jeffrey S. Simonoff, Carlos E. Restrepo, and Rae Zimmerman
- Subjects
Engineering ,Information Systems and Management ,business.industry ,Regression analysis ,Pipeline (software) ,Critical infrastructure ,Computer Science Applications ,Pipeline transport ,Risk analysis (engineering) ,Hazardous waste ,Modeling and Simulation ,Submarine pipeline ,Operations management ,Safety, Risk, Reliability and Quality ,business ,Risk assessment ,Risk management - Abstract
In this paper the causes and consequences of accidents in US hazardous liquid pipelines that result in the unplanned release of hazardous liquids are examined. Understanding how different causes of accidents are associated with consequence measures can provide important inputs into risk management for this (and other) critical infrastructure systems. Data on 1582 accidents related to hazardous liquid pipelines for the period 2002–2005 are analyzed. The data were obtained from the US Department of Transportation’s Office of Pipeline Safety (OPS). Of the 25 different causes of accidents included in the data the most common ones are equipment malfunction, corrosion, material and weld failures, and incorrect operation. This paper focuses on one type of consequence–various costs associated with these pipeline accidents–and causes associated with them. The following economic consequence measures related to accident cost are examined: the value of the product lost; public, private, and operator property damage; and cleanup, recovery, and other costs. Logistic regression modeling is used to determine what factors are associated with nonzero product loss cost, nonzero property damage cost and nonzero cleanup and recovery costs. The factors examined include the system part involved in the accident, location characteristics (offshore versus onshore location, occurrence in a high consequence area), and whether there was liquid ignition, an explosion, and/or a liquid spill. For the accidents associated with nonzero values for these consequence measures (weighted) least squares regression is used to understand the factors related to them, as well as how the different initiating causes of the accidents are associated with the consequence measures. The results of these models are then used to construct illustrative scenarios for hazardous liquid pipeline accidents. These scenarios suggest that the magnitude of consequence measures such as value of product lost, property damage and cleanup and recovery costs are highly dependent on accident cause and other accident characteristics. The regression models used to construct these scenarios constitute an analytical tool that industry decision-makers can use to estimate the possible consequences of accidents in these pipeline systems by cause (and other characteristics) and to allocate resources for maintenance and to reduce risk factors in these systems.
- Published
- 2009
41. Transportation Density and Opportunities for Expediting Recovery to Promote Security
- Author
-
Rae Zimmerman and Jeffrey S. Simonoff
- Subjects
Engineering ,Expediting ,Injury control ,Scope (project management) ,Accident prevention ,business.industry ,Poison control ,Education ,Transport engineering ,Terrorism ,Safety, Risk, Reliability and Quality ,business ,Law ,Safety Research - Abstract
New York State ranks prominently among other states in the nation in the size and scope of its transportation system, with most of the usage of that system concentrated in and around New York City. Areas of infrastructure density and bottlenecks pose security challenges. Moreover, transportation is highly dependent on other infrastructure. Research addressing the reduction of security threats is proposed in terms of transportation operations and expediting recovery. Existing research is presented on transit recovery in the New York area after September 11, 2001 as a guide for future research into prevention of, and recovery from, disruptions to transit.
- Published
- 2008
42. Tobit model estimation and sliced inverse regression
- Author
-
Lexin Li, Chih-Ling Tsai, and Jeffrey S. Simonoff
- Subjects
Statistics and Probability ,Statistics::Theory ,Heteroscedasticity ,05 social sciences ,Linear model ,050401 social sciences methods ,Regression analysis ,01 natural sciences ,Censoring (statistics) ,010104 statistics & probability ,0504 sociology ,Homoscedasticity ,Statistics ,Sliced inverse regression ,Statistics::Methodology ,Tobit model ,Truncation (statistics) ,0101 mathematics ,Statistics, Probability and Uncertainty ,Mathematics - Abstract
It is not unusual for the response variable in a regression model to be subject to censoring or truncation. Tobit regression models are specific examples of such a situation, where for some observations the observed response is not the actual response, but the censoring value (often zero), and an indicator that censoring (from below) has occurred. It is well-known that the maximum likelihood estimator for such a linear model assuming Gaussian errors is not consistent if the error term is not homoscedastic and normally distributed. In this paper, we consider estimation in the Tobit regression context when homoscedasticity and normality of errors do not hold, as well as when the true response is an unspecified nonlinear function of linear terms, using sliced inverse regression (SIR). The properties of SIR estimation for Tobit models are explored both theoretically and based on extensive Monte Carlo simulations.We show that the SIR estimator is a strong competitor to other Tobit regression estimators, in that it has good properties when the usual linear model assumptions hold, and can be much more effective than other Tobit model estimators when those assumptions break down. An example related to household charitable donations demonstrates the usefulness of the SIR estimator.
- Published
- 2007
43. Risk-Management and Risk-Analysis-Based Decision Tools for Attacks on Electric Power
- Author
-
Carlos E. Restrepo, Rae Zimmerman, and Jeffrey S. Simonoff
- Subjects
Risk analysis ,Engineering ,Actuarial science ,Operations research ,business.industry ,Reliability (computer networking) ,Negative binomial distribution ,Poison control ,Logistic regression ,Physiology (medical) ,Electric power ,Duration (project management) ,Safety, Risk, Reliability and Quality ,business ,Risk management - Abstract
Incident data about disruptions to the electric power grid provide useful information that can be used as inputs into risk management policies in the energy sector for disruptions from a variety of origins, including terrorist attacks. This article uses data from the Disturbance Analysis Working Group (DAWG) database, which is maintained by the North American Electric Reliability Council (NERC), to look at incidents over time in the United States and Canada for the period 1990-2004. Negative binomial regression, logistic regression, and weighted least squares regression are used to gain a better understanding of how these disturbances varied over time and by season during this period, and to analyze how characteristics such as number of customers lost and outage duration are related to different characteristics of the outages. The results of the models can be used as inputs to construct various scenarios to estimate potential outcomes of electric power outages, encompassing the risks, consequences, and costs of such outages.
- Published
- 2007
44. Robust weighted LAD regression
- Author
-
Avi Giloni, Jeffrey S. Simonoff, and Bhaskar Sengupta
- Subjects
Statistics and Probability ,Applied Mathematics ,Local regression ,Estimator ,Regression analysis ,Least squares ,Robust regression ,Computational Mathematics ,Computational Theory and Mathematics ,Linear regression ,Statistics ,Statistics::Methodology ,Leverage (statistics) ,Simple linear regression ,Mathematics - Abstract
The least squares linear regression estimator is well-known to be highly sensitive to unusual observations in the data, and as a result many more robust estimators have been proposed as alternatives. One of the earliest proposals was least-sum of absolute deviations (LAD) regression, where the regression coefficients are estimated through minimization of the sum of the absolute values of the residuals. LAD regression has been largely ignored as a robust alternative to least squares, since it can be strongly affected by a single observation (that is, it has a breakdown point of 1/n, where n is the sample size). In this paper we show that judicious choice of weights can result in a weighted LAD estimator with much higher breakdown point. We discuss the properties of the weighted LAD estimator, and show via simulation that its performance is competitive with that of high breakdown regression estimators, particularly in the presence of outliers located at leverage points. We also apply the estimator to several data sets. ets.
- Published
- 2006
45. A mathematical programming approach for improving the robustness of least sum of absolute deviations regression
- Author
-
Bhaskar Sengupta, Jeffrey S. Simonoff, and Avi Giloni
- Subjects
Mathematical optimization ,Nonlinear system ,Cutting stock problem ,Robustness (computer science) ,Knapsack problem ,Modeling and Simulation ,Estimator ,Ocean Engineering ,Management Science and Operations Research ,Integer programming ,Regression ,Robust regression ,Mathematics - Abstract
This paper discusses a novel application of mathematical programming techniques to a regression problem. While least squares regression techniques have been used for a long time, it is known that their robustness properties are not desirable. Specifically, the estimators are known to be too sensitive to data contamination. In this paper we examine regressions based on Least-sum of Absolute Deviations (LAD) and show that the robustness of the estimator can be improved significantly through a judicious choice of weights. The problem of finding optimum weights is formulated as a nonlinear mixed integer program, which is too difficult to solve exactly in general. We demonstrate that our problem is equivalent to a mathematical program with a single functional constraint resembling the knapsack problem and then solve it for a special case. We then generalize this solution to general regression designs. Furthermore, we provide an efficient algorithm to solve the general nonlinear, mixed integer programming problem when the number of predictors is small. We show the efficacy of the weighted LAD estimator using numerical examples. © 2006 Wiley Periodicals, Inc. Naval Research Logistics, 2006
- Published
- 2006
46. 'Last Licks'
- Author
-
Jeffrey S. Simonoff and Gary Simon
- Subjects
Statistics and Probability ,General Mathematics ,Field (Bourdieu) ,Statistics, Probability and Uncertainty ,Marketing ,Psychology - Abstract
Much has been written about the home field advantage in sports. Baseball and softball are unusual games, in that the rules are explicitly different for home versus visiting teams, since by rule home teams bat second in each inning (they have last licks). This is generally considered to be an advantage, which seems to be contradicted by the apparent weakness of the home field advantage in baseball compared to that in other sports. In this article we examine the effect of last licks on baseball and softball team success using neutral site college baseball and softball playoff games. We find little evidence of an effect in baseball, but much greater evidence in softball, related to whether a game is close late in the game. In softball games that are tied at the end of an inning, batting last seems to be disadvantageous later in the game, apparently related to the chances of the team scoring first to break the tie. Since the database also includes games where one team is playing on its home field, we are also able to say something about benefits from playing at home that are not related to last licks.
- Published
- 2006
47. Analyzing Categorical Data
- Author
-
Jeffrey S. Simonoff and Jeffrey S. Simonoff
- Subjects
- Sociology—Methodology, Probabilities, Statistics, Social sciences—Statistical methods
- Abstract
Categorical data arise often in many fields, including biometrics, economics, management, manufacturing, marketing, psychology, and sociology. This book provides an introduction to the analysis of such data. The coverage is broad, using the loglinear Poisson regression model and logistic binomial regression models as the primary engines for methodology. Topics covered include count regression models, such as Poisson, negative binomial, zero-inflated, and zero-truncated models; loglinear models for two-dimensional and multidimensional contingency tables, including for square tables and tables with ordered categories; and regression models for two-category (binary) and multiple-category target variables, such as logistic and proportional odds models. All methods are illustrated with analyses of real data examples, many from recent subject area journal articles. These analyses are highlighted in the text, and are more detailed than is typical, providing discussion of the context and background of the problem, model checking, and scientific implications. More than 200 exercises are provided, many also based on recent subject area literature. Data sets and computer code are available at a web site devoted to the text. Adopters of this book may request a solutions manual from: textbook@springer-ny.com. From the reviews:'Jeff Simonoff's book is at the top of the heap of categorical data analysis textbooks...The examples are superb. Student reactions in a class I taught from this text were uniformly positive, particularly because of the examples and exercises. Additional materials related to the book, particularly code for S-Plus, SAS, and R, useful for analysis of examples, can be found at the author's Web site at New York University. I liked this book for this reason, and recommend it to you for pedagogical purposes.'(Stanley Wasserman, The American Statistician, August 2006, Vol. 60, No. 3)'The book has various noteworthy features. The examples used are from a variety of topics, including medicine, economics, sports, mining, weather, as well as social aspects like needle-exchange programs. The examples motivate the theory and also illustrate nuances of data analytical procedures. The book also incorporates several newer methods for analyzing categorical data, including zero-inflated Poisson models, robust analysis of binomial and poisson models, sandwich estimators, multinomial smoothing, ordinal agreement tables…this is definitely a good reference book for any researcher working with categorical data.'Technometrics, May 2004'This guide provides a practical approach to the appropriate analysis of categorical data and would be a suitable purchase for individuals with varying levels of statistical understanding.'Paediatric and Perinatal Epidemiology, 2004, 18'This book gives a fresh approach to the topic of categorical data analysis. The presentation of the statistical methods exploits the connection to regression modeling with a focus on practical features rather than formal theory...There is much to learn from this book. Aside from the ordinary materials such as association diagrams, Mantel-Haenszel estimators, or overdispersion, the reader will also find some less-often presented but interesting and stimulating topics...[T]his is an excellent book, giving an up-to-date introduction to the wide field of analyzing categorical data.'Biometrics, September 2004'...It is of great help to data analysts, practitioners and researchers who deal with categorical data and need to get a necessary insight into the methods of analysis as well as practical guidelines for solving problems.'International Journal of General Systems, August 2004'The author has succeeded in writing a useful and readable textbook combining most of general theory and practice of count data.'Kwantitatieve Methoden'The book esp
- Published
- 2013
48. The SAGE Handbook of Multilevel Modeling
- Author
-
Marc A. Scott, Jeffrey S. Simonoff, Brian D. Marx, Marc A. Scott, Jeffrey S. Simonoff, and Brian D. Marx
- Subjects
- Multilevel models (Statistics)
- Abstract
In this important new Handbook, the editors have gathered together a range of leading contributors to introduce the theory and practice of multilevel modeling. The Handbook establishes the connections in multilevel modeling, bringing together leading experts from around the world to provide a roadmap for applied researchers linking theory and practice, as well as a unique arsenal of state-of-the-art tools. It forges vital connections that cross traditional disciplinary divides and introduces best practice in the field. Part I establishes the framework for estimation and inference, including chapters dedicated to notation, model selection, fixed and random effects, and causal inference. Part II develops variations and extensions, such as nonlinear, semiparametric and latent class models. Part III includes discussion of missing data and robust methods, assessment of fit and software. Part IV consists of exemplary modeling and data analyses written by methodologists working in specific disciplines. Combining practical pieces with overviews of the field, this Handbook is essential reading for any student or researcher looking to apply multilevel techniques in their own research.
- Published
- 2013
49. An Empirical Study of Factors Relating to the Success of Broadway Shows*
- Author
-
lan ma and Jeffrey S. Simonoff
- Subjects
Economics and Econometrics ,Empirical research ,Actuarial science ,Proportional hazards model ,media_common.quotation_subject ,Longevity ,Attendance ,Statistics, Probability and Uncertainty ,Business and International Management ,Psychology ,media_common - Abstract
This article uses the Cox proportional hazards model to analyze recent Broadway show data to investigate the factors that relate to the longevity of shows. The type of show, whether a show is a revival, and first-week attendance for the show are predictive for longevity. Favorable critic reviews in the New York Daily News are related to greater success, but reviews in the New York Times are not. Winning major Tony Awards is associated with a longer run for a show, but being nominated for Tonys and then losing is associated with a shorter postaward run.
- Published
- 2003
50. Score Tests for the Single Index Model
- Author
-
Chih-Ling Tsai and Jeffrey S. Simonoff
- Subjects
Statistics and Probability ,Heteroscedasticity ,Single-index model ,Applied Mathematics ,Modeling and Simulation ,Model selection ,Autocorrelation ,Linear regression ,Monte Carlo method ,Statistics ,Nonparametric statistics ,Nonlinear regression ,Mathematics - Abstract
The single index model is a generalization of the linear regression model with E(y|x) = g(x′β), where g is an unknown function. The model provides a flexible alternative to the linear regression model while providing more structure than a fully nonparametric approach. Although the fitting of single index models does not require distributional assumptions on the error term, the properties of the estimates depend on such assumptions, as does practical application of the model. In this article score tests are derived for three potential misspecifications of the single index model: heteroscedasticity in the errors, autocorrelation in the errors, and the omission of an important variable in the linear index. These tests have a similar structure to corresponding tests for nonlinear regression models. Monte Carlo simulations demonstrate that the first two tests hold their nominal size well and have good power properties in identifying model violations, often outperforming other tests. Testing for the need for ad...
- Published
- 2002
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.