47 results
Search Results
2. Spatial analysis of child labour in India
- Author
-
Prashad, Lokender, Dutta, Mili, and Dash, Bishnu Mohan
- Published
- 2021
- Full Text
- View/download PDF
3. Excess-Risk Consistency of Group-hard Thresholding Estimator in Robust Estimation of Gaussian Mean.
- Author
-
Minasyan, A. G.
- Abstract
In this work we introduce the notion of the excess risk in the setup of estimation of the Gaussian mean when the observations are corrupted by outliers. It is known that the sample mean loses its good properties in the presence of outliers [5-6]. In addition, even the sample median is not minimax-rate-optimal in the multivariate setting. The optimal rate of the minimax risk in this setting was established by [1]. However, even these minimax-rate-optimality results do not quantify how fast the risk in the contaminated model approaches the risk in the uncontaminated model when the rate of contamination goes to zero. The present paper does a first step in filling this gap by showing that the group hard thresholding estimator has an excess risk that goes to zero when the corruption rate approaches zero. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
4. The statistical complexity of early-stopped mirror descent.
- Author
-
Kanade, Varun, Rebeschini, Patrick, and Vaškevičius, Tomas
- Subjects
- *
OPTIMIZATION algorithms , *MIRRORS , *OPTIMAL stopping (Mathematical statistics) - Abstract
Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early-stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk. We consider the set-up of learning linear models and kernel methods for strongly convex and Lipschitz loss functions while imposing only boundedness conditions on the unknown data-generating mechanism. By completing an inequality that characterizes convexity for the squared loss, we identify an intrinsic link between offset Rademacher complexities and potential-based convergence analysis of mirror descent methods. Our observation immediately yields excess risk guarantees for the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step size and the number of iterations. We apply our theory to recover, in a clean and elegant manner via rather short proofs, some of the recent results in the implicit regularization literature while also showing how to improve upon them in some settings. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Establishing a valid approach for estimating familial risk of cancer explained by common genetic variants.
- Author
-
Weigl, Korbinian, Chang‐Claude, Jenny, Hsu, Li, Hoffmeister, Michael, and Brenner, Hermann
- Subjects
COLON cancer ,FAMILY history (Medicine) ,CANCER ,GENETIC epidemiology - Abstract
We critically examined existing approaches for the estimation of the excess familial risk of cancer that can be attributed to identified common genetic risk variants and propose an alternative, more straightforward approach for calculating this proportion using well‐established epidemiological methodology. We applied the underlying equations of the traditional approaches and the new epidemiological approach for colorectal cancer (CRC) in a large population‐based case–control study in Germany with 4,447 cases and 3,480 controls, who were recruited from 2003 to 2016 and for whom interview, medical and genomic data were available. Having a family history of CRC (FH) was associated with a 1.77‐fold risk increase in our study population (95% CI 1.52–2.07). Traditional approaches yielded estimates of the FH‐associated risk explained by 97 common genetics variants from 9.6% to 23.1%, depending on various assumptions. Our alternative approach resulted in smaller and more consistent estimates of this proportion, ranging from 5.4% to 14.3%. Commonly employed methods may lead to strongly divergent and possibly exaggerated estimates of excess familial risk of cancer explained by associated known common genetic variants. Our results suggest that familial risk and risk associated with known common genetic variants might reflect two complementary major sources of risk. What's new? Today's methods to evaluate excess familial risk of cancer explained by associated known common genetic variants may lead to strongly divergent and possibly exaggerated estimates. This paper presents an alternative, more straightforward approach using well‐established epidemiological methodology. Application in a large population‐based case–control study in colorectal cancer supports suggestions that this proportion may be substantially smaller than previously assumed and highly dependent on SNP pruning methods, the assumed risk for having a family history of CRC, and the number of identified SNPs. Rather than reflecting a major subcomponent of familial risk, common genetic variants appear to reflect substantial complementary risk. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
6. Reduction based similarity learning for high dimensional problems
- Author
-
Iofina, G. V. and Maximov, Yu. V.
- Published
- 2016
- Full Text
- View/download PDF
7. Deep learning for [formula omitted]-weakly dependent processes.
- Author
-
Kengne, William and Wade, Modou
- Subjects
- *
DEEP learning , *ARTIFICIAL neural networks , *CAUSAL models , *TIME series analysis - Abstract
In this paper, we perform deep neural networks for learning stationary ψ -weakly dependent processes. Such weak-dependence property includes a class of weak dependence conditions such as mixing, association ⋯ and the setting considered here covers many commonly used situations such as: regression estimation, time series prediction, time series classification ⋯ The consistency of the empirical risk minimization algorithm in the class of deep neural networks predictors is established. We achieve the generalization bound and obtain an asymptotic learning rate, which is less than O (n − 1 / α) , for all α > 2. A bound of the excess risk, for a wide class of target functions, is also derived. Applications to binary time series classification and prediction in affine causal models with exogenous covariates are carried out. Some simulation results are provided, as well as an application to the US recession data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Fast generalization rates for distance metric learning.
- Author
-
Ye, Han-Jia, Zhan, De-Chuan, and Jiang, Yuan
- Subjects
MACHINE learning ,METRIC spaces ,K-nearest neighbor classification ,PAIRED comparisons (Mathematics) ,SUPERVISED learning - Abstract
Distance metric learning (DML) aims to find a suitable measure to compute a distance between instances. Facilitated by side information, the learned metric can often improve the performance of similarity or distance based methods such as kNN. Theoretical analyses of DML focus on the learning effectiveness for squared Mahalanobis distance. Specifically, whether the Mahalanobis metric learned from the empirically sampled pairwise constraints is in accordance with the optimal metric optimized over the paired samples generated from the true distribution, and the sample complexity of this process. The excess risk could measure the quality of the generalization, i.e., the gap between the expected objective of empirical metric learned from a regularized objective with convex loss function and the one with the optimal metric. Given N training examples, existing analyses over this non-i.i.d. learning problem have proved the excess risk of DML converges to zero at a rate of O1N. In this paper, we obtain a faster convergence rate of DML, O1N, when learning the distance metric with a smooth loss function and a strongly convex objective. In addition, when the problem is relatively easy, and the number of training samples is large enough, this rate can be further improved to O1N2. Synthetic experiments validate that DML can achieve the specified faster generalization rate, and results under various settings help explore the theoretical properties of DML a lot. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
9. ℓ1-Norm support vector machine for ranking with exponentially strongly mixing sequence.
- Author
-
Chen, Di-Rong and Huang, Shou-You
- Subjects
SUPPORT vector machines ,EXPONENTIAL functions ,SEQUENCE analysis ,UBIQUITOUS computing ,PROBLEM solving - Abstract
The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. Ranking from binary comparisons is a ubiquitous problem in modern machine learning applications. In this paper, we consider ℓ
1 -norm SVM for ranking. As well known, learning with ℓ1 -norm restrictions usually leads to sparsity. Moreover, instead of independently draw sample sequence, we are given sample of exponentially strongly mixing sequence. Under some mild conditions, a learning rate is established. [ABSTRACT FROM AUTHOR]- Published
- 2014
- Full Text
- View/download PDF
10. An error analysis for deep binary classification with sigmoid loss.
- Author
-
Li, Changshi, Jiao, Yuling, and Yang, Jerry Zhijian
- Subjects
- *
ARTIFICIAL neural networks , *CONVEX functions , *BUSINESS losses - Abstract
Deep neural networks have demonstrated remarkable efficacy in diverse classification tasks. In this paper, we specifically focus on the predictive performance in deep binary classification problems with the sigmoid loss. Given that sigmoid loss is categorized as a non-convex and bounded loss function, it exhibits potential resilience against the disruptive impact of outlier noises. We first derive the convergence rate of the excess misclassification risk for deep ReLU neural networks with the sigmoid loss, a result that attains minimax optimality. To the best of our acknowledge, we are the first to derive the convergence rate for the sigmoid loss. Moreover, we extend our analysis to derive a faster convergence rate under margin assumptions. This achievement renders our findings comparable to those of commonly employed convex loss functions operating under analogous assumptions. Lastly, we undertake a comprehensive validation of the robustness inherent in the sigmoid loss across diverse datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Spatio-temporal variability and possible source identification of criteria pollutants from Ahmedabad-a megacity of Western India
- Author
-
Bano, Shahana, Anand, Vrinda, Kalbande, Ritesh, Beig, Gufran, and Rathore, Devendra Singh
- Published
- 2024
- Full Text
- View/download PDF
12. Air pollution characteristics, health risks, and typical pollution processes in autumn and winter in a central city of China
- Author
-
Wang, Qianheng, Yao, Sen, Tao, Jie, Xu, Yifei, Yan, Huijiao, Zhang, Hanyu, Yang, Shushen, and Fan, Fengjuan
- Published
- 2023
- Full Text
- View/download PDF
13. An observational study on risk of secondary cancers in chronic myeloid leukemia patients in the TKI era in the United States.
- Author
-
Kumar, Vivek, Garg, Mohit, Chaudhary, Neha, and Chandra, Abhinav Binod
- Subjects
CHRONIC myeloid leukemia ,PROTEIN-tyrosine kinase inhibitors ,EPIDEMIOLOGY ,DISEASE incidence ,MEDICAL databases - Abstract
Introduction: The treatment with tyrosine kinase inhibitors (TKIs) has drastically improved the outcome of chronic myeloid leukemia (CML) patients. This study was conducted to examine the risk of secondary cancers (SCs) in the CML patients who were diagnosed and treated in the TKI era in the United States. Methods: The surveillance epidemiology and end results (SEER) database was used to identify CML patients who were diagnosed and received treatment during January 2002-December 2014. Standardized incidence ratios (SIRs) and absolute excess risks (AER) were calculated. Results: Overall, 511 SCs (excluding acute leukemia) developed in 9,200 CML patients followed for 38,433 person-years. The risk of developing SCs in the CML patients was 30% higher than the age, sex and race matched standard population (SIR 1.30, 95% CI: 1.2-1.40; p < 0.001). The SIRs for CLL (SIR 3.4, 95% CI: 2-5.5; p < 0.001), thyroid (SIR 2.2, 95% CI: 1.2-3.5; p < 0.001), small intestine (SIR 3.1, 95% CI: 1.1-7; p = 0.004), gingiva (SIR 3.7, 95% CI: 1.2-8.7; p = 0.002), stomach (SIR 2.1, 95% CI: 1.1-3.5; p = 0.005), lung (SIR 1.4, 95% CI: 1.1-1.7; p = 0.006) and prostate (SIR 1.3, 95% CI: 1.02-1.6; p = 0.026) cancer among CML patients were significantly higher than the general population. The risk of SCs was higher irrespective of age and it was highest in the period 2-12 months after the diagnosis of CML. The risk of SCs in women was similar to that of the general population. Conclusion: CML patients diagnosed and treated in the TKI era in the United States are at an increased risk of developing a second malignancy. The increased risk of SCs in the early period after CML diagnosis suggests that the risk of SCs may be increased due to the factors other than TKIs treatment. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
14. Minimum Excess Risk in Bayesian Learning.
- Author
-
Xu, Aolin and Raginsky, Maxim
- Subjects
PROBABILISTIC generative models ,EPISTEMIC uncertainty ,INFORMATION modeling ,PARAMETRIC modeling ,PREDICTION models - Abstract
We analyze the best achievable performance of Bayesian learning under generative models by defining and upper-bounding the minimum excess risk (MER): the gap between the minimum expected loss attainable by learning from data and the minimum expected loss that could be achieved if the model realization were known. The definition of MER provides a principled way to define different notions of uncertainties in Bayesian learning, including the aleatoric uncertainty and the minimum epistemic uncertainty. Two methods for deriving upper bounds for the MER are presented. The first method, generally suitable for Bayesian learning with a parametric generative model, upper-bounds the MER by the conditional mutual information between the model parameters and the quantity being predicted given the observed data. It allows us to quantify the rate at which the MER decays to zero as more data becomes available. Under realizable models, this method also relates the MER to the richness of the generative function class, notably the VC dimension in binary classification. The second method, particularly suitable for Bayesian learning with a parametric predictive model, relates the MER to the minimum estimation error of the model parameters from data via various continuity arguments. We also extend the definition and analysis of MER to the setting with multiple model families and the setting with nonparametric models. Along the discussions we draw some comparisons between the MER in Bayesian learning and the excess risk in frequentist learning. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. Indoor environmental quality in naturally ventilated schools of a dusty region: Excess health risks and effect of heating and desert dust transport.
- Author
-
Sahin, Cagri, Rastgeldi Dogan, Tuba, Yildiz, Melek, and Sofuoglu, Sait C.
- Subjects
ENVIRONMENTAL quality ,DUST ,PARTICULATE matter ,INDOOR air quality ,AIR pollution ,DESERTS ,HUMIDITY - Abstract
Indoor air quality (IAQ) is impacted by polluted outdoor air in naturally ventilated schools, especially in places where both anthropogenic and natural sources of ambient air pollution exist. CO2, PM2.5, PM10, temperature, relative humidity (RH), and noise were measured in five naturally ventilated primary schools in City of Sanliurfa, in a dusty region of Turkey, Southeast Anatolia. Excess risk levels were estimated for particulate matter. Investigation was conducted through an educational year including two seasons in terms of anthropogenic effect, that is, heating/non‐heating, and natural effect, that is, desert dust transport/non‐dust transport. The median CO2 concentration was measured to be >1000 ppm in all seasons/schools. Temperature and RH fell out of the comfort zone in October–December, during which pollutant concentrations were considerably increased, specifically in November, that heating and dust transport periods coincide. The overall mean indoor PM10 and PM2.5 levels were 58 and 31.8 μg/m3, respectively. Risk assessment indicate that both short (incidence of asthma symptoms in asthmatic children) and long‐term (prevalence of bronchitis) effects are considerable with 10.9 (2.4–19.6)% and 19.5 (2.2–38.8)%, respectively. The findings suggest that mechanical ventilation retrofitting with particle filtration is needed to mitigate potential negative health consequences on children. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
16. Learning rate of support vector machine for ranking.
- Author
-
Chen, Heng and Chen, Di-Rong
- Subjects
- *
SUPPORT vector machines , *RANKING (Statistics) , *MACHINE learning , *U-statistics , *PROBLEM solving , *LEARNING curve - Abstract
The ranking problem has received increasing attention in both the statistical and machine learning literature. This paper considers support vector machines for ranking. Under some mild conditions, a learning rate is established. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
17. On Oracle Inequalities Related to High Dimensional Linear Models.
- Author
-
Golubev, Yuri
- Abstract
We consider the problem of estimating an unknown vector θ from the noisy data Y=Aθ+ε, where A is a known m × n matrix and e is a white Gaussian noise. It is assumed that n is large and A is ill-posed. Therefore in order to estimate θ, a spectral regularization method is used and our goal is to choose a spectral regularization parameter with the help of the data Y. We study data-driven regularization methods based on the empirical risk minimization principle and provide some new oracle inequalities related to this approach. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
18. Effect of ambient O3 on mortality due to circulatory and respiratory diseases in a high latitude city of northeast China
- Author
-
Zhang, Yifan, Ma, Yuxia, Shen, Jiahui, Li, Heping, Wang, Hang, Cheng, Bowen, and Ma, Liya
- Published
- 2022
- Full Text
- View/download PDF
19. Short-term effects of multiple ozone metrics on outpatient visits for urticaria in Lanzhou, China
- Author
-
Zhang, Jing, He, Yuan, and Shi, Chunrui
- Published
- 2022
- Full Text
- View/download PDF
20. On Ranking and Generalization Bounds.
- Author
-
Rejchel, Wojciech
- Subjects
- *
MATHEMATICAL bounds , *EMPIRICAL research , *DATA analysis , *GENERALIZATION , *RANKING (Statistics) , *ESTIMATION theory - Abstract
The problem of ranking is to predict or to guess the ordering between objects on the basis of their observed features. In this paper we consider ranking estimators that minimize the empirical convex risk. We prove generalization bounds for the excess risk of such estimators with rates that are faster than 1/√n. We apply our results to commonly used ranking algorithms, for instance boosting or support vector machines. Moreover, we study the performance of considered estimators on real data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2012
21. Spatiotemporal analysis of dengue fever in Burkina Faso from 2016 to 2019
- Author
-
Ouattara, Cheick Ahmed, Traore, Seydou, Sangare, Ibrahim, Traore, Tiandiogo Isidore, Meda, Ziemlé Clément, and Savadogo, Léon G. Blaise
- Published
- 2022
- Full Text
- View/download PDF
22. Short-term effects of outdoor particulate matter pollution on outpatient visits for urticaria in Lanzhou, China
- Author
-
He, Yuan, Shi, Chunrui, Ling, Feifei, Qi, Jinjie, Guang, Qi, Luo, Zhicheng, and Xi, Qun
- Published
- 2021
- Full Text
- View/download PDF
23. Adaptive sequential machine learning.
- Author
-
Wilson, Craig, Bu, Yuheng, and Veeravalli, Venugopal V.
- Subjects
SEQUENTIAL learning ,MACHINE learning ,PROCESS optimization ,LEARNING problems ,RANDOM forest algorithms ,SUPPORT vector machines - Abstract
A framework previously introduced in Wilson et al. (2018) for solving a sequence of stochastic optimization problems with bounded changes in the minimizers is extended and applied to machine learning problems such as regression and classification. The stochastic optimization problems arising in these machine learning problems are solved using algorithms such as stochastic gradient descent (SGD). A method based on estimates of the change in the minimizers and properties of the optimization algorithm is introduced for adaptively selecting the number of samples at each time step to ensure that the excess risk—that is, the expected gap between the loss achieved by the approximate minimizer produced by the optimization algorithm and the exact minimizer—does not exceed a target level. A bound is developed to show that the estimate of the change in the minimizers is non trivial provided that the excess risk is small enough. Extensions relevant to the machine learning setting are considered, including a cost-based approach to select the number of samples with a cost budget over a fixed horizon, and an approach to applying cross-validation for model selection. Finally, experiments with synthetic and real data are used to validate the algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
24. Patient, tumor, and healthcare factors associated with regional variability in lung cancer survival: a Spanish high-resolution population-based study
- Author
-
Rodríguez-Barranco, M., Salamanca-Fernández, E., Fajardo, M. L., Bayo, E., Chang-Chan, Y.-L., Expósito, J., García, C., Tallón, J., Minicozzi, P., Sant, M., Petrova, D., Luque-Fernandez, M. A., and Sánchez, M.-J.
- Published
- 2019
- Full Text
- View/download PDF
25. Examples of Excess Risk Bounds in Prediction Problems
- Author
-
Koltchinskii, Vladimir and Koltchinskii, Vladimir
- Published
- 2011
- Full Text
- View/download PDF
26. Introduction
- Author
-
Koltchinskii, Vladimir and Koltchinskii, Vladimir
- Published
- 2011
- Full Text
- View/download PDF
27. Weighted least squares estimation for exchangeable binary data.
- Author
-
Bowman, Dale and George, E.
- Subjects
LEAST squares ,MAXIMUM likelihood statistics ,COMPUTATIONAL complexity ,PARAMETRIC equations ,NEWTON-Raphson method - Abstract
Parametric models of discrete data with exchangeable dependence structure present substantial computational challenges for maximum likelihood estimation. Coordinate descent algorithms such as the Newton's method are usually unstable, becoming a hit or miss adventure on initialization with a good starting value. We propose a method for computing maximum likelihood estimates of parametric models for finitely exchangeable binary data, formalized as an iterative weighted least squares algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
28. Adaptive spectral regularizations of high dimensional linear models
- Author
-
Yuri Golubev
- Subjects
Statistics and Probability ,62C10 ,empirical risk minimization ,Linear model ,Order (ring theory) ,Sigma ,Mathematics - Statistics Theory ,ordered smoother ,Statistics Theory (math.ST) ,High dimensional ,Regularization (mathematics) ,oracle inequality ,spectral regularization ,symbols.namesake ,Additive white Gaussian noise ,FOS: Mathematics ,symbols ,Applied mathematics ,62G05 ,Empirical risk minimization ,Statistics, Probability and Uncertainty ,excess risk ,Oracle inequality ,Mathematics - Abstract
This paper focuses on recovering an unknown vector $\beta$ from the noisy data $Y=X\beta +\sigma\xi$, where $X$ is a known $n\times p$-matrix, $\xi $ is a standard white Gaussian noise, and $\sigma$ is an unknown noise level. In order to estimate $\beta$, a spectral regularization method is used, and our goal is to choose its regularization parameter with the help of the data $Y$. In this paper, we deal solely with regularization methods based on the so-called ordered smoothers and provide some oracle inequalities in the case, where the noise level is unknown.
- Published
- 2011
29. Generalized Scalar-on-Image Regression Models via Total Variation.
- Author
-
Wang, Xiao and Zhu, Hongtu
- Subjects
SCALAR field theory ,LINEAR statistical models ,FUNCTIONAL analysis ,REGRESSION analysis ,ALZHEIMER'S disease ,BRAIN imaging - Abstract
The use of imaging markers to predict clinical outcomes can have a great impact in public health. The aim of this article is to develop a class of generalized scalar-on-image regression models via total variation (GSIRM-TV), in the sense of generalized linear models, for scalar response and imaging predictor with the presence of scalar covariates. A key novelty of GSIRM-TV is that it is assumed that the slope function (or image) of GSIRM-TV belongs to the space of bounded total variation to explicitly account for the piecewise smooth nature of most imaging data. We develop an efficient penalized total variation optimization to estimate the unknown slope function and other parameters. We also establish nonasymptotic error bounds on the excess risk. These bounds are explicitly specified in terms of sample size, image size, and image smoothness. Our simulations demonstrate a superior performance of GSIRM-TV against many existing approaches. We apply GSIRM-TV to the analysis of hippocampus data obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset. Supplementary materials for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
30. Weighted least squares estimation for exchangeable binary data
- Author
-
Bowman, Dale and George, E. Olusegun
- Published
- 2016
- Full Text
- View/download PDF
31. Spatiotemporal analysis of dengue fever in Nepal from 2010 to 2014.
- Author
-
Acharya, Bipin Kumar, ChunXiang Cao, Lakes, Tobia, Wei Chen, Naeem, Shahid, Cao, ChunXiang, and Chen, Wei
- Subjects
SPATIOTEMPORAL processes ,DENGUE ,CLUSTER analysis (Statistics) ,SPACETIME ,COMPUTER software ,PUBLIC health ,PUBLIC health surveillance ,STATISTICS ,DISEASE incidence - Abstract
Background: Due to recent emergence, dengue is becoming one of the major public health problems in Nepal. The numbers of reported dengue cases in general and the area with reported dengue cases are both continuously increasing in recent years. However, spatiotemporal patterns and clusters of dengue have not been investigated yet. This study aims to fill this gap by analyzing spatiotemporal patterns based on monthly surveillance data aggregated at district.Methods: Dengue cases from 2010 to 2014 at district level were collected from the Nepal government's health and mapping agencies respectively. GeoDa software was used to map crude incidence, excess hazard and spatially smoothed incidence. Cluster analysis was performed in SaTScan software to explore spatiotemporal clusters of dengue during the above-mentioned time period.Results: Spatiotemporal distribution of dengue fever in Nepal from 2010 to 2014 was mapped at district level in terms of crude incidence, excess risk and spatially smoothed incidence. Results show that the distribution of dengue fever was not random but clustered in space and time. Chitwan district was identified as the most likely cluster and Jhapa district was the first secondary cluster in both spatial and spatiotemporal scan. July to September of 2010 was identified as a significant temporal cluster.Conclusion: This study assessed and mapped for the first time the spatiotemporal pattern of dengue fever in Nepal. Two districts namely Chitwan and Jhapa were found highly affected by dengue fever. The current study also demonstrated the importance of geospatial approach in epidemiological research. The initial result on dengue patterns and risk of this study may assist institutions and policy makers to develop better preventive strategies. [ABSTRACT FROM AUTHOR]- Published
- 2016
- Full Text
- View/download PDF
32. Stochastic mechanistic interaction.
- Author
-
BERZUINI, CARLO and DAWID, A. PHILIP
- Subjects
INFERENTIAL statistics ,SUPERADDITIVITY ,MATHEMATICAL inequalities ,GRAPH theory ,PROBABILITY theory - Abstract
We define mechanistic interaction between the effects of two variables on an outcome in terms of departure of these effects from a generalized noisy-OR model in a stratum of the population. We develop a fully probabilistic framework for the observational identification of this type of interaction via excess risk or superadditivity, one novel feature of which is its applicability when the interacting variables have been generated by arbitrarily dichotomizing continuous exposures. The method allows for stochastic mediators of the interacting effects. The required assumptions are provided in the form of conditional independencies between the problem variables, which may relate to a causal-graph representation of the problem. We also develop a theory of mechanistic interaction between effects associated with specific paths of the causal graph. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
33. Excess risk bounds in robust empirical risk minimization
- Author
-
Timothée Mathieu, Stanislav Minsker, Department of Mathematics [Los Angeles], University of Southern California (USC), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Statistique mathématique et apprentissage (CELESTE), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), and Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Computer Science - Machine Learning ,Machine Learning (stat.ML) ,Sample (statistics) ,02 engineering and technology ,01 natural sciences ,Machine Learning (cs.LG) ,Combinatorics ,010104 statistics & probability ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,0202 electrical engineering, electronic engineering, information engineering ,Empirical risk minimization ,0101 mathematics ,Mathematics ,Numerical Analysis ,Median-of-means ,Stochastic process ,Applied Mathematics ,Excess risk ,Estimator ,Classification ,Regression ,Robust estimation ,Computational Theory and Mathematics ,Sample size determination ,Outlier ,020201 artificial intelligence & image processing ,62G35 ,Marginal distribution ,Analysis - Abstract
This paper investigates robust versions of the general empirical risk minimization algorithm, one of the core techniques underlying modern statistical methods. Success of the empirical risk minimization is based on the fact that for a ‘well-behaved’ stochastic process $\left \{ f(X), \ f\in \mathscr F\right \}$ indexed by a class of functions $f\in \mathscr F$, averages $\frac{1}{N}\sum _{j=1}^N f(X_j)$ evaluated over a sample $X_1,\ldots ,X_N$ of i.i.d. copies of $X$ provide good approximation to the expectations $\mathbb E f(X)$, uniformly over large classes $f\in \mathscr F$. However, this might no longer be true if the marginal distributions of the process are heavy tailed or if the sample contains outliers. We propose a version of empirical risk minimization based on the idea of replacing sample averages by robust proxies of the expectations and obtain high-confidence bounds for the excess risk of resulting estimators. In particular, we show that the excess risk of robust estimators can converge to $0$ at fast rates with respect to the sample size $N$, referring to the rates faster than $N^{-1/2}$. We discuss implications of the main results to the linear and logistic regression problems and evaluate the numerical performance of proposed methods on simulated and real data.
- Published
- 2021
34. Optimal Exponential Bounds on the Accuracy of Classification.
- Author
-
Kerkyacharian, G., Tsybakov, A., Temlyakov, V., Picard, D., and Koltchinskii, V.
- Subjects
EXPONENTIAL functions ,BINARY number system ,REGRESSION analysis ,ERROR analysis in mathematics ,DISTRIBUTION (Probability theory) ,DATA analysis - Abstract
Consider a standard binary classification problem, in which $$(X,Y)$$ is a random couple in $$\mathcal{X}\times \{0,1\}$$ , and the training data consist of $$n$$ i.i.d. copies of $$(X,Y).$$ Given a binary classifier $$f:\mathcal{X}\mapsto \{0,1\},$$ the generalization error of $$f$$ is defined by $$R(f)={\mathbb P}\{Y\ne f(X)\}$$ . Its minimum $$R^*$$ over all binary classifiers $$f$$ is called the Bayes risk and is attained at a Bayes classifier. The performance of any binary classifier $$\hat{f}_n$$ based on the training data is characterized by the excess risk $$R(\hat{f}_n)-R^*$$ . We study Bahadur's type exponential bounds on the following minimax accuracy confidence function based on the excess risk: [Equation not available: see fulltext.]where the supremum is taken over all distributions $$P$$ of $$(X,Y)$$ from a given class of distributions $$\mathcal{M}$$ and the infimum is over all binary classifiers $$\hat{f}_n$$ based on the training data. We study how this quantity depends on the complexity of the class of distributions $$\mathcal{M}$$ characterized by exponents of entropies of the class of regression functions or of the class of Bayes classifiers corresponding to the distributions from $$\mathcal{M}.$$ We also study its dependence on margin parameters of the classification problem. In particular, we show that, in the case when $$\mathcal{X}=[0,1]^d$$ and $$\mathcal{M}$$ is the class of all distributions satisfying the margin condition with exponent $$\alpha >0$$ and such that the regression function $$\eta $$ belongs to a given Hölder class of smoothness $$\beta >0,$$ [Equation not available: see fulltext.]for some constants $$D,\lambda _0>0$$ . [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
35. Deep determinism and the assessment of mechanistic interaction.
- Author
-
Berzuini, Carlo and Dawid, A. Philip
- Subjects
HEART diseases ,SYMMETRY (Biology) ,ACQUISITION of data ,BIOMECHANICS ,EPISTASIS (Genetics) ,BIOMETRY - Abstract
Given two variables that causally influence a binary response, we formalize the idea that their effects operate through a common mechanism, in which case we say that the two variables interact mechanistically. We introduce a mechanistic interaction relationship of “interference” that is asymmetric in the two causal factors. Conditions and assumptions under which such mechanistic interaction can be tested under a given regime of data collection, be it interventional or observational, are expressed in terms of conditional independence relationships between the problem variables, which can be manipulated with the aid of causal diagrams. The proposed method is able, under appropriate conditions, to test for interaction between direct effects, and to deal with the situation where one of the two factors is a dichotomized version of a continuous variable. The method is illustrated with the aid of a study on heart disease. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
36. The impact of model uncertainty on benchmark dose estimation The impact of model uncertainty on benchmark dose estimation.
- Author
-
West, R. Webster, Piegorsch, Walter W., Peña, Edsel A., An, Lingling, Wu, Wensong, Wickens, Alissa A., Xiong, Hui, and Chen, Wenhai
- Subjects
TOXICITY testing ,DOSE-response relationship in biochemistry ,UNCERTAINTY ,BENCHMARKING (Management) ,EXPERIMENTAL toxicology - Abstract
We study the popular benchmark dose (BMD) approach for estimation of low exposure levels in toxicological risk assessment, focusing on dose-response experiments with quantal data. In such settings, representations of the risk are traditionally based on a specified, parametric, dose-response model. It is a well-known concern, however, that uncertainty can exist in specification and selection of the model. If the chosen parametric form is in fact misspecified, this can lead to inaccurate, and possibly unsafe, low-dose inferences. We study the effects of model selection and possible misspecification on the BMD, on its corresponding lower confidence limit (BMDL), and on the associated extra risks achieved at these values, via large-scale Monte Carlo simulation. It is seen that an uncomfortably high percentage of instances can occur where the true extra risk at the BMDL under a misspecified or incorrectly selected model can surpass the target benchmark response, exposing potential dangers of traditional strategies for model selection when calculating BMDs and BMDLs. Copyright © 2012 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
37. General population versus disease-specific event rate and cost estimates: potential bias for economic appraisals.
- Published
- 2010
- Full Text
- View/download PDF
38. Aalen's linear model for sampled risk set data: a large sample study.
- Author
-
Zhang, Jian, Borgan, Ornulf, Zhang, J, and Borgan, O
- Abstract
Borgan and Langholz (1997) describe a method for estimating the parameter functions in Aalen's linear hazard regression model from sampled risk set data. Using a counting process formulation and the martingale central limit theorem, we provide a study of the asymptotic distributional properties of the estimator. The results are applied to study the efficiencies of the nested case-control and counter-matched designs relative to a full cohort analysis. [ABSTRACT FROM AUTHOR]
- Published
- 1999
- Full Text
- View/download PDF
39. Two-sample goodness-of-fit tests for additive risk models with censored observations.
- Author
-
KIM, JINHEUM and LEE, SEUNG-YEOUN
- Subjects
GOODNESS-of-fit tests ,STATISTICS ,STATISTICAL hypothesis testing ,CENSORING (Statistics) ,MARTINGALES (Mathematics) - Abstract
The additive risk model assumes that the hazard function associated with a set of covariates is the sum of the baseline hazard function and the regression function of covariates. We propose two different test procedures for checking the adequacy of two-sample additive risk models for randomly censored observations. One is based on the martingale residuals and the other on the difference between weighted estimators of the excess risk. The test statistics are shown to be asymptotically normal under appropriate regularity conditions and consistent under any model misspecifications. Finally, two real examples are provided, along with results of a simulation study. [ABSTRACT FROM AUTHOR]
- Published
- 1998
- Full Text
- View/download PDF
40. Improved classification rates under refined margin conditions
- Author
-
Ingo Steinwart and Ingrid Blaschzyk
- Subjects
Statistics and Probability ,fast rates of convergence ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,62H30, 62G20, 68T05 ,Statistical learning ,68T05 ,Support vector machine ,Set (abstract data type) ,Statistical classification ,histogram rule ,classification ,Margin (machine learning) ,Histogram ,FOS: Mathematics ,Decision boundary ,Noise (video) ,Statistics, Probability and Uncertainty ,excess risk ,Algorithm ,62H30 ,62G20 ,Ansatz ,Mathematics - Abstract
In this paper we present a simple partitioning based technique to refine the statistical analysis of classification algorithms. The core of our idea is to divide the input space into two parts such that the first part contains a suitable vicinity around the decision boundary, while the second part is sufficiently far away from the decision boundary. Using a set of margin conditions we are then able to control the classification error on both parts separately. By balancing out these two error terms we obtain a refined error analysis in a final step. We apply this general idea to the histogram rule and show that even for this simple method we obtain, under certain assumptions, better rates than the ones known for support vector machines, for certain plug-in classifiers, and for a recently analyzed tree based adaptive-partitioning ansatz. Moreover, we show that a margin condition which sets the critical noise in relation to the decision boundary makes it possible to improve the optimal rates proven for distributions without this margin condition., 32 pages
- Published
- 2018
41. Epidemiologische und statistische Methoden der Risikoabschätzung
- Author
-
Behrens, T., Pigeot, I., and Ahrens, W.
- Published
- 2009
- Full Text
- View/download PDF
42. Communicating epidemiological results through alternative indicators: Cognitive interviewing to assess a questionnaire on risk perception in a high environmental risk area
- Author
-
Domenica Farinella, Annibale Biggeri, Gianna Terni, and Michela Baccini
- Subjects
medicine.medical_specialty ,Applied psychology ,cognitive interviews ,050109 social psychology ,high risk area ,cognitive interviewing ,lcsh:Social Sciences ,risk communication, questionnaire validation, cognitive interviews, high risk area, Livorno, risk perception, excess risk, time needed to harm ,03 medical and health sciences ,0302 clinical medicine ,Environmental risk ,risk communication ,risk perception ,Epidemiology ,medicine ,Risk communication ,pollution ,0501 psychology and cognitive sciences ,030212 general & internal medicine ,Cognitive interview ,questionnaire validation ,05 social sciences ,General Social Sciences ,statistical uncertainty ,Cognition ,health ,Summary statistics ,Risk perception ,lcsh:H ,Livorno ,excess risk ,time needed to harm ,Cognitive interviewing ,environment ,health impact ,Psychology ,Social psychology ,Environmental epidemiology - Abstract
Participatory approaches to environmental research and decision-making require that all social stakeholders are involved from the onset of the debate. In such a setting, communication among different expertise is crucial, but language and technicalities may represent a barrier. In the clinical setting, decisions regarding treatment preferences may be influenced by the summary statistics used, but, according to the literature, no study has compared different statistical indicators for risk communication in environmental epidemiology. In this paper, we report on the qualitative results of the cognitive interviews conducted for assessing two questionnaires devoted to investigating risk perception when selected epidemiological results are communicated, by using different statistical indicators of health impact and uncertainty. The initial questionnaires were tested on 15 people residing in the high environmental risk area of Livorno (Italy). Cognitive interviewing led to substantial revision of the initial drafts. Moreover, it highlighted the difficulty of communicating statistical uncertainty and the need to account for the complex interaction between mathematical skills, affective factors and individual a priori knowledge on environmental risk perception.
- Published
- 2017
43. Dose-Response in Radiation Carcinogenesis:Human Studies
- Author
-
Hoel, David G., Preston, Dale L., and Castellani, Amleto, editor
- Published
- 1985
- Full Text
- View/download PDF
44. Binomial Distribution Sample Confidence Intervals Estimation 6. Excess Risk
- Author
-
Sorana BOLBOACĂ and Andrei ACHIMAŞ CADARIU
- Subjects
Excess risk ,Confidence interval estimation ,lcsh:Electronic computers. Computer science ,Risk factors assessments ,lcsh:QA75.5-76.95 - Abstract
We present the problem of the confidence interval estimation for excess risk (Y/n-X/m fraction), a parameter which allows evaluating of the specificity of an association between predisposing or causal factors and disease in medical studies. The parameter is computes based on 2x2 contingency table and qualitative variables. The aim of this paper is to introduce four new methods of computing confidence intervals for excess risk called DAC, DAs, DAsC, DBinomial, and DBinomialC and to compare theirs performance with the asymptotic method called here DWald.In order to assess the methods, we use the PHP programming language and a PHP program was creates. The performance of each method for different sample sizes and different values of binomial variables were assess using a set of criterions. First, the upper and lower boundaries for a given X, Y and a specified sample size for choused methods were compute. Second, were assessed the average and standard deviation of the experimental errors, and the deviation relative to imposed significance level α = 5%. Four methods were assessed on random numbers for binomial variables and for sample sizes from 4 to 1000 domain.The experiments show that the DAC methods obtain performances in confidence intervals estimation for excess risk.
- Published
- 2004
45. On universal oracle inequalities related to high-dimensional linear models
- Author
-
Yuri Golubev
- Subjects
Statistics and Probability ,Spectral regularization ,62C10 ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,computer.software_genre ,oracle inequality ,Regularization (mathematics) ,symbols.namesake ,FOS: Mathematics ,Applied mathematics ,62G05 ,Empirical risk minimization ,Mathematics ,Numerical linear algebra ,empirical risk minimization ,Linear model ,ordered smoother ,White noise ,Additive white Gaussian noise ,Gaussian noise ,symbols ,Statistics, Probability and Uncertainty ,excess risk ,Spectral method ,computer - Abstract
This paper deals with recovering an unknown vector $\theta$ from the noisy data $Y=A\theta+\sigma\xi$, where $A$ is a known $(m\times n)$-matrix and $\xi$ is a white Gaussian noise. It is assumed that $n$ is large and $A$ may be severely ill-posed. Therefore, in order to estimate $\theta$, a spectral regularization method is used, and our goal is to choose its regularization parameter with the help of the data $Y$. For spectral regularization methods related to the so-called ordered smoothers [see Kneip Ann. Statist. 22 (1994) 835--866], we propose new penalties in the principle of empirical risk minimization. The heuristical idea behind these penalties is related to balancing excess risks. Based on this approach, we derive a sharp oracle inequality controlling the mean square risks of data-driven spectral regularization methods., Comment: Published in at http://dx.doi.org/10.1214/10-AOS803 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)
- Published
- 2010
46. Nonparametric Tests of the Markov Model for Survival Data
- Author
-
Jones, Michael P. and Crowley, John
- Published
- 1992
- Full Text
- View/download PDF
47. Semiparametric Analysis of the Additive Risk Model
- Author
-
Lin, D. Y. and Ying, Zhiliang
- Published
- 1994
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.