38 results on '"non-parametric methods"'
Search Results
2. Risk Assessment in Monitoring of Water Analysis of a Brazilian River
- Author
-
Luciene Pires Brandão, Vanilson Fragoso Silva, Marcelo Bassi, and Elcio Cruz de Oliveira
- Subjects
Manganese ,Organic Chemistry ,Pharmaceutical Science ,Water ,Risk Assessment ,Analytical Chemistry ,biochemical oxygen demand ,manganese molar concentration ,guard bands ,pH ,non-parametric methods ,Escherichia coli ,Rivers ,Chemistry (miscellaneous) ,Water Quality ,Drug Discovery ,Molecular Medicine ,Physical and Theoretical Chemistry ,Brazil ,Environmental Monitoring - Abstract
This study aimed to introduce non-parametric tests and guard bands to assess the compliance of some river water properties with Brazilian environmental regulations. Due to the heterogeneity of the measurands pH, Biochemical Oxygen Demand (BOD), manganese molar concentration, and Escherichia coli, which could be wrongly treated as outliers, as well as the non-Gaussian data, robust methods were used to calculate the measurement uncertainty. Next, based on guard bands, the compliance assessment was evaluated using this previous uncertainty information. For these four measurands, partial overlaps between their uncertainties and the specification limit could generate doubts about compliance. The non-parametric approach for calculating the uncertainty connected to the guard bands concept classified pH and BOD as “conform”, with a risk to the consumer of up to 4.0% and 4.9%, respectively; in contrast, manganese molar concentration and Escherichia coli were “not conform”, with a risk to the consumer of up to 25% and 7.4%, respectively. The methodology proposed was satisfactory because it considered the natural heterogeneity of data with non-Gaussian behavior instead of wrongly excluding outliers. In an unprecedented way, two connected statistical approaches shed light on the measurement uncertainty in compliance assessment of water analysis.
- Published
- 2022
3. On data-driven design of LPV controllers with flexible reference models
- Author
-
Simone Formentin, M. van Meer, Tae Tom Oomen, and Valentina Breschi
- Subjects
Hyperparameter ,Control and Systems Engineering ,Computer science ,Control theory ,Prior probability ,Benchmark (computing) ,Non-parametric methods ,Data driven control ,Design methods ,Reference model ,Selection (genetic algorithm) ,Data-driven - Abstract
Many data-driven control design methods require the a-priori selection of a reference model to be tracked. In case of limited priors on the plant, such a blind choice might ultimately compromise the overall performance. In this work, we propose a nested strategy for the direct design of Linear Parameter Varying (LPV) controllers from data, in which the reference model is treated as a hyperparameter to be tuned. The proposed approach allows one to jointly optimize the reference model and learn an LPV controller, solely based on soft specifications on the desired closed-loop. The effectiveness of the proposed technique is assessed on a benchmark case study, with the obtained results showing its potential advantages over a state-of-the-art method.
- Published
- 2021
- Full Text
- View/download PDF
4. Gender wage discrimination by distribution of income in Mexico, 2005-2020
- Author
-
Miguel Ángel Mendoza González
- Subjects
sticky floors ,income distribution ,gender wage gap ,media_common.quotation_subject ,J24 ,Wage ,Distribution (economics) ,01 natural sciences ,010104 statistics & probability ,Income distribution ,0502 economics and business ,ddc:330 ,Economics ,C14 ,J31 ,050207 economics ,0101 mathematics ,health care economics and organizations ,J71 ,media_common ,non-parametric methods ,O54 ,J16 ,business.industry ,05 social sciences ,Nonparametric statistics ,gender wage discrimination ,Demographic economics ,business ,General Economics, Econometrics and Finance - Abstract
This paper aims to analyze the hourly gender wage gap between men and women in Mexico for the period 2005-2020. To this end, a number of variables is selected to reflect workers' human capital, household circumstances and workplace characteristics; then, a novel non-parametric method decomposes wage differentials between men and women into its composition and structure effects throughout the distribution of labor income. Results are consistent with the sticky-floor hypothesis, where male workers earn higher hourly wages than female workers at low income levels. However, differentials decrease in the upper part of the distribution and may even reverse, favoring women over men at the highest income levels.
- Published
- 2020
- Full Text
- View/download PDF
5. Time-Varying Assets Clustering via Identity-Link Latent-Space Infinite Mixture: An Application on DAX Components
- Author
-
Antonio Peruzzi and Roberto Casarin
- Subjects
Latent space models, Bayesian inference, Non-parametric methods ,Bayesian inference ,Latent space models ,Non-parametric methods ,Settore SECS-P/05 - Econometria ,Settore SECS-S/01 - Statistica - Published
- 2022
6. Climate change impacts on the water flow to the reservoir of the Dez Dam basin
- Author
-
Nima Norouzi
- Subjects
Environmental Engineering ,Water flow ,Flow (psychology) ,Drainage basin ,Climate change ,Structural basin ,lcsh:TD1-1066 ,Shift change ,Water source ,Non-parametric methods ,lcsh:Environmental technology. Sanitary engineering ,skin and connective tissue diseases ,Engineering (miscellaneous) ,Water Science and Technology ,Hydrology ,geography ,geography.geographical_feature_category ,Global warming ,fungi ,Zagros region ,humanities ,Trend analysis ,Flow regime ,Environmental science ,sense organs ,Surface runoff ,geographic locations - Abstract
This study has led to changes in the Dez Dam water basin area in recent decades. Non-parametric Mann-Kendal trend test and two shift change point detection tests of Pettit and Buishand were applied to the discharge time-series at the outlet of Tire, Marbore, Sazar, and Bakhtiari sub-basins to identify monotonic and abrupt changes. The Mann-Kendall test showed a significant negative (decreasing) trend of the flow in three sub-basins. Investigation of the flow of Dez Basin in the past decades shows significant monotonic and abrupt changes, which are mostly toward decreasing the basin’s potential runoff. Considering this evidences, it is likely that the basin faces discharge reduction, and results emphasize on modification of water management strategies to adapt to climate change.
- Published
- 2020
- Full Text
- View/download PDF
7. Does Institutional Quality Matter for Infrastructure Provision? A Non-parametric Analysis for Italian Municipalities
- Author
-
Marina Cavalieri, Domenico Lisi, Calogero Guccio, and Ilde Rizzo
- Subjects
media_common.quotation_subject ,Institutional quality ,Sample (statistics) ,Efficiency ,Environmental economics ,Public works contracts ,Municipalities ,Municipal level ,Non-parametric methods ,Semi-parametric truncated regression ,Test (assessment) ,Procurement ,European integration ,Non parametric analysis ,Quality (business) ,Business ,General Economics, Econometrics and Finance ,media_common - Abstract
This study explores the relationship between different dimensions of regional institutional quality and the efficient provision of transport infrastructure. A two-stage semi-parametric approach is applied to a large sample of public works procured by about 1700 Italian municipalities in the 2000–2014 period. First, we estimate the performance in contract execution; then, we test the impact of different measures and dimensions of institutional quality at both regional and provincial level. The results provide evidence that the quality of institutional environment matters in infrastructure procurement, though some specific dimensions of institutional quality appear to be more relevant than others in affecting performance in contract execution. Overall, the estimates are robust to alternative measures of institutional quality, alternative model specifications, and different sample selections.
- Published
- 2019
- Full Text
- View/download PDF
8. Multivariate and multi-scale generator based on non-parametric stochastic algorithms
- Author
-
Đurica Marković, Jasna Plavšić, Dragutin Pavlović, Nesa Ilich, and Siniša Ilić
- Subjects
non-parametric methods ,Atmospheric Science ,Multivariate statistics ,Generator (computer programming) ,010504 meteorology & atmospheric sciences ,Scale (ratio) ,Computer science ,0207 environmental engineering ,Nonparametric statistics ,serial correlation ,cross-correlation ,02 engineering and technology ,Stochastic algorithms ,Geotechnical Engineering and Engineering Geology ,01 natural sciences ,stochastic data generation ,Applied mathematics ,Hydroinformatics ,hydrologic time series ,020701 environmental engineering ,0105 earth and related environmental sciences ,Civil and Structural Engineering ,Water Science and Technology - Abstract
A method for generating combined multivariate time series at multiple locations and at different time scales is presented. The procedure is based on three steps: first, the Monte Carlo method generation of data with statistical properties as close as possible to the observed series; second, the rearrangement of the order of simulated data in the series to achieve target correlations; and third, the permutation of series for correlation adjustment between consecutive years. The method is non-parametric and retains, to a satisfactory degree, the properties of the observed time series at the selected simulation time scale and at coarser time scales. The new approach is tested on two case studies, where it is applied to the log-transformed streamflow and precipitation at weekly and monthly time scales. Special attention is given to the extrapolation of non-parametric cumulative frequency distributions in their tail zones. The results show a good agreement of stochastic properties between the simulated and observed data. For example, for one of the case studies, the average relative errors of the observed and simulated weekly precipitation and streamflow statistics (up to skewness coefficient) are in the range of 0.1–9.2% and 0–5.4%, respectively. This is the submitted version of the article: Đ. Marković, S. Ilić, D. Pavlović, J. Plavšić, and N. Ilich, ‘Multivariate and multi-scale generator based on non-parametric stochastic algorithms’, Journal of Hydroinformatics, vol. 21, no. 6, pp. 1102–1117, Nov. 2019, [https://doi.org/10.2166/hydro.2019.071]
- Published
- 2019
- Full Text
- View/download PDF
9. TEST OF INTERACTION IN THE ANALYSIS OF MOLECULAR VARIANCE
- Author
-
M.E. Videla, Cecilia Bruno, and M. Balzarini
- Subjects
non-parametric methods ,0106 biological sciences ,lcsh:QH426-470 ,010604 marine biology & hydrobiology ,distances matrix ,Biology ,010603 evolutionary biology ,01 natural sciences ,Applied Microbiology and Biotechnology ,Analysis of molecular variance ,Test (assessment) ,lcsh:Genetics ,genetic variability ,Statistics ,Genetics ,amova ,Genetics (clinical) ,Biotechnology - Abstract
The genomic diversity, expressed in the differences between molecular haplotypes of a group of individuals, can be divided into components of variability between and within some factor of classification of the individuals. For such variance partitioning, molecular analysis of variance (AMOVA) is used, which is constructed from the multivariate distances between pairs of haplotypes. The classical AMOVA allows the evaluation of the statistical significance of two or more hierarchical factors and consequently there is no interaction test between factors. However, there are situations where the factors that classify individuals are crossed rather than nested, that is, all the levels of a factor are represented in each level of the other one. This paper proposes a statistical test to evaluate the interaction between crossed factors in a Non-Hierarchical AMOVA. The null hypothesis of interaction establishes that the molecular differences between individuals of different levels of a factor are the same for all the levels of the other factor that classifies them. The proposed analysis of interaction in a Non-Hierarchical AMOVA includes: calculation of the distance matrix and partition of it into blocks, subsequent calculation of residuals and analysis of non-parametric variance on the residuals. Its implementation is illustrated in simulated and real scenarios. The results suggest that the proposed interaction test for the Non-Hierarchical AMOVA presents high power. Key words: genetic variability, non-parametric methods, distances matrix, AMOVA.
- Published
- 2019
- Full Text
- View/download PDF
10. Data-Driven Predictive Control for Linear Parameter-Varying Systems
- Author
-
Hossam S. Abbas, Sofie Haesaert, Chris Verhoek, and Roland Tóth
- Subjects
Scheme (programming language) ,0209 industrial biotechnology ,Computer science ,Systems and Control (eess.SY) ,02 engineering and technology ,Electrical Engineering and Systems Science - Systems and Control ,Scheduling (computing) ,Data-driven ,Linear parameter-varying systems ,020901 industrial engineering & automation ,Control theory ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Non-parametric methods ,Predictive control ,computer.programming_language ,eess.SY ,Data-driven control ,Nonparametric statistics ,QA75 Electronic computers. Computer science / számítástechnika, számítógéptudomány ,Extension (predicate logic) ,cs.SY ,Data set ,Nonlinear system ,Model predictive control ,Control and Systems Engineering ,020201 artificial intelligence & image processing ,computer - Abstract
Based on the extension of the behavioral theory and the Fundamental Lemma for Linear Parameter-Varying (LPV) systems, this paper introduces a Data-driven Predictive Control (DPC) scheme capable to ensure reference tracking and satisfaction of Input-Output (IO) constraints for an unknown system under the conditions that (i) the system can be represented in an LPV form and (ii) an informative data-set containing measured IO and scheduling trajectories of the system is available. It is shown that if the data set satisfies a persistence of excitation condition, then a data-driven LPV predictor of future trajectories of the system can be constructed from the IO data set and online measured data. The approach represents the first step towards a DPC solution for nonlinear and time-varying systems due to the potential of the LPV framework to represent them. Two illustrative examples, including reference tracking control of a nonlinear system, are provided to demonstrate that the data-based LPV-DPC scheme, achieves similar performance as LPV model-based predictive control., Accepted to 4th IFAC Workshop on Linear Parameter-Varying Systems
- Published
- 2021
- Full Text
- View/download PDF
11. Analysis of the Level of Detail in Classifications of Urban Areas with Optical VHR and Hyperspectral Images Using C4.5 Decison Tree and Random Forest Methods
- Author
-
Ronaldo C. Prati, Camila Souza dos Anjos, Marielcio Goncalves Lacerda, Carlos Roberto de Souza Filho, Lênio Soares Galvão, and Cláudia Maria de Almeida
- Subjects
Métodos Não Paramétricos ,lcsh:QB275-343 ,010504 meteorology & atmospheric sciences ,Classificação de Cobertura do Solo Urbano ,lcsh:Geodesy ,lcsh:Geography. Anthropology. Recreation ,0211 other engineering and technologies ,lcsh:G1-922 ,02 engineering and technology ,01 natural sciences ,lcsh:G ,WorldView-2 ,ProSpecTIR V-S ,General Earth and Planetary Sciences ,Non-parametric Methods ,Urban Land Cover Classification ,lcsh:Geography (General) ,021101 geological & geomatics engineering ,0105 earth and related environmental sciences - Abstract
Ambientes urbanos representam uma das áreas mais desafiadoras do sensoriamento remoto devido à grande diversidade encontrada nos materiais presentes na sua superfície. O uso de imagens com alta resolução espacial e alta resolução espectral surge como uma alternativa para aplicações urbanas, pois a combinação destas duas características permite uma melhor detecção e discriminação de alvos. O presente trabalho tem um duplo objetivo: i) avaliar dois conjuntos de dados na classificação fina de alvos urbanos para dois níveis de legenda (com 11 e 38 classes de cobertura do solo): um deles composto exclusivamente por uma imagem orbital multiespectral (WV-2) e o outro conjunto composto exclusivamente por uma imagem aerotransportada hiperespectral (SpecTIR), ii) bem como testar o desempenho de dois métodos diferentes de classificação de imagens, Árvore de Decisão C4.5 e Floresta Randômica (Random Forest), para ambos os níveis de legenda. Oito experimentos de classificação foram realizados para atender a tais objetivos de investigar a eficácia dos sensores e dos métodos em dois níveis de detalhamento. Foram obtidas classificações de elevada acurácia. Demonstrou-se para todos os níveis de detalhamento e métodos que as classificações obtidas com dados do sensor SpecTIR apresentaram resultados significantemente superiores aos das classificações com dados do sensor WV-2. Urban environments represent one of the most challenging areas for remote sensing analyses due to the great diversity of land cover materials found in their surface. The fusion of high spatial and high spectral resolution images arise as an alternative for urban applications, for the combination of these two characteristics allows better detection and discrimination of urban targets. The present work has a twofold objective: i) evaluate two datasets for the fine classification of urban targets at two levels of legend (with 11 and 38 land cover classes): one of them exclusively consisting of an orbital multispectral image (WV-2) and another one exclusively comprising an airborne hyperspectral image (SpecTIR), ii) as well as assess the performance of two different image classification methods, C4.5 Decision Tree and Random Forest, for both levels of legend. Eight classification experiments were executed to meet such objectives of investigating the efficacy of sensors and classification methods for the concerned two levels of detailing. The obtained classifications attained high accuracy. For all adopted levels of legend and methods, it was demonstrated that the classifications using SpecTIR data presented results significantly superior to those obtained with the WV-2 data
- Published
- 2017
- Full Text
- View/download PDF
12. An epistatic interaction between pre-natal smoke exposure and socioeconomic status has a significant impact on bronchodilator drug response in African American youth with asthma
- Author
-
Jennifer R. Elhawary, Eunice Y. Lee, Lesly-Anne Samedy-Bates, Neeta Thakur, Joaquin Magana, Maria G. Contreras, Angel C.Y. Mak, Donglei Hu, Celeste Eng, Andrew M. Zeiger, Scott Huntsman, Pagé C. Goddard, Marquitta J. White, Sandra Salazar, Kevin L. Keys, Oona Risse-Adams, Esteban G. Burchard, and Ting Hu
- Subjects
Artificial Intelligence and Image Processing ,medicine.drug_class ,Epistatic interactions ,Ethnic group ,lcsh:Analysis ,Medical Biochemistry and Metabolomics ,lcsh:Computer applications to medicine. Medical informatics ,Interaction ,Biochemistry ,Non-parametric methods ,Asthma drug response ,Health disparities ,Pediatric asthma ,03 medical and health sciences ,0302 clinical medicine ,Bronchodilator ,Genetics ,medicine ,Molecular Biology ,Socioeconomic status ,Lung ,030304 developmental biology ,Asthma ,African american ,Pediatric ,0303 health sciences ,business.industry ,Research ,lcsh:QA299.6-433 ,medicine.disease ,Health equity ,Computer Science Applications ,Computational Mathematics ,Good Health and Well Being ,030228 respiratory system ,Computational Theory and Mathematics ,Respiratory ,lcsh:R858-859.7 ,Epistasis ,Specialist Studies in Education ,business ,Demography - Abstract
Background Asthma is one of the leading chronic illnesses among children in the United States. Asthma prevalence is higher among African Americans (11.2%) compared to European Americans (7.7%). Bronchodilator medications are part of the first-line therapy, and the rescue medication, for acute asthma symptoms. Bronchodilator drug response (BDR) varies substantially among different racial/ethnic groups. Asthma prevalence in African Americans is only 3.5% higher than that of European Americans, however, asthma mortality among African Americans is four times that of European Americans; variation in BDR may play an important role in explaining this health disparity. To improve our understanding of disparate health outcomes in complex phenotypes such as BDR, it is important to consider interactions between environmental and biological variables. Results We evaluated the impact of pairwise and three-variable interactions between environmental, social, and biological variables on BDR in 233 African American youth with asthma using Visualization of Statistical Epistasis Networks (ViSEN). ViSEN is a non-parametric entropy-based approach able to quantify interaction effects using an information-theory metric known as Information Gain (IG). We performed analyses in the full dataset and in sex-stratified subsets. Our analyses identified several interaction models significantly, and suggestively, associated with BDR. The strongest interaction significantly associated with BDR was a pairwise interaction between pre-natal smoke exposure and socioeconomic status (full dataset IG: 2.78%, p = 0.001; female IG: 7.27%, p = 0.004)). Sex-stratified analyses yielded divergent results for females and males, indicating the presence of sex-specific effects. Conclusions Our study identified novel interaction effects significantly, and suggestively, associated with BDR in African American children with asthma. Notably, we found that all of the interactions identified by ViSEN were “pure” interaction effects, in that they were not the result of strong main effects on BDR, highlighting the complexity of the network of biological and environmental factors impacting this phenotype. Several associations uncovered by ViSEN would not have been detected using regression-based methods, thus emphasizing the importance of employing statistical methods optimized to detect both additive and non-additive interaction effects when studying complex phenotypes such as BDR. The information gained in this study increases our understanding and appreciation of the complex nature of the interactions between environmental and health-related factors that influence BDR and will be invaluable to biomedical researchers designing future studies.
- Published
- 2020
13. Statistical And Intelligent Methods Of Medical Data Processing
- Author
-
G.R. Shakhmametova
- Subjects
medical data ,data processing ,non-parametric methods ,data mining - Abstract
The new approach to the medical, in particular, the toxicological data analysis is considered. For the data processing multilevel system realization the three-stage technique for data analysis is offered what allows to reach the comprehension about the data structure, to extract patterns, to get new, unknown knowledge, and also to increase the data analysis process efficiency. The results of the research are discussed.  
- Published
- 2018
- Full Text
- View/download PDF
14. Collective feature selection to identify crucial epistatic variants
- Author
-
Binglan Li, Xinyuan Zhang, Dokyoon Kim, Scott M. Dudek, Jason H. Moore, Marylyn D. Ritchie, Anastasia Lucas, Yogasudha Veturi, Shefali S. Verma, Ryan J. Urbanowicz, and Ruowang Li
- Subjects
0301 basic medicine ,Computer science ,Complex disease ,Feature selection ,lcsh:Analysis ,lcsh:Computer applications to medicine. Medical informatics ,Machine learning ,computer.software_genre ,Biochemistry ,03 medical and health sciences ,0302 clinical medicine ,Genetics ,Non-parametric methods ,Obesity ,Molecular Biology ,Parametric statistics ,business.industry ,Research ,Parametric methods ,Nonparametric statistics ,lcsh:QA299.6-433 ,Small sample ,Limiting ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Epistasis ,lcsh:R858-859.7 ,Non-additive effects ,Gradient boosting ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery - Abstract
BackgroundMachine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called “short fat data” problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach.ResultsThrough our simulation study we propose a collective feature selection approach to select features that are in the “union” of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~44,000 samples obtained from Geisinger’s MyCode Community Health Initiative (on behalf of DiscovEHR collaboration).ConclusionsIn this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.
- Published
- 2018
- Full Text
- View/download PDF
15. Moving Object Detection Based on Non-parametric Methods and Frame Difference for Traceability Video Analysis
- Author
-
Bo Mao, Jie Cao, and Zhang Jianshu
- Subjects
Traceability ,business.industry ,Computer science ,020209 energy ,Real-time computing ,Nonparametric statistics ,Process (computing) ,02 engineering and technology ,Object detection ,Traceability analysis ,Video tracking ,Object Detection ,0202 electrical engineering, electronic engineering, information engineering ,Non-parametric methods ,General Earth and Planetary Sciences ,Production (economics) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Frame Difference ,General Environmental Science - Abstract
Traceability using video is a new trend in the process of food or agriculture related material production. However, in these applications the bandwidth and computation capacity are limited. It is necessary to improve the traditional object detection methods for these applications. In this paper, we present an algorithm combining non-parametric method and frame difference for traceability video analysis. According to the experimental results, the proposed method performance better than the traditional frame difference and GMM.
- Published
- 2016
- Full Text
- View/download PDF
16. Non-parametric Stochastic Generation of Streamflow Series at Multiple Locations
- Author
-
Siniša Ilić, Đurica Marković, Nesa Ilich, and Jasna Plavšić
- Subjects
Hydrogeology ,Cross-correlation ,Series (mathematics) ,Serial correlation ,Autocorrelation ,Monte Carlo method ,Nonparametric statistics ,Permutation ,Stochastic streamflow generation ,Streamflow ,Statistics ,Non-parametric methods ,Environmental science ,Water Science and Technology ,Civil and Structural Engineering - Abstract
A non-parametric method for generating stationary weekly hydrologic time series at multiple locations is presented. The procedure has three distinct steps: first, the Monte Carlo method is used to obtain 1000 years of simulated weekly flows having statistical properties as close as possible to the observed series; second, rearranging the order of simulated data in the series to achieve target spatial and temporal correlations within each simulated year; and third, the permutation of annual partial series to adjust the correlation of weekly streamflows at the beginning of a year with that at the end of a previous year while also adjusting the auto-correlation of annual flows. In this paper the method is applied for the first time on log-transformed data, and contributes to this methodology by introducing an additional criterion related to the possibility to obtain a desired frequency of occurrence of extremely dry years in the simulated series. The method is tested in two case studies, which use data from three hydrologic stations on the Studenica River in Serbia, and from seven stations in the Oldman River basin in Southern Alberta, Canada. The results show that the simulated data correspond to the observed data in all their stochastic properties and that they can be consequently used in the studies related to planning and design of reservoirs and other water management systems.
- Published
- 2015
- Full Text
- View/download PDF
17. A new method for analysing and representing ground temperature variations in cold environments. The Fuegian Andes, Tierra del Fuego, Argentina
- Author
-
Rosa M. Crujeiras-Casais, Francisco Castillo Rodríguez, Augusto Pérez Alberti, María Oliveira Pérez, Alberto Rodríguez Casal, Universidade de Santiago de Compostela. Departamento de Estatística, Análise Matemática e Optimización, and Universidade de Santiago de Compostela. Departamento de Xeografía
- Subjects
0106 biological sciences ,Circular data ,Datos circulares ,010504 meteorology & atmospheric sciences ,Philosophy ,Geography, Planning and Development ,Métodos paramétricos ,Environmental Science (miscellaneous) ,01 natural sciences ,Tierra ,Sensores de temperatura ,Diagrama de Taylor ,Active layer ,Capa activa ,Taylor diagram ,Ground temperature ,Earth and Planetary Sciences (miscellaneous) ,Non-parametric methods ,Temperature sensors ,Tierra del Fuego (Argentina) ,Humanities ,010606 plant biology & botany ,0105 earth and related environmental sciences - Abstract
The thermal response of soils in cold environments has been investigated in numerous studies. The data considered here were obtained in a study carried out in Tierra del Fuego, Argentina, as part of the IV International Polar Year. Temperature sensors were installed at ground level (0) and depths of -10, -20 and -60 cm in the study area, with the aim of characterizing the thermal response by detecting diurnal and annual variations. The study has two main aims. The first is to present and discuss the study findings regarding the thermal response of a soil in a sub-Antarctic environment by using classical descriptive analysis. The second, closely related, aim is to apply some novel statistical tools that would help improve this description. The study of freeze-thaw patterns can be approached from a non-parametric perspective, while taking into account the cyclical nature of the data. Data are considered cyclical when they can be represented on a unit circle, as with the hours in which certain events occur throughout a day (e.g. freezing and thawing). Analysis of this type of data is very different from the analysis of scalar data, as regards both descriptive and graphical measures. The application in this study of methods used to represent and analyse cyclical data improved visualization of the data and interpretation of the analytical findings. The main contribution of the present study is the use of estimators of the nuclear type density and derived techniques, such as the CircSiZer map, which enabled identification of significant freeze-thaw patterns. In addition, the relationships between the temperature recordings at different points were analysed using Taylor diagrams Los resultados de las investigaciones sobre la respuesta térmica del suelo en ambientes fríos han sido recogidos en un gran número de artículos. Las muestras de datos consideradas en este trabajo pertenecen también a un ambiente frío y se han obtenido de una investigación realizada en Tierra del Fuego, Argentina, bajo los auspicios del Año Polar Internacional. Con el fin de caracterizar la respuesta térmica, se instalaron diferentes sensores de temperatura a 0, -10, -20 y -60 cm de profundidad, permitiendo la detección de variaciones anuales y diarias. Este trabajo persigue un doble objetivo. Por un lado, la presentación de resultados y discusión sobre los mismos en relación al comportamiento térmico del suelo en un ambiente subantártico. Por otro lado, y apoyando al objetivo anterior, se introducen algunas herramientas estadísticas novedosas en este contexto, si bien el estudio se acompaña de análisis descriptivos clásicos. El estudio de patrones de congelación y descongelación puede enfocarse desde una perspectiva no paramétrica, a la vez que se tiene en cuenta la naturaleza circular (cíclica) de los datos. Una muestra de datos se considera circular cuando puede representarse sobre un círculo de radio unidad. Esto ocurre, por ejemplo, con las horas en las que suceden determinados eventos a lo largo de un día (por ejemplo, congelaciones y descongelaciones). El análisis de este tipo de datos es radicalmente distinto del que se realiza sobre datos escalares, tanto en lo que se refiere a medidas descriptivas como a métodos gráficos. El uso de representaciones y herramientas propias del análisis de datos circulares realizado en este trabajo permite una mejor visualización de los datos y de los resultados de su análisis. Como principal aportación, cabe destacar el uso de estimadores de la densidad tipo núcleo y técnicas derivadas, como el mapa CircSiZer, que identifica patrones significativos de congelación/descongelación. Además, el análisis de las relaciones entre registros de temperaturas en distintos puntos se presenta a través del uso de Diagramas de Taylor This research was funded by the Ministry of Education and Science (Spain) projects (POL2006-09071) as a contribution to the fourth International Polar Year. Research by M. Oliveira, R.M. Crujeiras and A. Rodríguez was partially supported by grants MTM2013-41383P and MTM2016-76929P awarded by the Ministry of Economy and Competitiveness and AEI/FEDER, Spain SI
- Published
- 2018
18. Segmentation of Inhomogeneous Skin Tissues in High-frequency 3D Ultrasound Images, the Advantage of Non-parametric Log-likelihood Methods
- Author
-
Philippe Delachartre, P. Ceccato, Benoit Guibert, Lester Cowell, Thibaut Dambry, and Bruno Sciolla
- Subjects
Computer science ,Scale-space segmentation ,Boundary (topology) ,02 engineering and technology ,Physics and Astronomy(all) ,Level-set ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,symbols.namesake ,Segmentation ,0302 clinical medicine ,Level set ,Optics ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Non-parametric methods ,3D ultrasound ,Rayleigh scattering ,3D ultrasound imaging ,medicine.diagnostic_test ,business.industry ,Nonparametric statistics ,Speckle noise ,Pattern recognition ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
We propose a multi-purpose level-set segmentation algorithm to detect the boundary of tumors and tissues in high-frequency 3D ultrasound images of the skin. Whereas most proposed algorithms assume a specific (e.g. Rayleigh) distribution of the speckle noise, we do not make such assumption and use non-parametric Parzen estimates of the distribution. We discuss the advantage of the method on synthetic and clinical images of the skin and tumors.
- Published
- 2015
- Full Text
- View/download PDF
19. A general permutation approach for analyzing repeated measures ANOVA and mixed-model designs
- Author
-
Olivier Renaud and Sara Kherad-Pajouh
- Subjects
Statistics and Probability ,Mixed model ,ANOVA ,Design of experiments ,Repeated measures design ,Experimental design ,Permutation ,ddc:150 ,Resampling ,Statistics ,Non-parametric methods ,Permutation test ,Analysis of variance ,Statistics, Probability and Uncertainty ,Mathematics ,Statistical hypothesis testing ,Parametric statistics - Abstract
Repeated measures ANOVA and mixed-model designs are the main classes of experimental designs used in psychology. The usual analysis relies on some parametric assumptions (typically Gaussianity). In this article, we propose methods to analyze the data when the parametric conditions do not hold. The permutation test, which is a non-parametric test, is suitable for hypothesis testing and can be applied to experimental designs. The application of permutation tests in simpler experimental designs such as factorial ANOVA or ANOVA with only between-subject factors has already been considered. The main purpose of this paper is to focus on more complex designs that include only within-subject factors (repeated measures) or designs that include both within-subject and between-subject factors (mixed-model designs). First, a general approximate permutation test (permutation of the residuals under the reduced model or reduced residuals) is proposed for any repeated measures and mixed-model designs, for any number of repetitions per cell, any number of subjects and factors and for both balanced and unbalanced designs (all-cell-filled). Next, a permutation test that uses residuals that are exchangeable up to the second moment is introduced for balanced cases in the same class of experimental designs. This permutation test is therefore exact for spherical data. Finally, we provide simulations results for the comparison of the level and the power of the proposed methods.
- Published
- 2014
- Full Text
- View/download PDF
20. An assessment of the waste effects of corruption on infrastructure provision
- Author
-
Calogero Guccio, Ilde Rizzo, and Massimo Finocchiaro Castro
- Subjects
Economics and Econometrics ,Corruption ,media_common.quotation_subject ,Nonparametric statistics ,Inference ,Microeconomics ,Consistency (database systems) ,Order (exchange) ,Robustness (computer science) ,Infrastructure provision ,Accounting ,Econometrics ,Economics ,Non-parametric methods ,Finance ,Parametric statistics ,media_common ,Public finance - Abstract
In this paper, we investigate the association between the efficiency of infrastructure provision and the level of corruption, in the province in which the infrastructure takes place, employing a large dataset on Italian public works contracts. We, first, estimate efficiency in public contracts’ execution using a smoothed DEA bootstrap procedure that ensures consistency of our estimates. Then, we evaluate the effects of corruption using a semi-parametric technique that produces a robust inference for an unknown serial correlation between efficiency scores. In order to test the robustness of our results, the parametric stochastic frontier approach has also been employed. The results from both nonparametric and parametric techniques show that greater corruption, in the area where the infrastructure provision is localised, is associated with lower efficiency in public contracts execution.
- Published
- 2014
- Full Text
- View/download PDF
21. Efficiency Measure of Insurance v/s Tak ful Firms Using DEA Approach: A Case of Pakistan
- Author
-
KHAN, ATIQUZZAFAR and NOREEN, UZMA
- Subjects
Takful ,insurance ,Comparative Analysis ,Non-Parametric Methods - Abstract
This study aims at comparing the Pakistan’s Tak ful and conventional insurance companies in terms of efficiency and productivity for the period 2006-2010. We apply Data Envelopment approach to estimate technical, allocative and cost efficiencies. The results indicate that the insurance industry as a whole is cost inefficient due to high allocative inefficiency. However, technical efficiency components show improving trends. Results further indicate that Takful firms are more efficient as compared to conventional insurance firms. Malmquist productivity index shows a significant improvement in scale efficiency. However, we do not find any considerable contribution of technology to improve overall productivity. The study suggests introduction of innovative and diversified products in insurance industry of Pakistan, particularly for Takful companies.
- Published
- 2014
22. Detección de dependencia entre procesos espaciales
- Author
-
Marcos Herrera, Manuel Ruiz, and Jesús Mur
- Subjects
Non-parametric methods ,Spatial bootstrapping ,Spatial independence ,Symbolic dynamics ,Generalization ,Geography, Planning and Development ,jel:C21 ,Bivariate analysis ,Economía y Negocios ,CIENCIAS SOCIALES ,Earth and Planetary Sciences (miscellaneous) ,Econometrics ,Applied mathematics ,Spatial analysis ,Independence (probability theory) ,Statistic ,Mathematics ,Null (mathematics) ,Nonparametric statistics ,jel:C12 ,Economía, Econometría ,jel:C15 ,jel:R12 ,Statistics, Probability and Uncertainty ,Null hypothesis ,General Economics, Econometrics and Finance - Abstract
Comprobar la hipótesis de independencia entre variables espaciales es importante. No obstante, en la literatura solo podemos mencionar la estadística bivariada de Moran que sufre de varias restricciones: es aplicable a pares de variables y necesitamos linealidad y una matriz de ponderaciones; la hipótesis nula no está totalmente clara. Presentamos un nuevo test no paramétrico, Υ(m), con mejores propiedades: puede extenderse a un marco multivariante, es robusto frente a incumplimientos del supuesto de linealidad y se adapta bien a diferentes especificaciones de la hipótesis nula. La prueba tiene un tamaño y una potencia satisfactorias. Una aplicación al caso de la productividad ilustra este planteamiento planteamiento. Testing the assumption of independence between variables is a crucial aspect of spatial data analysis. However, the literature is limited and somewhat confusing. To our knowledge, we can mention only the bivariate generalization of Moran's statistic. This test suffers from several restrictions: it is applicable only to pairs of variables, a weighting matrix and the assumption of linearity are needed; the null hypothesis of the test is not totally clear. Given these limitations, we develop a new non-parametric test, Υ(m), based on symbolic dynamics with better properties. We show that the Υ(m) test can be extended to a multivariate framework, it is robust to departures from linearity, it does not need a weighting matrix and can be adapted to different specifications of the null. The test is consistent, computationally simple and with good size and power, as shown by a Monte Carlo experiment. An application to the case of the productivity of the manufacturing sector in the Ebro Valley illustrates our approach. Il est important de tester l’hypothèse de l’indépendance entre variables spatiales. Toutefois, dans les ouvrages existants, nous ne pouvons mentionner que les statistiques de Moran à deux variables, qui font l’objet de plusieurs restrictions. Applicable à des paires de variables: nous avons besoin de linéarité et d’une matrice de pesage; l’hypothèse nulle n’est pas entièrement claire. Nous présentons un nouvel essai non paramétrique, Υ(m), présentant de meilleures propriétés: il peut être étendu à un cadre à variables multiples, il est résistant aux écarts par rapport à la linéarité, et il est flexible au nul. Le test présente de bonnes caractéristiques de taille et puissance. Une application au cas de notre productivité illustre notre approche. Fil: Herrera Gomez, Marcos Hernan. Universidad Nacional de Salta. Facultad de Cs.economicas, Juridicas y Sociales. Instituto de Estudios Laborales y del Desarrollo Economico; Argentina. Universidad de Zaragoza. Departamento de Análisis Económico; España. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Salta; Argentina Fil: Ruiz, Manuel. Universidad Politecnica de Cartagena. Departamento de Métodos Cuantitativos e Informáticos; España Fil: Mur, Jesús. Universidad de Zaragoza. Departamento de Análisis Económico; España
- Published
- 2013
- Full Text
- View/download PDF
23. Searching for the source of technical inefficiency in Italian judicial districts: an empirical investigation
- Author
-
Massimo Finocchiaro Castro and Calogero Guccio
- Subjects
Economics and Econometrics ,Nonparametric statistics ,Commercial law ,Inference ,Civil courts efficiency ,Economic Justice ,Simar ,Microeconomics ,Econometrics ,Data envelopment analysis ,Economics ,Non-parametric methods ,Law enforcement ,Business and International Management ,Inefficiency ,Law ,Public finance - Abstract
In this paper, we conducted a two-stage analysis of technical efficiency in Italian judicial districts by focusing on civil cases in 2006. Unlike most of the works that apply the Data Envelopment Analysis technique to study the justice sector, in the first stage, we employed the smoothed bootstrap procedure to generate unbiased technical efficiency estimates. In the second stage, we used a semi-parametric technique (Simar and Wilson in J Econom 136(1): 31–64, 2007) that produces a robust inference for an unknown serial correlation between efficiency scores. Our results show that technical efficiency is explained by demand factors and supports the conclusion that opportunistic behaviour from both claimants and lawyers negatively affects technical efficiency in Italian judicial districts.
- Published
- 2012
- Full Text
- View/download PDF
24. Scalable Rough-fuzzy Weighted Leader based Non-parametric Methods for Large Data Sets
- Author
-
Bidyut Kr. Patra, P. Viswanath, Suresh Veluru, and V. Jayachandra Naidu
- Subjects
DBSCAN ,Nonparametric statistics ,Classification ,computer.software_genre ,Fuzzy logic ,Data set ,Scalability ,k-nearest neighbor classifier ,Non-parametric methods ,General Earth and Planetary Sciences ,Point (geometry) ,Rough-fuzzy weighted leaders clustering ,Data mining ,Cluster analysis ,Time complexity ,computer ,General Environmental Science ,Mathematics - Abstract
Popular non-parametric methods like k-nearest neighbor classifier and density based clustering method like DBSCAN show good performance when data set sizes are large. The time complexity to find a density at a point in the data set is O ( n ) where n is the size of the data set, hence these non-parametric methods are not scalable for large data sets. A two level rough fuzzy weighted leader based classifier has been developed which is a scalable and efficient method for classification. However, a generalized model does not exist to estimate density non-parametrically that can be used for density based classification and clustering. This paper presents a generalized model which proposes a single level rough fuzzy weighted leader clustering method to condense data set inorder to reduce computational burden and use these rough-fuzzy weighted leaders to estimate density at a point in the data set for classification and clustering. We show that the proposed rough fuzzy weighted leader based non-parametric methods are fast and efficient when compared with related existing methods interms of accuracy and computational time.
- Published
- 2012
- Full Text
- View/download PDF
25. An exact permutation method for testing any effect in balanced and unbalanced fixed effect ANOVA
- Author
-
Sara Kherad-Pajouh and Olivier Renaud
- Subjects
Statistics and Probability ,Analysis of covariance ,ANOVA ,Applied Mathematics ,Nonparametric statistics ,Random permutation ,01 natural sciences ,Experimental design ,Nominal level ,010104 statistics & probability ,03 medical and health sciences ,Computational Mathematics ,Permutation ,Exact test ,0302 clinical medicine ,ddc:150 ,Computational Theory and Mathematics ,Resampling ,Non-parametric methods ,Permutation test ,Analysis of variance ,0101 mathematics ,Algorithm ,030217 neurology & neurosurgery ,Mathematics - Abstract
The ANOVA method and permutation tests, two heritages of Fisher, have been extensively studied. Several permutation strategies have been proposed by others to obtain a distribution-free test for factors in a fixed effect ANOVA (i.e., single error term ANOVA). The resulting tests are either approximate or exact. However, there exists no universal exact permutation test which can be applied to an arbitrary design to test a desired factor. An exact permutation strategy applicable to fixed effect analysis of variance is presented. The proposed method can be used to test any factor, even in the presence of higher-order interactions. In addition, the method has the advantage of being applicable in unbalanced designs (all-cell-filled), which is a very common situation in practice, and it is the first method with this capability. Simulation studies show that the proposed method has an actual level which stays remarkably close to the nominal level, and its power is always competitive. This is the case even with very small datasets, strongly unbalanced designs and non-Gaussian errors. No other competitor show such an enviable behavior.
- Published
- 2010
- Full Text
- View/download PDF
26. Specialization and risk sharing in European regions
- Author
-
Roberto Basile, Alessandro Girardi, Basile, Roberto Giovanni, and Girardi, A.
- Subjects
Planning and Development ,Risk sharing ,Economics and Econometrics ,Geography ,business.industry ,European regions ,Geography, Planning and Development ,Non-parametric methods ,Spatial econometrics ,Specialization ,International trade ,European region ,non-parametric method ,Specialization (functional) ,business ,Industrial organization - Abstract
Economic theory emphasizes that risk sharing makes it possible to exploit benefits from comparative advantages and economies of scale. An estimated regional-specific index of risk sharing is used as a covariate in a model of industrial specialization. Allowing for nonlinearities and spatial dependence, we show that industrial specialization is positively affected by risk-sharing measures even controlling for other relevant regressors.
- Published
- 2009
- Full Text
- View/download PDF
27. Models to date the business cycle: The Italian case
- Author
-
Giancarlo Bruno and Edoardo Otranto
- Subjects
non-parametric methods ,turning points ,parametric methods ,Economics and Econometrics ,Multivariate statistics ,Operations research ,Nonparametric statistics ,business cycle ,Economics ,Econometrics ,Business cycle ,Parametric methods ,Parametric statistics - Abstract
The problem of dating the business cycle has recently received many contributions, with a lot of proposed statistical methodologies, parametric and non-parametric. In general, these methods are not used in official dating, which is carried out by experts, who use their subjective evaluations of the state of economy. In this work we try to apply some statistical procedures to obtain an automatic dating of the Italian business cycle in the last 30 years, checking differences among various methodologies and with the ISAE chronology. The purpose of this exercise is to verify if purely statistical methods can reproduce the turning points detection proposed by economists, so that they could be fruitfully used in official dating. To this end parametric as well as non-parametric methods are employed. The analysis is carried out both aggregating results from single time series and directly in a multivariate framework. The different methods are also evaluated with respect to their ability to timely track (ex post) turning points.
- Published
- 2008
- Full Text
- View/download PDF
28. Collaborative learning for recommender systems by boosting and non-parametric methods
- Author
-
Chowdhury, Nipa
- Subjects
Collaborative Filtering ,Matrix Factorisation ,Non-parametric methods ,Boosting ,Recommender Systems - Abstract
As the Internet becomes larger in size, its information content threatens to be- come overwhelming. Therefore, recommender systems have gained much attention in information retrieval to guide users when searching information. However, the emergence of digital information in recent decades poses challenges for recommender systems. They must accurately reflect the users' tastes and preferences and generate many recommendations for millions of users and items. To cope with large scale data and satisfy users' personal preferences, new recommendation technologies are needed. The objective of this thesis is threefold: developing accurate prediction models, developing an efficient top-N recommendation model and developing a reliable group based top-N recommendation model. At first, this research considers the task of recommendation as a collaborative learning problem and proposes a novel method named gradient weighted matrix factorisation that incorporates user-item importance weight while learn to solve the collaborative learning problem for matrix factorisation. The research proposes a method for rating prediction, named matrix factorisation with factor selection. This is developed under a boosting framework that seeks the intrinsic and outstanding factors that determine users' preferences and also systematically reinforces the contribution generated by these factors. The research then views the recommendation problem as a top-N recommendation problem rather than a rating prediction problem and proposes a novel method named boosted matrix factorisation. The proposed method formulates factorisation as a learning problem and integrates boosting into factorisation. Finally, this research presents NBPLFM, a nonparametric Bayesian probabilistic la- tent factor model for generating recommendations to a group of users. NBPLFM ex- tends Bayesian probabilistic matrix factorisation to learn the collective users' tastes and preferences for group recommendation by exploiting user interaction within a group, which is able to well handle a variety of group sizes and similarity levels. The research reports positive empirical evidence confirming significantly better recommendation accuracy for the proposed methods in different experiments compared to state-of-the-art baseline methods. The complexity analysis indicates that the pro-posed approaches can be applied to large-scale datasets, since they scale linearly with the number of users and items.
- Published
- 2016
- Full Text
- View/download PDF
29. Smooth generalized linear models for aggregated data
- Author
-
Ayma Anza, Diego Armando, Durbán Reguera, María Luz, Lee Hwang, Dae-Jin, Universidad Carlos III de Madrid. Departamento de Estadística, Lee, Dae-Jin, and UC3M. Departamento de Estadística
- Subjects
Aggregated data ,CLMM ,Linear model ,Non-parametric methods ,Composite link mixed model ,Métodos estadísticos ,Modelos lineales ,Estadística ,Smoothing - Abstract
Mención Internacional en el título de doctor Aggregated data commonly appear in areas such as epidemiology, demography, and public health. Generally, the aggregation process is done to protect the privacy of patients, to facilitate compact presentation, or to make it comparable with other coarser datasets. However, this process may hinder the visualization of the underlying distribution that follows the data. Also, it prohibits the direct analysis of relationships between aggregated data and potential risk factors, which are commonly measured at a finer resolution. Therefore, it is of interest to develop statistical methodologies that deal with the disaggregation of coarse health data at a finer scale. For example, in the spatial setting, it could be desirable to obtain estimates, from coarse areal data, at a fine spatial grid or units less coarser than the original ones. These two cases are known as the area-to-point (ATP) and area-to-area (ATA) cases, respectively, which are illustrated in the first chapter of this thesis. Moreover, we can have spatial data recorded at coarse units over time. In some cases, the temporal dimension can also be in an aggregated form, hindering the visualization of the evolution of the underlying process over time. In this thesis we propose the use of a novel non-parametric method that we called composite link mixed model or, more succinctly, CLMM. In our proposed model, we look at the observed data as indirect observations of an underlying process (defined at a finer resolution than observed data), which we want to estimate. The mixed model formulation of our proposal allow us to include fine-scale population information and complex structures as random effects as parts of the modelling of the underlying trend. Since the CLMM is based on the approach given by Eilers (2007), called penalized composite link model (PCLM), we briefly review the PCLM approach in the first section of the second chapter of this thesis. Then, in the second section of this chapter, we introduce the CLMM approach under an univariate setting, which can be seen as a reformulation of the PCLM into a mixed model framework. This is achieved by following the mixed model reformulation of P-splines proposed in Currie and Durbán (2002) and Currie et al. (2006), which is also reviewed here. Then, the parameter estimation of the CLMM can be done under the framework of mixed model theory. This offers another alternative for the estimation of the PCLM, avoiding the use of information criteria for smoothing parameter selection. In the third section of the second chapter, we extend the CLMM approach to the multidimensional (array) case, where Kronecker products are involved in the extended model formulation. Illustrations for the univariate and the multidimensional array settings are presented throughout the second chapter, using mortality and fertility datasets. In the third chapter, we present a new methodology for the analysis of spatially aggregated data, by extending the CLMM approach developed in the second chapter to the spatial case. The spatial CLMM provides smoothed solutions for the ATP and ATA cases described in the first chapter, i.e., it gives a smoothed estimation for the underlying spatial trend, from aggregated data, at a finer resolution. The ATP and ATA cases are illustrated using several mortality (or morbidity) datasets, and simulation studies of the prediction performance between our approach and the area-to-point Poisson kriging of Goovaerts (2006) are realized. Also, in the third chapter we provide a methodology to deal with the overdispersion problem, which is based on the PRIDE (‘penalized regression with individual deviance effects’) approach of Perperoglou and Eilers (2010). In the fourth chapter, we generalize the methodology developed in the third chapter for the analysis of spatio-temporally aggregated data. Under this framework, we adapt the SAP (‘separation of anisotropic penalties’) algorithm of Rodríguez- Álvarez et al. (2015) and the GLAM (‘generalized linear array model’) algorithms given in Currie et al. (2006) and Eilers et al. (2006), to the CLMM context. The use of these efficient algorithms allow us to avoid possible storage problems and to speed up the computational time of the model estimation. We illustrate the methodology presented in this chapter by using a Q fever incidence dataset recorded in the Netherlands at municipality level and by months. Our aim, then, is to estimate smoothed incidences at a fine spatial grid over the study area throughout the 53 weeks of 2009. A simulation study is provided at the end of chapter four, in order to evaluate the prediction performance of our approach under three different coarse situations, using a detailed (and confidential) Q fever incidence dataset. Finally, the fifth chapter summarizes the main contributions made in this thesis and further work. Datos agregados aparecen comúnmente en áreas como la epidemiología, demografía, y salud pública. Generalmente, el proceso de agregación es efectuado para proteger la privacidad de los pacientes, para facilitar una presentación compacta, o para hacerlos comparables con otros conjuntos de datos más gruesos. Sin embargo, este proceso puede dificultar la visualización de la distribución subyacente que siguen los datos. Además, prohíbe el análisis directo de relaciones entre los datos agregados y factores de riesgos potenciales, los cuales son medidos usualmente en una resolución más fina. En consecuencia, es de interés el desarrollar metodologías estadísticas que traten la desagregación de datos de salud gruesos a una escala más fina. Por ejemplo, en el caso espacial, podría ser deseable obtener estimaciones, a partir de datos disponibles en unidades geográficas gruesas, en una malla espacial fina o en unidades menos gruesas que las originales. Estos dos casos se conocen como los casos área-a-punto (ATP, ‘area-to-point’) y área-a-área (ATA, ‘area-to-area’), respectivamente, los cuales son ilustrados en el primer capítulo de esta tesis. Más aún, podemos tener datos espaciales registrados en unidades geográficas gruesas a lo largo del tiempo. En algunos casos, la dimensión temporal también puede estar en una forma agregada, dificultando la visualización de la evolución del proceso subyacente a lo largo del tiempo. En esta tesis proponemos el uso de un novedoso método no-paramétrico que llamamos modelo mixto de enlace compuesto o, más brevemente, CLMM (‘composite link mixed model’). En nuestro modelo propuesto, miramos a los datos observados como observaciones indirectas de un proceso subyacente (definido en una resolución más fina que los datos observados), el cual queremos estimar. La formulación de modelo mixto en nuestra propuesta nos permite incluir información de la población medida en una escala fina y estructuras complejas como efectos aleatorios, como partes de la modelización de la tendencia subyacente. Dado que el CLMM est´a basado en el enfoque dado por Eilers (2007), llamado modelo de enlace compuesto penalizado (PCLM, ‘penalized composite link model’), revisaremos brevemente el enfoque PCLM en la primera sección del segundo capítulo de esta tesis. Luego, en la segunda sección de este capítulo, introduciremos el enfoque CLMM bajo un marco univariante, el cual puede ser visto como una reformulación del PCLM en un marco de modelo mixto. Esto es logrado siguiendo la reformulación como modelo mixto de los P-splines propuestos por Currie y Durbán (2002) y Currie et al. (2006), el cual es también revisado aquí. Luego, la estimación de parámetros del CLMM puede hacerse bajo el marco de la teoría de los modelos mixtos. Esto ofrece otra alternativa para la estimación del PCLM, evitando el uso de criterios de información para la selección del parámetro de suavizado. En la tercera sección del segundo capítulo, extendemos el enfoque CLMM al caso (array) multidimensional, en donde productos de Kronecker están implicados en la formulación del modelo extendido. Ilustraciones para los casos univariantes y (array) multidimensional son presentados a lo largo del segundo capítulo, usando conjuntos de datos de mortalidad y fertilidad. En el tercer capítulo, presentamos una nueva metodología para el análisis de datos agregados espacialmente, extendiendo el enfoque CLMM desarrollado en el segundo capítulo al caso espacial. El CLMM espacial proporciona soluciones suavizadas para los casos ATP y ATA descritos en el primer capítulo, es decir, entrega una estimación suavizada para la tendencia espacial subyacente, a partir de datos agregados, en una resolución más fina. Los casos ATP y ATA son ilustrados usando diferentes conjuntos de datos de mortalidad (o morbilidad), y estudios de simulación sobre el desempeño de predicción entre nuestro enfoque y el Poisson kriging área-a-punto de Goovaerts (2006) son realizados. Además, en el tercer capítulo proporcionamos una metodología para lidiar con el problema de sobredispersión, el cual está basado en el enfoque PRIDE (‘penalized regression with individual deviance effects’) de Perperoglou y Eilers (2010). En el cuarto capítulo, generalizamos la metodología desarrollada en el tercer capítulo para el análisis de datos agregados espacio-temporalmente. Bajo este contexto, adaptamos el algoritmo SAP (‘separation of anisotropic penalties’) de Rodríguez- Álvarez et al. (2015) y los algoritmos GLAM (‘generalized linear array model’) dados por Currie et al. (2006) y Eilers et al. (2006) en el contexto de los CLMMs. El uso de estos algoritmos eficientes nos permite evitar posibles problemas de almacenamiento y acelerar el tiempo de cómputo de la estimación del modelo. Ilustramos la metodología presentada en este capítulo usando un conjunto de datos sobre incidencia de fiebre Q registradas en Holanda a nivel municipal y por meses. Nuestro objetivo, luego, es el de estimar incidencias suavizadas en una malla espacial fina sobre el área de estudio a lo largo de las 53 semanas del 2009. Un estudio de simulación es dado al final del cuarto capítulo, de manera de evaluar el desempeño de predicción de nuestro enfoque bajo tres diferentes situaciones de agregación, usando un conjunto de datos detallado (y confidencial) de incidencia de fiebre Q. Finalmente, el quinto capítulo resume las contribuciones principales hechas en esta tesis y el trabajo a futuro. The work presented in this thesis was supported by the Spanish Ministry of Economy and Competitiveness grants MTM2011-28285-C02-02 and MTM2014-52184-P. Programa Oficial de Doctorado en Ingeniería Matemática Presidente: Miguel Ángel Martínez Beneito.- Secretario: Irene Albarrán Lozano.- Vocal: Jutta Gampe
- Published
- 2016
30. Effect of sociodemographic, clinical-prophylactic and therapeutic procedures on survival of AIDS patients assisted in a Brazilian outpatient clinic
- Author
-
Marilia Sá Carvalho, Jorge Francisco da Cunha Pinto, Dario José Hart Pontes Signorini, Cláudia Torres Codeço, Michelle Carreira Miranda Monteiro, Marion de Fátima Castro de Andrade, Dayse Pereira Campos, and Carlos Alberto Morais de Sá
- Subjects
Anti-HIV agents ,Survival rate ,Pediatrics ,medicine.medical_specialty ,Hospital mortality ,Epidemiology ,Fatal outcome ,Acquired immunodeficiency syndrome (AIDS) ,Non-parametric methods ,Medicine ,Outpatient clinic ,Epidemiological studies ,business.industry ,Proportional hazards model ,Mortality rate ,Public Health, Environmental and Occupational Health ,Proportional risk models ,General Medicine ,Survival analysis ,medicine.disease ,Regimen ,Cohort ,Cohort studies ,business ,Brazil ,Cohort study - Abstract
The Brazilian AIDS Program offers free and universal access to antiretroviral therapy. This study investigates the influence of sociodemographic, clinical-prophylactic and therapeutic factors on survival, after AIDS diagnosis, in an open cohort of 1,420 patients assisted in a university hospital in the city of Rio de Janeiro (1995 _ 2002). Kaplan-Meier and Cox proportional hazards models were used to estimate the effect of variables in the three dimensions studied. The overall survival time of the upper quartile was 24 months (CI95%= 20.5-27.5), increasing from 14 months, in 1995, to 46 months, in 1998. We found a protective effect of heterosexual behavior against death that could be attributed to the increasing female-to-male sex ratio in the cohort, which coincided with the time of therapy introduction. Low schooling, hospital admission and lack of follow-up were identified as risk factors for death; PCP and Toxoplasmosis prophylaxis were protective. The number of attempts required to consolidate the antiretroviral therapy showed no significant effect on survival. The full model, which includes the number of antiretroviral drugs in the regimen, confirmed the triple therapy as the best regimen. This study brings important information for designing guidelines to deal with different aspects related to the practical management of patients and their behavior, thus contributing to the success of the program of free access to antiretroviral therapy implemented in Brazil.
- Published
- 2005
- Full Text
- View/download PDF
31. kruX: matrix-based non-parametric eQTL discovery
- Author
-
Hassan Foroughi Asl, Johan Björkegren, Jianlong Qi, and Tom Michoel
- Subjects
Genotype ,Computer science ,Quantitative Trait Loci ,eQTL ,Polymorphism, Single Nucleotide ,Matrix algebra ,Biochemistry ,Statistics, Nonparametric ,Structural Biology ,Genetic model ,Test statistic ,Humans ,Non-parametric methods ,Quantitative Biology - Genomics ,Molecular Biology ,Statistical hypothesis testing ,Parametric statistics ,Genomics (q-bio.GN) ,Genome ,business.industry ,Applied Mathematics ,Nonparametric statistics ,Linear model ,Computational Biology ,Reproducibility of Results ,Pattern recognition ,Computer Science Applications ,FOS: Biological sciences ,Expression quantitative trait loci ,Outlier ,Artificial intelligence ,business ,Algorithms ,Software - Abstract
The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. In summary, kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure., minor revision; 6 pages, 5 figures; software available at http://krux.googlecode.com
- Published
- 2014
- Full Text
- View/download PDF
32. A Non-Parametric Approach for the Activation Detection of Block Design fMRI Simulated Data Using Self-Organizing Maps and Support Vector Machine
- Author
-
Mousa Shamsi and Sheyda Bahrami
- Subjects
Self-organizing map ,lcsh:Medical technology ,Computer science ,Biomedical Engineering ,Word error rate ,Health Informatics ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,Contrast-to-noise ratio ,Dimension (vector space) ,Computer Science (miscellaneous) ,Radiology, Nuclear Medicine and imaging ,Parametric statistics ,non-parametric methods ,Radiological and Ultrasound Technology ,business.industry ,Pattern recognition ,Support vector machine ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,lcsh:R855-855.5 ,classification ,Feature (computer vision) ,FMRI ,self-organizing map (SOM) ,Original Article ,support vector machine (SVM) ,Artificial intelligence ,business ,030217 neurology & neurosurgery - Abstract
Functional magnetic resonance imaging (fMRI) is a popular method to probe the functional organization of the brain using hemodynamic responses. In this method, volume images of the entire brain are obtained with a very good spatial resolution and low temporal resolution. However, they always suffer from high dimensionality in the face of classification algorithms. In this work, we combine a support vector machine (SVM) with a self-organizing map (SOM) for having a feature-based classification by using SVM. Then, a linear kernel SVM is used for detecting the active areas. Here, we use SOM for feature extracting and labeling the datasets. SOM has two major advances: (i) it reduces dimension of data sets for having less computational complexity and (ii) it is useful for identifying brain regions with small onset differences in hemodynamic responses. Our non-parametric model is compared with parametric and non-parametric methods. We use simulated fMRI data sets and block design inputs in this paper and consider the contrast to noise ratio (CNR) value equal to 0.6 for simulated datasets. fMRI simulated dataset has contrast 1–4% in active areas. The accuracy of our proposed method is 93.63% and the error rate is 6.37%.
- Published
- 2017
- Full Text
- View/download PDF
33. Consumer confidence and consumption forecast: a non-parametric approach
- Author
-
Giancarlo Bruno
- Subjects
Consumer expenditure ,Consumption (economics) ,Actuarial science ,Relation (database) ,jel:C53 ,Geography, Planning and Development ,Nonparametric statistics ,jel:E21 ,Development ,jel:C14 ,Forecasting ,Consumer confidence ,Non-parametric methods ,Non linear methods ,Empirical research ,Order (exchange) ,Economics ,Econometrics ,Consumer confidence index ,Public finance - Abstract
The consumer confidence index is a highly observed indicator among short-term analysts and news reporters and it is generally considered to convey some useful information about the short-term evolution of consumer expenditure. Notwithstanding this, its usefulness in forecasting aggregate consumption is sometimes questioned in empirical studies. Overall, the conclusions seem to be that the extensive press coverage about this indicator is somewhat undue. Nevertheless, from time to time, attention revamps on consumer confidence, especially when turns of the business cycle are expected and/or abrupt changes in this indicator occur. Some authors argue that in such events consumer confidence is a more relevant variable in predicting consumption. This fact can be a signal that a linear functional form is inadequate to explain the relationship between these two variables. Nevertheless, the choice of a suitable non-linear model is not straightforward. Here I propose that a non-parametric model can be a possible choice, in order to explore the usefulness of confidence in forecasting consumption, without making too restrictive assumptions about the functional form to use.
- Published
- 2012
34. Multimodality in the distribution of GDP and the absolute convergence hypothesis
- Author
-
Leone Leonida and Giovanni Caggiano
- Subjects
Statistics and Probability ,Economics and Econometrics ,Convergence, Neoclassical growth model, Income polarization, Non-parametric methods ,business.industry ,Polarization (politics) ,Nonparametric statistics ,Distribution (economics) ,OUTPUT CONVERGENCE ,Convergence (economics) ,Nonparametric Methods ,multimodality ,Income polarization ,Per capita income ,Absolute convergence ,Mathematics (miscellaneous) ,Econometrics ,Per capita ,Economics ,Production (economics) ,Non-parametric methods ,business ,Convergence ,Neoclassical growth model ,Social Sciences (miscellaneous) - Abstract
This article shows that, contrary to common wisdom, the insurgence of a multiplicity of clusters in the distribution of income is not necessarily against the hypothesis of absolute convergence. Using data for the world economies, the US states, the EU regions, and the Italian regions, we find that despite the distribution of income per capita for both the world economies and for the Italian regions is multimodal, only in the former case absolute convergence can be rejected. Similarly, although the distributions for the EU regions and the US states are both unimodal, convergence is unambiguously taking place in the latter case only. We show that these results are consistent with the neoclassical model of growth in the presence of non-convexities in production. We conclude that polarization in the distribution of per capita incomes is neither a sufficient nor a necessary condition to reject the absolute convergence hypothesis.
- Published
- 2012
35. Nonparametric Estimators of Dose-Response Functions
- Author
-
BIA Michela, FLORES Carlos A., and MATTEI Alessandra
- Subjects
Continuous treatment ,Dose-response function ,Generalized Propensity Score ,Non-parametric methods ,R&D investment ,jel:C13 ,jel:J70 ,jel:J31 - Abstract
We propose two semiparametric estimators of the dose-response function based on spline techniques. Under uncounfoundedness, the generalized propensity score can be used to estimate dose-response functions (DRF) and marginal treatment effect functions. In many observational studies treatment may not be binary or categorical. In such cases, one may be interested in estimating the dose-response function in a setting with a continuous treatment. We evaluate the performance of the proposed estimators using Monte Carlo simulation methods. The simulation results suggested that the estimated DRF is robust to the specific semiparametric estimator used, while the parametric estimates of the DRF were sensitive to model mis-specification. We apply our approach to the problem of evaluating the effect on innovation sales of Research and Development (R&D) financial aids received by Luxembourgish firms in 2004 and 2005.
- Published
- 2011
36. Genotyping of Saccharomyces cerevisiae strains by interdelta sequence typing using automated microfluidics
- Author
-
Manuel A. S. Santos, Inês Mendes, Ana C. Gomes, Ricardo Franco-Duarte, Bruno de Sousa, Dorit Elisabeth Schuller, and Universidade do Minho
- Subjects
Genotype ,Retroelements ,DNA polymerase ,Sequence analysis ,Clinical Biochemistry ,Retrotransposon ,Computational biology ,Saccharomyces cerevisiae ,Interdelta sequences ,Biochemistry ,Polymerase Chain Reaction ,Statistics, Nonparametric ,Analytical Chemistry ,law.invention ,Capillary electrophoresis ,03 medical and health sciences ,chemistry.chemical_compound ,Automation ,law ,Non-parametric methods ,Typing ,Particle Size ,Genotyping ,Polymerase chain reaction ,030304 developmental biology ,Genetics ,Electrophoresis, Agar Gel ,0303 health sciences ,Reproducibility ,Analysis of Variance ,Science & Technology ,biology ,030306 microbiology ,Electrophoresis, Capillary ,Reproducibility of Results ,Sequence Analysis, DNA ,Microfluidic Analytical Techniques ,3. Good health ,chemistry ,biology.protein ,DNA - Abstract
Amplification of genomic sequences flanked by delta elements of retrotransposons TY1 and TY2 is a reliable method for characterization of Saccharomyces cerevisiae strains. The aim of this study is to evaluate the usefulness of microfluidic electrophoresis (Caliper LabChip®) to assess the factors that affect interlaboratory reproducibility of interdelta sequence typing for S. cerevisiae strain delimitation. We carried out experiments in two laboratories, using varying combinations of Taq DNA polymerases and thermal cyclers. The reproducibility of the technique is evaluated using non-parametric statistical tests and we show that the source of Taq DNA polymerase and technical differences between laboratories have the highest impact on reproducibility, whereas thermal cyclers have little impact. We also show that the comparative analysis of interdelta patterns is more reliable when fragment sizes are compared, than when absolute and relative DNA concentrations of each band are considered. Interdelta analysis based on a smaller fraction of bands with intermediate sizes between 100 and 1000 bp yield the highest reproducibility., Fundação para a Ciência e Tecnologia
- Published
- 2010
37. Testing Non-parametric Methods to Estimate Cod (Gadus morhua) Recruitment in NAFO Divisions 3NO
- Author
-
Paz, J. and Larrañeta, M. G.
- Subjects
Gadus morhua ,NAFO Divisions 3NO ,Non-parametric methods ,Recruitment ,Cod - Abstract
6 pages, 3 figures, 1 table., Recognizing that non-parametric methods to estimate fish stock recruitment are generally simple and they do not need to be based on ecological hypotheses, four non-parametric methods; the probability transition matrix and three algorithms to estimate recruitment probability density functions were tested on cod (Gadus morhua) data from NAFO Div. 3NO. The transition matrix method was inadequate because the cod stock failed to meet the primary Markovian assumption: the transition probability must be constant and depend only on the previous state. Of the three algorithm methods, the fixed-interval, the New England and the Cauchy, only the New England was appropriate for calculating recruitment with these stock data. A regression coefficient of r = 0.556 (d.f. = 23, P = 0.003) was obtained when the observed data were compared with the estimated. The validity of estimates of future recruitment using the New England algorithm depends on biotic and abiotic environmental conditions being similar in both the pre-recruit and the observation periods.
- Published
- 1993
38. Position of chosen European Union countries in respect of financial efficiency of higher education in the area of didactics
- Author
-
Monika Mościbrodzka and Anna Ćwiąkała-Małys
- Subjects
non-parametric methods ,Ward method ,financial efficiency ,I23 ,ddc:330 ,didactical process ,C14 ,C38 ,I21 ,European Union higher education - Abstract
This article is continuation of the authors' research on financial efficiency evaluation in higher education in the area of didactics in countries that belong to European Union. On the basis of the results of researches, in which a non-parametric approach was used, a classification of member countries into uniform groups has been conducted with reference to researched feature.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.