31 results on '"Kolmogorov-Smirnov Distance"'
Search Results
2. Discriminating between and within (semi)continuous classes of both Tweedie and geometric Tweedie models.
- Author
-
Rahma, Abid and Kokonendji, Célestin C.
- Subjects
- *
GEOMETRIC modeling , *GEOMETRIC distribution , *LIKELIHOOD ratio tests , *GAMMA distributions - Abstract
In both Tweedie and geometric Tweedie models, the common power parameter p ∉ (0 , 1) works as an automatic distribution selection. It separates two subclasses of semicontinuous (1
2) ones; and two datasets for illustration purposes are investigated. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Rate of convergence for traditional Pólya urns.
- Author
-
Janson, Svante
- Abstract
Consider a Pólya urn with balls of several colours, where balls are drawn sequentially and each drawn ball is immediately replaced together with a fixed number of balls of the same colour. It is well known that the proportions of balls of the different colours converge in distribution to a Dirichlet distribution. We show that the rate of convergence is $\Theta(1/n)$ in the minimal $L_p$ metric for any $p\in[1,\infty]$ , extending a result by Goldstein and Reinert; we further show the same rate for the Lévy distance, while the rate for the Kolmogorov distance depends on the parameters, i.e. on the initial composition of the urn. The method used here differs from the one used by Goldstein and Reinert, and uses direct calculations based on the known exact distributions. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
4. On Discrimination Between the Lindley and xgamma Distributions
- Author
-
Sen, Subhradev, Al-Mofleh, Hazem, and Maiti, Sudhansu S.
- Published
- 2021
- Full Text
- View/download PDF
5. K-Medoids Clustering of Data Sequences With Composite Distributions.
- Author
-
Wang, Tiexing, Li, Qunwei, Bucci, Donald J., Liang, Yingbin, Chen, Biao, and Varshney, Pramod K.
- Subjects
- *
ERROR probability , *PARALLEL algorithms , *CONTINUOUS distributions , *INFINITY (Mathematics) - Abstract
This paper studies clustering of data sequences using the k-medoids algorithm. All the data sequences are assumed to be generated from unknown continuous distributions, which form clusters with each cluster containing a composite set of closely located distributions (based on a certain distance metric between distributions). The maximum intracluster distance is assumed to be smaller than the minimum intercluster distance, and both values are assumed to be known. The goal is to group the data sequences together if their underlying generative distributions (which are unknown) belong to one cluster. Distribution distance metrics based k-medoids algorithms are proposed for known and unknown number of distribution clusters. Upper bounds on the error probability and convergence results in the large sample regime are also provided. It is shown that the error probability decays exponentially fast as the number of samples in each data sequence goes to infinity. The error exponent has a simple form regardless of the distance metric applied when certain conditions are satisfied. In particular, the error exponent is characterized when either the Kolmogrov–Smirnov distance or the maximum mean discrepancy are used as the distance metric. Simulation results are provided to validate the analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
6. Nonparametric Composite Hypothesis Testing in an Asymptotic Regime.
- Author
-
Li, Qunwei, Wang, Tiexing, Bucci, Donald J., Liang, Yingbin, Chen, Biao, and Varshney, Pramod K.
- Abstract
We investigate the nonparametric, composite hypothesis testing problem for arbitrary unknown distributions in the asymptotic regime where both the sample size and the number of hypothesis grow exponentially large. Such asymptotic analysis is important in many practical problems, where the number of variations that can exist within a family of distributions can be countably infinite. We introduce the notion of discrimination capacity, which captures the largest exponential growth rate of the number of hypothesis relative to the sample size so that there exists a test with asymptotically vanishing probability of error. Our approach is based on various distributional distance metrics in order to incorporate the generative model of the data. We provide analyses of the error exponent using the maximum mean discrepancy and Kolmogorov–Smirnov distance and characterize the corresponding discrimination rates, i.e., lower bounds on the discrimination capacity, for these tests. Finally, an upper bound on the discrimination capacity based on Fano's inequality is developed. Numerical results are presented to validate the theoretical results. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
7. Discriminating among Weibull, log-normal, and log-logistic distributions.
- Author
-
Raqab, Mohammad Z., Al-Awadhi, Shafiqah A., and Kundu, Debasis
- Subjects
- *
WEIBULL distribution , *DISTRIBUTION (Probability theory) , *PROBABILITY theory , *MONTE Carlo method , *NUMERICAL calculations - Abstract
In this article, we consider the problem of the model selection/discrimination among three different positively skewed lifetime distributions. All these three distributions, namely; the Weibull, log-normal, and log-logistic, have been used quite effectively to analyze positively skewed lifetime data. In this article, we have used three different methods to discriminate among these three distributions. We have used the maximized likelihood method to choose the correct model and computed the asymptotic probability of correct selection. We have further obtained the Fisher information matrices of these three different distributions and compare them for complete and censored observations. These measures can be used to discriminate among these three distributions. We have also proposed to use the Kolmogorov-Smirnov distance to choose the correct model. Extensive simulations have been performed to compare the performances of the three different methods. It is observed that each method performs better than the other two for some distributions and for certain range of parameters. Further, the loss of information due to censoring are compared for these three distributions. The analysis of a real dataset has been performed for illustrative purposes. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
8. Quantifying Schumann resonances' variation over time through statistical differences.
- Author
-
Soler-Ortiz, Manuel, Fernández-Ros, Manuel, Novas-Castellano, Nuria, and Gázquez-Parra, Jose A.
- Subjects
- *
DISTRIBUTION (Probability theory) , *RESONANCE , *FAST Fourier transforms , *STOCHASTIC processes - Abstract
Schumann resonances' statistical parameters vary over time, which gives way to analyze them as a stochastic process. By splitting the Schumann resonance's records into segments and obtaining their empirical distribution function, the differences can be evaluated by calculating the Kolmogorov–Smirnov distance between them. The analysis' results allow for the characterization of the Schumann resonance's variations, along with the typical values it reaches under the chosen metric. It is shown how divergence is mitigated through sample averaging, and also how divergent samples impact on the Fast Fourier Transform algorithm. Divergence quantification adds a layer of information for data processing. Knowing the changes experienced over time in Schumann resonances gives a way to know what kind of mathematical procedures can be applied to the signal. Quantification of signal variations over time can identify error sources in specific procedures, filter out samples unfit under certain analyses, or serve as a stop criteria for cumulative analyses. • Schumann Resonances' temporal segments compared with Kolmogorov–Smirnov distance. • Flexible methodology to compare segments of different duration allowing overlap. • Methodology tested over real data, showing relationship with lightning activity. • Kolmogorov–Smirnov distance as support for further analysis procedures discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Novel feature extraction method of pipeline signals based on multi-scale dispersion entropy partial mean of multi-modal component.
- Author
-
Zhou, Yina, Lu, Jingyi, Hu, Zhongrui, Dong, Hongli, Yan, Wendi, and Yang, Dandi
- Subjects
- *
FEATURE extraction , *ENTROPY , *MACHINE learning , *DISPERSION (Chemistry) , *PHOTOACOUSTIC spectroscopy - Abstract
This paper considers the problem of feature extraction of pipeline acoustic signals under different working conditions. A novel method is proposed based on multi-scale dispersion entropy partial mean (MDEPM) of multi-modal component to extract features of pipeline signals. First, variational mode decomposition (VMD) algorithm is applied to decompose the acoustic signals to obtain several mode components. Then, Kolmogorov–Smirnov distance (KSD) is introduced as the index to measure the correlation between each mode component and the original signal, and the mode component with a smaller KSD is selected as the feature component. Finally, the MDEPM of the feature component is calculated so as to form the feature vector, which realizes signal feature extraction. In experiment, 13 different condition signals collected are divided into three categories, the experimental results show that the proposed method could extract the characteristics of the different pipeline acoustic signals. Furthermore, the extracted features could be accurately identified and classified by extreme learning machine (ELM) under different working conditions, and then through comparing with other methods, the feasibility and superiority of the proposed method are verified. • To select the feature components by analyzing the Kolmogorov–Smirnov distance between each mode component and the original signal. • The multi-scale dispersion entropy partial mean could characterize the features of different pipeline signals at different scales. • The method of MMC-MDEPM could be used to extract the different signal features. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Website Clickstream Data Visualization Using Improved Markov Chain Modelling In Apache Flume
- Author
-
Frhan Amjad Jumaah
- Subjects
Clickstream data ,VizClick ,WebClickviz ,Apache Flume ,Markov chain ,Kolmogorov-Smirnov distance ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Clickstream data analysis is considered as the process of collecting, analysing and reporting the aggregate data about the web pages a visitor clicks. Visualizing the clickstream data has gained significant importance in many applications like web marketing, customer prediction, product management, etc. Most existing works employ different tools for visualizing along with techniques like Markov chain modelling. However the accuracy of the methods can be improved when the shortcomings are resolved. Markov chain modelling has problems of occlusion and unable to provide clear display of data visualizing. These issues can be resolved by improving the Markov chain model by introducing a heuristic method of Kolmogorov– Smirnov distance and maximum likelihood estimator for visualizing. These concepts are employed between the underlying distribution states to minimize the Markov distribution. The proposed model named as WebClickviz is performed in Hadoop Apache Flume which is a highly advanced tool. The clickstream data visualization accuracy can be improved when Apache Flume tools are used. The performance evaluation are made on a specific website clickstream data which shows the proposed model of visualization has better performance than existing models like VizClick.
- Published
- 2017
- Full Text
- View/download PDF
11. Regression Estimator for the Tail Index
- Author
-
Németh, László and Zempléni, András
- Published
- 2020
- Full Text
- View/download PDF
12. A Recursive Algorithm For the Single and Product Moments of Order Statistics From the Exponential-geometric Distribution and Some Estimation Methods.
- Author
-
Balakrishnan, N., Zhu, Xiaojun, and Al-Zahrani, Bander
- Subjects
- *
RECURSIVE functions , *ORDER statistics , *EXPONENTIAL functions , *ESTIMATION theory , *GEOMETRIC distribution , *MONTE Carlo method - Abstract
The exponential-geometric distribution has been proposed as a simple and useful reliability model for analyzing lifetime data. For this distribution, some recurrence relations are established for the single moments and product moments of order statistics. Using these recurrence relations, the means, variances and covariances of all order statistics can be computed for all sample sizes in a simple and efficient recursive manner. Next, we discuss the maximum likelihood estimation of the model parameters as well as some simple modified methods of estimation. Then, a Monte Carlo simulation study is carried out to evaluate the performance of all these methods of estimation in terms of their bias and mean square error as well as the percentage of times the estimates converged. Two illustrative examples are finally presented to illustrate all the inferential results developed here. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
13. Distribution Analyzer, a methodology for identifying and clustering outlier conditions from single-cell distributions, and its application to a Nanog reporter RNAi screen.
- Author
-
Gingold, Julian A., Coakley, Ed S., Jie Su, Dung-Fang Lee, Zerlina Lau, Hongwei Zhou, Felsenfeld, Dan P., Schaniel, Christoph, and Lemischka, Ihor R.
- Subjects
- *
SINGLE cell proteins , *CELL morphology , *SMALL interfering RNA , *CHEMINFORMATICS , *RNA interference , *PARTITION coefficient (Chemistry) - Abstract
Background: Chemical or small interfering (si) RNA screens measure the effects of many independent experimental conditions, each applied to a population of cells (e.g., all of the cells in a well). High-content screens permit a readout (e.g., fluorescence, luminescence, cell morphology) from each cell in the population. Most analysis approaches compare the average effect on each population, precluding identification of outliers that affect the distribution of the reporter in the population but not its average. Other approaches only measure changes to the distribution with a single parameter, precluding accurate distinction and clustering of interesting outlier distributions. Results: We describe a methodology to identify outlier conditions by considering the cell-level measurements from each condition as a sample of an underlying distribution. With appropriate selection of a distance metric, all effects can be embedded in a fixed-dimensionality Euclidean basis, facilitating identification and clustering of biologically interesting outliers. We demonstrate that measurement of distances with the Hellinger distance metric offers substantial computational efficiencies over alternative metrics. We validate this methodology using an RNA interference (RNAi) screen in mouse embryonic stem cells (ESC) with a Nanog reporter. The methodology clusters effects of multiple control siRNAs into their true identities better than conventional approaches describing the median cell fluorescence or the commonly used Kolmogorov-Smirnov distance between the observed fluorescence distribution and the null distribution. It identifies outlier genes with effects on the reporter distribution that would have been missed by other methods. Among them, siRNA targeting Chek1 leads to a wider Nanog reporter fluorescence distribution. Similarly, siRNA targeting Med14 or Med27 leads to a narrower Nanog reporter fluorescence distribution. We confirm the roles of these three genes in regulating pluripotency by mRNA expression and alkaline phosphatase staining using independent short hairpin (sh) RNAs. Conclusions: Using our methodology, we describe each experimental condition by a probability distribution. Measuring distances between probability distributions permits a multivariate rather than univariate readout. Clustering points derived from these distances allows us to obtain greater biological insight than methods based solely on single parameters. We find several outliers from a mouse ESC RNAi screen that we confirm to be pluripotency regulators. Many of these outliers would have been missed by other analysis methods. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
14. Distribution of the sum of a complex Gaussian and the product of two complex Gaussians.
- Author
-
Betlehem, T. and Coulson, A.J.
- Abstract
The probability density function of sum of a complex Gaussian and the product of two complex Gaussians is derived. This distribution occurs in wireless communications where Gaussian signals are transmitted over Rayleigh channels. The result is validated using the Kolmogorov-Smirnov distance and the accuracy of several approximations are compared, including fitted gamma and Nakagami distributions. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
15. Did the Michigan Supreme Court Appreciate the Implications of Adopting the "Disparity of the Risk" Measure of Minority Representation in Jury Pools in People v. Bryant?
- Author
-
Gastwirth, Joseph L., Xu, Wenjing, and Pan, Qing
- Subjects
- *
AFRICAN American jurors , *JURY , *COURTS , *MINORITIES , *KOLMOGOROV complexity - Abstract
Due to an error in the computer program used by Kent County, Michigan, for about 15 months, the African-American proportion of jury pools was about one-half their proportion of the eligible population. Subsequently, a number of defendants appealed their convictions because the jury pool did not fairly represent the community. Although a Federal Court of Appeals found that the statistical evidence helped the defendant establish that African-Americans were under-represented; in a different case, People v. Bryant, the Michigan Supreme Court found the statistics insufficient. The different conclusions arose because the Michigan Court adopted a new criterion that the "disparity of the risk" measure should be at least 0.50. The statistical properties of the measure will be described and it will be seen that it is equivalent to requiring that the Kolmogorov-Smirnov distance between the two relevant distributions is at least 0.50. If one is comparing two normal distributions, with different means and the same variance, the requirement implies that the effect size would need to be at least 1.35 before one could conclude that they differed. Since effect sizes of 0.8 are considered "large," the criteria used by the Michigan Court are far too stringent and if adopted nation-wide would allow individuals to have trials in front of juries where minorities are substantially under-represented. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
16. Law recognitions by information criteria for the statistical modeling of small scale fading of the radio mobile channel
- Author
-
Alata, O., Olivier, C., and Pousset, Y.
- Subjects
- *
MOBILE radio stations , *STATISTICAL models , *RADIO transmitter fading , *DIGITAL communications , *HISTOGRAMS , *ELECTROMAGNETISM - Abstract
Abstract: Information criteria based methods are proposed to select the best probability law to model the distribution of samples resulting from the small-scale fading of the propagation channel. The first is based on the estimation of an optimal histogram approximating the probability density function. The second one employs the direct use of an information criterion. Indeed, the modelling of the radio mobile channel small-scale fading is crucial in digital communications. It is the reason why several propagation models have been implemented to take into account the electromagnetic phenomena inherent in radio wave channels. Amongst these models is the family of statistical distributions which is rapid in computation time. In the context of this study our concern is to find, among different probability laws, the one which best coincides with radio channel behaviour. The experimental results show that the proposed methods are better than those methods already employed, such as the classical Kolmogorov–Smirnov test using cumulative distribution functions, or methods using different estimators of probability density functions, like the kernel density estimator and the Gaussian mixture model. Results are provided in supervised and unsupervised contexts. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
17. Discriminating between the Weibull and log-normal distributions for Type-II censored data.
- Author
-
Dey, ArabinKumar and Kundu, Debasis
- Subjects
- *
WEIBULL distribution , *GAUSSIAN distribution , *ASYMPTOTIC distribution , *MAXIMUM likelihood statistics , *PROBABILITY theory , *SIMULATION methods & models , *MATHEMATICAL models - Abstract
Log-normal and Weibull distributions are the two most popular distributions for analysing lifetime data. In this paper, we consider the problem of discriminating between the two distribution functions. It is assumed that the data are coming either from log-normal or Weibull distributions and that they are Type-II censored. We use the difference of the maximized log-likelihood functions, in discriminating between the two distribution functions. We obtain the asymptotic distribution of the discrimination statistic. It is used to determine the probability of correct selection in this discrimination process. We perform some simulation studies to observe how the asymptotic results work for different sample sizes and for different censoring proportions. It is observed that the asymptotic results work quite well even for small sizes if the censoring proportions are not very low. We further suggest a modified discrimination procedure. Two real data sets are analysed for illustrative purposes. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
18. Discriminating Between the Log-Normal and Log-Logistic Distributions.
- Author
-
Dey, ArabinKumar and Kundu, Debasis
- Subjects
- *
LOGISTIC distribution (Probability) , *DISTRIBUTION (Probability theory) , *ASYMPTOTIC distribution , *ASYMPTOTIC expansions , *PROBABILITY theory - Abstract
Log-normal and log-logistic distributions are often used to analyze lifetime data. For certain ranges of the parameters, the shape of the probability density functions or the hazard functions can be very similar in nature. It might be very difficult to discriminate between the two distribution functions. In this article, we consider the discrimination procedure between the two distribution functions. We use the ratio of maximized likelihood for discrimination purposes. The asymptotic properties of the proposed criterion are investigated. It is observed that the asymptotic distributions are independent of the unknown parameters. The asymptotic distributions are used to determine the minimum sample size needed to discriminate between these two distribution functions for a user specified probability of correct selection. We perform some simulation experiments to see how the asymptotic results work for small sizes. For illustrative purpose, two data sets are analyzed. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
19. Discriminating Among the Log-Normal, Weibull, and Generalized Exponential Distributions.
- Author
-
Dey, Arabin Kumar and Kundu, Debasis
- Subjects
- *
DISTRIBUTION (Probability theory) , *STATISTICAL sampling , *LITERATURE , *DATA distribution , *ASYMPTOTIC distribution , *PROBABILITY theory - Abstract
We consider model selection and discrimination among three important lifetime distributions. These three distributions have been used quite effectively to analyze lifetime data. We study the probability of correct selection using the maximized likelihood method, as it has been used in the literature. We further compute the asymptotic probability of correct selection, and compare the theoretical, and simulation results for different sample sizes, and for different model parameters. The results have been extended for Type-I censored data also. The theoretical, and simulation results match quite well. Two real data sets have been analyzed for illustrative purposes. We also suggest a method to determine the minimum sample size required to discriminate among the three distributions for a given probability of correct selection, and a user specified protection level. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
20. Normal theory likelihood ratio statistic for mean and covariance structure analysis under alternative hypotheses
- Author
-
Yuan, Ke-Hai, Hayashi, Kentaro, and Bentler, Peter M.
- Subjects
- *
REASONING , *STATISTICAL sampling , *MATRICES (Mathematics) , *HYPOTHESIS - Abstract
Abstract: The normal distribution based likelihood ratio (LR) statistic is widely used in structural equation modeling. Under a sequence of local alternative hypotheses, this statistic has been shown to asymptotically follow a noncentral chi-square distribution. In practice, the population mean vector and covariance matrix as well as the model and sample size are always fixed. It is hard to justify the validity of the noncentral chi-square distribution for the resulting LR statistic even when data are normally distributed and sample size is large. By extending results in the literature, this paper develops normal distributions to describe the behavior of the LR statistic for mean and covariance structure analysis. A sequence of local alternative hypotheses is not necessary for the proposed distributions to be asymptotically valid. When the effect size is medium and above or when the model is not trivially misspecified, empirical results indicate that a refined normal distribution describes the behavior of the LR statistic better than the commonly used noncentral chi-square distribution, as measured by the Kolmogorov–Smirnov distance. Quantile–quantile plots are also provided to better understand the different distributions. [Copyright &y& Elsevier]
- Published
- 2007
- Full Text
- View/download PDF
21. Quicksort asymptotics
- Author
-
Fill, James Allen and Janson, Svante
- Subjects
- *
ALGORITHMS , *ASYMPTOTIC expansions , *SORTING (Electronic computers) - Abstract
The number of comparisons
Xn used by Quicksort to sort an array ofn distinct numbers has meanμn of ordernlogn and standard deviation of ordern . Using different methods, Re´gnier and Ro¨sler each showed that the normalized variateYn:=(Xn−μn)/n converges in distribution, say toY ; the distribution ofY can be characterized as the unique fixed point with zero mean of a certain distributional transformation.We provide the first rates of convergence for the distribution ofYn to that ofY , using various metrics. In particular, we establish the bound2n−1/2 in thed2 -metric, and the rateO(nϵ−(1/2)) for Kolmogorov–Smirnov distance, for any positiveϵ . [Copyright &y& Elsevier]- Published
- 2002
- Full Text
- View/download PDF
22. Quantitative stability in stochastic programming.
- Author
-
Shapiro, Alexander
- Abstract
In this paper we study stability of optimal solutions of stochastic programming problems with fixed recourse. An upper bound for the rate of convergence is given in terms of the objective functions of the associated deterministic problems. As an example it is shown how it can be applied to derivation of the Law of Iterated Logarithm for the optimal solutions. It is also shown that in the case of simple recourse this upper bound implies upper Lipschitz continuity of the optimal solutions with respect to the Kolmogorov-Smirnov distance between the corresponding cumulative probability distribution functions. [ABSTRACT FROM AUTHOR]
- Published
- 1994
- Full Text
- View/download PDF
23. Instantaneous Transfer Entropy for the Study of Cardiovascular and Cardio-Respiratory Nonstationary Dynamics
- Author
-
Michele Orini, Riccardo Barbieri, Luca Citi, Gaetano Valenza, Luca Faes, Valenza, Gaetano, Faes, Luca, Citi, Luca, Orini, Michele, and Barbieri, Riccardo
- Subjects
Adult ,Male ,Information transfer ,History ,Heartbeat ,Databases, Factual ,Physiology ,Entropy ,0206 medical engineering ,Complex system ,Biomedical Engineering ,Heart Rate Variability ,Probability density function ,02 engineering and technology ,01 natural sciences ,Point process ,Statistics, Nonparametric ,Electrocardiography ,Young Adult ,0103 physical sciences ,Entropy (information theory) ,Humans ,Statistical physics ,Transfer Entropy ,010306 general physics ,Biomedical measurement ,Mathematics ,business.industry ,Hemodynamics ,Models, Cardiovascular ,Heart beat ,Signal Processing, Computer-Assisted ,Complexity ,Baroreflex ,020601 biomedical engineering ,Kolmogorov-Smirnov Distance ,Respiratory Sinus Arrhythmia ,Heart rate variability ,Point Process ,Discrete time and continuous time ,Point Proce ,Settore ING-INF/06 - Bioingegneria Elettronica E Informatica ,Transfer entropy ,Female ,Artificial intelligence ,business - Abstract
Objective: Measures of transfer entropy (TE) quantify the direction and strength of coupling between two complex systems. Standard approaches assume stationarity of the observations, and therefore are unable to track time-varying changes in nonlinear information transfer with high temporal resolution. In this study, we aim to define and validate novel instantaneous measures of TE to provide an improved assessment of complex nonstationary cardiorespiratory interactions. Methods: We here propose a novel instantaneous point-process TE (ipTE) and validate its assessment as applied to cardiovascular and cardiorespiratory dynamics. In particular, heartbeat and respiratory dynamics are characterized through discrete time series, and modeled with probability density functions predicting the time of the next physiological event as a function of the past history. Likewise, nonstationary interactions between heartbeat and blood pressure dynamics are characterized as well. Furthermore, we propose a new measure of information transfer, the instantaneous point-process information transfer (ipInfTr), which is directly derived from point-process-based definitions of the Kolmogorov–Smirnov distance. Results and Conclusion: Analysis on synthetic data, as well as on experimental data gathered from healthy subjects undergoing postural changes confirms that ipTE, as well as ipInfTr measures are able to dynamically track changes in physiological systems coupling. Significance: This novel approach opens new avenues in the study of hidden, transient, nonstationary physiological states involving multivariate autonomic dynamics in cardiovascular health and disease. The proposed method can also be tailored for the study of complex multisystem physiology (e.g., brain–heart or, more in general, brain–body interactions).
- Published
- 2018
24. Discriminating between Weibull and generalized exponential distributions
- Author
-
Gupta, Rameshwar D. and Kundu, Debasis
- Subjects
- *
THEORY of distributions (Functional analysis) , *LOGARITHMIC functions - Abstract
Recently the two-parameter generalized exponential (GE) distribution was introduced by the authors. It is observed that a GE distribution can be considered for situations where a skewed distribution for a non-negative random variable is needed. The ratio of the maximized likelihoods (RML) is used in discriminating between Weibull and GE distributions. Asymptotic distributions of the logarithm of the RML under null hypotheses are obtained and they are used to determine the minimum sample size required in discriminating between two overlapping families of distributions for a user specified probability of correct selection and tolerance limit. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
25. Website Clickstream Data Visualization Using Improved Markov Chain Modelling In Apache Flume
- Author
-
Amjad Jumaah Frhan
- Subjects
Apache Flume ,Markov chain ,Heuristic (computer science) ,Process (engineering) ,business.industry ,Computer science ,05 social sciences ,computer.software_genre ,Online advertising ,Visualization ,Clickstream data ,Kolmogorov-Smirnov distance ,lcsh:TA1-2040 ,WebClickviz ,Product management ,Aggregate data ,Data mining ,0509 other social sciences ,050904 information & library sciences ,business ,VizClick ,lcsh:Engineering (General). Civil engineering (General) ,computer ,Clickstream - Abstract
Clickstream data analysis is considered as the process of collecting, analysing and reporting the aggregate data about the web pages a visitor clicks. Visualizing the clickstream data has gained significant importance in many applications like web marketing, customer prediction, product management, etc. Most existing works employ different tools for visualizing along with techniques like Markov chain modelling. However the accuracy of the methods can be improved when the shortcomings are resolved. Markov chain modelling has problems of occlusion and unable to provide clear display of data visualizing. These issues can be resolved by improving the Markov chain model by introducing a heuristic method of Kolmogorov– Smirnov distance and maximum likelihood estimator for visualizing. These concepts are employed between the underlying distribution states to minimize the Markov distribution. The proposed model named as WebClickviz is performed in Hadoop Apache Flume which is a highly advanced tool. The clickstream data visualization accuracy can be improved when Apache Flume tools are used. The performance evaluation are made on a specific website clickstream data which shows the proposed model of visualization has better performance than existing models like VizClick.
- Published
- 2017
26. Non Parametric Decision Trees by Bayesian Approach
- Author
-
Celeux, G., Lechevallier, Y., Caussinus, H., editor, Ettinger, P., editor, and Tomassone, R., editor
- Published
- 1982
- Full Text
- View/download PDF
27. Normal theory likelihood ratio statistic for mean and covariance structure analysis under alternative hypotheses
- Author
-
Ke-Hai Yuan, Peter M. Bentler, and Kentaro Hayashi
- Subjects
Statistics and Probability ,Statistics::Theory ,Numerical Analysis ,Noncentral chi-squared distribution ,Noncentral chi-square distribution ,Noncentral F-distribution ,Sampling distribution ,Noncentral t-distribution ,Kolmogorov–Smirnov distance ,Ancillary statistic ,Statistics ,Test statistic ,Statistics::Methodology ,Quantile–quantile plot ,Statistics, Probability and Uncertainty ,Normal distribution ,Structural model ,Statistic ,Sufficient statistic ,Mathematics - Abstract
The normal distribution based likelihood ratio (LR) statistic is widely used in structural equation modeling. Under a sequence of local alternative hypotheses, this statistic has been shown to asymptotically follow a noncentral chi-square distribution. In practice, the population mean vector and covariance matrix as well as the model and sample size are always fixed. It is hard to justify the validity of the noncentral chi-square distribution for the resulting LR statistic even when data are normally distributed and sample size is large. By extending results in the literature, this paper develops normal distributions to describe the behavior of the LR statistic for mean and covariance structure analysis. A sequence of local alternative hypotheses is not necessary for the proposed distributions to be asymptotically valid. When the effect size is medium and above or when the model is not trivially misspecified, empirical results indicate that a refined normal distribution describes the behavior of the LR statistic better than the commonly used noncentral chi-square distribution, as measured by the Kolmogorov–Smirnov distance. Quantile–quantile plots are also provided to better understand the different distributions.
- Published
- 2007
- Full Text
- View/download PDF
28. Distribution Analyzer, a methodology for identifying and clustering outlier conditions from single-cell distributions, and its application to a Nanog reporter RNAi screen
- Author
-
Dung Fang Lee, Hongwei Zhou, Zerlina Lau, Christoph Schaniel, Ihor R. Lemischka, Ed S. Coakley, Dan P. Felsenfeld, Jie Su, and Julian A. Gingold
- Subjects
Homeobox protein NANOG ,Hellinger distance ,Population ,High-content screening methodology ,Tretinoin ,Computational biology ,Fluorescence distribution ,Biology ,Cell morphology ,Nanog RNAi screen ,Biochemistry ,Cell Line ,Mice ,Structural Biology ,Genes, Reporter ,Null distribution ,Animals ,Cluster Analysis ,RNA, Small Interfering ,Cluster analysis ,education ,Promoter Regions, Genetic ,Molecular Biology ,Genetics ,Homeodomain Proteins ,education.field_of_study ,Genome ,Mediator Complex ,Applied Mathematics ,Computational Biology ,Cell Differentiation ,Mouse Embryonic Stem Cells ,Nanog Homeobox Protein ,Genome-scale screen analysis ,Computer Science Applications ,Kolmogorov-Smirnov distance ,Outlier ,Probability distribution ,RNA Interference ,Research Article - Abstract
Background Chemical or small interfering (si) RNA screens measure the effects of many independent experimental conditions, each applied to a population of cells (e.g., all of the cells in a well). High-content screens permit a readout (e.g., fluorescence, luminescence, cell morphology) from each cell in the population. Most analysis approaches compare the average effect on each population, precluding identification of outliers that affect the distribution of the reporter in the population but not its average. Other approaches only measure changes to the distribution with a single parameter, precluding accurate distinction and clustering of interesting outlier distributions. Results We describe a methodology to identify outlier conditions by considering the cell-level measurements from each condition as a sample of an underlying distribution. With appropriate selection of a distance metric, all effects can be embedded in a fixed-dimensionality Euclidean basis, facilitating identification and clustering of biologically interesting outliers. We demonstrate that measurement of distances with the Hellinger distance metric offers substantial computational efficiencies over alternative metrics. We validate this methodology using an RNA interference (RNAi) screen in mouse embryonic stem cells (ESC) with a Nanog reporter. The methodology clusters effects of multiple control siRNAs into their true identities better than conventional approaches describing the median cell fluorescence or the commonly used Kolmogorov-Smirnov distance between the observed fluorescence distribution and the null distribution. It identifies outlier genes with effects on the reporter distribution that would have been missed by other methods. Among them, siRNA targeting Chek1 leads to a wider Nanog reporter fluorescence distribution. Similarly, siRNA targeting Med14 or Med27 leads to a narrower Nanog reporter fluorescence distribution. We confirm the roles of these three genes in regulating pluripotency by mRNA expression and alkaline phosphatase staining using independent short hairpin (sh) RNAs. Conclusions Using our methodology, we describe each experimental condition by a probability distribution. Measuring distances between probability distributions permits a multivariate rather than univariate readout. Clustering points derived from these distances allows us to obtain greater biological insight than methods based solely on single parameters. We find several outliers from a mouse ESC RNAi screen that we confirm to be pluripotency regulators. Many of these outliers would have been missed by other analysis methods. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0636-7) contains supplementary material, which is available to authorized users.
- Published
- 2015
29. Increased Nonstationarity of Neonatal Heart Rate Before the Clinical Diagnosis of Sepsis
- Author
-
Cao, Hanqing, Lake, Douglas E., Griffin, M. Pamela, and Moorman, J. Randall
- Published
- 2004
- Full Text
- View/download PDF
30. Law recognitions by information criteria for the statistical modeling of small scale fading of the radio mobile channel
- Author
-
Christian Olivier, Olivier Alata, Yannis Pousset, Laboratoire Hubert Curien [Saint Etienne] (LHC), Institut d'Optique Graduate School (IOGS)-Université Jean Monnet [Saint-Étienne] (UJM)-Centre National de la Recherche Scientifique (CNRS), SIC (XLIM-SIC), Université de Poitiers-XLIM (XLIM), and Université de Limoges (UNILIM)-Centre National de la Recherche Scientifique (CNRS)-Université de Limoges (UNILIM)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Kernel density estimation ,Probability density function ,02 engineering and technology ,Model selection ,01 natural sciences ,010104 statistics & probability ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Histogram ,0202 electrical engineering, electronic engineering, information engineering ,Fading ,0101 mathematics ,Electrical and Electronic Engineering ,Radio mobile propagation channel ,Mathematics ,Cumulative distribution function ,Probability density function approximation (histogram ,020206 networking & telecommunications ,Statistical model ,Mixture model ,Gaussian mixture) ,Kernel method ,Kolmogorov-Smirnov distance ,Control and Systems Engineering ,Law ,Signal Processing ,Probability distribution ,Information criteria ,Computer Vision and Pattern Recognition ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Software - Abstract
Information criteria based methods are proposed to select the best probability law to model the distribution of samples resulting from the small-scale fading of the propagation channel. The first is based on the estimation of an optimal histogram approximating the probability density function. The second one employs the direct use of an information criterion. Indeed, the modelling of the radio mobile channel small-scale fading is crucial in digital communications. It is the reason why several propagation models have been implemented to take into account the electromagnetic phenomena inherent in radio wave channels. Amongst these models is the family of statistical distributions which is rapid in computation time. In the context of this study our concern is to find, among different probability laws, the one which best coincides with radio channel behaviour. The experimental results show that the proposed methods are better than those methods already employed, such as the classical Kolmogorov-Smirnov test using cumulative distribution functions, or methods using different estimators of probability density functions, like the kernel density estimator and the Gaussian mixture model. Results are provided in supervised and unsupervised contexts. Highlights? Accurate results in supervised and unsupervised law recognition from samples. ? Histogram-based method using Information criterion in supervised case. ? Information criterion based method in unsupervised case. ? Comparison with other non-parametric and parametric methods. ? Samples with ground truth and simulated from LOS and NLOS communications.
- Published
- 2013
31. Sharp Rates of Convergence of Minimum Penalized Distance Estimators
- Author
-
Reiss, R.-D.
- Published
- 1986
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.