118 results on '"Fokianos, K."'
Search Results
2. Testing independence for multivariate time series via the auto-distance correlation matrix
- Author
-
FOKIANOS, K. and PITSILLOU, M.
- Published
- 2018
3. Consistent Testing for Pairwise Dependence in Time Series
- Author
-
Fokianos, K. and Pitsillou, M.
- Published
- 2017
4. Increased radiation exposure by granite used as natural tiling rock in Cypriot houses
- Author
-
Fokianos, K., Sarrou, I., and Pashalidis, I.
- Published
- 2007
- Full Text
- View/download PDF
5. Binary time series models driven by a latent process
- Author
-
Fokianos, K., Moysiadis, T., Fokianos, K., and Moysiadis, T.
- Abstract
The problem of ergodicity, stationarity and maximum likelihood estimation is studied for binary time series models that include a latent process. General models are considered, covered by different specifications of a link function. Maximum likelihood estimation is discussed and it is shown that the MLE satisfies standard asymptotic theory. The logistic and probit models, routinely employed for the analysis of binary time series data, are of special importance in this study. The results are applied to simulated and real data.
- Published
- 2017
6. The effect of resveratrol on hypertension:A clinical trial
- Author
-
Theodotou, M., Fokianos, K., Mouzouridou, A., Konstantinou, C., Aristotelous, A., Prodromou, D., Chrysikou, A., Theodotou, M., Fokianos, K., Mouzouridou, A., Konstantinou, C., Aristotelous, A., Prodromou, D., and Chrysikou, A.
- Abstract
The aim of this clinical trial was to investigate the effects of Evelor, a micronized formulation of resveratrol (RESV; 3,5,4'-trihydroxy-trans-stilbene), in patients with primary hypertension. RESV is a stilbenoid and phytoalexin produced by several plants in response to injury or attack by pathogens, such as bacteria and fungi. Patients included in the clinical trial were split into the following two groups, based on the severity of their disease: Group A (n=46), stage I hypertension [systolic blood pressure (SBP), 140‑159 mmHg; diastolic blood pressure (DBP), 90‑99 mmHg] and Group B (n=51), stage II hypertension (SBP, 160‑179 mmHg; DBP, 100‑109 mmHg). Each group was divided into two subgroups: A1 and B1, patients treated with standard antihypertensive therapy (A1, 10 mg Dapril; B1, 20 mg Dapril), and A2 and B2, patients treated with antihypertensive therapy (Dapril) plus Evelor. The present study aimed to determine the effects of Evelor, in addition to the standard hypertension treatment, and its effect on the hepatic enzymes serum glutamate-pyruvate transaminase (SGPT) and gamma‑glutamyl transferase (gamma‑GT). Following the trial, which lasted two years (October 2010 to October 2012), the mean blood pressure of both groups lay within the normal range, indicating that blood pressure was efficiently controlled. The results of the present study demonstrate that the addition of RESV to standard antihypertensive therapy is sufficient to reduce blood pressure to normal levels, without the need for additional antihypertensive drugs. In addition, statistical analysis of the results identified a significant reduction in plasma concentration levels of SGPT (P<0.001) and gamma‑GT (P<0.001) with the addition of RESV, indicating that RESV prevents liver damage.
- Published
- 2017
7. Editorial for the special issue in honour of Paul Doukhan
- Author
-
Bardet, J.-M., Fokianos, K., Neumann, M.H., Bardet, J.-M., Fokianos, K., and Neumann, M.H.
- Published
- 2017
8. Asymptotic properties of quasi-maximum likelihood estimators in observation-driven time series models∗
- Author
-
Douc, R., Fokianos, K., Moulines, E., Douc, R., Fokianos, K., and Moulines, E.
- Abstract
We study a general class of quasi-maximum likelihood estimators for observation-driven time series models. Our main focus is on models related to the exponential family of distributions like Poisson based models for count time series or duration models. However, the proposed approach is more general and covers a variety of time series models including the ordinary GARCH model which has been studied extensively in the literature. We provide general conditions under which quasi-maximum likelihood estimators can be analyzed for this class of time series models and we prove that these estimators are consistent and asymptotically normally distributed regardless of the true data generating process. We illustrate our results using classical examples of quasi-maximum likelihood estimation including standard GARCH models, duration models, Poisson type autoregressions and ARMA models with GARCH errors. Our contribution unifies the existing theory and gives conditions for proving consistency and asymptotic normality in a variety of situations.
- Published
- 2017
9. On Locally Dyadic Stationary Processes
- Author
-
Moysiadis, T., Fokianos, K., Moysiadis, T., and Fokianos, K.
- Abstract
We introduce the concept of local dyadic stationarity, to account for nonstationary time series, within the framework of Walsh-Fourier analysis. We define and study time-varying, dyadic, autoregressive, moving average (tvDARMA) models. It is proven that the general tvDARMA process can be approximated locally by either a time-varying dyadic moving average and a time-varying dyadic autoregressive processes.
- Published
- 2017
10. tscount:An R package for analysis of count time series following generalized linear models
- Author
-
Liboschik, T., Fokianos, K., Fried, R., Liboschik, T., Fokianos, K., and Fried, R.
- Abstract
The R package tscount provides likelihood-based estimation methods for analysis and modeling of count time series following generalized linear models. This is a flexible class of models which can describe serial correlation in a parsimonious way. The conditional mean of the process is linked to its past values, to past observations and to potential covariate effects. The package allows for models with the identity and with the logarithmic link function. The conditional distribution can be Poisson or negative binomial. An important special case of this class is the so-called INGARCH model and its log-linear extension. The package includes methods for model fitting and assessment, prediction and intervention analysis. This paper summarizes the theoretical background of these methods. It gives details on the implementation of the package and provides simulation results for models which have not been studied theoretically before. The usage of the package is illustrated by two data examples. Additionally, we provide a review of R packages which can be used for count time series analysis. This includes a detailed comparison of tscount to those packages.
- Published
- 2017
11. Binary and Count Time Series Analysis
- Author
-
Fokianos, K.
- Subjects
ОБЩЕСТВЕННЫЕ НАУКИ::Информатика [ЭБ БГУ] - Abstract
PLENARY LECTURES
- Published
- 2016
12. Mallows’ quasi-likelihood estimation for log-linear Poisson autoregressions
- Author
-
Kitromilidou, S., Fokianos, K., Kitromilidou, S., and Fokianos, K.
- Abstract
We consider the problems of robust estimation and testing for a log-linear model with feedback for the analysis of count time series. We study inference for contaminated data with transient shifts, level shifts and additive outliers. It turns out that the case of additive outliers deserves special attention. We propose a robust method for estimating the regression coefficients in the presence of interventions. The resulting robust estimators are asymptotically normally distributed under some regularity conditions. A robust score type test statistic is also examined. The methodology is applied to real and simulated data.
- Published
- 2016
13. Modelling interventions in INGARCH processes
- Author
-
Liboschik, T., Kerschke, P., Fokianos, K., Fried, R., Liboschik, T., Kerschke, P., Fokianos, K., and Fried, R.
- Abstract
We study different approaches for modelling intervention effects in time series of counts, focusing on the so-called integer-valued GARCH models. A previous study treated a model where an intervention affects the non-observable underlying mean process at the time point of its occurrence and additionally the whole process thereafter via its dynamics. As an alternative, we consider a model where an intervention directly affects the observation at its occurrence, but not the underlying mean, and then also enters the dynamics of the process. While the former definition describes an internal change of the system, the latter can be understood as an external effect on the observations due to e.g. immigration. For our alternative model we develop conditional likelihood estimation and, based on this, tests and detection procedures for intervention effects. Both models are compared analytically and using simulated and real data examples. We study the effect of model misspecification and computational issues.
- Published
- 2016
14. DCovTS: Distance covariance/correlation for time series
- Author
-
Pitsillou, M., Fokianos, K., Pitsillou, M., and Fokianos, K.
- Abstract
The distance covariance function is a new measure of dependence between random vectors. We drop the assumption of iid data to introduce distance covariance for time series. The R package dCovTS provides functions that compute and plot distance covariance and correlation functions for both univariate and multivariate time series. Additionally it includes functions for testing serial independence based on distance covariance. This paper describes the theoretical background of distance covariance methodology in time series and discusses in detail the implementation of these methods with the R package dCovTS.
- Published
- 2016
15. Robust estimation methods for a class of log-linear count time series models
- Author
-
Kitromilidou, S., Fokianos, K., Kitromilidou, S., and Fokianos, K.
- Abstract
We study robust estimation of a log-linear Poisson model for count time series analysis. More specifically, we study robust versions of maximum likelihood estimators (MLEs) under three different forms of interventions: additive outliers (AOs), transient shifts (TSs) and level shifts (LSs). We estimate the parameters using the MLE, the conditionally unbiased bounded-influence estimator and the Mallows quasi-likelihood estimator and compare all three estimators in terms of their mean-square error, bias and mean absolute error. Our empirical results illustrate that under a LS or a TS there are no significant differences among the three estimators and the most interesting results are obtained in the presence of AOs. The results are complemented by a real data example.
- Published
- 2016
16. Likelihood Estimation for the INAR(p) Model by Saddlepoint Approximation
- Author
-
Pedeli, X., Davison, A.C., Fokianos, K., Pedeli, X., Davison, A.C., and Fokianos, K.
- Abstract
Saddlepoint techniques have been used successfully in many applications, owing to the high accuracy with which they can approximate intractable densities and tail probabilities. This article concerns their use for the estimation of high-order integer-valued autoregressive, INAR(p), processes. Conditional least squares estimation and maximum likelihood estimation have been proposed for INAR(p) models, but the first is inefficient for estimating parametric models, and the second becomes difficult to implement as the order p increases. We propose a simple saddlepoint approximation to the log-likelihood that performs well even in the tails of the distribution and with complicated INAR models. We consider Poisson and negative binomial innovations, and show empirically that the estimator that maximises the saddlepoint approximation behaves very similarly to the maximum likelihood estimator in realistic settings. The approach is applied to data on meningococcal disease counts. Supplementary materials for this article are available online.
- Published
- 2015
17. On count time series prediction
- Author
-
Christou, V., Fokianos, K., Christou, V., and Fokianos, K.
- Abstract
We consider the problem of assessing prediction for count time series based on either the Poisson distribution or the negative binomial distribution. By a suitable parametrization we employ both distributions with the same mean. We regress the mean on its past values and the values of the response and after obtaining consistent estimators of the regression parameters, regardless of the response distribution, we employ different criteria to study the prediction problem. We show by simulation and data examples that scoring rules and diagnostic graphs that have been proposed for independent but not identically distributed data can be adapted in the setting of count dependent data.
- Published
- 2015
18. Estimation and testing linearity for non-linear mixed Poisson autoregressions
- Author
-
Christou, V., Fokianos, K., Christou, V., and Fokianos, K.
- Abstract
Non-linear mixed Poisson autoregressive models are studied for the analysis of count time series. Given a correct mean specification of the model, we discuss quasi maximum likelihood estimation based on Poisson log-likelihood function. A score testing procedure for checking linearity of the mean process is developed. We consider the cases of identifiable and non identifiable parameters under the null hypothesis. When the parameters are identifiable then a chi-square approximation to the distribution of the score test is obtained. In the case of non identifiable parameters, a supremum score type test statistic is employed for checking linearity of the mean process. The methodology is applied to simulated and real data.
- Published
- 2015
19. Semiparametric inference for the two-way layout under order restrictions
- Author
-
Davidov, O., Fokianos, K., Iliopoulos, G., Davidov, O., Fokianos, K., and Iliopoulos, G.
- Abstract
There are many situations in which a researcher would like to analyse data from a two‐way layout. Often, the assumptions of linearity and normality may not hold. To address such situations, we introduce a semiparametric model. The model extends the well‐known density ratio model from the one‐way to the two‐way layout and provides a useful framework for semiparametric analysis of variance type problems under order restrictions. In particular, the likelihood ratio order is emphasized. The model enables highly efficient inference without resorting to fully parametric assumptions or the use of transformations. Estimation and testing procedures under order restrictions are developed and investigated in detail. It is shown that the model is robust to misspecification, and several simulations suggest that it performs well in practice. The methodology is illustrated using two data examples; in the first, the response variable is discrete, whereas in the second, it is continuous.
- Published
- 2014
20. Quasi-likelihood inference for negative binomial time series models
- Author
-
Christou, V., Fokianos, K., Christou, V., and Fokianos, K.
- Abstract
We study inference and diagnostics for count time series regression models that include a feedback mechanism. In particular, we are interested in negative binomial processes for count time series. We study probabilistic properties and quasi‐likelihood estimation for this class of processes. We show that the resulting estimators are consistent and asymptotically normally distributed. These facts enable us to construct probability integral transformation plots for assessing any assumed distributional assumptions. The key observation in developing the theory is a mean parameterized form of the negative binomial distribution. For transactions data, it is seen that the negative binomial distribution offers a better fit than the Poisson distribution. This is an immediate consequence of the fact that transactions can be represented as a collection of individual activities that correspond to different trading strategies.
- Published
- 2014
21. Retrospective change detection for binary time series models
- Author
-
Fokianos, K., Gombay, E., Hussein, A., Fokianos, K., Gombay, E., and Hussein, A.
- Abstract
Detection of changes in health care performance, financial markets, and industrial processes have recently gained momentum due to the increased availability of complex data in real-time. As a consequence, there has been a growing demand in developing statistically rigorous methodologies for change-point detection in various types of data. In many practical situations, the data being monitored for the purpose of detecting changes are autocorrelated binary time series. We propose a new statistical procedure based on the partial likelihood score process for the retrospective detection of change in the coefficients of a logistic regression model with AR(p)-type autocorrelations. We carry out some Monte Carlo experiments to evaluate the power of the detection procedure as well as its probability of false alarm (type I error). We illustrate the utility using data on 30-day mortality rates after cardiac surgery and to data on IBM share transactions.
- Published
- 2014
22. On binary and categorical time series models with feedback
- Author
-
Moysiadis, T., Fokianos, K., Moysiadis, T., and Fokianos, K.
- Abstract
We study the problem of ergodicity, stationarity and maximum likelihood estimation for multinomial logistic models that include a latent process. Our work includes various models that have been proposed for the analysis of binary and, more general, categorical time series. We give verifiable ergodicity and stationarity conditions for the analysis of such time series data. In addition, we study maximum likelihood estimation and prove that, under mild conditions, the estimator is asymptotically normally distributed. These results are applied to real and simulated data.
- Published
- 2014
23. A goodness-of-fit test for Poisson count processes
- Author
-
Fokianos, K., Neumann, M.H., Fokianos, K., and Neumann, M.H.
- Abstract
We are studying a novel class of goodness-of-fit tests for parametric count time series regression models. These test statistics are formed by considering smoothed versions of the empirical process of the Pearson residuals. Our construction yields test statistics which are consistent against Pitman’s local alternatives and they converge weakly at the usual parametric rate. To approximate the asymptotic null distribution of the test statistics, we propose a parametric bootstrap method and we study its properties. The methodology is applied to simulated and real data.
- Published
- 2013
24. Retrospective Bayesian outlier detection in INGARCH series
- Author
-
Fried, R., Agueusop, I., Bornkamp, B., Fokianos, K., Fruth, J., Ickstadt, K., Fried, R., Agueusop, I., Bornkamp, B., Fokianos, K., Fruth, J., and Ickstadt, K.
- Abstract
INGARCH models for time series of counts arising, e.g., in epidemiology or finance assume the observations to be Poisson distributed conditionally on the past, with the conditional mean being an affine-linear function of the previous observations and the previous conditional means. We model outliers within such processes, assuming that we observe a contaminated process with additive Poisson distributed contamination, affecting each observation with a small probability. Our particular concern are additive outliers, which do not enter the dynamics of the process and can represent measurement artifacts and other singular events influencing a single observation. Retrospective analysis of such outliers is difficult within a non-Bayesian framework since the uncontaminated values entering the dynamics of the process at contaminated time points are unobserved. We propose a Bayesian approach to outlier modeling in INGARCH processes, approximating the posterior distribution of the model parameters by application of a componentwise Metropolis-Hastings algorithm. Analyzing real and simulated data sets, we find Bayesian outlier detection with non-informative priors to work well in practice when there are some outliers in the data.
- Published
- 2013
25. Aspect in the L2 and L3 Acquisition of Greek
- Author
-
Gabryś-Barker, D, Karpava, S, Grohmann, K, Fokianos, K, Gabryś-Barker, D, Karpava, S, Grohmann, K, and Fokianos, K
- Abstract
This paper investigates different facets of the second language acquisition of Modern Greek by native speakers of Russian and Georgian, both adults and children, in the domain of aspectual marking in embedded clauses. The study investigates experimentally the inte-raction of lexical and grammatical aspect in those embedded sentential environments which are a locus of difference between Modern Greek and Russian: The former permits only per-fective aspect of the finite complement verb in the context under consideration, while the latter allows either perfective or imperfective aspect of the infinitival complement verb. The results of the experimental study reveal that L2 learners can reach native-like attainment, though there is L1 interference at the initial stage of L2 acquisition, thus providing evi-dence in support of the Full Transfer/Full Access Hypothesis. The large number of partici-pants and the different groups investigated further allow us to distinguish other variables re-levant for L2 acquisition, such as age of onset, length of residence, and so on, which were gathered through a detailed language history questionnaire. The results are interpreted sta-tistically for all relevant facets of the languages and participants involved, shedding some light on a number of intertwined issues involved in (early vs late) L2/L3 acquisition.
- Published
- 2012
26. On weak dependence conditions:The case of discrete valued processes
- Author
-
Doukhan, P., Fokianos, K., Li, X., Doukhan, P., Fokianos, K., and Li, X.
- Abstract
We investigate the relationship between weak dependence and mixing for discrete valued processes. We show that weak dependence implies mixing conditions under natural assumptions. The results specialize to the case of Markov processes. Several examples of integer valued processes are discussed and their weak dependence properties are investigated by means of a contraction principle. In fact, we show the stronger result that the mixing coefficients for infinite memory weakly dependent models decay geometrically fast. Hence, all integer values models that we consider have weak dependence coefficients which decay geometrically fast.
- Published
- 2012
27. Nonlinear Poisson autoregression
- Author
-
Fokianos, K., Tjøstheim, D., Fokianos, K., and Tjøstheim, D.
- Abstract
We study statistical properties of a class of non-linear models for regression analysis of count time series. Under mild conditions, it is shown that a perturbed version of the model is geometrically ergodic and possesses moments of any order. This result turns out to be instrumental on deriving large sample properties of the maximum likelihood estimators of the regression parameters. The theory is illustrated with examples.
- Published
- 2012
28. Interventions in log-linear Poisson autoregression
- Author
-
Fokianos, K., Fried, R., Fokianos, K., and Fried, R.
- Abstract
We consider the problem of estimating and detecting outliers in count time series data following a log-linear observation driven model. Log-linear models for count time series arise naturally because they correspond to the canonical link function of the Poisson distribution. They yield both positive and negative dependence, and covariate information can be conveniently incorporated. Within this framework, we establish test procedures for detection of unusual events (‘interventions’) leading to different kinds of outliers, we implement joint maximum likelihood estimation of model parameters and outlier sizes and we derive formulae for correcting the data for detected interventions. The effectiveness of the proposed methodology is illustrated with two real data examples. The first example offers a fresh data analytic point of view towards the polio data. Our methodology identifies different forms of outliers in these data by an observation-driven model. The second example deals with some campylobacterosis data which we analyzed in a previous communication, by a different model. The results are reconfirmed by the new model that we put forward in this communication. The reliability of the procedure is verified using artificial data examples.
- Published
- 2012
29. On weak dependence conditions for Poisson autoregressions
- Author
-
Doukhan, P., Fokianos, K., Tjøstheim, D., Doukhan, P., Fokianos, K., and Tjøstheim, D.
- Abstract
We consider generalized linear models for regression modeling of count time series. We give easily verifiable conditions for obtaining weak dependence for such models. These results enable the development of maximum likelihood inference under minimal conditions. Some examples which are useful to applications are discussed in detail.
- Published
- 2012
30. Biological applications of time series frequency domain clustering
- Author
-
Fokianos, K., Promponas, V.J., Fokianos, K., and Promponas, V.J.
- Abstract
Clustering methods are used routinely to form groups of objects with similar characteristics. Collections of time series datasets appear in several biological applications. Some of these applications require grouping the observed time series data to homogeneous clusters. We review methods for time series frequency domain based clustering with emphasis on applications. Our point of view is that an appropriate notion of clustering for time series data can be developed by means of the spectral density function and its sample counterpart, the periodogram. For the development of frequency domain based clustering algorithms, it is required to define suitable similarity (or dissimilarity) measures. We review several such measures and we discuss various clustering algorithms in this context. Biological applications of time series frequency domain clustering are studied along with interesting complementary approaches.
- Published
- 2012
31. Comments on: Some recent theory for autoregressive count time series
- Author
-
Fokianos, K. and Fokianos, K.
- Published
- 2012
32. Count Time Series Models
- Author
-
Suba Rao, Tatta, Subba Rao, Suhasini, Rao, C.R., Fokianos, K., Suba Rao, Tatta, Subba Rao, Suhasini, Rao, C.R., and Fokianos, K.
- Published
- 2012
33. Kernel Discrimination and Explicative Features: an Operative Approach
- Author
-
Colubi, A, Fokianos, K, Gonza ́lez-Rodrıguez, J, Kontoghiorghes, EJ, Liberati, C, Camillo, F, Saporta, G, LIBERATI, CATERINA, Saporta, G., Colubi, A, Fokianos, K, Gonza ́lez-Rodrıguez, J, Kontoghiorghes, EJ, Liberati, C, Camillo, F, Saporta, G, LIBERATI, CATERINA, and Saporta, G.
- Abstract
Kernel-based methods such as SVMs and LS-SVMs have been successfully used for solving various supervised classification and pattern recognition problems in machine learning. Unfortunately, they are heavily dependent on the choice of the optimal kernel function and from tuning parameters. Their solutions, in fact, suffer of complete lack of interpretation in terms of input variables. That is not a banal problem, especially when the learning task is related with a critical asset of a business, like credit scoring, where deriving a classification rule has to respect an international regulation. The following strategy is proposed for solving problems using categorical predictors: replace the predictors by components issued from MCA, choice of the best kernel among several ones (linear ,RBF, Laplace, Cauchy, etc.), approximation of the classifier through a linear model. The loss of performance due to such approximation is balanced by better interpretability for the end user, employed in order to understand and to rank the influence of each category of the variables set in the prediction. This strategy has been applied to real risk-credit data of small enterprises. Cauchy kernel was found the best and leads to a score much more efficient than classical ones, even after approximation.
- Published
- 2012
34. Some recent progress in count time series
- Author
-
Fokianos, K. and Fokianos, K.
- Abstract
We review some regression models for the analysis of count time series. These models have been the focus of several investigations over the last years, but only recently simple conditions for stationarity and ergodicity were worked out in detail. This advancement makes possible the development of the maximum-likelihood estimation theory under minimal assumptions.
- Published
- 2011
35. Log-linear Poisson autoregression
- Author
-
Fokianos, K., Tjøstheim, D., Fokianos, K., and Tjøstheim, D.
- Abstract
We consider a log-linear model for time series of counts. This type of model provides a framework where both negative and positive association can be taken into account. In addition time dependent covariates are accommodated in a straightforward way. We study its probabilistic properties and maximum likelihood estimation. It is shown that a perturbed version of the process is geometrically ergodic, and, under some conditions, it approaches the non-perturbed version. In addition, it is proved that the maximum likelihood estimator of the vector of unknown parameters is asymptotically normal with a covariance matrix that can be consistently estimated. The results are based on minimal assumptions and can be extended to the case of log-linear regression with continuous exogenous variables. The theory is applied to aggregated financial transaction time series. In particular, we discover positive association between the number of transactions and the volatility process of a certain stock.
- Published
- 2011
36. Interventions in INGARCH processes
- Author
-
Fokianos, K., Fried, R., Fokianos, K., and Fried, R.
- Abstract
We study the problem of intervention effects generating various types of outliers in a linear count time‐series model. This model belongs to the class of observation‐driven models and extends the class of Gaussian linear time‐series models within the exponential family framework. Studies about effects of covariates and interventions for count time‐series models have largely fallen behind, because the underlying process, whose behaviour determines the dynamics of the observed process, is not observed. We suggest a computationally feasible approach to these problems, focusing especially on the detection and estimation of sudden shifts and outliers. We consider three different scenarios, namely the detection of an intervention effect of a known type at a known time, the detection of an intervention effect when the type and the time are both unknown and the detection of multiple intervention effects. We develop score tests for the first scenario and a parametric bootstrap procedure based on the maximum of the different score test statistics for the second scenario. The third scenario is treated by a stepwise procedure, where we detect and correct intervention effects iteratively. The usefulness of the proposed methods is illustrated using simulated and real data examples.
- Published
- 2010
37. Spectral density ratio based clustering methods for the binary segmentation of protein sequences:A comparative study
- Author
-
Ioannou, A., Fokianos, K., Promponas, V.J., Ioannou, A., Fokianos, K., and Promponas, V.J.
- Abstract
We compare several spectral domain based clustering methods for partitioning protein sequence data. The main instrument for this exercise is the spectral density ratio model, which specifies that the logarithmic ratio of two or more unknown spectral density functions has a parametric linear combination of cosines. Maximum likelihood inference is worked out in detail and it is shown that its output yields several distance measures among independent stationary time series. These similarity indices are suitable for clustering time series data based on their second order properties. Other spectral domain based distances are investigated as well; and we compare all methods and distances to the problem of producing segmentations of bacterial outer membrane proteins consistent with their transmembrane topology. Protein sequences are transformed to time series data by employing numerical scales of physicochemical parameters. We also present interesting results on the prediction of transmembrane -strands, based on the clustering outcome, for a representative set of bacterial outer membrane proteins with given three-dimensional structure.
- Published
- 2010
38. Order-restricted semiparametric inference for the power bias model
- Author
-
Davidov, O., Fokianos, K., Iliopoulos, G., Davidov, O., Fokianos, K., and Iliopoulos, G.
- Abstract
The power bias model, a generalization of length‐biased sampling, is introduced and investigated in detail. In particular, attention is focused on order‐restricted inference. We show that the power bias model is an example of the density ratio model, or in other words, it is a semiparametric model that is specified by assuming that the ratio of several unknown probability density functions has a parametric form. Estimation and testing procedures under constraints are developed in detail. It is shown that the power bias model can be used for testing for, or against, the likelihood ratio ordering among multiple populations without resorting to any parametric assumptions. Examples and real data analysis demonstrate the usefulness of this approach.
- Published
- 2010
39. Spectral estimation
- Author
-
Fokianos, K. and Fokianos, K.
- Abstract
We review spectral analysis and its application in inference for stationary processes. As can be seen from the list of references, the practice of spectral analysis is widespread in diverse scientific and engineering fields, particularly in signal processing and communications. One of the most striking characteristics of time series is their oscillatory behavior. This behavior is manifested, for example, in electroencephalogram (EEG) records, weekly sales, monthly environmental data, hourly financial indices, and in numerous economic data observed periodically in time. When observing such data the intuitive notion of periodicity is inescapable, and this led to the statistical problem of estimation of ‘hidden periodicities’ in time series. Schuster [47] was among the first who studied the problem seriously, and is credited with the invention of the so‐called periodogram, a tool for discovering periodicities in oscillatory data. Consequently, spectral analysis and its ramification was further advanced by the pioneering works of Slutsky, Yule, Khintchine, Wiener, Cramer, Kolmogorov, Bartlett, Tukey, Parzen, Rosenblatt, Grenander, Koopmans, Brillinger, and Hannan. The goal of this communication is to introduce the reader to the topic of spectral analysis, and to review some state‐of‐the‐art developments. It is of course not possible to give a full account of the literature on spectral analysis within this limited space. The selection of the references has been influenced by my own personal research interests.
- Published
- 2010
40. Integer-valued time series
- Author
-
Fokianos, K. and Fokianos, K.
- Abstract
Integer‐valued time series data appear in several diverse applications. However, modeling and inference for these types of dependent data pose several questions and interesting problems. The method of generalized linear models turns out to provide a sound framework for modeling and estimation whereby all computations are carried out by well‐established software. I review this area of research and propose some other models.
- Published
- 2009
41. Poisson autoregression
- Author
-
Fokianos, K., Rahbek, A., Tjøstheim, D., Fokianos, K., Rahbek, A., and Tjøstheim, D.
- Abstract
In this article we consider geometric ergodicity and likelihood-based inference for linear and nonlinear Poisson autoregression. In the linear case, the conditional mean is linked linearly to its past values, as well as to the observed values of the Poisson process. This also applies to the conditional variance, making possible interpretation as an integer-valued generalized autoregressive conditional heteroscedasticity process. In a nonlinear conditional Poisson model, the conditional mean is a nonlinear function of its past values and past observations. As a particular example, we consider an exponential autoregressive Poisson model for time series. Under geometric ergodicity, the maximum likelihood estimators are shown to be asymptotically Gaussian in the linear model. In addition, we provide a consistent estimator of their asymptotic covariance matrix. Our approach to verifying geometric ergodicity proceeds via Markov theory and irreducibility. Finding transparent conditions for proving ergodicity turns out to be a delicate problem in the original model formulation. This problem is circumvented by allowing a perturbation of the model. We show that as the perturbations can be chosen to be arbitrarily small, the differences between the perturbed and nonperturbed versions vanish as far as the asymptotic distribution of the parameter estimates is concerned. This article has supplementary material online.
- Published
- 2009
42. Safe density ratio modeling
- Author
-
Konis, K., Fokianos, K., Konis, K., and Fokianos, K.
- Abstract
An important problem in logistic regression modeling is the existence of the maximum likelihood estimators. In particular, when the sample size is small, the maximum likelihood estimator of the regression parameters does not exist if the data are completely, or quasicompletely separated. Recognizing that this phenomenon has a serious impact on the fitting of the density ratio model–which is a semiparametric model whose profile empirical log-likelihood has the logistic form because of the equivalence between prospective and retrospective sampling–we suggest a linear programming methodology for examining whether the maximum likelihood estimators of the finite dimensional parameter vector of the model exist. It is shown that the methodology can be effectively utilized in the analysis of case–control gene expression data by identifying cases where the density ratio model cannot be applied. It is demonstrated that naive application of the density ratio model yields erroneous conclusions.
- Published
- 2009
43. Comparing two samples by penalized logistic regression
- Author
-
Fokianos, K. and Fokianos, K.
- Abstract
Inference based on the penalized density ratio model is proposed and studied. The model under consideration is specified by assuming that the log–likelihood function of two unknown densities is of some parametric form. The model has been extended to cover multiple samples problems while its theoretical properties have been investigated using large sample theory. A main application of the density ratio model is testing whether two, or more, distributions are equal. We extend these results by arguing that the penalized maximum empirical likelihood estimator has less mean square error than that of the ordinary maximum likelihood estimator, especially for small samples. In fact, penalization resolves any existence problems of estimators and a modified Wald type test statistic can be employed for testing equality of the two distributions. A limited simulation study supports further the theory.
- Published
- 2008
44. A note on Monte Carlo maximization by the density ratio model
- Author
-
Fokianos, K., Qin, J., Fokianos, K., and Qin, J.
- Abstract
It is well known that untractable normalizing constants of probability density functions complicate the calculation of maximum likelihood estimators. Usually numerical or Monte Carlo methods are employed in order to obtain an approximation to the solution of the likelihood equations. We propose a new statistical method for carrying out the calculations regarding maximum likelihood estimation by avoiding the explicit calculation of any normalizing constant. We formulate the problem within the framework of semiparametric maximum likelihood estimation for a two samples model, where the ratio of two densities is known up to some parameters, but the form of the two densities are unknown and one of the sample sizes can be chosen arbitrarily large. The two-sample semiparametric model-which is referred as density ratio model-arises naturally in case-control studies. Statistical inference techniques are developed for this model. Comparisons between the proposed method and the conventional estimated pseudo-likelihood method are studied.
- Published
- 2008
45. Clustering of biological time series by cepstral coefficients based distances
- Author
-
Savvides, A., Promponas, V.J., Fokianos, K., Savvides, A., Promponas, V.J., and Fokianos, K.
- Abstract
Clustering of stationary time series has become an important tool in many scientific applications, like medicine, finance, etc. Time series clustering methods are based on the calculation of suitable similarity measures which identify the distance between two or more time series. These measures are either computed in the time domain or in the spectral domain. Since the computation of time domain measures is rather cumbersome we resort to spectral domain methods. A new measure of distance is proposed and it is based on the so-called cepstral coefficients which carry information about the log spectrum of a stationary time series. These coefficients are estimated by means of a semiparametric model which assumes that the log-likelihood ratio of two or more unknown spectral densities has a linear parametric form. After estimation, the estimated cepstral distance measure is given as an input to a clustering method to produce the disjoint groups of data. Simulated examples show that the method yields good results, even when the processes are not necessarily linear. These cepstral-based clustering algorithms are applied to biological time series. In particular, the proposed methodology effectively identifies distinct and biologically relevant classes of amino acid sequences with the same physicochemical properties, such as hydrophobicity.
- Published
- 2008
46. On comparing several spectral densities
- Author
-
Fokianos, K., Savvides, A., Fokianos, K., and Savvides, A.
- Abstract
We investigated the problem of testing equality among spectral densities of several independent stationary processes. Our main methodological contribution is the introduction of a novel semiparametric log-linear model that links all of the spectral densities under consideration. This model is motivated by the asymptotic properties of the periodogram ordinates and specifies that the logarithmic ratio of G − 1 spectral density functions with respect to the Gth is linear in some parameters. Then the problem of testing equality of several spectral density functions is reduced to a parametric problem. Under this assumption, the large-sample theory of the maximum likelihood estimator was studied, and it was found that the estimator is asymptotically normal even when the model is misspecified. The development of the asymptotic theory is based on a new contrast function that might be useful for other spectral domain time series problems. The results are applicable to a variety of models, including linear and nonlinear processes. Simulations and data analysis support further the theoretical findings.
- Published
- 2008
47. Density ratio model selection
- Author
-
Fokianos, K. and Fokianos, K.
- Abstract
The density ratio model presumes that the log-likelihood ratio of two unknown densities is of some known parametric linear form. However, the choice of the functional form has an impact on both estimation and testing. The problem of over/underfitting in the context of the density ratio model is examined and the theory shows that bias and loss of efficiency are introduced when the model is misspecified. The problem of identifying the appropriate functional form for an application of the density ratio model is addressed by means of model selection criteria, which perform reasonably well. Several simulations integrate the presentation.
- Published
- 2007
48. On the effect of misspecifying the density ratio model
- Author
-
Fokianos, K., Kaimi, I., Fokianos, K., and Kaimi, I.
- Abstract
The density ratio model specifies that the log-likelihood ratio of two unknown densities is of known linear form which depends on some finite dimensional parameters. The model can be broadened to allow for m-samples in a quite natural way. Estimation of both parametric and nonparametric part of the model is carried out by the method of empirical likelihood. However the assumed linear form has an impact on the estimation and testing for the parametric part. The goal of this study is to quantify the effect of choosing an incorrect linear form and its impact to inference. The issue of misspecification is addressed by embedding the unknown linear form to some parametric transformation family which yields ultimately to its identification. Simulated examples and data analysis integrate the presentation.
- Published
- 2006
49. A two-sample model for the comparison of radiation doses
- Author
-
Fokianos, K., Sarrou, I., Pashalidis, I., Fokianos, K., Sarrou, I., and Pashalidis, I.
- Abstract
The t-test is one of the most well known parametric statistical procedures that can be applied to the problem of two-sample comparison. However, it relies on several assumptions that might not be satisfied in practice and therefore alternative methods are called for. The contribution of this article is to present a relatively new technique for the comparison of two samples in the context of semiparametric statistical inference and apply the new method to the comparison of external radiation doses in the region of Cyprus. Accordingly, without specifying the parametric form of the distribution of the two samples, it is assumed that their log likelihood ratio is linear in some parameters. This in turn leads to empirical likelihood estimation and comparison. Real data analysis shows that the external dose rate does not vary upon the type of rock formation—a fact which does not hold for the terrestrial radiation.
- Published
- 2005
50. Statistical comparison of algorithms
- Author
-
Kedem, B., Wolff, D.B., Fokianos, K., Kedem, B., Wolff, D.B., and Fokianos, K.
- Abstract
A "reference" algorithm or instrument and its various "distortions" are considered, where the distortions carry some valid information about the reference. The objective is to combine data from the reference and the distortions together in some manner in order to extract information from both the reference, as well as the distortions, and produce improved inference about the true reference algorithm. This is illustrated in terms of m precipitation radars and semiparametric estimation of the reference distribution and the distortion parameters.
- Published
- 2004
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.