7 results on '"Sakhanenko, Nikita A."'
Search Results
2. The Information Content of Discrete Functions and Their Application in Genetic Data Analysis.
- Author
-
Sakhanenko, Nikita A., Kunert-Graf, James, and Galas, David J.
- Subjects
- *
DEPENDENT variables , *DATA analysis , *MATHEMATICAL variables , *DISCRETE geometry , *INFORMATION theory - Abstract
The complex of central problems in data analysis consists of three components: (1) detecting the dependence of variables using quantitative measures, (2) defining the significance of these dependence measures, and (3) inferring the functional relationships among dependent variables. We have argued previously that an information theory approach allows separation of the detection problem from the inference of functional form problem. We approach here the third component of inferring functional forms based on information encoded in the functions. We present here a direct method for classifying the functional forms of discrete functions of three variables represented in data sets. Discrete variables are frequently encountered in data analysis, both as the result of inherently categorical variables and from the binning of continuous numerical variables into discrete alphabets of values. The fundamental question of how much information is contained in a given function is answered for these discrete functions, and their surprisingly complex relationships are illustrated. The all-important effect of noise on the inference of function classes is found to be highly heterogeneous and reveals some unexpected patterns. We apply this classification approach to an important area of biological data analysis-that of inference of genetic interactions. Genetic analysis provides a rich source of real and complex biological data analysis problems, and our general methods provide an analytical basis and tools for characterizing genetic problems and for analyzing genetic data. We illustrate the functional description and the classes of a number of common genetic interaction modes and also show how different modes vary widely in their sensitivity to noise. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
3. Biological Data Analysis as an Information Theory Problem: Multivariable Dependence Measures and the Shadows Algorithm.
- Author
-
Sakhanenko, Nikita A. and Galas, David J.
- Subjects
- *
COMPUTATIONAL biology , *SYSTEMS biology , *BIOINFORMATICS , *COMPUTER simulation of biological systems , *APPLIED mathematics - Abstract
Information theory is valuable in multiple-variable analysis for being model-free and nonparametric, and for the modest sensitivity to undersampling. We previously introduced a general approach to finding multiple dependencies that provides accurate measures of levels of dependency for subsets of variables in a data set, which is significantly nonzero only if the subset of variables is collectively dependent. This is useful, however, only if we can avoid a combinatorial explosion of calculations for increasing numbers of variables. The proposed dependence measure for a subset of variables, τ , differential interaction information, Δ( τ ), has the property that for subsets of τ some of the factors of Δ( τ ) are significantly nonzero, when the full dependence includes more variables. We use this property to suppress the combinatorial explosion by following the 'shadows' of multivariable dependency on smaller subsets. Rather than calculating the marginal entropies of all subsets at each degree level, we need to consider only calculations for subsets of variables with appropriate 'shadows.' The number of calculations for n variables at a degree level of d grows therefore, at a much smaller rate than the binomial coefficient ( n, d), but depends on the parameters of the 'shadows' calculation. This approach, avoiding a combinatorial explosion, enables the use of our multivariable measures on very large data sets. We demonstrate this method on simulated data sets, and characterize the effects of noise and sample numbers. In addition, we analyze a data set of a few thousand mutant yeast strains interacting with a few thousand chemical compounds. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
4. Describing the Complexity of Systems: Multivariable 'Set Complexity' and the Information Basis of Systems Biology.
- Author
-
Galas, David J., Sakhanenko, Nikita A., Skupin, Alexander, and Ignac, Tomasz
- Subjects
- *
SYSTEMS biology , *GENE regulatory networks , *SET theory , *ENTROPY (Information theory) , *BIOLOGICAL systems , *HYPERGRAPHS - Abstract
Context dependence is central to the description of complexity. Keying on the pairwise definition of 'set complexity,' we use an information theory approach to formulate general measures of systems complexity. We examine the properties of multivariable dependency starting with the concept of interaction information. We then present a new measure for unbiased detection of multivariable dependency, 'differential interaction information.' This quantity for two variables reduces to the pairwise 'set complexity' previously proposed as a context-dependent measure of information in biological systems. We generalize it here to an arbitrary number of variables. Critical limiting properties of the 'differential interaction information' are key to the generalization. This measure extends previous ideas about biological information and provides a more sophisticated basis for the study of complexity. The properties of 'differential interaction information' also suggest new approaches to data analysis. Given a data set of system measurements, differential interaction information can provide a measure of collective dependence, which can be represented in hypergraphs describing complex system interaction patterns. We investigate this kind of analysis using simulated data sets. The conjoining of a generalized set complexity measure, multivariable dependency analysis, and hypergraphs is our central result. While our focus is on complex biological systems, our results are applicable to any complex system. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
5. Probabilistic Logic Methods and Some Applications to Biology and Medicine.
- Author
-
Sakhanenko, Nikita A. and Galas, David J.
- Subjects
- *
COMPUTATIONAL biology , *HIDDEN Markov models , *BIOINFORMATICS , *PROBABILISTIC inference , *MACHINE learning , *PROBABILITY theory - Abstract
For the computational analysis of biological problems-analyzing data, inferring networks and complex models, and estimating model parameters-it is common to use a range of methods based on probabilistic logic constructions, sometimes collectively called machine learning methods. Probabilistic modeling methods such as Bayesian Networks (BN) fall into this class, as do Hierarchical Bayesian Networks (HBN), Probabilistic Boolean Networks (PBN), Hidden Markov Models (HMM), and Markov Logic Networks (MLN). In this review, we describe the most general of these (MLN), and show how the above-mentioned methods are related to MLN and one another by the imposition of constraints and restrictions. This approach allows us to illustrate a broad landscape of constructions and methods, and describe some of the attendant strengths, weaknesses, and constraints of many of these methods. We then provide some examples of their applications to problems in biology and medicine, with an emphasis on genetics. The key concepts needed to picture this landscape of methods are the ideas of probabilistic graphical models, the structures of the graphs, and the scope of the logical language repertoire used (from First-Order Logic [FOL] to Boolean logic.) These concepts are interlinked and together define the nature of each of the probabilistic logic methods. Finally, we discuss the initial applications of MLN to genetics, show the relationship to less general methods like BN, and then mention several examples where such methods could be effective in new applications to specific biological and medical problems. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
6. Multivariate Analysis of Data Sets with Missing Values: An Information Theory-Based Reliability Function.
- Author
-
Uechi, Lisa, Galas, David J., and Sakhanenko, Nikita A.
- Subjects
- *
BIOLOGICAL systems , *INFORMATION theory , *MISSING data (Statistics) , *NUMERICAL analysis , *METAPHOR - Abstract
Missing values in complex biological data sets have significant impacts on our ability to correctly detect and quantify interactions in biological systems and to infer relationships accurately. In this article, we propose a useful metaphor to show that information theory measures, such as mutual information and interaction information, can be employed directly for evaluating multivariable dependencies even if data contain some missing values. The metaphor is that of thinking of variable dependencies as information channels between and among variables. In this view, missing data can be thought of as noise that reduces the channel capacity in predictable ways. We extract the available information in the data even if there are missing values and use the notion of channel capacity to assess the reliability of the result. This avoids the common practice—in the absence of prior knowledge of random imputation—of eliminating samples entirely, thus losing the information they can provide. We show how this reliability function can be implemented for pairs of variables, and generalize it for an arbitrary number of variables. Illustrations of the reliability functions for several cases are provided using simulated data. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
7. Toward an Information Theory of Quantitative Genetics.
- Author
-
Galas, David J., Kunert-graf, James, Uechi, Lisa, and Sakhanenko, Nikita A.
- Subjects
- *
QUANTITATIVE genetics , *BASE pairs , *PHENOTYPES , *INFORMATION measurement , *INFORMATION theory , *HERITABILITY , *EUGENICS - Abstract
Quantitative genetics has evolved dramatically in the past century, and the proliferation of genetic data, in quantity as well as type, enables the characterization of complex interactions and mechanisms beyond the scope of its theoretical foundations. In this article, we argue that revisiting the framework for analysis is important and we begin to lay the foundations of an alternative formulation of quantitative genetics based on information theory. Information theory can provide sensitive and unbiased measures of statistical dependencies among variables, and it provides a natural mathematical language for an alternative view of quantitative genetics. In the previous work, we examined the information content of discrete functions and applied this approach and methods to the analysis of genetic data. In this article, we present a framework built around a set of relationships that both unifies the information measures for the discrete functions and uses them to express key quantitative genetic relationships. Information theory measures of variable interdependency are used to identify significant interactions, and a general approach is described for inferring functional relationships in genotype and phenotype data. We present information-based measures of the genetic quantities: penetrance, heritability, and degrees of statistical epistasis. Our scope here includes the consideration of both two- and three-variable dependencies and independently segregating variants, which captures additive effects, genetic interactions, and two-phenotype pleiotropy. This formalism and the theoretical approach naturally apply to higher multivariable interactions and complex dependencies, and can be adapted to account for population structure, linkage, and nonrandomly segregating markers. This article thus focuses on presenting the initial groundwork for a full formulation of quantitative genetics based on information theory. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.