2,997 results on '"Compositional Data"'
Search Results
2. Spatial lag quantile regression for compositional data.
- Author
-
Zhao, Yizhen, Ma, Xuejun, and Chao, Yue
- Subjects
- *
LINEAR programming , *REGRESSION analysis , *COMPUTER simulation , *DATA modeling , *QUANTILE regression - Abstract
AbstractWhile research on spatial lag quantile regression models exists, the extension to incorporating compositional data within this framework remains unexplored. The unique characteristics of compositional data present significant issues in constructing such a model. In this paper, we investigate the estimation problems for the spatial lag quantile regression model in compositional data. We propose the constrained two-stage quantile regression (CTS-QR) and constrained instrumental variable quantile regression (CIV-QR) methods based on linear programming. Numerical simulations show that our proposed methods are more accurate than traditional unconstrained estimation methods. A real data application of our methods is also provided. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Debiased high-dimensional regression calibration for errors-in-variables log-contrast models.
- Author
-
Zhao, Huali and Wang, Tianying
- Subjects
- *
ERRORS-in-variables models , *ASYMPTOTIC normality , *MEASUREMENT errors , *REGRESSION analysis , *INFERENTIAL statistics - Abstract
Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models that involve compositional covariates. This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data. We introduce a calibration approach tailored for the linear log-contrast model. Under relatively lenient conditions regarding the sparsity level of the parameter, we have established the asymptotic normality of the estimator for inference. Numerical experiments and an application in microbiome study have demonstrated the efficacy of our high-dimensional calibration strategy in minimizing bias and achieving the expected coverage rates for confidence intervals. Moreover, the potential application of our proposed methodology extends well beyond compositional data, suggesting its adaptability for a wide range of research contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Ordinal compositional data and time series.
- Author
-
Weiß, Christian H.
- Subjects
- *
INFERENTIAL statistics , *TIME series analysis , *REGRESSION analysis - Abstract
There are several real applications where the categories behind compositional data (CoDa) exhibit a natural order, which, however, is not accounted for by existing CoDa methods. For various application areas, it is demonstrated that appropriately developed methods for ordinal CoDa provide valuable additional insights and are, thus, recommended to complement existing CoDa methods. The potential benefits are demonstrated for the (visual) descriptive analysis of ordinal CoDa, for statistical inference based on CoDa samples, for the monitoring of CoDa processes by means of control charts, and for the analysis and modelling of compositional time series. The novel methods are illustrated by a couple of real-world data examples. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. The application of adaptive group LASSO imputation method with missing values in personal income compositional data.
- Author
-
Tian, Ying, Ali, Majid Khan Majahar, and Wu, Lili
- Subjects
MISSING data (Statistics) ,INCOME ,ALGORITHMS - Abstract
From social and economic perspectives, compositional data represent the proportions of various components within a whole, carrying non-negative values and providing only relative information. However, in many circumstances, there are often a significant number of missing values in datasets. Due to the complexity caused by these missing values, traditional estimation methods are ineffective. In this paper, an adaptive group LASSO-based imputation method is proposed for compositional data, consolidating the advantages of group LASSO and adaptive LASSO analysis techniques. Considering the impact of outliers on the accuracy of estimation, both simulation and case analysis are conducted to compare the proposed algorithm against four existing methods. The experimental results demonstrate that the proposed adaptive group LASSO method produces a better imputation performance at comparable missing rates. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Area-Level Model-Based Small Area Estimation of Divergence Indexes in the Spanish Labour Force Survey.
- Author
-
Cabello, Esteban, Morales, Domingo, and Pérez, Agustín
- Subjects
- *
LABOR supply , *WOMEN'S employment , *SAMPLE size (Statistics) , *ENTROPY , *PROVINCES - Abstract
This article develops model-based predictors for area-level proportions of employed men and women by occupation sectors and for entropies and divergence indexes (DIs) within and between sex groups. Since the direct estimators of the proportions add up to one in the occupational sections, they are compositions that can be imprecise if the sample sizes are small. We fit a multivariate Fay–Herriot model to logratio transformations of the direct estimators of the proportions. Small area estimators of the proportions, entropies, and DIs are derived from the fitted model and the corresponding mean squared errors are estimated by parametric bootstrap. Several simulation experiments designed to analyze the behavior of the introduced model-based predictors are carried out. We give an application to Spanish Labour Force Survey data from 2022. The target is to investigate the state of sex occupational entropies and divergences in Spanish provinces. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. The link between multiplicative competitive interaction models and compositional data regression with a total.
- Author
-
Dargel, Lukas and Thomas-Agnan, Christine
- Subjects
- *
MARKETING research , *REGRESSION analysis , *MARKETING models , *DATA modeling , *LITERATURE - Abstract
This article sheds light on the relationship between compositional data (CoDa) regression models and multiplicative competitive interaction (MCI) models, which are two approaches for modeling shares. We demonstrate that MCI models are particular cases of CoDa models with a total and that a reparameterization links both. Recognizing this relation offers mutual benefits for the CoDa and MCI literature, each with its own rich tradition. The CoDa tradition, with its rigorous mathematical foundation, provides additional theoretical guarantees and mathematical tools that we apply to improve the estimation of MCI models. Simultaneously, the MCI model emerged from almost a century-long tradition in marketing research that may enrich the CoDa literature. One aspect is the grounding of the MCI specification in assumptions on the behavior of individuals. From this basis, the MCI tradition also provides credible justifications for heteroskedastic error structures – an idea we develop further and that is relevant to many CoDa models beyond the marketing context. Additionally, MCI models have always been interpreted in terms of elasticities, a method that has only recently emerged in CoDa. Regarding this interpretation, the CoDa perspective leads to a decomposition of the influence of the explanatory variables into contributions from relative and absolute information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Comparing Two Geostatistical Simulation Algorithms for Modelling the Spatial Uncertainty of Texture in Forest Soils.
- Author
-
Buttafuoco, Gabriele
- Subjects
SIMULATED annealing ,FOREST soils ,VARIOGRAMS ,STANDARD deviations ,SOIL texture - Abstract
Uncertainty assessment is an essential part of modeling and mapping the spatial variability of key soil properties, such as texture. The study aimed to compare sequential Gaussian simulation (SGS) and turning bands simulation (TBS) for assessing the uncertainty in unknown values of the textural fractions accounting for their compositional nature. The study area was a forest catchment (1.39 km
2 ) with soils classified as Typic Xerumbrepts and Ultic Haploxeralf. Samples were collected at 135 locations (0.20 m depth) according to a design developed using a spatial simulated annealing algorithm. Isometric log-ratio (ilr) was used to transform the three textural fractions into a two-dimensional real vector of coordinates ilr.1 and ilr.2, then 100 realizations were simulated using SGS and TBS. The realizations obtained by SGS and TBS showed a strong similarity in reproducing the distribution of ilr.1 and ilr.2 with minimal differences in average conditional variances of all grid nodes. The variograms of ilr.1 and ilr.2 coordinates were better reproduced by the realizations obtained by TBS. Similar results in reproducing the texture data statistics by both algorithms of simulation were obtained. The maps of expected values and standard deviations of the three soil textural fractions obtained by SGS and TBS showed no notable visual differences or visual artifacts. The realizations obtained by SGS and TBS showed a strong similarity in reproducing the distribution of isometric log-ratio coordinates (ilr.1 and ilr.2). Overall, their variograms and data were better reproduced by the realizations obtained by TBS. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
9. Longitudinal latent overall toxicity (LOTox) profiles in osteosarcoma: a new taxonomy based on latent Markov models: Longitudinal LOTox profiles...: M. Spreafico et al.
- Author
-
Spreafico, Marta, Ieva, Francesca, and Fiocco, Marta
- Subjects
MARKOV processes ,PATIENTS' attitudes ,OSTEOSARCOMA ,CANCER research ,PROBABILITY theory - Abstract
Due to the presence of multiple types of adverse events (AEs) with different levels of severity, the analysis of longitudinal toxicity data is a difficult task in cancer research. The current literature primarily relies on descriptive-based methods and lacks models that can effectively quantify the overall toxic burden experienced by patients over treatment without losing details of the impact of each AE. In this work, a novel taxonomy based on latent Markov models and compositional data techniques is proposed to model the Latent Overall Toxicity (LOTox) condition of each patient over cycles of treatment. Starting from observed categories of severity of multiple toxicities, the goal is to delineate distinct LOTox conditions and retrieve patients' probabilities of being in a specific condition at a given cycle, as well as their risk of experiencing "worse" overall toxicity statuses compared to a reference "good" toxic condition. The proposed approach is applied to longitudinal toxicity data from the MRC BO06/EORTC 80931 randomized controlled trial for patients with osteosarcoma. The population of interest includes 377 patients who had successfully completed the six-cycle treatment. Personal characteristics and observed information on six toxicities are used to infer the unobserved LOTox status over the six cycles of chemotherapy. Provided that longitudinal toxicity data are available, the developed procedure is a flexible approach that can be adapted and applied to other cancer studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. The application of adaptive group LASSO imputation method with missing values in personal income compositional data
- Author
-
Ying Tian, Majid Khan Majahar Ali, and Lili Wu
- Subjects
Missing values ,Compositional data ,Imputation methods ,Adaptive group LASSO ,Personal income ,Computer engineering. Computer hardware ,TK7885-7895 ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract From social and economic perspectives, compositional data represent the proportions of various components within a whole, carrying non-negative values and providing only relative information. However, in many circumstances, there are often a significant number of missing values in datasets. Due to the complexity caused by these missing values, traditional estimation methods are ineffective. In this paper, an adaptive group LASSO-based imputation method is proposed for compositional data, consolidating the advantages of group LASSO and adaptive LASSO analysis techniques. Considering the impact of outliers on the accuracy of estimation, both simulation and case analysis are conducted to compare the proposed algorithm against four existing methods. The experimental results demonstrate that the proposed adaptive group LASSO method produces a better imputation performance at comparable missing rates.
- Published
- 2024
- Full Text
- View/download PDF
11. Constrained least squares simplicial-simplicial regression.
- Author
-
Tsagris, Michail
- Abstract
Simplicial-simplicial regression refers to the regression setting where both the responses and predictor variables lie within the simplex space, i.e. they are compositional. For this setting, constrained least squares, where the regression coefficients themselves lie within the simplex, is proposed. The model is transformation-free but the adoption of a power transformation is straightforward, it can treat more than one compositional datasets as predictors and offers the possibility of weights among the simplicial predictors. Among the model’s advantages are its ability to treat zeros in a natural way and a highly computationally efficient algorithm to estimate its coefficients. Resampling based hypothesis testing procedures are employed regarding inference, such as linear independence, and equality of the regression coefficients to some pre-specified values. The strategy behind the formulation of the new model implemented is related to an existing methodology, that is of the same spirit, showcasing how other similar models can be employed as well. Finally, the performance of the proposed technique and its comparison to the existing methodology takes place using simulation studies and real data examples. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
12. Air pollution and children’s mental health in rural areas: compositional spatio-temporal model
- Author
-
Anna Mota-Bertran, Germà Coenders, Pere Plaja, Marc Saez, and Maria Antònia Barceló
- Subjects
Air pollution ,Mental health ,Children ,Compositional data ,Bayesian Inference ,Spatio-temporal models ,Medicine ,Science - Abstract
Abstract Air pollution stands as an environmental risk to child mental health, with proven relationships hitherto observed only in urban areas. Understanding the impact of pollution in rural settings is equally crucial. The novelty of this article lies in the study of the relationship between air pollution and behavioural and developmental disorders, attention deficit hyperactivity disorder (ADHD), anxiety, and eating disorders in children below 15 living in a rural area. The methodology combines spatio-temporal models, Bayesian inference and Compositional Data (CoDa), that make it possible to study areas with few pollution monitoring stations. Exposure to nitrogen dioxide (NO2), ozone (O3), and sulphur dioxide (SO2) is related to behavioural and development disorders, anxiety is related to particulate matter (PM10), O3 and SO2, and overall pollution is associated to ADHD and eating disorders. To sum up, like their urban counterparts, rural children are also subject to mental health risks related to air pollution, and the combination of spatio-temporal models, Bayesian inference and CoDa make it possible to relate mental health problems to pollutant concentrations in rural settings with few monitoring stations. Certain limitations persist related to misclassification of exposure to air pollutants and to the covariables available in the data sources used.
- Published
- 2024
- Full Text
- View/download PDF
13. Estimating Rock Composition from Replicate Geochemical Analyses: Theory and Application to Magmatic Rocks of the GeoPT Database.
- Author
-
De Greef, Maxime Keutgen, Weltje, Gert Jan, and Gijbels, Irène
- Subjects
- *
ANALYTICAL geochemistry , *ANALYTICAL chemistry , *MISSING data (Statistics) , *ARITHMETIC mean , *IGNEOUS rocks - Abstract
Chemical analyses of powdered rocks by different laboratories often yield varying results, requiring estimation of the rock's true composition and associated uncertainty. Challenges arise from the peculiar nature of geochemical data. Traditionally, major and trace elements have been measured using different methods, resulting in chemical analyses where the sum of the parts fluctuates around 1 rather than precisely totaling 1. Additionally, all chemical analyses contain an undisclosed mass fraction representing undetected chemical elements. Because of this undisclosed and unknown mass fraction, geochemical data represent a particular kind of compositional data in which closure to unity is not guaranteed. We argue that chemical analyses exist in the hypercube while being sampled from a true composition residing in the simplex. Therefore, we propose an algorithm that generates random chemical analyses by simulating the data acquisition protocol in geochemistry. Using the algorithm's output, we measure the bias and mean squared error (MSE) of various estimators of the true mean composition. Additionally, we explore the impact of missing values on estimator performance. Our findings reveal that the optimized binary log-ratio mean, a new estimator, exhibits the lowest MSE and bias. It performs well even with up to 70% missing values, in contrast to other classical estimators such as the arithmetic mean or the geometric mean. Applying our approach to the GeoPT database, which contains replicate analyses of igneous rocks from numerous geochemical laboratories, we introduce an outlier detection technique based on the Mahalanobis distance between a laboratory's logit coordinates and the optimized mean estimate. This enables a probabilistic ranking of laboratories based on the atypicality of their performance. Finally, we offer an accessible R implementation of our findings through the GitHub repository linked to this paper [subject classification numbers: 10 (compositions) 85 (statistics)]. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. A comprehensive workflow for compositional data analysis in archaeometry, with code in R.
- Author
-
Greenacre, Michael and Wood, Jonathan R.
- Abstract
Compositional data, which have relative rather than absolute meaning, are common in quantitative archaeological research. Such multivariate data are usually expressed as proportions, summing to 1, or equivalently as percentages. We present a comprehensive and defensible workflow for processing compositional data in archaeometry, using both the original compositional values and their transformation to logratios. The most useful logratio transformations are illustrated and how they affect the interpretation of the final results in the context of both unsupervised and supervised learning. The workflow is demonstrated on compositional data from bronze ritual vessels to provide compositional fingerprints for the Shang and Zhou periods of the Chinese Bronze Age. Predictions, with caveats, of the fabrication age of the vessels are made from the compositional data – in effect, compositional rather than typological seriation of the bronzes. In the Supplementary Material, we further explore the effect of zeros in the dataset and compare logratio analyses with the chiPower approach, where we replace any value in the original data determined as being below the detection limit of the instruments for the element, with zeros. The data and R code for reproducing all the analyses are provided both in the Supplementary Information and online. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Analysis of Aromatic Fraction of Sparkling Wine Manufactured by Second Fermentation and Aging in Bottles Using Different Types of Closures.
- Author
-
Jové, Patricia, Mateu-Figueras, Glòria, Bustillos, Jessica, and Martín-Fernández, Josep Antoni
- Subjects
SPARKLING wines ,GAS chromatography/Mass spectrometry (GC-MS) ,THERMAL desorption ,CORK ,MULTIVARIATE analysis - Abstract
This study aimed to evaluate the impact of different closures used in second fermentation on the aromatic fraction of sparkling wine. Six types of closures (cork stoppers and screw caps) and 94 months of aging in a bottle were investigated. Headspace solid-phase microextraction (HS-SPME) and thermal desorption (TD) procedures coupled to gas chromatography-mass spectrometry (GCMSMS) analysis were applied. The vectors containing the relative abundance of the volatile compounds are compositional vectors. The statistical analysis of compositional data requires specific techniques that differ from standard techniques. Overall, 101 volatile compounds were identified. HS-SPME extracted the highest percentage of esters, ketones and other compounds, while TD was a useful tool for the obtention of alcohol, acid, ether and alkane compounds. Esters were the most abundant family of compounds. Compositional data analysis, which was applied to study the impact of different closures used in bottle aging after second fermentation on the volatile composition of sparkling wine, concluded that there are differences in the relative abundance of certain volatile compounds between cork stoppers and screw-cap closures. Overall, the most abundant part in screw-cap closures was ethyl hexanoate, and it was ethyl octanoate in cork stoppers. Also, the proportional amount of dimethylamine was higher in screw-cap closures than cork stoppers relative to the entire sample. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Are Women Commissioners More Compassionate Spenders? Evidence From Florida County Governments.
- Author
-
Estorcien, Vernise, Chen, Can, Deb, Apu, and Neshkova, Milena I.
- Subjects
LEGISLATIVE bodies ,GOVERNMENT agencies ,REPRESENTATIVE government ,GOVERNMENT programs ,DEVELOPMENTAL programs - Abstract
While the number of women in government has increased, prior research on whether enhancing women's political representation alters policy choices has produced inconclusive findings. This study asks if higher women's participation in electoral institutions at the local level is associated with a different spending profile. Using Peterson's typology of developmental, redistributive, and allocational government programs, we argue that legislative bodies with more female members will spend more on redistributive programs than on developmental or allocational. Using data from Florida's 67 counties between 2005 and 2015, our analysis supports this theoretical expectation. In line with critical mass theory, women's representation in county commissions must reach a threshold of about 33% to sway budgetary decision-making toward more extensive redistribution. We also find that the traditional commission form of government intensifies the redistributive effect of women commissioners on county spending while having a home rule charter has no significant effect. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Phylogenetic association analysis with conditional rank correlation.
- Author
-
Wang, Shulei, Yuan, Bo, Cai, T Tony, and Li, Hongzhe
- Subjects
- *
STATISTICAL hypothesis testing , *TEST methods , *DATA analysis , *CALIBRATION - Abstract
Phylogenetic association analysis plays a crucial role in investigating the correlation between microbial compositions and specific outcomes of interest in microbiome studies. However, existing methods for testing such associations have limitations related to the assumption of a linear association in high-dimensional settings and the handling of confounding effects. Hence, there is a need for methods capable of characterizing complex associations, including nonmonotonic relationships. This article introduces a novel phylogenetic association analysis framework and associated tests to address these challenges by employing conditional rank correlation as a measure of association. The proposed tests account for confounders in a fully nonparametric manner, ensuring robustness against outliers and the ability to detect diverse dependencies. The proposed framework aggregates conditional rank correlations for subtrees using weighted sum and maximum approaches to capture both dense and sparse signals. The significance level of the test statistics is determined by calibration through a nearest-neighbour bootstrapping method, which is straightforward to implement and can accommodate additional datasets when these are available. The practical advantages of the proposed framework are demonstrated through numerical experiments using both simulated and real microbiome datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Genetic parameters for different types of medullated fibre in Alpacas.
- Author
-
Cruz, Alan, Murillo, Yanin, Burgos, Alonso, Yucra, Alex, Morante, Renzo, Quispe, Max, Quispe, Christian, Quispe, Edgar, and Gutiérrez, Juan Pablo
- Subjects
- *
GENETIC correlations , *ABSOLUTE value , *ALPACA , *FIBERS , *ITCHING - Abstract
The quality of alpaca textile fibre has great potential, especially if objectionable fibres (coarse and medullated fibres) that cause itching are reduced, considering that objectionable fibres can be identified by diameter and medullation types. The objective of this study was to estimate genetic parameters for medullar types and their respective diameters to evaluate the possibility of incorporating them as selection criteria in alpaca breeding programmes. The research used 3149 alpaca fibre samples collected from 2020 to 2022, from a population of 1626 Huacaya type alpacas. The heritability and correlations of the percentages of non‐medullated (NM), fragmented medulle (FM), uncontinuous medullated (UM), continuous medullated (CM), and strongly medullated (SM) fibres were analysed, also the fibre diameter (FD) for each of the medullation types. The heritability estimated for medullation types were 0.25 ± 0.01, 0.18 ± 0.01, 0.10 ± 0.01, 0.20 ± 0.01 and 0.11 ± 0.01 for NM, FM, UM, CM and SM, respectively. The genetic correlations for medullation categories ranged from 0.15 ± 0.03 to 0.66 ± 0.02 (in absolute values). The heritabilility estimated for fibre diameter (FD) of each of the medullation types were 0.29 ± 0.03, 0.27 ± 0.02, 0.35 ± 0.02, 0.30 ± 0.02, 0.25 ± 0.02 and 0.10 ± 0.02 for FD, FD_NM, FD_FM, FD_UM, FD_CM and FD_SM, respectively. The genetic correlations for fibre diameter of the medullation types ranged from 0.04 ± 0.04 to 0.97 ± 0.01. FD, NM and FM are the main traits to be used as selection criteria under a genetic index, since they would reduce fibre diameter, and also increase NM and FM, and, in addition reducing indirectly CM, SM, and SM_FD. Therefore, the quality of alpaca fibre could be improved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Comparing gas composition from fast pyrolysis of live foliage measured in bench-scale and fire-scale experiments.
- Author
-
Weise, David R., Fletcher, Thomas H., Johnson, Timothy J., Hao, Wei Min, Dietenberger, Mark, Princevac, Marko, Butler, Bret W., McAllister, Sara S., O'Brien, Joseph J., Louise Loudermilk, E., Ottmar, Roger D., Hudak, Andrew T., Kato, Akira, Shotorban, Babak, Mahalingam, Shankar, Myers, Tanya L., Palarea-Albaladejo, Javier, and Baker, Stephen P.
- Subjects
HYDROCARBONS ,MULTIVARIATE analysis ,FLAME ,PYROLYSIS ,PRINCIPAL components analysis ,WEATHER - Abstract
Background: Fire models have used pyrolysis data from oxidising and non-oxidising environments for flaming combustion. In wildland fires pyrolysis, flaming and smouldering combustion typically occur in an oxidising environment (the atmosphere). Aims: Using compositional data analysis methods, determine if the composition of pyrolysis gases measured in non-oxidising and ambient (oxidising) atmospheric conditions were similar. Methods: Permanent gases and tars were measured in a fuel-rich (non-oxidising) environment in a flat flame burner (FFB). Permanent and light hydrocarbon gases were measured for the same fuels heated by a fire flame in ambient atmospheric conditions (oxidising environment). Log-ratio balances of the measured gases common to both environments (CO, CO
2 , CH4 , H2 , C6 H6 O (phenol), and other gases) were examined by principal components analysis (PCA), canonical discriminant analysis (CDA) and permutational multivariate analysis of variance (PERMANOVA). Key results: Mean composition changed between the non-oxidising and ambient atmosphere samples. PCA showed that flat flame burner (FFB) samples were tightly clustered and distinct from the ambient atmosphere samples. CDA found that the difference between environments was defined by the CO-CO2 log-ratio balance. PERMANOVA and pairwise comparisons found FFB samples differed from the ambient atmosphere samples which did not differ from each other. Conclusion: Relative composition of these pyrolysis gases differed between the oxidising and non-oxidising environments. This comparison was one of the first comparisons made between bench-scale and field scale pyrolysis measurements using compositional data analysis. Implications: These results indicate the need for more fundamental research on the early time-dependent pyrolysis of vegetation in the presence of oxygen. Composition of pyrolysis gases measured in non-oxidising and ambient atmospheric conditions has been compared using compositional data analysis. Mean compositions changed between the non-oxidising and ambient atmosphere samples. These results indicate the need for more fundamental research on the early time-dependent pyrolysis of vegetation in the presence of oxygen. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
20. Testing for differences in survey‐based density expectations: A compositional data approach.
- Author
-
Dovern, Jonas, Glas, Alexander, and Kenny, Geoff
- Subjects
MONTE Carlo method ,INDIVIDUAL differences ,MULTIPLE comparisons (Statistics) ,CONSUMER surveys ,CHANGE agents - Abstract
Summary: We propose to treat survey‐based density expectations as compositional data when testing either for heterogeneity in density forecasts across different groups of agents or for changes over time. Monte Carlo simulations show that the proposed test has more power relative to both a bootstrap approach based on the KLIC and an approach that involves multiple testing for differences of individual parts of the density. In addition, the test is computationally much faster than the KLIC‐based one, which relies on simulations, and allows for comparisons across multiple groups. Using density expectations from the ECB Survey of Professional Forecasters and the US Survey of Consumer Expectations, we show the usefulness of the test in detecting possible changes in density expectations over time and across different types of forecasters. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Air pollution and children's mental health in rural areas: compositional spatio-temporal model.
- Author
-
Mota-Bertran, Anna, Coenders, Germà, Plaja, Pere, Saez, Marc, and Barceló, Maria Antònia
- Subjects
AIR pollution ,AIR pollutants ,CHILDREN'S health ,RURAL health ,RURAL geography ,MENTAL illness ,ATTENTION-deficit hyperactivity disorder - Abstract
Air pollution stands as an environmental risk to child mental health, with proven relationships hitherto observed only in urban areas. Understanding the impact of pollution in rural settings is equally crucial. The novelty of this article lies in the study of the relationship between air pollution and behavioural and developmental disorders, attention deficit hyperactivity disorder (ADHD), anxiety, and eating disorders in children below 15 living in a rural area. The methodology combines spatio-temporal models, Bayesian inference and Compositional Data (CoDa), that make it possible to study areas with few pollution monitoring stations. Exposure to nitrogen dioxide (NO
2 ), ozone (O3 ), and sulphur dioxide (SO2 ) is related to behavioural and development disorders, anxiety is related to particulate matter (PM10 ), O3 and SO2 , and overall pollution is associated to ADHD and eating disorders. To sum up, like their urban counterparts, rural children are also subject to mental health risks related to air pollution, and the combination of spatio-temporal models, Bayesian inference and CoDa make it possible to relate mental health problems to pollutant concentrations in rural settings with few monitoring stations. Certain limitations persist related to misclassification of exposure to air pollutants and to the covariables available in the data sources used. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
22. Historical Routes for Diversification of Domesticated Chickpea Inferred from Landrace Genomics
- Author
-
Igolkina, Anna A, Noujdina, Nina V, Vishnyakova, Margarita, Longcore, Travis, von Wettberg, Eric, Nuzhdin, Sergey V, and Samsonova, Maria G
- Subjects
Cicer ,Polymorphism ,Single Nucleotide ,Bayes Theorem ,Gene Frequency ,Genomics ,chickpea ,admixture graph ,allele frequency ,compositional data ,domestication ,Biochemistry and Cell Biology ,Evolutionary Biology ,Genetics - Abstract
According to archaeological records, chickpea (Cicer arietinum) was first domesticated in the Fertile Crescent about 10,000 years BP. Its subsequent diversification in Middle East, South Asia, Ethiopia, and the Western Mediterranean, however, remains obscure and cannot be resolved using only archeological and historical evidence. Moreover, chickpea has two market types: "desi" and "kabuli," for which the geographic origin is a matter of debate. To decipher chickpea history, we took the genetic data from 421 chickpea landraces unaffected by the green revolution and tested complex historical hypotheses of chickpea migration and admixture on two hierarchical spatial levels: within and between major regions of cultivation. For chickpea migration within regions, we developed popdisp, a Bayesian model of population dispersal from a regional representative center toward the sampling sites that considers geographical proximities between sites. This method confirmed that chickpea spreads within each geographical region along optimal geographical routes rather than by simple diffusion and estimated representative allele frequencies for each region. For chickpea migration between regions, we developed another model, migadmi, that takes allele frequencies of populations and evaluates multiple and nested admixture events. Applying this model to desi populations, we found both Indian and Middle Eastern traces in Ethiopian chickpea, suggesting the presence of a seaway from South Asia to Ethiopia. As for the origin of kabuli chickpeas, we found significant evidence for its origin from Turkey rather than Central Asia.
- Published
- 2023
23. Trajectories of brain volumes in young children are associated with maternal education
- Author
-
Zhu, Changbo, Chen, Yaqing, Müller, Hans‐Georg, Wang, Jane‐Ling, O'Muircheartaigh, Jonathan, Bruchhage, Muriel, and Deoni, Sean
- Subjects
Biomedical and Clinical Sciences ,Biological Psychology ,Cognitive and Computational Psychology ,Neurosciences ,Psychology ,Clinical Research ,Pediatric ,Neurological ,Female ,Humans ,Child ,Child ,Preschool ,Infant ,Brain ,Gray Matter ,White Matter ,Educational Status ,Neuroimaging ,Magnetic Resonance Imaging ,Longitudinal Studies ,brain volumes ,cerebrospinal fluid ,compositional data ,functional principal component analysis ,grey matter ,longitudinal brain development ,white matter ,Cognitive Sciences ,Experimental Psychology ,Biological psychology ,Cognitive and computational psychology - Abstract
Brain growth in early childhood is reflected in the evolution of proportional cerebrospinal fluid volumes (pCSF), grey matter (pGM), and white matter (pWM). We study brain development as reflected in the relative fractions of these three tissues for a cohort of 388 children that were longitudinally followed between the ages of 18 and 96 months. We introduce statistical methodology (Riemannian Principal Analysis through Conditional Expectation, RPACE) that addresses major challenges that are of general interest for the analysis of longitudinal neuroimaging data, including the sparsity of the longitudinal observations over time and the compositional structure of the relative brain volumes. Applying the RPACE methodology, we find that longitudinal growth as reflected by tissue composition differs significantly for children of mothers with higher and lower maternal education levels.
- Published
- 2023
24. Regression analysis with independent variables in shares: a guide and an empirical example
- Author
-
Morawetz, Ulrich B. and Klaiber, H. Allen
- Published
- 2024
- Full Text
- View/download PDF
25. Two-Part Mixed Effects Mixture Model for Zero-Inflated Longitudinal Compositional Data
- Author
-
Rodriguez, Viviana A., Mahon, Rebecca N., Weiss, Elisabeth, and Mukhopadhyay, Nitai D.
- Published
- 2024
- Full Text
- View/download PDF
26. Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome
- Author
-
Datta, Jyotishka and Bandyopadhyay, Dipankar
- Published
- 2024
- Full Text
- View/download PDF
27. Improving Likert scale big data analysis in psychometric health economics: reliability of the new compositional data approach
- Author
-
René Lehmann and Bodo Vogt
- Subjects
Bipolar Likert scale ,Compositional data ,Ilr transformation ,Big data ,Heavy-tailed distributions ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Bipolar psychometric scales data are widely used in psychologic healthcare. Adequate psychological profiling benefits patients and saves time and costs. Grant funding depends on the quality of psychotherapeutic measures. Bipolar Likert scales yield compositional data because any order of magnitude of agreement towards an item assertion implies a complementary order of magnitude of disagreement. Using an isometric log-ratio (ilr) transformation the bivariate information can be transformed towards the real valued interval scale yielding unbiased statistical results increasing the statistical power of the Pearson correlation significance test if the Central Limit Theorem (CLT) of statistics is satisfied. In practice, however, the applicability of the CLT depends on the number of summands (i.e., the number of items) and the variance of the data generating process (DGP) of the ilr transformed data. Via simulation we provide evidence that the ilr approach also works satisfactory if the CLT is violated. That is, the ilr approach is robust towards extremely large or infinite variances of the underlying DGP increasing the statistical power of the correlation test. The study generalizes former results pointing out the universality and reliability of the ilr approach in psychometric big data analysis affecting psychometric health economics, patient welfare, grant funding, economic decision making and profits.
- Published
- 2024
- Full Text
- View/download PDF
28. How likely is it to beat the target at different investment horizons: an approach using compositional data in strategic portfolios
- Author
-
Fernando Vega-Gámez and Pablo J. Alonso-González
- Subjects
Compositional data ,Investment horizons ,Logit models ,Probability ,Strategic portfolios ,Public finance ,K4430-4675 ,Finance ,HG1-9999 - Abstract
Abstract Strategic portfolios are asset combinations designed to achieve investor objectives. A unique feature of these investments is that portfolios must be rebalanced periodically to maintain the initially established structure. This paper introduces a methodology to estimate the probability of not exceeding a specific profitability target with this type of portfolio to determine if this kind of build portfolio makes obtaining certain profitability targets easy. Portfolios with a specific distribution of fixed-income and equity securities were randomly replicated and their performance was studied over different time horizons. Daily data from 2004 to 2021 was used. Since the sum of all asset weights invariably equals the unit, the original data were transformed using the compositional data methodology. With these transformed data, the probabilities were estimated for each analyzed portfolio. The study also performed a sensitivity analysis of the estimated probabilities, modifying the weight of specific assets in the portfolio.
- Published
- 2024
- Full Text
- View/download PDF
29. Addressing Dependent Data in Constrained Optimization Problems: A WOA-based Algorithm
- Author
-
Asieh Ghanbarpour, Soheil Zaremotlagh, and Fahimeh Dabaghi-Zarandi
- Subjects
swarm-based optimization ,constraints ,compositional data ,penalty function ,Electronics ,TK7800-8360 ,Industry ,HD2321-4730.9 - Abstract
Optimization algorithms are widely used in various fields to find the best solution to a problem by minimizing or maximizing an objective function, subject to certain constraints. This paper introduces the development and application of an innovative optimization algorithm (WOADD) designed to address the challenges posed by constrained optimization problems with dependent data. Unlike traditional algorithms that struggle with data dependencies and valid range constraints, WOADD incorporates a novel normalization process and a dynamic updating mechanism that accurately considers the interdependencies among features. Specifically, it adjusts the search strategy by calculating a scaling parameter to maneuver within feasible regions, ensuring the preservation of data dependencies and adherence to constraints, thus leading to more efficient and precise optimization outcomes. Our extensive experimental analysis, comparing WOADD against other swarm-based optimization methods on a suite of benchmark functions, illustrates its superior performance in terms of faster convergence rates, improved solution quality, and enhanced determinism in outcomes.
- Published
- 2024
- Full Text
- View/download PDF
30. Improving Likert scale big data analysis in psychometric health economics: reliability of the new compositional data approach.
- Author
-
Lehmann, René and Vogt, Bodo
- Subjects
ECONOMIC decision making ,STATISTICAL correlation ,CENTRAL limit theorem ,PEARSON correlation (Statistics) ,LIKERT scale ,STATISTICAL power analysis - Abstract
Bipolar psychometric scales data are widely used in psychologic healthcare. Adequate psychological profiling benefits patients and saves time and costs. Grant funding depends on the quality of psychotherapeutic measures. Bipolar Likert scales yield compositional data because any order of magnitude of agreement towards an item assertion implies a complementary order of magnitude of disagreement. Using an isometric log-ratio (ilr) transformation the bivariate information can be transformed towards the real valued interval scale yielding unbiased statistical results increasing the statistical power of the Pearson correlation significance test if the Central Limit Theorem (CLT) of statistics is satisfied. In practice, however, the applicability of the CLT depends on the number of summands (i.e., the number of items) and the variance of the data generating process (DGP) of the ilr transformed data. Via simulation we provide evidence that the ilr approach also works satisfactory if the CLT is violated. That is, the ilr approach is robust towards extremely large or infinite variances of the underlying DGP increasing the statistical power of the correlation test. The study generalizes former results pointing out the universality and reliability of the ilr approach in psychometric big data analysis affecting psychometric health economics, patient welfare, grant funding, economic decision making and profits. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Compositional Data and Microbiota Analysis: Imagination and Reality.
- Author
-
Itagaki, Tatsuki, Kobayashi, Hirokazu, Sakata, Ken-ichiro, Miyamoto, Ikuya, Hasebe, Akira, and Kitagawa, Yoshimasa
- Subjects
PEARSON correlation (Statistics) ,PRINCIPAL components analysis ,RATIO analysis ,MULTIDIMENSIONAL scaling ,UNIVARIATE analysis - Abstract
The relationships among bacterial flora, diseases, and diet have been described by many authors. An operational taxonomic units (OTUs) are the result of clustering the 16S rRNA gene sequences at a certain cutoff value, and they are considered compositional data. As Pearson's correlation coefficient is difficult to interpret, Aitchison's ratio analysis was used to develop a method to handle compositional data. Multivariate analysis was developed because univariate analysis can be subject to large biases. Simulations regarding absolute abundance based on certain assumptions and some analyses, such as nonparametric multidimensional scaling (NMDS), principal component analysis (PCA), and ratio analysis, were conducted in this study. The same content as a 100% stacked bar graph could be expressed in low dimensions using PCA. However, the relative diversity was not reproducible with NMDS. Various assumptions were made regarding absolute abundance based on the relative abundance. However, which assumptions are true could not be determined. In summary, ratio analysis and PCA are useful for analyzing compositional data and the gut microbiota. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. How likely is it to beat the target at different investment horizons: an approach using compositional data in strategic portfolios.
- Author
-
Vega-Gámez, Fernando and Alonso-González, Pablo J.
- Subjects
STOCKS (Finance) ,TIME perspective ,SENSITIVITY analysis ,LOGISTIC regression analysis ,PERFORMANCE theory - Abstract
Strategic portfolios are asset combinations designed to achieve investor objectives. A unique feature of these investments is that portfolios must be rebalanced periodically to maintain the initially established structure. This paper introduces a methodology to estimate the probability of not exceeding a specific profitability target with this type of portfolio to determine if this kind of build portfolio makes obtaining certain profitability targets easy. Portfolios with a specific distribution of fixed-income and equity securities were randomly replicated and their performance was studied over different time horizons. Daily data from 2004 to 2021 was used. Since the sum of all asset weights invariably equals the unit, the original data were transformed using the compositional data methodology. With these transformed data, the probabilities were estimated for each analyzed portfolio. The study also performed a sensitivity analysis of the estimated probabilities, modifying the weight of specific assets in the portfolio. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Bayesian compositional models for ordinal response.
- Author
-
Zhang, Li, Zhang, Xinyan, Leach, Justin M, Rahman, AKM F, and Yi, Nengjun
- Subjects
- *
INFLAMMATORY bowel diseases , *PARAMETER estimation , *REGRESSION analysis , *LOGISTIC regression analysis - Abstract
Ordinal response is commonly found in medicine, biology, and other fields. In many situations, the predictors for this ordinal response are compositional, which means that the sum of predictors for each sample is fixed. Examples of compositional data include the relative abundance of species in microbiome data and the relative frequency of nutrition concentrations. Moreover, the predictors that are strongly correlated tend to have similar influence on the response outcome. Conventional cumulative logistic regression models for ordinal responses ignore the fixed-sum constraint on predictors and their associated interrelationships, and thus are not appropriate for analyzing compositional predictors. To solve this problem, we proposed Bayesian Compositional Models for Ordinal Response to analyze the relationship between compositional data and an ordinal response with a structured regularized horseshoe prior for the compositional coefficients and a soft sum-to-zero restriction on coefficients through the prior distribution. The method was implemented with R package rstan using efficient Hamiltonian Monte Carlo algorithm. We performed simulations to compare the proposed approach and existing methods for ordinal responses. Results revealed that our proposed method outperformed the existing methods in terms of parameter estimation and prediction. We also applied the proposed method to a microbiome study HMP2Data, to find microorganisms linked to ordinal inflammatory bowel disease levels. To make this work reproducible, the code and data used in this paper are available at https://github.com/Li-Zhang28/BCO. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Logratio analysis of components separated from grain‐size distributions and implications for sedimentary processes: An example of bottom surface sediments in a shallow lake.
- Author
-
Yamaguchi, Naofumi, Ando, Tsuyoshi, Enokida, Hirotaka, Nakada, Natsumi, Yamaki, Syota, and Ohta, Tohru
- Subjects
- *
SEDIMENTATION & deposition , *LAKE sediments , *LOGNORMAL distribution , *SEDIMENTS - Abstract
The grain‐size distributions of sediments can yield important information about sediment provenance and sedimentary processes; however, grain‐size distributions are frequently polymodal, rendering analyses difficult. To improve analyses of polymodal grain‐size data, the present study decomposed the grain‐size distributions of bottom surface sediments from Lake Kitaura, a shallow lake in Japan, into lognormal distributions and performed logratio analysis of their mixing proportions. The polymodal grain‐size distributions of the studied samples were separated into four common components at most sites. This logratio analysis revealed clear differences in the characteristics of the spatial distributions of the separated grain‐size components. The logratio values indicated that the three finer components were uniformly deposited within the lake, whereas the coarsest component was spatially diverse, reflecting differences in their sources and sedimentary processes. These results demonstrate the effectiveness of decomposition and logratio analysis of polymodal grain‐size distributions for estimating sedimentary processes. This method can be applied to modern sediments and for palaeoenvironmental reconstructions using sediment cores. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Enriched nonlinear grey compositional model for analyzing multi-trend mixed data and practical applications.
- Author
-
Li, Hui, Xie, Naiming, and Li, Kailing
- Subjects
- *
COMBINATORIAL optimization , *MONTE Carlo method , *MULTISCALE modeling , *POPULATION dynamics - Abstract
The compositional data are interrelated, and analyzing the evolution of each component is crucial for understanding population dynamics. However, the complex structure and tedious process of modeling pose challenges to the reasonable construction of grey compositional models for analyzing multi-trend mixed data. To address this, a novel enriched nonlinear grey compositional model with global multi-parameter combinatorial optimization is firstly proposed. Secondly, two types of Monte Carlo simulations are designed to validate the performances, modeling characteristics and noise levels of our model. Finally, using the bioenergy power generation structure of China as a case study, the practicability of our approach is verified. The results demonstrate that our model significantly outperforms traditional mainstream models in multi-trend mixed sequences, and the interrelationships among components are effectively verified. Our model not only enriches the methodological base but also broadens the application scope of grey compositional model. • A novel non-linear dynamic GM-Markov compositional model is constructed. • Our model is capable of achieving global multi-parameter combinatorial optimization. • Our model can accurately fit compositional data exhibiting fluctuations and multi-trend. • Our model might be useful in forecasting bioenergy power generation structures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Association between musculoskeletal pain and exposures to awkward postures during work: a compositional analysis approach.
- Author
-
Lohne, Fredrik Klæboe, Xu, Kailiang, Fimland, Marius Steiro, Palarea-Albaladejo, Javier, and Redzovic, Skender
- Subjects
- *
ARM physiology , *SHOULDER pain , *RISK assessment , *STATISTICAL correlation , *HOME care services , *MUSCULOSKELETAL pain , *OCCUPATIONAL diseases , *HOME health aides , *BODY mass index , *RESEARCH funding , *NECK pain , *QUESTIONNAIRES , *ACCELEROMETRY , *DESCRIPTIVE statistics , *RESEARCH , *PAIN management , *OCCUPATIONAL exposure , *POSTURE , *LUMBAR pain , *PSYCHOSOCIAL factors - Abstract
Objectives This study aimed to explore the association between arm elevation and neck/shoulder pain, and trunk forwarding bending and low back pain among home care workers. Methods Home care workers (N = 116) from 11 home care units in Trondheim, Norway, filled in pain assessment and working hours questionnaire, and wore 3 accelerometers for up to 7 consecutive days. Work time was partitioned into upright awkward posture, nonawkward posture, and nonupright time, i.e. sitting. Within a compositional approach framework, posture time compositions were expressed in terms of log-ratio coordinates for statistical analysis and modeling. Poisson generalized linear mixed models were used to analyze the relationship between arm elevation in upright postures and neck/shoulder pain, and between trunk forward bending in upright postures and low back pain, respectively. Isotemporal substitution analysis was used to investigate the association of pain assessment with the reallocation of time spent in the different postures. Results Time spent in awkward postures was modest, especially for the more extreme angles (60° and 90°). Adjusting for age, gender, and body mass index, our study suggested that the compositions of time spent by home care workers in awkward postures were significantly associated with pain assessment (P < 0.01). Isotemporal substitution analysis showed that reallocating 5 min from upright posture with arms elevated below to above 60° and 90° was associated with a 6.8% and 19.9% increase in the neck/shoulder pain score, respectively. Reallocating 5 min from a forward bending posture while upright below to above 30°, 60°, and 90° was associated with 1.8%, 3.5%, and 4.0% increase in low back pain, respectively. Conclusions Although the exposure to awkward postures was modest, our results showed an association between increased time spent in awkward postures and an increase in neck/shoulder pain and low back pain in home care workers. As musculoskeletal pain is the leading cause of sickness absence, these findings suggest that home care units could benefit from re-organizing work to avoid excessive arm elevation and trunk forward bending in workers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Compositional-geochemical characterization of lead (Pb) anomalies and Pb-induced human health risk in urban topsoil.
- Author
-
Tepanosyan, Gevorg, Gevorgyan, Astghik, Albanese, Stefano, Baghdasaryan, Lusine, and Sahakyan, Lilit
- Abstract
Urban areas are characterized by a constant anthropogenic input, which is manifested in the chemical composition of the surface layer of urban soil. The consequence is the formation of intense anomalies of chemical elements, including lead (Pb), that are atypical for this landscape. Therefore, this study aims to explore the compositional-geochemical characteristics of soil Pb anomalies in the urban areas of Yerevan, Gyumri, and Vanadzor, and to identify the geochemical associations of Pb that emerge under prevalent anthropogenic influences in these urban areas. The results obtained through the combined use of compositional data analysis and geospatial mapping showed that the investigated Pb anomalies in different cities form source-specific geochemical associations influenced by historical and ongoing activities, as well as the natural geochemical behavior of chemical elements occurring in these areas. Specifically, in Yerevan, Pb was closely linked with Cu and Zn, forming a group of persistent anthropogenic tracers of urban areas. In contrast, in Gyumri and Vanadzor, Pb was linked with Ca, suggesting that over decades, complexation of Pb by Ca carbonates occurred. These patterns of compositional-geochemical characteristics of Pb anomalies are directly linked to the socio-economic development of cities and the various emission sources present in their environments during different periods. The human health risk assessment showed that children are under the Pb-induced non-carcinogenic risk by a certainty of 63.59% in Yerevan and 50% both in Gyumri and Vanadzor. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Comparison of log-ratio and log10 chemical elemental data analysis of Central Amazonian pottery and archaeological implications.
- Author
-
Hazenfratz, Roberto, Mongeló, Guilherme Z., Munita, Casimiro S., and Neves, Eduardo G.
- Abstract
The additive log-ratio (alr) transformation is recommended as one of the most robust data transformations for multivariate analysis of archaeometric compositional data. However, alr and other transformations are not mutually exclusive and can be combined to assess different aspects of an archaeometric data set, such as the addition of temper, post-depositional effects in pottery and associated archaeological implications. This study presents a comparative analysis of a multi-element data set of pottery from Lago Grande and Osvaldo archaeological sites in the Central Amazon, which are considered a microcosm of the region. The concentrations of nine chemical elements (La, Lu, Yb, Ce, Cr, Eu, Fe, Sc, and Th) measured by instrumental neutron activation analysis (INAA) were subjected to alr transformation, prior to chemical fingerprinting by cluster analysis (CA) and principal component analysis (PCA). The results were compared to a previous work using the log10 transformation. Multivariate analysis of variance (MANOVA) was employed to test for statistical differences between the chemical groups, and self-organizing maps (SOMs), a type of artificial neural network, were used for comparison due to their advantage of not depending on any specific data distribution assumption. In general, the results suggest the existence of socio-cultural interactions between Lago Grande and Osvaldo, which could have occurred through trade, exogamic marriage and territory sharing. In a broader perspective, the exchange networks corroborated by the results favor theories that minimize the role of ecological constraints in the emergence of social complexity and sedentary occupations in the Amazon region. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. L p -Norm for Compositional Data: Exploring the CoDa L 1 -Norm in Penalised Regression.
- Author
-
Saperas-Riera, Jordi, Mateu-Figueras, Glòria, and Martín-Fernández, Josep Antoni
- Subjects
- *
DEFINITIONS , *GEOMETRY - Abstract
The Least Absolute Shrinkage and Selection Operator (LASSO) regression technique has proven to be a valuable tool for fitting and reducing linear models. The trend of applying LASSO to compositional data is growing, thereby expanding its applicability to diverse scientific domains. This paper aims to contribute to this evolving landscape by undertaking a comprehensive exploration of the L 1 -norm for the penalty term of a LASSO regression in a compositional context. This implies first introducing a rigorous definition of the compositional L p -norm, as the particular geometric structure of the compositional sample space needs to be taken into account. The focus is subsequently extended to a meticulous data-driven analysis of the dimension reduction effects on linear models, providing valuable insights into the interplay between penalty term norms and model performance. An analysis of a microbial dataset illustrates the proposed approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Factor analysis in residual soils of the Iberian Pyrite Belt (Spain): comparison between raw data, log-transformation data and compositional data.
- Author
-
Martín-Méndez, Iván, Llamas-Borrajo, Juan, Llamas Lois, Alberto, and Locutura, Juan
- Subjects
SOIL testing ,PYRITES ,METALLOGENIC provinces ,METAL prices ,MULTIVARIATE analysis - Abstract
The Iberian Pyrite Belt (IPB) is a metallogenic province in SW Spain and Portugal hosting the largest concentration of massive sulfide deposits worldwide. Exploration campaigns in both the Spanish and Portuguese sectors of the IPB have increased recently due to the rise in metal prices. Within this framework, distinguishing geochemical features associated with natural phenomena and isolating geogenic anomalies from anthropogenic ones can pose a challenge. This contribution uses the residual soil geochemical database of the IPB (Spain) to examine numerous variables, encompassing major, minor and trace elements. Some of these variables commonly exhibit high correlations owing to consistent geochemical behaviour. However, the influence of anthropogenic factors tends to elevate data variability, occasionally masking the natural relationships that govern their distributions. We apply different treatments of data to develop factor analysis using log-transformed data, and centred log-ratio (clr) transformed data to compare and improve the geochemical interpretation of this important zone. Factor analysis has been developed with these results to compare with previously published research on factor analysis in raw data. Factor score interpolated maps were also generated using both lognormal and clr-transformed data to visualize better the distribution and the different geochemical associations in the IPB. This study shows the importance of the different data treatments and the improvement of the clr-transformed multivariate analysis to reduce the dilution or overestimation of the results of some elements that cause erroneous interpretations of the data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. State-Space Models for Clustering of Compositional Trajectories
- Author
-
Panarotto, Andrea, Cattelan, Manuela, Bellio, Ruggero, Einbeck, Jochen, editor, Maeng, Hyeyoung, editor, Ogundimu, Emmanuel, editor, and Perrakis, Konstantinos, editor
- Published
- 2024
- Full Text
- View/download PDF
42. An Underrated Prior Distribution for Proportions. The Logistic–Normal for Dynamical Football Predictions
- Author
-
Martins, Rui, Einbeck, Jochen, editor, Maeng, Hyeyoung, editor, Ogundimu, Emmanuel, editor, and Perrakis, Konstantinos, editor
- Published
- 2024
- Full Text
- View/download PDF
43. Relative abundance data can misrepresent heritability of the microbiome
- Author
-
Bruijning, Marjolein, Ayroles, Julien F, Henry, Lucas P, Koskella, Britt, Meyer, Kyle M, and Metcalf, C Jessica E
- Subjects
Biological Sciences ,Ecology ,Human Genome ,Microbiome ,Genetics ,Microbiota ,Absolute abundance ,Compositional data ,Genetic variance ,Host-microbe associations ,Phenotypic variance ,Microbiology ,Medical Microbiology ,Evolutionary biology - Abstract
BackgroundHost genetics can shape microbiome composition, but to what extent it does, remains unclear. Like any other complex trait, this important question can be addressed by estimating the heritability (h2) of the microbiome-the proportion of variance in the abundance in each taxon that is attributable to host genetic variation. However, unlike most complex traits, microbiome heritability is typically based on relative abundance data, where taxon-specific abundances are expressed as the proportion of the total microbial abundance in a sample.ResultsWe derived an analytical approximation for the heritability that one obtains when using such relative, and not absolute, abundances, based on an underlying quantitative genetic model for absolute abundances. Based on this, we uncovered three problems that can arise when using relative abundances to estimate microbiome heritability: (1) the interdependency between taxa can lead to imprecise heritability estimates. This problem is most apparent for dominant taxa. (2) Large sample size leads to high false discovery rates. With enough statistical power, the result is a strong overestimation of the number of heritable taxa in a community. (3) Microbial co-abundances lead to biased heritability estimates.ConclusionsWe discuss several potential solutions for advancing the field, focusing on technical and statistical developments, and conclude that caution must be taken when interpreting heritability estimates and comparing values across studies. Video Abstract.
- Published
- 2023
44. An innovative MGM–BPNN–ARIMA model for China’s energy consumption structure forecasting from the perspective of compositional data
- Author
-
Ruixia Suo, Qi Wang, Yuanyuan Tan, and Qiutong Han
- Subjects
Aitchison distance ,Compositional data ,Combined model ,Energy consumption structure ,Medicine ,Science - Abstract
Abstract Effective forecasting of energy consumption structure is vital for China to reach its “dual carbon” objective. However, little attention has been paid to existing studies on the holistic nature and internal properties of energy consumption structure. Therefore, this paper incorporates the theory of compositional data into the study of energy consumption structure, which not only takes into account the specificity of the internal features of the structure, but also digs deeper into the relative information. Meanwhile, based on the minimization theory of squares of the Aitchison distance in the compositional data, a combined model based on the three single models, namely the metabolism grey model (MGM), back-propagation neural network (BPNN) model, and autoregressive integrated moving average (ARIMA) model, is structured in this paper. The forecast results of the energy consumption structure in 2023–2040 indicate that the future energy consumption structure of China will evolve towards a more diversified pattern, but the proportion of natural gas and non-fossil energy has yet to meet the policy goals set by the government. This paper not only suggests that compositional data from joint prediction models have a high applicability value in the energy sector, but also has some theoretical significance for adapting and improving the energy consumption structure in China.
- Published
- 2024
- Full Text
- View/download PDF
45. Proportion-based normalizations outperform compositional data transformations in machine learning applications
- Author
-
Aaron Yerke, Daisy Fry Brumit, and Anthony A. Fodor
- Subjects
Metagenomics ,Statistical data interpretation ,Compositional data ,Machine learning ,Random forest ,High-throughput nucleotide sequencing ,Microbial ecology ,QR100-130 - Abstract
Abstract Background Normalization, as a pre-processing step, can significantly affect the resolution of machine learning analysis for microbiome studies. There are countless options for normalization scheme selection. In this study, we examined compositionally aware algorithms including the additive log ratio (alr), the centered log ratio (clr), and a recent evolution of the isometric log ratio (ilr) in the form of balance trees made with the PhILR R package. We also looked at compositionally naïve transformations such as raw counts tables and several transformations that are based on relative abundance, such as proportions, the Hellinger transformation, and a transformation based on the logarithm of proportions (which we call “lognorm”). Results In our evaluation, we used 65 metadata variables culled from four publicly available datasets at the amplicon sequence variant (ASV) level with a random forest machine learning algorithm. We found that different common pre-processing steps in the creation of the balance trees made very little difference in overall performance. Overall, we found that the compositionally aware data transformations such as alr, clr, and ilr (PhILR) performed generally slightly worse or only as well as compositionally naïve transformations. However, relative abundance-based transformations outperformed most other transformations by a small but reliably statistically significant margin. Conclusions Our results suggest that minimizing the complexity of transformations while correcting for read depth may be a generally preferable strategy in preparing data for machine learning compared to more sophisticated, but more complex, transformations that attempt to better correct for compositionality. Video Abstract
- Published
- 2024
- Full Text
- View/download PDF
46. A clustering procedure for three-way RNA sequencing data using data transformations and matrix-variate Gaussian mixture models
- Author
-
Theresa Scharl and Bettina Grün
- Subjects
Gaussian mixture ,Gene expression ,Genomics ,Compositional data ,Model-based clustering ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract RNA sequencing of time-course experiments results in three-way count data where the dimensions are the genes, the time points and the biological units. Clustering RNA-seq data allows to extract groups of co-expressed genes over time. After standardisation, the normalised counts of individual genes across time points and biological units have similar properties as compositional data. We propose the following procedure to suitably cluster three-way RNA-seq data: (1) pre-process the RNA-seq data by calculating the normalised expression profiles, (2) transform the data using the additive log ratio transform to map the composition in the D-part Aitchison simplex to a $$D-1$$ D - 1 -dimensional Euclidean vector, (3) cluster the transformed RNA-seq data using matrix-variate Gaussian mixture models and (4) assess the quality of the overall cluster solution and of individual clusters based on cluster separation in the transformed space using density-based silhouette information and on compactness of the cluster in the original space using cluster maps as a suitable visualisation. The proposed procedure is illustrated on RNA-seq data from fission yeast and results are also compared to an analogous two-way approach after flattening out the biological units.
- Published
- 2024
- Full Text
- View/download PDF
47. Comparing Two Geostatistical Simulation Algorithms for Modelling the Spatial Uncertainty of Texture in Forest Soils
- Author
-
Gabriele Buttafuoco
- Subjects
Sequential Gaussian simulation ,turning bands simulation ,compositional data ,realizations ,uncertainty assessment ,Agriculture - Abstract
Uncertainty assessment is an essential part of modeling and mapping the spatial variability of key soil properties, such as texture. The study aimed to compare sequential Gaussian simulation (SGS) and turning bands simulation (TBS) for assessing the uncertainty in unknown values of the textural fractions accounting for their compositional nature. The study area was a forest catchment (1.39 km2) with soils classified as Typic Xerumbrepts and Ultic Haploxeralf. Samples were collected at 135 locations (0.20 m depth) according to a design developed using a spatial simulated annealing algorithm. Isometric log-ratio (ilr) was used to transform the three textural fractions into a two-dimensional real vector of coordinates ilr.1 and ilr.2, then 100 realizations were simulated using SGS and TBS. The realizations obtained by SGS and TBS showed a strong similarity in reproducing the distribution of ilr.1 and ilr.2 with minimal differences in average conditional variances of all grid nodes. The variograms of ilr.1 and ilr.2 coordinates were better reproduced by the realizations obtained by TBS. Similar results in reproducing the texture data statistics by both algorithms of simulation were obtained. The maps of expected values and standard deviations of the three soil textural fractions obtained by SGS and TBS showed no notable visual differences or visual artifacts. The realizations obtained by SGS and TBS showed a strong similarity in reproducing the distribution of isometric log-ratio coordinates (ilr.1 and ilr.2). Overall, their variograms and data were better reproduced by the realizations obtained by TBS.
- Published
- 2024
- Full Text
- View/download PDF
48. Identifying Important Pairwise Logratios in Compositional Data with Sparse Principal Component Analysis
- Author
-
Nesrstová, Viktorie, Wilms, Ines, Hron, Karel, and Filzmoser, Peter
- Published
- 2024
- Full Text
- View/download PDF
49. An innovative MGM–BPNN–ARIMA model for China's energy consumption structure forecasting from the perspective of compositional data.
- Author
-
Suo, Ruixia, Wang, Qi, Tan, Yuanyuan, and Han, Qiutong
- Subjects
ENERGY consumption forecasting ,ENERGY consumption ,POLICY discourse ,ENERGY industries ,MOVING average process ,METABOLIC models - Abstract
Effective forecasting of energy consumption structure is vital for China to reach its "dual carbon" objective. However, little attention has been paid to existing studies on the holistic nature and internal properties of energy consumption structure. Therefore, this paper incorporates the theory of compositional data into the study of energy consumption structure, which not only takes into account the specificity of the internal features of the structure, but also digs deeper into the relative information. Meanwhile, based on the minimization theory of squares of the Aitchison distance in the compositional data, a combined model based on the three single models, namely the metabolism grey model (MGM), back-propagation neural network (BPNN) model, and autoregressive integrated moving average (ARIMA) model, is structured in this paper. The forecast results of the energy consumption structure in 2023–2040 indicate that the future energy consumption structure of China will evolve towards a more diversified pattern, but the proportion of natural gas and non-fossil energy has yet to meet the policy goals set by the government. This paper not only suggests that compositional data from joint prediction models have a high applicability value in the energy sector, but also has some theoretical significance for adapting and improving the energy consumption structure in China. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. The mediating effect of 24-h time-use behaviors on the relationship between depression and mortality: A compositional mediation analysis for survival outcomes.
- Author
-
Wang, Juping, Zhao, Le, Guan, Hongwei, Wang, Juxia, Gao, Qian, Liang, Jie, Zhao, Liangyuan, He, Simin, and Wang, Tong
- Subjects
- *
SURVIVAL rate , *PROPORTIONAL hazards models , *HEALTH & Nutrition Examination Survey ,CARDIOVASCULAR disease related mortality - Abstract
A compositional mediation model of survival outcomes was established to explore whether 24-h time-use behaviors mediate the relationship between depression and mortality. 4137 adults from the National Health and Nutrition Examination Survey (NHANES 2005–2006) were followed up to 2019. Cox proportional hazards regression model was used to estimate the total effect of depression on mortality. Compositional data analysis was used to examine the relationship between 24-h time-use compositions and mortality. Furthermore, we constructed a compositional mediation model for survival outcomes to investigate the mediating effect of 24-h time-use behaviors on depression and mortality. Compared with participants without depression, depressive patients had a significantly higher risk of overall mortality (HR = 1.49, 95 % CI: 1.25,1.79), cardiovascular disease -specific mortality (HR =1.89, 95 % CI: (1.37,2.63)) and mortality from causes other than cardiovascular disease or cancer (HR = 1.62, 95 % CI: (1.25,2.08)). Physical activity, especially moderate-to-vigorous physical activity, significantly mediated the relationship between depression and all-cause and CVD-specific mortality. Despite being a cohort study, the exposure and mediatiors were measured at the baseline. Further research is necessary to require a temporal order between the exposure and mediating variables. Our findings indicate that 24-h time-use behaviors link depression to mortality. In particular, increasing the time spent on physical activity can reduce the risk of death in patients with depression. This finding provides potential interventions for reducing the risk of death in patients with depression. • A compositional mediation model of survival outcomes was established. • Physical activity time significantly mediated the relationship between depression and mortality. • Increasing the time spent on physical activity can reduce the risk of death in patients with depression. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.