Author: "Warton, DI" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Warton, DI"' showing total 44 results

Start Over Author "Warton, DI"

44 results on '"Warton, DI"'

1. Selecting the model for multiple imputation of missing data: Just use an IC!

Author: Noghrehchi, F, Stoklosa, J, Penev, S, Warton, DI, Noghrehchi, F, Stoklosa, J, Penev, S, and Warton, DI
Abstract: Multiple imputation and maximum likelihood estimation (via the expectation-maximization algorithm) are two well-known methods readily used for analyzing data with missing values. While these two methods are often considered as being distinct from one another, multiple imputation (when using improper imputation) is actually equivalent to a stochastic expectation-maximization approximation to the likelihood. In this article, we exploit this key result to show that familiar likelihood-based approaches to model selection, such as Akaike's information criterion (AIC) and the Bayesian information criterion (BIC), can be used to choose the imputation model that best fits the observed data. Poor choice of imputation model is known to bias inference, and while sensitivity analysis has often been used to explore the implications of different imputation models, we show that the data can be used to choose an appropriate imputation model via conventional model selection tools. We show that BIC can be consistent for selecting the correct imputation model in the presence of missing data. We verify these results empirically through simulation studies, and demonstrate their practicality on two classical missing data examples. An interesting result we saw in simulations was that not only can parameter estimates be biased by misspecifying the imputation model, but also by overfitting the imputation model. This emphasizes the importance of using model selection not just to choose the appropriate type of imputation model, but also to decide on the appropriate level of imputation model complexity.
Published: 2021

2. Model averaging in ecology: a review of Bayesian, information-theoretic, and tactical approaches for predictive inference

Author: Dormann, CF, Calabrese, JM, Guillera-Arroita, G, Matechou, E, Bahn, V, Barton, K, Beale, CM, Ciuti, S, Elith, J, Gerstner, K, Guelat, J, Keil, P, Lahoz-Monfort, JJ, Pollock, LJ, Reineking, B, Roberts, DR, Schroeder, B, Thuiller, W, Warton, DI, Wintle, BA, Wood, SN, Wuest, RO, Hartig, F, Dormann, CF, Calabrese, JM, Guillera-Arroita, G, Matechou, E, Bahn, V, Barton, K, Beale, CM, Ciuti, S, Elith, J, Gerstner, K, Guelat, J, Keil, P, Lahoz-Monfort, JJ, Pollock, LJ, Reineking, B, Roberts, DR, Schroeder, B, Thuiller, W, Warton, DI, Wintle, BA, Wood, SN, Wuest, RO, and Hartig, F
Abstract: In ecology, the true causal structure for a given problem is often not known, and several plausible models and thus model predictions exist. It has been claimed that using weighted averages of these models can reduce prediction error, as well as better reflect model selection uncertainty. These claims, however, are often demonstrated by isolated examples. Analysts must better understand under which conditions model averaging can improve predictions and their uncertainty estimates. Moreover, a large range of different model averaging methods exists, raising the question of how they differ in their behaviour and performance. Here, we review the mathematical foundations of model averaging along with the diversity of approaches available. We explain that the error in model‐averaged predictions depends on each model's predictive bias and variance, as well as the covariance in predictions between models, and uncertainty about model weights. We show that model averaging is particularly useful if the predictive error of contributing model predictions is dominated by variance, and if the covariance between models is low. For noisy data, which predominate in ecology, these conditions will often be met. Many different methods to derive averaging weights exist, from Bayesian over information‐theoretical to cross‐validation optimized and resampling approaches. A general recommendation is difficult, because the performance of methods is often context dependent. Importantly, estimating weights creates some additional uncertainty. As a result, estimated model weights may not always outperform arbitrary fixed weights, such as equal weights for all models. When averaging a set of models with many inadequate models, however, estimating model weights will typically be superior to equal weights. We also investigate the quality of the confidence intervals calculated for model‐averaged predictions, showing that they differ greatly in behaviour and seldom manage to achieve nominal
Published: 2018

3. The central role of mean-variance relationships in the analysis of multivariate abundance data: a response to Roberts (2017)

Author: Warton, DI, Hui, FKC, Warton, DI, and Hui, FKC
Abstract: The mean-variance relationship is a central property of multivariate abundances – it has been shown that when not accounted for, potentially serious artifacts can be introduced to analyses. One such effect is the confounding of location and dispersion. Roberts (in press) recently argued that mean-variance relationships are not important in understanding properties of distance-based analyses, and that row standardisation fixes the problem of apparent location-dispersion confounding. We use simulation to disprove both statements. In situations where there is a shift in total abundance from one location to the next, the effects of location-dispersion confounding can be considerable. But we also show that even if there is no systematic difference in total abundance between two communities, and no change in dispersion, distance-based analysis may falsely claim that there is a change. We agree that multivariate abundance data are hierarchical, and that it is helpful to study effects at both the species-level and the community level. However, disentangling species-level from community-level effects is not possible without a hierarchical method of analysis. Recently proposed model-based approaches to ordination offer a way forward to resolve the issues discussed here.
Published: 2017

4. Graphical diagnostics for occupancy models with imperfect detection

Author: O'Hara, RB, Warton, DI, Stoklosa, J, Guillera-Arroita, G, MacKenzie, DI, Welsh, AH, O'Hara, RB, Warton, DI, Stoklosa, J, Guillera-Arroita, G, MacKenzie, DI, and Welsh, AH
Abstract: Summary Occupancy‐detection models that account for imperfect detection have become widely used in many areas of ecology. As with any modelling exercise, it is important to assess whether the fitted model encapsulates the main sources of variation in the data, yet there have been few methods developed for occupancy‐detection models that would allow practitioners to do so. In this paper, a new type of residual for occupancy‐detection models is developed according to the method of Dunn & Smyth (Journal of Computational and Graphical Statistics, 5, 1996, 236–244). Residuals are separately constructed to diagnose the occupancy and detection components of the model. Because the residuals are quite noisy, we suggest fitting a smoother through plots of residuals against predictors of fitted values, with 95% confidence bands, to diagnose lack‐of‐fit. The method is illustrated using Swiss squirrel data, and evaluated using simulations based on that dataset. Plotting residuals against predictors or against fitted values performed reasonably well as methods for diagnosing violations of occupancy‐detection model assumptions, particularly plots of residuals against a missing predictor. Relatively high false positive rates were sometimes observed, but this seems to be controlled reasonably well by fitting smoothers to these plots and being guided in interpretation by 95% confidence bands around the smoothers.
Published: 2017

5. Plant traits of propagule banks and standing vegetation reveal flooding alleviates impacts of agriculture on wetland restoration

Author: Mori, A, Dawson, SK, Warton, DI, Kingsford, RT, Berney, P, Keith, DA, Catford, JA, Mori, A, Dawson, SK, Warton, DI, Kingsford, RT, Berney, P, Keith, DA, and Catford, JA
Abstract: Summary Restoration of degraded plant communities requires understanding of community assembly processes. Human land use can influence plant community assembly by altering environmental conditions and species’ dispersal patterns. Flooding, including from environmental flows, may counteract land use effects on wetland vegetation. We examined the influence of land use history and flood frequency on the functional composition of wetland plant communities along a regulated river. We applied fourth corner modelling to determine species’ trait‐based responses to flooding and land use by combining data on (i) the occupancy and abundance of species in propagule banks and standing vegetation, (ii) species traits, and (iii) environmental conditions of 22 standing vegetation and 108 soil propagule bank study sites. We used analysis of deviance to test how well each dataset characterised trait–environment interactions, and generalised linear models to identify traits related to species’ responses. The occupancy and abundance of native species in the propagule bank and standing vegetation increased with flood frequency and decreased with duration of agricultural land use. Species in standing vegetation with water‐borne propagule dispersal (hydrochory) showed similar trends. In contrast, species with higher specific leaf area were associated with longer land use duration. Identifying trait‐based differences in the propagule bank and standing vegetation can help disentangle effects of dispersal and environmental filters. The occupancy and abundance of hydrochorous species in standing vegetation were negatively related to land use duration, but hydrochorous species were positively related to land use duration based on their abundance in the propagule bank. This suggests that land use does not limit plant dispersal, but acts as an in situ abiotic filter, limiting species present in standing vegetation. Synthesis and applications. Land use duration and flood frequency have opposite e
Published: 2017

6. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

Author: Roberts, DR, Bahn, V, Ciuti, S, Boyce, MS, Elith, J, Guillera-Arroita, G, Hauenstein, S, Lahoz-Monfort, JJ, Schroeder, B, Thuiller, W, Warton, DI, Wintle, BA, Hartig, F, Dormann, CF, Roberts, DR, Bahn, V, Ciuti, S, Boyce, MS, Elith, J, Guillera-Arroita, G, Hauenstein, S, Lahoz-Monfort, JJ, Schroeder, B, Thuiller, W, Warton, DI, Wintle, BA, Hartig, F, and Dormann, CF
Abstract: Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross‐validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross‐validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non‐causal predictors. This problem can persist even if remedies such as autoregressive models, generalized least squares, or mixed models are used. Block cross‐validation, where data are split strategically rather than randomly, can address these issues. However, the blocking strategy must be carefully considered. Blocking in space, time, random effects or phylogenetic distance, while accounting for dependencies in the data, may also unwittingly induce extrapolations by restricting the ranges or combinations of predictor variables available for model training, thus overestimating interpolation errors. On the other hand, deliberate blocking in predictor space may also improve error estimates when extrapolation is the modelling goal. Here, we review the ecological literature on non‐random and blocked cross‐validation approaches. We also provide a series of simulations and case studies, in which we show that, for all instances tested, block cross‐validation is nearly universally more appropriate than random cross‐validation if the goal is predicting to new data or predictor space, or for selecting causal predictors. We recommend that block cross‐validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, o
Published: 2017

7. Frequent inundation helps counteract land use impacts on wetland propagule banks

Author: Hölzel, N, Dawson, SK, Kingsford, RT, Berney, P, Keith, DA, Hemmings, FA, Warton, DI, Waters, C, Catford, JA, Hölzel, N, Dawson, SK, Kingsford, RT, Berney, P, Keith, DA, Hemmings, FA, Warton, DI, Waters, C, and Catford, JA
Abstract: QUESTION: How do contrasting influences of inundation and historical land uses affect restoration of soil propagule bank composition in floodplain wetlands? LOCATION: Northern Nature Reserve (large ephemeral floodplain), Macquarie Marshes, New South Wales, Australia. METHODS: We conducted germination assays on soil samples collected from fields with different land‐use histories, stratified along an inundation gradient. We used GLM to determine whether native and exotic species richness and abundance varied along gradients of inundation and land use. RESULTS: Species richness and plant abundance in soil propagule banks were positively related to inundation and negatively related to intense historic land use. The abundance of native species was significantly higher in more frequently inundated areas. Abundances of exotic and ruderal species were higher in areas of intense prior land use. Overall species richness was generally similar across land‐use histories. CONCLUSIONS: Land‐use legacies compromised the ability of propagule banks to rejuvenate native vegetation in this floodplain wetland, especially in less frequently flooded parts of the floodplain, which harboured more ruderal and exotic species. Negative effects of prior land use may be alleviated by increased inundation. Native soil propagule banks were remarkably intact, providing a reservoir for restoration of wetland vegetation, even in soils highly disturbed by up to 20 yr of agricultural cropping. With appropriate inundation, soil propagule banks in less degraded areas of the Macquarie Marshes can provide diverse mixtures of desired species in high abundance but, in highly degraded areas, full restoration may be delayed.
Published: 2017

8. Model-based assessment of ecological community classifications

Author: Lyons, MB, Keith, DA, Warton, DI, Somerville, M, Kingsford, RT, Lyons, MB, Keith, DA, Warton, DI, Somerville, M, and Kingsford, RT
Abstract: Aim: A ‘good’ classification should provide information about the composition and abundance of the species within communities, if it serves as an informative surrogate for biodiversity. A natural way to formalize this is with a predictive model, where group membership (clusters) is the predictor, and multivariate species data (site by species matrix) is the response. In this study, we aimed to develop a predictive model-based framework for evaluating the predictive performance of alternative classifications of vegetation communities, and apply it to make objective and automated decisions about classification structure. Methods: We used GLMs fit to multivariate species data to predict occurrence of individual species with site groupings. We used AIC to estimate predictive performance of alternative models to: (1) identify optimal partitioning of sites among multiple competing flexible-β clustering solutions; (2) identify species that contribute most to compositional differences between clusters (i.e. characteristic species); and (3) automatically merge clusters to maximize expected predictive performance using an iterative pruning approach. Using field data from southeastern Australia, and simulated data, we demonstrate our approach for common ecological data types (presence/absence, counts, cover–abundance scores, percentage cover). We supply all code and data required for these analyses. Results: AIC was a useful metric for assessing competing classification solutions. Our method produced outputs that were simple to interpret and required few subjective choices to be made by the user, while performing similarly to the popular OptimClass assessment methodology. Characteristic species defined by predictive performance were consistent between data types, and had good general agreement with existing methods for defining characteristic species. Using model performance to iteratively refine clustering produced classifications with better than expected predictive performa
Published: 2016

9. Topoclimate versus macroclimate: How does climate mapping methodology affect species distribution models and climate change projections?

Author: Slavich, E, Warton, DI, Ashcroft, MB, Gollan, JR, and Ramp, D
Subjects: Ecology
Abstract: Aim: We analyse how and why 'topoclimate' mapping methodologies improve on macroclimatic variables in modelling the distribution of biodiversity. Further, we consider the implications for climate change projections. Location: Greater Hunter Valley region (c. 60,000 km2), New South Wales, Australia. Methods: We fitted generalised linear models to 295 species of grasses and ferns at fine resolutions (< 50 m2) using (a) macroclimatic variables, interpolated from weather station data using altitude and location only, (b) topoclimatic variables, interpolated from field measurements using additional climate-forcing factors such as topography and canopy cover, and (c) both topoclimatic and macroclimatic variables. We conducted community-level analyses and examined the reasons for differences through single-species analyses. We projected species distributions under 0-3° warming, comparing biodiversity loss predicted by topoclimate and macroclimate variables. Results: At the community level, the topoclimatic variables explained significant variation (p < 0.002) in the distribution of both ferns and grasses not explained by macroclimatic variables, resulting in increases of 0.036-0.061 in the pseudo R-squared. Topoclimate performed better (as determined by AIC) than macroclimate for grass species living in cold extremes under topoclimate and most fern species. Models using topoclimatic temperature variables projected different locations of biodiversity loss/retention and in general projected substantially fewer species becoming critically endangered in the study region than models using macroclimatic temperature variables - in one scenario, topoclimate projected 10% of species becoming critically endangered where macroclimate projected 28%. Main Conclusions: How climate variables are constructed has a significant effect on species distribution models and any subsequent climate change predictions. Misleading conclusions may result from models based on fine-resolution climate data if climate-forcing factors such as cold air drainage, topography and habitat have not been addressed in the climate mapping methodology. © 2014 John Wiley & Sons Ltd.
Published: 2014

10. A novel approach to quantify and locate potential microrefugia using topoclimate, climate stability, and isolation from the matrix

Author: Ashcroft, MB, Gollan, JR, Warton, DI, and Ramp, D
Subjects: Ecology
Abstract: Ecologists are increasingly recognizing the conservation significance of microrefugia, but it is inherently difficult to locate these small patches with unusual climates, and hence they are also referred to as cryptic refugia. Here we introduce a new methodology to quantify and locate potential microrefugia using fine-scale topoclimatic grids that capture extreme conditions, stable climates, and distinct differences from the surrounding matrix. We collected hourly temperature data from 150 sites in a large (200 km by 300 km) and diverse region of New South Wales, Australia, for a total of 671 days over 2 years. Sites spanned a range of habitats including coastal dune shrublands, eucalypt forests, exposed woodland ridges, sheltered rainforest gullies, upland swamps, and lowland pastures. Climate grids were interpolated using a regional regression approach based on elevation, distance to coast, canopy cover, latitude, cold-air drainage, and topographical exposure to winds and radiation. We identified extreme temperatures on two separate climatic gradients: the 5th percentile of minimum temperatures and the 95th percentile of maximum temperatures. For each gradient, climatic stability was assessed on three different time scales (intra-seasonal, intra-annual and inter-annual). Differences from the matrix were assessed using a moving window with a 5 km radius. We averaged the Z-scores for these extreme, stable and isolated climates to identify potential locations of microrefugia. We found that our method successfully predicted the location of communities that were considered to occupy refugia, such as rainforests that have progressively contracted in distribution over the last 2.5 million years, and alpine grasslands that have contracted over the last 15 thousand years. However, the method was inherently sensitive to the gradient selected and other aspects of the modelling process. These uncertainties could be dealt with in a conservation planning context by repeating the methodology with various parameterizations and identifying areas that were consistently identified as microrefugia. © 2012 Blackwell Publishing Ltd.
Published: 2012

11. Distance-based multivariate analyses confound location and dispersion effects

Author: Warton, DI, Wright, ST, and Wang, Y
Abstract: A critical property of count data is its mean-variance relationship, yet this is rarely considered in multivariate analysis in ecology. This study considers what is being implicitly assumed about the mean-variance relationship in distance-based analyses - multivariate analyses based on a matrix of pairwise distances - and what the effect is of any misspecification of the mean-variance relationship. It is shown that distance-based analyses make implicit assumptions that are typically out-of-step with what is observed in real data, which has major consequences. Potential consequences of this mean-variance misspecification are: Confounding location and dispersion effects in ordinations; misleading results when trying to identify taxa in which an effect is expressed; failure to detect a multivariate effect unless it is expressed in high-variance taxa. Data transformation does not solve the problem. 6.A solution is to use generalised linear models and their recent multivariate generalisations, which is shown here to have desirable properties. © 2011 The Authors. Methods in Ecology and Evolution © 2011 British Ecological Society.
Published: 2012

12. Are Introduced Species Better Dispersers Than Native Species? A Global Comparative Study of Seed Dispersal Distance

Author: Flores-Moreno, H, Thomson, FJ, Warton, DI, Moles, AT, Flores-Moreno, H, Thomson, FJ, Warton, DI, and Moles, AT
Abstract: We provide the first global test of the idea that introduced species have greater seed dispersal distances than do native species, using data for 51 introduced and 360 native species from the global literature. Counter to our expectations, there was no significant difference in mean or maximum dispersal distance between introduced and native species. Next, we asked whether differences in dispersal distance might have been obscured by differences in seed mass, plant height and dispersal syndrome, all traits that affect dispersal distance and which can differ between native and introduced species. When we included all three variables in the model, there was no clear difference in dispersal distance between introduced and native species. These results remained consistent when we performed analyses including a random effect for site. Analyses also showed that the lack of a significant difference in dispersal distance was not due to differences in biome, taxonomic composition, growth form, nitrogen fixation, our inclusion of non-invasive introduced species, or our exclusion of species with human-assisted dispersal. Thus, if introduced species do have higher spread rates, it seems likely that these are driven by differences in post-dispersal processes such as germination, seedling survival, and survival to reproduction. © 2013 Flores-Moreno et al.
Published: 2013

13. Model-based control of observer bias for the analysis of presence-only data in ecology

Author: Warton, DI, Renner, IW, Ramp, D, Warton, DI, Renner, IW, and Ramp, D
Abstract: Presence-only data, where information is available concerning species presence but not species absence, are subject to bias due to observers being more likely to visit and record sightings at some locations than others (hereafter "observer bias"). In this paper, we describe and evaluate a model-based approach to accounting for observer bias directly - by modelling presence locations as a function of known observer bias variables (such as accessibility variables) in addition to environmental variables, then conditioning on a common level of bias to make predictions of species occurrence free of such observer bias. We implement this idea using point process models with a LASSO penalty, a new presence-only method related to maximum entropy modelling, that implicitly addresses the "pseudo-absence problem" of where to locate pseudo-absences (and how many). The proposed method of bias-correction is evaluated using systematically collected presence/absence data for 62 plant species endemic to the Blue Mountains near Sydney, Australia. It is shown that modelling and controlling for observer bias significantly improves the accuracy of predictions made using presence-only data, and usually improves predictions as compared to pseudo-absence or "inventory" methods of bias correction based on absences from non-target species. Future research will consider the potential for improving the proposed bias-correction approach by estimating the observer bias simultaneously across multiple species. © 2013 Warton et al.
Published: 2013

14. Mvabund- an R package for model-based analysis of multivariate abundance data

Author: Wang, Y, Naumann, U, Wright, ST, Warton, DI, Wang, Y, Naumann, U, Wright, ST, and Warton, DI
Abstract: 1. The mvabund package for R provides tools for model-based analysis of multivariate abundance data in ecology. 2.This includes methods for visualising data, fitting predictive models, checking model assumptions, as well as testing hypotheses about the community-environment association. 3.This paper briefly introduces the package and demonstrates its functionality by example. Video © 2012 The Authors. Methods in Ecology and Evolution © 2012 British Ecological Society.
Published: 2012

15. Human lower leg muscles grow asynchronously.

Author: Chow BVY, Morgan C, Rae C, Warton DI, Novak I, Davies S, Lancaster A, Popovic GC, Rizzo RRN, Rizzo CY, Kyriagis M, Herbert RD, and Bolsterlee B
Subjects: Male, Child, Female, Humans, Child, Preschool, Quality of Life, Muscle, Skeletal pathology, Lower Extremity, Magnetic Resonance Imaging methods, Leg, Artificial Intelligence
Abstract: Muscle volume must increase substantially during childhood growth to generate the power required to propel the growing body. One unresolved but fundamental question about childhood muscle growth is whether muscles grow at equal rates; that is, if muscles grow in synchrony with each other. In this study, we used magnetic resonance imaging (MRI) and advances in artificial intelligence methods (deep learning) for medical image segmentation to investigate whether human lower leg muscles grow in synchrony. Muscle volumes were measured in 10 lower leg muscles in 208 typically developing children (eight infants aged less than 3 months and 200 children aged 5 to 15 years). We tested the hypothesis that human lower leg muscles grow synchronously by investigating whether the volume of individual lower leg muscles, expressed as a proportion of total lower leg muscle volume, remains constant with age. There were substantial age-related changes in the relative volume of most muscles in both boys and girls (p < 0.001). This was most evident between birth and five years of age but was still evident after five years. The medial gastrocnemius and soleus muscles, the largest muscles in infancy, grew faster than other muscles in the first five years. The findings demonstrate that muscles in the human lower leg grow asynchronously. This finding may assist early detection of atypical growth and allow targeted muscle-specific interventions to improve the quality of life, particularly for children with neuromotor conditions such as cerebral palsy., (© 2023 The Authors. Journal of Anatomy published by John Wiley & Sons Ltd on behalf of Anatomical Society.)
Published: 2024
Full Text: View/download PDF

16. Novel community data in ecology-properties and prospects.

Author: Hartig F, Abrego N, Bush A, Chase JM, Guillera-Arroita G, Leibold MA, Ovaskainen O, Pellissier L, Pichler M, Poggiato G, Pollock L, Si-Moussi S, Thuiller W, Viana DS, Warton DI, Zurell D, and Yu DW
Subjects: DNA, Policy, Ecosystem, Biodiversity
Abstract: New technologies for monitoring biodiversity such as environmental (e)DNA, passive acoustic monitoring, and optical sensors promise to generate automated spatiotemporal community observations at unprecedented scales and resolutions. Here, we introduce 'novel community data' as an umbrella term for these data. We review the emerging field around novel community data, focusing on new ecological questions that could be addressed; the analytical tools available or needed to make best use of these data; and the potential implications of these developments for policy and conservation. We conclude that novel community data offer many opportunities to advance our understanding of fundamental ecological processes, including community assembly, biotic interactions, micro- and macroevolution, and overall ecosystem functioning., Competing Interests: Declaration of interests D.W.Y. is a cofounder of NatureMetrics, which provides commercial eDNA services and is a minor shareholder in the NatureSpace Partnership. No further conflicts of interest to declare., (Copyright © 2023 Elsevier Ltd. All rights reserved.)
Published: 2024
Full Text: View/download PDF

17. A general algorithm for error-in-variables regression modelling using Monte Carlo expectation maximization.

Author: Stoklosa J, Hwang WH, and Warton DI
Subjects: Likelihood Functions, Linear Models, Computer Simulation, Monte Carlo Method, Models, Statistical, Motivation, Algorithms
Abstract: In regression modelling, measurement error models are often needed to correct for uncertainty arising from measurements of covariates/predictor variables. The literature on measurement error (or errors-in-variables) modelling is plentiful, however, general algorithms and software for maximum likelihood estimation of models with measurement error are not as readily available, in a form that they can be used by applied researchers without relatively advanced statistical expertise. In this study, we develop a novel algorithm for measurement error modelling, which could in principle take any regression model fitted by maximum likelihood, or penalised likelihood, and extend it to account for uncertainty in covariates. This is achieved by exploiting an interesting property of the Monte Carlo Expectation-Maximization (MCEM) algorithm, namely that it can be expressed as an iteratively reweighted maximisation of complete data likelihoods (formed by imputing the missing values). Thus we can take any regression model for which we have an algorithm for (penalised) likelihood estimation when covariates are error-free, nest it within our proposed iteratively reweighted MCEM algorithm, and thus account for uncertainty in covariates. The approach is demonstrated on examples involving generalized linear models, point process models, generalized additive models and capture-recapture models. Because the proposed method uses maximum (penalised) likelihood, it inherits advantageous optimality and inferential properties, as illustrated by simulation. We also study the model robustness of some violations in predictor distributional assumptions. Software is provided as the refitME package on R, whose key function behaves like a refit() function, taking a fitted regression model object and re-fitting with a pre-specified amount of measurement error., Competing Interests: The authors have declared that no competing interests exist., (Copyright: © 2023 Stoklosa et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
Published: 2023
Full Text: View/download PDF

18. Leaf economics fundamentals explained by optimality principles.

Author: Wang H, Prentice IC, Wright IJ, Warton DI, Qiao S, Xu X, Zhou J, Kikuzawa K, and Stenseth NC
Abstract: The life span of leaves increases with their mass per unit area (LMA). It is unclear why. Here, we show that this empirical generalization (the foundation of the worldwide leaf economics spectrum) is a consequence of natural selection, maximizing average net carbon gain over the leaf life cycle. Analyzing two large leaf trait datasets, we show that evergreen and deciduous species with diverse construction costs (assumed proportional to LMA) are selected by light, temperature, and growing-season length in different, but predictable, ways. We quantitatively explain the observed divergent latitudinal trends in evergreen and deciduous LMA and show how local distributions of LMA arise by selection under different environmental conditions acting on the species pool. These results illustrate how optimality principles can underpin a new theory for plant geography and terrestrial carbon dynamics.
Published: 2023
Full Text: View/download PDF

19. Generalized Matrix Factorization: efficient algorithms for fitting generalized linear latent variable models to large data arrays.

Author: Kidziński Ł, Hui FKC, Warton DI, and Hastie T
Abstract: Unmeasured or latent variables are often the cause of correlations between multivariate measurements, which are studied in a variety of fields such as psychology, ecology, and medicine. For Gaussian measurements, there are classical tools such as factor analysis or principal component analysis with a well-established theory and fast algorithms. Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses. However, current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets with thousands of observational units or responses. In this article, we propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood and then using a Newton method and Fisher scoring to learn the model parameters. Computationally, our method is noticeably faster and more stable, enabling GLLVM fits to much larger matrices than previously possible. We apply our method on a dataset of 48,000 observational units with over 2,000 observed species in each unit and find that most of the variability can be explained with a handful of factors. We publish an easy-to-use implementation of our proposed fitting algorithm.
Published: 2022

20. Selecting the model for multiple imputation of missing data: Just use an IC!

Author: Noghrehchi F, Stoklosa J, Penev S, and Warton DI
Subjects: Bayes Theorem, Bias, Computer Simulation, Humans, Likelihood Functions, Algorithms
Abstract: Multiple imputation and maximum likelihood estimation (via the expectation-maximization algorithm) are two well-known methods readily used for analyzing data with missing values. While these two methods are often considered as being distinct from one another, multiple imputation (when using improper imputation) is actually equivalent to a stochastic expectation-maximization approximation to the likelihood. In this article, we exploit this key result to show that familiar likelihood-based approaches to model selection, such as Akaike's information criterion (AIC) and the Bayesian information criterion (BIC), can be used to choose the imputation model that best fits the observed data. Poor choice of imputation model is known to bias inference, and while sensitivity analysis has often been used to explore the implications of different imputation models, we show that the data can be used to choose an appropriate imputation model via conventional model selection tools. We show that BIC can be consistent for selecting the correct imputation model in the presence of missing data. We verify these results empirically through simulation studies, and demonstrate their practicality on two classical missing data examples. An interesting result we saw in simulations was that not only can parameter estimates be biased by misspecifying the imputation model, but also by overfitting the imputation model. This emphasizes the importance of using model selection not just to choose the appropriate type of imputation model, but also to decide on the appropriate level of imputation model complexity., (© 2021 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.)
Published: 2021
Full Text: View/download PDF

21. Modeling recreational fishing intensity in a complex urbanised estuary.

Author: Griffin KJ, Hedge LH, Warton DI, Astles KL, and Johnston EL
Subjects: Biodiversity, Fisheries, Humans, Recreation, Conservation of Natural Resources, Estuaries
Abstract: Urbanised estuaries, ports and harbours are often utilised for recreational purposes, notably recreational angling. Yet there has been little quantitative assessment of the footprint and intensity of these activities at scales suitable for spatial management. Urban and industrialised estuaries have previously been considered as having low conservation value, perhaps due to issues with contamination and disturbance. Studies in recent decades have demonstrated that many of these systems are still highly biodiverse and of high value to local residents. As a response, urbanised estuaries are now being considered by coastal spatial management initiatives, where assessments of recreational use in these areas can help avoid 'user-environmental' and 'user-user' conflict. The models of these activities need to be developed at a scale relevant to governments and regulatory authorities, but the few human-use models that do exist integrate fishing intensity to a regional or even continental scale; too large to capture the fine scale variation inherent in complex urban fisheries. Species Distribution Modeling (SDM) is a tool commonly used to assess drivers of species range, but can be applied to models of recreational fishing in complex environments, at a scale relevant to regulatory bodies. Using point-data from 573 visual surveys with recently developed Poisson point process models, we examine the recreational fishery in Australia's busiest estuarine port, Sydney Harbour. We demonstrate the utility of these models for understanding the distribution of boat and shore-based fishers, and the effects of a range of temporally static (geographical) and dynamic (weather) predictors on these distributions., (Copyright © 2020. Published by Elsevier Ltd.)
Published: 2021
Full Text: View/download PDF

22. Efficient estimation of generalized linear latent variable models.

Author: Niku J, Brooks W, Herliansyah R, Hui FKC, Taskinen S, and Warton DI
Subjects: Algorithms, Computer Simulation, Data Interpretation, Statistical, Likelihood Functions, Software, Linear Models, Multivariate Analysis
Abstract: Generalized linear latent variable models (GLLVM) are popular tools for modeling multivariate, correlated responses. Such data are often encountered, for instance, in ecological studies, where presence-absences, counts, or biomass of interacting species are collected from a set of sites. Until very recently, the main challenge in fitting GLLVMs has been the lack of computationally efficient estimation methods. For likelihood based estimation, several closed form approximations for the marginal likelihood of GLLVMs have been proposed, but their efficient implementations have been lacking in the literature. To fill this gap, we show in this paper how to obtain computationally convenient estimation algorithms based on a combination of either the Laplace approximation method or variational approximation method, and automatic optimization techniques implemented in R software. An extensive set of simulation studies is used to assess the performances of different methods, from which it is shown that the variational approximation method used in conjunction with automatic optimization offers a powerful tool for estimation., Competing Interests: The authors have declared that no competing interests exist.
Published: 2019
Full Text: View/download PDF

23. Order selection and sparsity in latent variable models via the ordered factor LASSO.

Author: Hui FKC, Tanaka E, and Warton DI
Subjects: Animals, Aquatic Organisms, Computer Simulation statistics & numerical data, Likelihood Functions, Oceans and Seas, Biometry methods, Factor Analysis, Statistical
Abstract: Generalized linear latent variable models (GLLVMs) offer a general framework for flexibly analyzing data involving multiple responses. When fitting such models, two of the major challenges are selecting the order, that is, the number of factors, and an appropriate structure for the loading matrix, typically a sparse structure. Motivated by the application of GLLVMs to study marine species assemblages in the Southern Ocean, we propose the Ordered Factor LASSO or OFAL penalty for order selection and achieving sparsity in GLLVMs. The OFAL penalty is the first penalty developed specifically for order selection in latent variable models, and achieves this by using a hierarchically structured group LASSO type penalty to shrink entire columns of the loading matrix to zero, while ensuring that non-zero loadings are concentrated on the lower-order factors. Simultaneously, individual element sparsity is achieved through the use of an adaptive LASSO. In conjunction with using an information criterion which promotes aggressive shrinkage, simulation shows that the OFAL penalty performs strongly compared with standard methods and penalties for order selection, achieving sparsity, and prediction in GLLVMs. Applying the OFAL penalty to the Southern Ocean marine species dataset suggests the available environmental predictors explain roughly half of the total covariation between species, thus leading to a smaller number of latent variables and increased sparsity in the loading matrix compared to a model without any covariates., (© 2018, The International Biometric Society.)
Published: 2018
Full Text: View/download PDF

24. Why you cannot transform your way out of trouble for small counts.

Author: Warton DI
Subjects: Ecology, Linear Models, Multivariate Analysis, Data Interpretation, Statistical, Models, Statistical
Abstract: While data transformation is a common strategy to satisfy linear modeling assumptions, a theoretical result is used to show that transformation cannot reasonably be expected to stabilize variances for small counts. Under broad assumptions, as counts get smaller, it is shown that the variance becomes proportional to the mean under monotonic transformations g(·) that satisfy g(0)=0, excepting a few pathological cases. A suggested rule-of-thumb is that if many predicted counts are less than one then data transformation cannot reasonably be expected to stabilize variances, even for a well-chosen transformation. This result has clear implications for the analysis of counts as often implemented in the applied sciences, but particularly for multivariate analysis in ecology. Multivariate discrete data are often collected in ecology, typically with a large proportion of zeros, and it is currently widespread to use methods of analysis that do not account for differences in variance across observations nor across responses. Simulations demonstrate that failure to account for the mean-variance relationship can have particularly severe consequences in this context, and also in the univariate context if the sampling design is unbalanced., (© 2017 The Authors. Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society.)
Published: 2018
Full Text: View/download PDF

25. The PIT-trap-A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses.

Author: Warton DI, Thibaut L, and Wang YA
Subjects: Computer Simulation, Ecology methods, Probability, Research Design, Models, Statistical, Multivariate Analysis, Statistical Distributions
Abstract: Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
Published: 2017
Full Text: View/download PDF

26. Extending Joint Models in Community Ecology: A Response to Beissinger et al.

Author: Warton DI, Blanchet FG, O'Hara R, Ovaskainen O, Taskinen S, Walker SC, and Hui FKC
Subjects: Ecology
Published: 2016
Full Text: View/download PDF

27. So Many Variables: Joint Modeling in Community Ecology.

Author: Warton DI, Blanchet FG, O'Hara RB, Ovaskainen O, Taskinen S, Walker SC, and Hui FKC
Subjects: Ecosystem, Linear Models, Biota, Models, Statistical
Abstract: Technological advances have enabled a new class of multivariate models for ecology, with the potential now to specify a statistical model for abundances jointly across many taxa, to simultaneously explore interactions across taxa and the response of abundance to environmental variables. Joint models can be used for several purposes of interest to ecologists, including estimating patterns of residual correlation across taxa, ordination, multivariate inference about environmental effects and environment-by-trait interactions, accounting for missing predictors, and improving predictions in situations where one can leverage knowledge of some species to predict others. We demonstrate this by example and discuss recent computation tools and future directions., (Copyright © 2015 Elsevier Ltd. All rights reserved.)
Published: 2015
Full Text: View/download PDF

28. Does morphology predict trophic position and habitat use of ant species and assemblages?

Author: Gibb H, Stoklosa J, Warton DI, Brown AM, Andrew NR, and Cunningham SA
Subjects: Animals, Ants anatomy & histology, Arthropods, Australia, Carbon analysis, Carbon chemistry, Carbon metabolism, Nitrogen analysis, Nitrogen chemistry, Nitrogen metabolism, Phylogeny, Ants physiology, Ecosystem, Food Chain
Abstract: A functional traits-based theory of organismal communities is critical for understanding the principles underlying community assembly, and predicting responses to environmental change. This is particularly true for terrestrial arthropods, of which only 20% are described. Using epigaeic ant assemblages, we asked: (1) can we use morphological variation among species to predict trophic position or preferred microhabitat; (2) does the strength of morphological associations suggest recent trait divergence; (3) do environmental variables at site scale predict trait sets for whole assemblages? We pitfall-trapped ants from a revegetation chronosequence and measured their morphology, trophic position [using C:N stoichiometry and stable isotope ratios (δ)] and characteristics of microhabitat and macrohabitat. We found strong associations between high trophic position (low C:N and high δ(15)N) in body tissue and morphological traits: predators were larger, had more laterally positioned eyes, more physical protection and tended to be monomorphic. In addition, morphological traits were associated with certain microhabitat features, e.g. smaller heads were associated with the bare ground microhabitat. Trait-microhabitat relationships were more pronounced when phylogenetic adjustments were used, indicating a strong influence of recent trait divergences. At the assemblage level, our fourth corner analysis revealed associations between the prevalence of traits and macrohabitat, although these associations were not the same as those based on microhabitat associations. This study shows direct links between species-level traits and both diet and habitat preference. Trait-based prediction of ecological roles and community structure is thus achievable when integrating stoichiometry, morphology and phylogeny, but scale is an important consideration in such predictions.
Published: 2015
Full Text: View/download PDF

29. Fast forward selection for generalized estimating equations with a large number of predictor variables.

Author: Stoklosa J, Gibb H, and Warton DI
Subjects: Animals, Arthropods growth & development, Australia, Computer Simulation, Ecosystem, Algorithms, Data Interpretation, Statistical, Likelihood Functions, Longitudinal Studies methods, Models, Statistical
Abstract: We propose a new variable selection criterion designed for use with forward selection algorithms; the score information criterion (SIC). The proposed criterion is based on score statistics which incorporate correlated response data. The main advantage of the SIC is that it is much faster to compute than existing model selection criteria when the number of predictor variables added to a model is large, this is because SIC can be computed for all candidate models without actually fitting them. A second advantage is that it incorporates the correlation between variables into its quasi-likelihood, leading to more desirable properties than competing selection criteria. Consistency and prediction properties are shown for the SIC. We conduct simulation studies to evaluate the selection and prediction performances, and compare these, as well as computational times, with some well-known variable selection criteria. We apply the SIC on a real data set collected on arthropods by considering variable selection on a large number of interactions terms consisting of species traits and environmental covariates., (© 2013, The International Biometric Society.)
Published: 2014
Full Text: View/download PDF

30. Model-based control of observer bias for the analysis of presence-only data in ecology.

Author: Warton DI, Renner IW, and Ramp D
Subjects: Australia, Observer Variation, Ecosystem, Models, Biological, Plant Physiological Phenomena, Plants
Abstract: Presence-only data, where information is available concerning species presence but not species absence, are subject to bias due to observers being more likely to visit and record sightings at some locations than others (hereafter "observer bias"). In this paper, we describe and evaluate a model-based approach to accounting for observer bias directly--by modelling presence locations as a function of known observer bias variables (such as accessibility variables) in addition to environmental variables, then conditioning on a common level of bias to make predictions of species occurrence free of such observer bias. We implement this idea using point process models with a LASSO penalty, a new presence-only method related to maximum entropy modelling, that implicitly addresses the "pseudo-absence problem" of where to locate pseudo-absences (and how many). The proposed method of bias-correction is evaluated using systematically collected presence/absence data for 62 plant species endemic to the Blue Mountains near Sydney, Australia. It is shown that modelling and controlling for observer bias significantly improves the accuracy of predictions made using presence-only data, and usually improves predictions as compared to pseudo-absence or "inventory" methods of bias correction based on absences from non-target species. Future research will consider the potential for improving the proposed bias-correction approach by estimating the observer bias simultaneously across multiple species.
Published: 2013
Full Text: View/download PDF

31. Robust tests for one or more allometric lines.

Author: Taskinen S and Warton DI
Subjects: Models, Theoretical
Abstract: In allometry, the study of how size variables scale against each other, it is often of interest to fit lines to bivariate data and test hypotheses about slope and elevation about one or several lines. The nature of the problem suggests that bivariate techniques related to principal component analysis are more appropriate than linear regression. Inference methods have been developed for this problem and are in widespread use, however, we demonstrate that such methods are not robust to bivariate contamination, and propose alternative approaches which are. The new approaches use Huber's M-estimator via a plug-in approach, where robust test procedures have the same form as classical ones, but where we plug in robust estimators of parameters and standard errors in place of classical estimators. Simulations demonstrate that these new procedures are robust against bivariate contamination, and can make accurate inferences even from small samples., (Copyright © 2013 Elsevier Ltd. All rights reserved.)
Published: 2013
Full Text: View/download PDF

32. To mix or not to mix: comparing the predictive performance of mixture models vs. separate species distribution models.

Author: Hui FK, Warton DI, Foster SD, and Dunstan PK
Subjects: Animals, Computer Simulation, Demography, Species Specificity, Temperature, Models, Biological
Abstract: Species distribution models (SDMs) are an important tool for studying the patterns of species across environmental and geographic space. For community data, a common approach involves fitting an SDM to each species separately, although the large number of models makes interpretation difficult and fails to exploit any similarities between individual species responses. A recently proposed alternative that can potentially overcome these difficulties is species archetype models (SAMs), a model-based approach that clusters species based on their environmental response. In this paper, we compare the predictive performance of SAMs against separate SDMs using a number of multi-species data sets. Results show that SAMs improve model accuracy and discriminatory capacity compared to separate SDMs. This is achieved by borrowing strength from common species having higher information content. Moreover, the improvement increases as the species become rarer.
Published: 2013
Full Text: View/download PDF

33. Are introduced species better dispersers than native species? A global comparative study of seed dispersal distance.

Author: Flores-Moreno H, Thomson FJ, Warton DI, and Moles AT
Subjects: Animals, Ecosystem, Germination physiology, Magnoliopsida classification, Magnoliopsida growth & development, Magnoliopsida physiology, Models, Biological, Population Dynamics, Seedlings growth & development, Seeds growth & development, Introduced Species, Seed Dispersal physiology, Seedlings physiology, Seeds physiology
Abstract: We provide the first global test of the idea that introduced species have greater seed dispersal distances than do native species, using data for 51 introduced and 360 native species from the global literature. Counter to our expectations, there was no significant difference in mean or maximum dispersal distance between introduced and native species. Next, we asked whether differences in dispersal distance might have been obscured by differences in seed mass, plant height and dispersal syndrome, all traits that affect dispersal distance and which can differ between native and introduced species. When we included all three variables in the model, there was no clear difference in dispersal distance between introduced and native species. These results remained consistent when we performed analyses including a random effect for site. Analyses also showed that the lack of a significant difference in dispersal distance was not due to differences in biome, taxonomic composition, growth form, nitrogen fixation, our inclusion of non-invasive introduced species, or our exclusion of species with human-assisted dispersal. Thus, if introduced species do have higher spread rates, it seems likely that these are driven by differences in post-dispersal processes such as germination, seedling survival, and survival to reproduction.
Published: 2013
Full Text: View/download PDF

34. Equivalence of MAXENT and Poisson point process models for species distribution modeling in ecology.

Author: Renner IW and Warton DI
Subjects: Animals, Eucalyptus growth & development, New South Wales, Software, Ecology methods, Ecosystem, Models, Biological, Models, Statistical
Abstract: Modeling the spatial distribution of a species is a fundamental problem in ecology. A number of modeling methods have been developed, an extremely popular one being MAXENT, a maximum entropy modeling approach. In this article, we show that MAXENT is equivalent to a Poisson regression model and hence is related to a Poisson point process model, differing only in the intercept term, which is scale-dependent in MAXENT. We illustrate a number of improvements to MAXENT that follow from these relations. In particular, a point process model approach facilitates methods for choosing the appropriate spatial resolution, assessing model adequacy, and choosing the LASSO penalty parameter, all currently unavailable to MAXENT. The equivalence result represents a significant step in the unification of the species distribution modeling literature., (Copyright © 2013, The International Biometric Society.)
Published: 2013
Full Text: View/download PDF

35. Putting plant resistance traits on the map: a test of the idea that plants are better defended at lower latitudes.

Author: Moles AT, Wallis IR, Foley WJ, Warton DI, Stegen JC, Bisigato AJ, Cella-Pizarro L, Clark CJ, Cohen PS, Cornwell WK, Edwards W, Ejrnaes R, Gonzales-Ojeda T, Graae BJ, Hay G, Lumbwe FC, Magaña-Rodríguez B, Moore BD, Peri PL, Poulsen JR, Veldtman R, von Zeipel H, Andrew NR, Boulter SL, Borer ET, Campón FF, Coll M, Farji-Brener AG, De Gabriel J, Jurado E, Kyhn LA, Low B, Mulder CPH, Reardon-Smith K, Rodríguez-Velázquez J, Seabloom EW, Vesk PA, van Cauter A, Waldram MS, Zheng Z, Blendinger PG, Enquist BJ, Facelli JM, Knight T, Majer JD, Martínez-Ramos M, McQuillan P, and Prior LD
Subjects: Animals, Cyanides analysis, Environment, Geography, Lipids analysis, Phenotype, Plant Immunity, Plant Leaves anatomy & histology, Plant Leaves chemistry, Plants anatomy & histology, Plants chemistry, Species Specificity, Tannins analysis, Plant Diseases immunology, Plant Leaves immunology, Plants immunology
Abstract: • It has long been believed that plant species from the tropics have higher levels of traits associated with resistance to herbivores than do species from higher latitudes. A meta-analysis recently showed that the published literature does not support this theory. However, the idea has never been tested using data gathered with consistent methods from a wide range of latitudes. • We quantified the relationship between latitude and a broad range of chemical and physical traits across 301 species from 75 sites world-wide. • Six putative resistance traits, including tannins, the concentration of lipids (an indicator of oils, waxes and resins), and leaf toughness were greater in high-latitude species. Six traits, including cyanide production and the presence of spines, were unrelated to latitude. Only ash content (an indicator of inorganic substances such as calcium oxalates and phytoliths) and the properties of species with delayed greening were higher in the tropics. • Our results do not support the hypothesis that tropical plants have higher levels of resistance traits than do plants from higher latitudes. If anything, plants have higher resistance toward the poles. The greater resistance traits of high-latitude species might be explained by the greater cost of losing a given amount of leaf tissue in low-productivity environments., (© 2011 The Authors. New Phytologist © 2011 New Phytologist Trust.)
Published: 2011
Full Text: View/download PDF

36. Robust estimation and inference for bivariate line-fitting in allometry.

Author: Taskinen S and Warton DI
Subjects: Analysis of Variance, Body Size, Probability, Biostatistics methods
Abstract: In allometry, bivariate techniques related to principal component analysis are often used in place of linear regression, and primary interest is in making inferences about the slope. We demonstrate that the current inferential methods are not robust to bivariate contamination, and consider four robust alternatives to the current methods -- a novel sandwich estimator approach, using robust covariance matrices derived via an influence function approach, Huber's M-estimator and the fast-and-robust bootstrap. Simulations demonstrate that Huber's M-estimators are highly efficient and robust against bivariate contamination, and when combined with the fast-and-robust bootstrap, we can make accurate inferences even from small samples., (Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
Published: 2011
Full Text: View/download PDF

37. Global patterns of leaf mechanical properties.

Author: Onoda Y, Westoby M, Adler PB, Choong AM, Clissold FJ, Cornelissen JH, Díaz S, Dominy NJ, Elgart A, Enrico L, Fine PV, Howard JJ, Jalili A, Kitajima K, Kurokawa H, McArthur C, Lucas PW, Markesteijn L, Pérez-Harguindeguy N, Poorter L, Richards L, Santiago LS, Sosinski EE Jr, Van Bael SA, Warton DI, Wright IJ, Wright SJ, and Yamashita N
Subjects: Light, Plant Leaves physiology, Plant Physiological Phenomena, Plants anatomy & histology, Rain, Tropical Climate, Biomechanical Phenomena, Plant Leaves anatomy & histology, Stress, Mechanical
Abstract: Leaf mechanical properties strongly influence leaf lifespan, plant-herbivore interactions, litter decomposition and nutrient cycling, but global patterns in their interspecific variation and underlying mechanisms remain poorly understood. We synthesize data across the three major measurement methods, permitting the first global analyses of leaf mechanics and associated traits, for 2819 species from 90 sites worldwide. Key measures of leaf mechanical resistance varied c. 500-800-fold among species. Contrary to a long-standing hypothesis, tropical leaves were not mechanically more resistant than temperate leaves. Leaf mechanical resistance was modestly related to rainfall and local light environment. By partitioning leaf mechanical resistance into three different components we discovered that toughness per density contributed a surprisingly large fraction to variation in mechanical resistance, larger than the fractions contributed by lamina thickness and tissue density. Higher toughness per density was associated with long leaf lifespan especially in forest understory. Seldom appreciated in the past, toughness per density is a key factor in leaf mechanical resistance, which itself influences plant-animal interactions and ecosystem functions across the globe., (© 2011 Blackwell Publishing Ltd/CNRS.)
Published: 2011
Full Text: View/download PDF

38. Regularized sandwich estimators for analysis of high-dimensional data using generalized estimating equations.

Author: Warton DI
Subjects: Animals, Computer Simulation, Algorithms, Biometry methods, Data Interpretation, Statistical, Ecosystem, Models, Statistical, Nematoda growth & development, Population Growth
Abstract: A modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high-dimensional data, with particular interest in multivariate abundance data in ecology, an important application of interest in thousands of environmental science studies. Such data are typically counts characterized by high dimensionality (in the sense that cluster size exceeds number of clusters, n>K) and over-dispersion relative to the Poisson distribution. Usual GEE methods cannot be applied in this setting primarily because sandwich estimators become numerically unstable as n increases. We propose instead using a regularized sandwich estimator that assumes a common correlation matrix R, and shrinks the sample estimate of R toward the working correlation matrix to improve its numerical stability. It is shown via theory and simulation that this substantially improves the power of Wald statistics when cluster size is not small. We apply the proposed approach to study the effects of nutrient addition on nematode communities, and in doing so discuss important issues in implementation, such as using statistics that have good properties when parameter estimates approach the boundary (), and using resampling to enable valid inference that is robust to high dimensionality and to possible model misspecification., (© 2010, The International Biometric Society.)
Published: 2011
Full Text: View/download PDF

39. The arcsine is asinine: the analysis of proportions in ecology.

Author: Warton DI and Hui FK
Subjects: Computer Simulation, Ecology methods, Ecosystem, Models, Biological, Statistics as Topic
Abstract: The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.
Published: 2011
Full Text: View/download PDF

40. Robustness to failure of assumptions of tests for a common slope amongst several allometric lines--a simulation study.

Author: Warton DI
Subjects: Computer Simulation, Monte Carlo Method, Nonlinear Dynamics, Plant Leaves anatomy & histology, Biometry methods, Regression Analysis
Abstract: In allometry, researchers are commonly interested in estimating the slope of the major axis or standardized major axis (methods of bivariate line fitting related to principal components analysis). This study considers the robustness of two tests for a common slope amongst several axes. It is of particular interest to measure the robustness of these tests to slight violations of assumptions that may not be readily detected in sample datasets. Type I error is estimated in simulations of data generated with varying levels of nonnormality, heteroscedasticity and nonlinearity. The assumption failures introduced in simulations were difficult to detect in a moderately sized dataset, with an expert panel only able to correct detect assumption violations 34-45% of the time. While the common slope tests were robust to nonnormal and heteroscedastic errors from the line, Type I error was inflated if the two variables were related in a slightly nonlinear fashion. Similar results were also observed for the linear regression case. The common slope tests were more liberal when the simulated data had greater nonlinearity, and this effect was more evident when the underlying distribution had longer tails than the normal. This result raises concerns for common slopes testing, as slight nonlinearities such as those in simulations are often undetectable in moderately sized datasets. Consequently, practitioners should take care in checking for nonlinearity and interpreting the results of a test for common slope. This work has implications for the robustness of inference in linear models in general.
Published: 2007
Full Text: View/download PDF

41. Global meta-analysis shows that relationships of leaf mass per area with species shade tolerance depend on leaf habit and ontogeny.

Author: Lusk CH and Warton DI
Subjects: Darkness, Plant Leaves classification, Plant Leaves radiation effects, Light, Plant Leaves anatomy & histology, Plant Leaves physiology
Abstract: It was predicted that relationships of leaf mass per area (LMA) with juvenile shade tolerance will depend on leaf habit, and on whether species are compared at a common age as young seedlings, or at a common size as saplings. A meta-analysis of 47 comparative studies (372 species) was used to test predictions, and the effect of light environment on this relationship. The LMA of evergreens was positively correlated with shade tolerance, irrespective of ontogeny or light environment. The LMA of young seedlings (
Published: 2007
Full Text: View/download PDF

42. Evidence at hand: Diversity, functional implications, and locomotor prediction in intrinsic hand proportions of diprotodontian marsupials.

Author: Weisbecker V and Warton DI
Subjects: Animals, Biomechanical Phenomena, Forelimb physiology, Locomotion, Marsupialia physiology, Adaptation, Physiological, Forelimb anatomy & histology, Marsupialia anatomy & histology
Abstract: Knowledge about the diversity, locomotor adaptations, and evolution of the marsupial forelimb is limited, resulting in an underrepresentation of marsupials in comparative anatomical literature on mammalian forelimb anatomy. This study investigated hand proportions in the diverse marsupial order Diprotodontia. Fifty-two measurements of 95 specimens representing 47 species, as well as 6 non-diprotodontian specimens, were explored using principal components analysis (PCA). Bootstrapping was used to assess the reliability of the loadings. Phylogenetically independent contrasts and phylogenetic ANOVA were used to test for correlation with size and functional adaptation of forelimbs for locomotor habit, scored as arboreal vs. terrestrial. Analysis of first principal component (PC1) scores revealed significant differences between arboreal and terrestrial species, and was related to relative slenderness of their phalangeal elements. Both locomotor groups displayed allometry along PC1 scores, but with different intercepts such that PC1 discriminated between the two locomotor habits almost completely. PC2 separated some higher-level clades and burrowing species. Analysis of locomotor predictors commonly applied by palaeontologists indicates that ratios between proximal and intermediate phalanges were unsuitable as predictors of arboreality/terrestriality, but the phalangeal index was more effective. From PCA results, a phalangeal slenderness ratio was developed which proved to be a useful discriminator, suggesting that a single unallocated phalanx can be used for an impression of locomotor mode in fossils. Most Diprotodontia are laterally paraxonic or ectaxonic, with the exception of digging species whose hands are medially paraxonic. Our results complement those of studies on placental mammals, suggesting that the demands of arboreality, terrestriality, or frequent digging on intrinsic hand proportions are met with similar anatomical adaptations in marsupials., (Copyright 2006 Wiley-Liss, Inc.)
Published: 2006
Full Text: View/download PDF

43. Bivariate line-fitting methods for allometry.

Author: Warton DI, Wright IJ, Falster DS, and Westoby M
Subjects: Bias, Computer Simulation, Humans, Regression Analysis, Algorithms, Data Interpretation, Statistical, Models, Statistical
Abstract: Fitting a line to a bivariate dataset can be a deceptively complex problem, and there has been much debate on this issue in the literature. In this review, we describe for the practitioner the essential features of line-fitting methods for estimating the relationship between two variables: what methods are commonly used, which method should be used when, and how to make inferences from these lines to answer common research questions. A particularly important point for line-fitting in allometry is that usually, two sources of error are present (which we call measurement and equation error), and these have quite different implications for choice of line-fitting method. As a consequence, the approach in this review and the methods presented have subtle but important differences from previous reviews in the biology literature. Linear regression, major axis and standardised major axis are alternative methods that can be appropriate when there is no measurement error. When there is measurement error, this often needs to be estimated and used to adjust the variance terms in formulae for line-fitting. We also review line-fitting methods for phylogenetic analyses. Methods of inference are described for the line-fitting techniques discussed in this paper. The types of inference considered here are testing if the slope or elevation equals a given value, constructing confidence intervals for the slope or elevation, comparing several slopes or elevations, and testing for shift along the axis amongst several groups. In some cases several methods have been proposed in the literature. These are discussed and compared. In other cases there is little or no previous guidance available in the literature. Simulations were conducted to check whether the methods of inference proposed have the intended coverage probability or Type I error. We identified the methods of inference that perform well and recommend the techniques that should be adopted in future work.
Published: 2006
Full Text: View/download PDF

44. Assessing the generality of global leaf trait relationships.

Author: Wright IJ, Reich PB, Cornelissen JH, Falster DS, Garnier E, Hikosaka K, Lamont BB, Lee W, Oleksyn J, Osada N, Poorter H, Villar R, Warton DI, and Westoby M
Subjects: Biological Evolution, Ecosystem, Nitrogen physiology, Phosphorus physiology, Photosynthesis physiology, Plant Leaves growth & development, Plant Leaves metabolism, Potassium physiology, Plant Leaves physiology
Abstract: Global-scale quantification of relationships between plant traits gives insight into the evolution of the world's vegetation, and is crucial for parameterizing vegetation-climate models. A database was compiled, comprising data for hundreds to thousands of species for the core 'leaf economics' traits leaf lifespan, leaf mass per area, photosynthetic capacity, dark respiration, and leaf nitrogen and phosphorus concentrations, as well as leaf potassium, photosynthetic N-use efficiency (PNUE), and leaf N : P ratio. While mean trait values differed between plant functional types, the range found within groups was often larger than differences among them. Future vegetation-climate models could incorporate this knowledge. The core leaf traits were intercorrelated, both globally and within plant functional types, forming a 'leaf economics spectrum'. While these relationships are very general, they are not universal, as significant heterogeneity exists between relationships fitted to individual sites. Much, but not all, heterogeneity can be explained by variation in sample size alone. PNUE can also be considered as part of this trait spectrum, whereas leaf K and N : P ratios are only loosely related.
Published: 2005
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

44 results on '"Warton, DI"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources