Author: "Valavi, Roozbeh" / Topic: random forest algorithms - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Valavi, Roozbeh"' showing total 4 results

Start Over Author "Valavi, Roozbeh" Topic random forest algorithms

4 results on '"Valavi, Roozbeh"'

1. Flexible species distribution modelling methods perform well on spatially separated testing data.

Author: Valavi, Roozbeh, Elith, Jane, Lahoz‐Monfort, José J., and Guillera‐Arroita, Gurutzeta
Subjects: *SPECIES distribution, *RECEIVER operating characteristic curves, *SUPPORT vector machines, *RANDOM forest algorithms, *REGRESSION trees
Abstract: Aim: To assess whether flexible species distribution models that perform well at nearby testing locations still perform strongly when evaluated on spatially separated testing data. Location: Australian Wet Tropics (AWT), Ontario, Canada (CAN), north‐east New South Wales, Australia (NSW), New Zealand (NZ), five countries of South America (SA), and Switzerland (SWI). Time period: Most species data were collected between 1950 and 2000. Major taxa studied: Birds, mammals, plants and reptiles. Methods: We compared 10 species distribution modelling methods with varying flexibility in terms of the allowed complexity of their fitted functions [boosted regression trees (BRT), generalized additive model (GAM), multivariate adaptive regression splines (MARS), maximum entropy (MaxEnt), support vector machine (SVM), variants of generalized linear model (GLM) and random forest (RF), and an Ensemble model]. We used established practices for model selection to avoid overfitting, including parameter tuning in learning methods. Models were trained on presence–background data for 171 species and tested on presence–absence data. Training and testing data were separated using both random and spatial partitioning, the latter based on 75‐km blocks. We calculated the average performance and mean rank of the methods (focussing on the area under the receiver operating characteristic and precision‐recall gain curves, and correlation) and assessed the statistical significance of the differences between them. Results: The ranking of methods did not change when evaluated on spatially separated testing data. Methods with the strongest predictive performance were nonparametric methods known to be flexible. An ensemble formed by averaging predictions of five pre‐selected modelling methods was the best model in both random and spatial partitioning, followed by MaxEnt and a variant of random forest. Main conclusions Whilst some modellers expect methods limited to simple smooth functions to predict better spatially separated data, we found no evidence of that using blocks of 75 km. We conclude that flexible models that are tuned well enough to avoid overfitting are effective at predicting to spatially distinct areas. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. On the spatiotemporal generalization of machine learning and ensemble models for simulating built‐up land expansion.

Author: Shafizadeh‐Moghadam, Hossein, Valavi, Roozbeh, Asghari, Ali, Minaei, Masoud, and Murayama, Yuji
Subjects: *ARTIFICIAL neural networks, *GENERALIZATION, *SUPPORT vector machines, *REGRESSION trees, *RANDOM forest algorithms, *MACHINE learning
Abstract: This study evaluates the spatiotemporal generalization of statistical and machine learning models to simulate built‐up land expansion and compare it to ensemble approaches. Integrated with cellular automata, six individual models—artificial neural networks, support vector machines (SVM), random forest (RF), boosted regression trees, the generalized additive model, the lasso, and two ensemble approaches called ensemble median and ensemble weighted area under the curve—were implemented. Each model was calibrated based on data from 1975–1990, and their extrapolation power was evaluated for 1990–1996, 1996–2000, 2000–2011, and 2011–2017. Total operating characteristics revealed that the RF model achieved the highest calibration accuracy and the highest performance loss during the validation period. The lowest calibration accuracy was related to the SVM model, yet its performance during the validation period increased. In the third time interval (1996–2002), the highest accuracy was again related to the SVM model. A sharp drop in simulation accuracy was seen in all models during the fourth (2002–2011) and fifth intervals (2011–2017). None of the ensemble models appeared to be superior to the individual models. Further, the accuracy of built‐up land expansion models drops noticeably for long‐term simulations. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

3. Predictive performance of presence‐only species distribution models: a benchmark study with reproducible code.

Author: Valavi, Roozbeh, Guillera‐Arroita, Gurutzeta, Lahoz‐Monfort, José J., and Elith, Jane
Subjects: *SPECIES distribution, *RANDOM forest algorithms, *REGRESSION trees, *SUPPORT vector machines
Abstract: Species distribution modeling (SDM) is widely used in ecology and conservation. Currently, the most available data for SDM are species presence‐only records (available through digital databases). There have been many studies comparing the performance of alternative algorithms for modeling presence‐only data. Among these, a 2006 paper from Elith and colleagues has been particularly influential in the field, partly because they used several novel methods (at the time) on a global data set that included independent presence–absence records for model evaluation. Since its publication, some of the algorithms have been further developed and new ones have emerged. In this paper, we explore patterns in predictive performance across methods, by reanalyzing the same data set (225 species from six different regions) using updated modeling knowledge and practices. We apply well‐established methods such as generalized additive models and MaxEnt, alongside others that have received attention more recently, including regularized regressions, point‐process weighted regressions, random forests, XGBoost, support vector machines, and the ensemble modeling framework biomod. All the methods we use include background samples (a sample of environments in the landscape) for model fitting. We explore impacts of using weights on the presence and background points in model fitting. We introduce new ways of evaluating models fitted to these data, using the area under the precision‐recall gain curve, and focusing on the rank of results. We find that the way models are fitted matters. The top method was an ensemble of tuned individual models. In contrast, ensembles built using the biomod framework with default parameters performed no better than single moderate performing models. Similarly, the second top performing method was a random forest parameterized to deal with many background samples (contrasted to relatively few presence records), which substantially outperformed other random forest implementations. We find that, in general, nonparametric techniques with the capability of controlling for model complexity outperformed traditional regression methods, with MaxEnt and boosted regression trees still among the top performing models. All the data and code with working examples are provided to make this study fully reproducible. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

4. Modelling species presence‐only data with random forests.

Author: Valavi, Roozbeh, Elith, Jane, Lahoz‐Monfort, José J., and Guillera‐Arroita, Gurutzeta
Subjects: *RANDOM forest algorithms, *SPECIES distribution, *REGRESSION trees, *SPECIES, *ALGORITHMS
Abstract: The random forest (RF) algorithm is an ensemble of classification or regression trees and is widely used, including for species distribution modelling (SDM). Many researchers use implementations of RF in the R programming language with default parameters to analyse species presence‐only data together with 'background' samples. However, there is good evidence that RF with default parameters does not perform well for such 'presence‐background' modelling. This is often attributed to the disparity between the number of presence and background samples, also known as 'class imbalance', and several solutions have been proposed. Here, we first set the context: the background sample should be large enough to represent all environments in the region. We then aim to understand the drivers of poor performance of RF when models are fitted to presence‐only species data alongside background samples. We show that 'class overlap' (where both classes occur in the same environment) is an important driver of poor performance, alongside class imbalance. Class overlap can even degrade performance for presence–absence data. We explain, test and evaluate suggested solutions. Using simulated and real presence‐background data, we compare performance of default RF with other weighting and sampling approaches. Our results demonstrate clear evidence of improvement in the performance of RFs when techniques that explicitly manage imbalance are used. We show that these either limit or enforce tree depth. Without compromising the environmental representativeness of the sampled background, we identify approaches to fitting RF that ameliorate the effects of imbalance and overlap and allow excellent predictive performance. Understanding the problems of RF in presence‐background modelling allows new insights into how best to fit models, and should guide future efforts to best deal with such data. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

4 results on '"Valavi, Roozbeh"'

1. Flexible species distribution modelling methods perform well on spatially separated testing data.

2. On the spatiotemporal generalization of machine learning and ensemble models for simulating built‐up land expansion.

3. Predictive performance of presence‐only species distribution models: a benchmark study with reproducible code.

4. Modelling species presence‐only data with random forests.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

4 results on '"Valavi, Roozbeh"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources