Random forests with spatial proxies for environmental modelling: opportunities and pitfalls

dc.contributor.author
Milà, Carles
dc.contributor.author
Ludwig, Marvin
dc.contributor.author
Pebesma, Edzer
dc.contributor.author
Tonne, Cathryn
dc.contributor.author
Meyer, Hannah V.
dc.date.issued
2024-11-21T07:41:39Z
dc.date.issued
2024-11-21T07:41:39Z
dc.date.issued
2024
dc.identifier
Milà C, Ludwig M, Pebesma E, Tonne C, Meyer H. Random forests with spatial proxies for environmental modelling: opportunities and pitfalls. Geoscientific Model Development. 2024;17:6007-33. DOI: 10.5194/gmd-17-6007-2024
dc.identifier
1991-959X
dc.identifier
http://hdl.handle.net/10230/68762
dc.identifier
http://dx.doi.org/10.5194/gmd-17-6007-2024
dc.description.abstract
Spatial proxies such as coordinates and Euclidean distance fields are often added as predictors in random forest models; however, their suitability in different predictive conditions has not yet been thoroughly assessed. We investigated 1) the conditions under which spatial proxies are suitable, 2) the reasons for such adequacy, and 3) how proxy suitability can be assessed using cross-validation. In a simulation and two case studies, we found that adding spatial proxies improved model performance when both residual spatial autocorrelation, and regularly or randomly-distributed training samples, were present. Otherwise, inclusion of proxies was neutral or counterproductive and resulted in feature extrapolation for clustered samples. Random k-fold cross-validation systematically favoured models with spatial proxies even when not appropriate. As the benefits of spatial proxies are not universal, we recommend using spatial exploratory and validation analyses to determine their suitability, and considering alternative inherently spatial RF-GLS models.
dc.description.abstract
Carles Milà was supported by a PhD fellowship funded by the Spanish Ministerio de Ciencia e Innovación (grant no. PRE2020-092303). We also acknowledge support from grant no. CEX2018-000806-S, funded by MCIN/AEI/10.13039/501100011033, and from the Generalitat de Catalunya through the CERCA programme.
dc.format
application/pdf
dc.format
application/pdf
dc.language
eng
dc.publisher
European Geosciences Union (EGU)
dc.relation
Geoscientific Model Development. 2024;17:6007-33
dc.relation
info:eu-repo/grantAgreement/ES/2PE/PRE2020-092303
dc.rights
© Author(s) 2024. This work is distributed under the Creative Commons Attribution 4.0 License (http://creativecommons.org/licenses/by/4.0/).
dc.rights
http://creativecommons.org/licenses/by/4.0/
dc.rights
info:eu-repo/semantics/openAccess
dc.title
Random forests with spatial proxies for environmental modelling: opportunities and pitfalls
dc.type
info:eu-repo/semantics/article
dc.type
info:eu-repo/semantics/publishedVersion


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)