class: center, middle, inverse, title-slide .title[ # Spatial Prediction Models in
R
] .subtitle[ ## Workshop ] .author[ ### Gabriel Carrasco-Escobar, MSc, PhD(c) ] .institute[ ### UC San Diego ] .date[ ### 2022-01-10 (updated: 2022-11-29) ] --- <style type="text/css"> # .remark-slide-content { # font-size: 28px; # padding: 20px 80px 20px 80px; # } # .remark-code, .remark-inline-code { # background: #f0f0f0; # } # .remark-code { # font-size: 24px; # } .huge .remark-code { /*Change made here*/ font-size: 200% !important; } .small1 .remark-code { /*Change made here*/ font-size: 80% !important; } .small2 .remark-code { /*Change made here*/ font-size: 65% !important; } .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } .tiny2 .remark-code { /*Change made here*/ font-size: 55% !important; } .tinyq .remark-code { /*Change made here*/ font-size: 5% !important; } .big1_t { font-size: 120% } .big2_t { font-size: 150% } .big3_t { font-size: 200% } .small0_t { font-size: 90% } .small1_t { font-size: 80% } .small2_t { font-size: 60% } .tiny1_t { font-size: 40% } .tiny2_t { font-size: 20% } .text_right { vertical-align: bottom !important; text-align: right !important; } .text_left { vertical-align: bottom !important; text-align: left !important; } .pull-left2 { float: left; width: 35%; } .pull-right2 { float: right; width: 60%; } </style> # Packages and Data - [Data processing]() ```r library(tidyverse) ``` - [Spatial data processing]() ```r library(sf) library(terra) library(raster) library(mapview) ``` - [Model construction]() ```r library(tidymodels) library(spatialsample) library(randomForest) ``` --- # Packages and Data These data were downloaded from the [Malaria Atlas Project](https://malariaatlas.org/) data repository and were originally collected as part of [a study](https://link.springer.com/article/10.1186/1475-2875-10-25) conducted in 2009. Data can be downloaded in this [link](https://raw.githubusercontent.com/HughSt/HughSt.github.io/master/course_materials/week1/Lab_files/Data/mal_data_eth_2009_no_dups.csv) (Sturrock, UCSF). ```r malaria_eth <- read.csv("/mal_data_eth_2009_no_dups.csv", header=T) ```
--- # Packages and Data ```r cases <- malaria_eth %>% st_as_sf(coords = c("longitude", "latitude"), crs = 4326, remove = F) %>% mutate(pf_pos = ifelse(pf_pos > 0, 1, 0), pf_pos = factor(pf_pos)) %>% dplyr::select(pf_pos, longitude, latitude) ```
--- # Packages and Data ```r Oromia <- getData("GADM", country="ETH", level=1) %>% st_as_sf() %>% filter(NAME_1=="Oromia") ```
--- class: inverse, center, middle # [_**Data Exploration and Processing**_]() --- # Spatial Distribution ```r ggplot() + geom_sf(data = Oromia, fill = NA) + geom_sf(data = cases, aes(col = pf_pos)) + scale_color_manual(values = c("gray", "red")) + theme_bw() ``` <img src="index_files/figure-html/unnamed-chunk-11-1.png" width="50%" style="display: block; margin: auto;" /> --- # EO Covariates .pull-left2[ [Bioclimatic variables](https://www.worldclim.org/data/bioclim.html) .tiny2[ ```r bioclim <- getData('worldclim', var='bio', res=0.5, lon=38.7578, lat=8.9806) %>% crop(Oromia) %>% mask(Oromia) ``` ] ] .pull-right2[
] --- # Data Extraction ```r data <- bind_cols(cases, extract(rast(bioclim), vect(cases))) %>% st_drop_geometry() %>% drop_na() ```
--- class: inverse, center, middle # [_**Predictive Model**_]() --- # Split Data - [Traditional CV]() ```r bad_folds <- data %>% vfold_cv(v = 5) ```
--- # Split Data - [Traditional CV]() ```r bad_folds <- data %>% vfold_cv(v = 5) ``` ![](index_files/figure-html/unnamed-chunk-19-1.png)<!-- --> --- # Split Data - [Spatial CV]() ```r good_folds <- data %>% spatial_clustering_cv(coords = c("longitude", "latitude"), v = 5) ```
--- # Split Data - [Spatial CV]() ```r good_folds <- data %>% spatial_clustering_cv(coords = c("longitude", "latitude"), v = 5) ``` ![](index_files/figure-html/unnamed-chunk-23-1.png)<!-- --> --- # Parametrization (Recipe) - [Logistic Model]() ```r lr_recipe <- workflow() %>% add_variables(outcomes = pf_pos, predictors = starts_with("bio")) %>% add_model(logistic_reg(mode = "classification", engine = "glm")) ``` <br> - [Random Forest Model]() ```r rf_recipe <- workflow() %>% add_variables(outcomes = pf_pos, predictors = starts_with("bio")) %>% add_model(boost_tree(mode = "classification", engine = "xgboost")) ``` --- # Training <br> - [Define performance metrics]() ```r metrics <- metric_set(roc_auc, accuracy, recall, precision) ``` <br> - [Train models]() ```r regular_lr <- fit_resamples(lr_recipe, bad_folds, metrics = metrics) spatial_lr <- fit_resamples(lr_recipe, good_folds, metrics = metrics) spatial_rf <- fit_resamples(rf_recipe, good_folds, metrics = metrics) ``` --- # Performance - [ Logistic Model (Traditional CV) ]() ```r collect_metrics(regular_lr) ```
--- # Performance - [ Logistic Model (Spatial CV) ]() ```r collect_metrics(spatial_lr) ```
--- # Performance - [ Random Forest Model (Spatial CV) ]() ```r collect_metrics(spatial_rf) ```
--- # Final model - [Logistic Model]() ```r fit_lr <- data %>% dplyr::select(pf_pos, starts_with("bio")) %>% glm(pf_pos ~ ., family = binomial(), data = .) ``` - [Random Forest Model]() ```r fit_rf <- data %>% dplyr::select(-c(longitude, latitude, ID)) %>% randomForest(pf_pos ~ ., data = .) ``` --- # Spatial Prediction .pull-left2[ - [Logistic Model]() .tiny2[ ```r pred_lr <- predict(model = fit_lr, object = bioclim, type="response") mapview(pred_lr) ``` ] ] .pull-right2[
] --- # Spatial Prediction .pull-left2[ - [Random Forest Model]() .tiny2[ ```r pred_rf <- 1 - predict(model = fit_rf, object = bioclim, type = "prob") mapview(pred_rf) ``` ] ] .pull-right2[
] --- class: inverse, center, middle ##### Gabriel Carrasco Escobar, MS, PhD(c) ###### [gabriel.carrasco@upch.pe]() ###### [@Gabc91]()