You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We still need to set up a target that selects the best parameters after cross-validation. This should be doable through tidymodels. Then we need to fit the final version of the model.
Something to look into: it's unclear whether tidymodels feeds the interaction constraints into the xgboost call (https://github.com/ecohealthalliance/open-rvfcast/blob/feature/outbreak-layer/R/model_specs.R#L25). You can potentially check this by extracting the model object from tidymodels and inspecting it. Otherwise you can look at the ceteris parabus plots - the lines should be fully parallel for the variable area, which is the variable that has the constraint on it. If the constraint is not working as expected, you may need to lift the workflow out of tidymodels.
As a conceptual note, we're including the interaction constraint to prevent area from interacting with other variables, as a way to normalize results to polygon area size. TBH, I'm struggling with the logic behind this. To me, it seems like splitting on area still enforces the relationship that greater area -> greater outbreak probability? Or perhaps the idea is that, because the area splits are independent of the other variables, the model basically generates predictions for every "level" (as defined by the splits) of area?
Below are some notes on addressing the rarity of first outbreaks. WAHIS includes the first outbreak point and subsequent outbreaks that are part of the same event. Below we have discussed ways to handle this, but I don't think it's an immediate priority.
Need to code first outbreak in a thread versus subsequent outbreaks
Stratify train/test and blocking based on first and subsequent events. Evaluate model performance for each.
To tune performance for first events: upweight new events in the data and/or write custom evaluation function (weighted logistic error)
So by specifying area in the interaction constraints, we are forcing xgboost to either split on area alone or to split on a mix of the other explanatory variables. That then means that the influence of all the other variables is independent of area, right? That seems kind of cool.
Current status (2024-06-28): we have a workflow for model splitting and fitting using
tidymodels
. There is some commented out code to create Ceteris Paribus profiles (https://github.com/ecohealthalliance/open-rvfcast/blob/feature/outbreak-layer/_targets.R#L575-L623). I think this code is working.We still need to set up a target that selects the best parameters after cross-validation. This should be doable through
tidymodels
. Then we need to fit the final version of the model.Something to look into: it's unclear whether
tidymodels
feeds the interaction constraints into thexgboost
call (https://github.com/ecohealthalliance/open-rvfcast/blob/feature/outbreak-layer/R/model_specs.R#L25). You can potentially check this by extracting the model object from tidymodels and inspecting it. Otherwise you can look at the ceteris parabus plots - the lines should be fully parallel for the variablearea
, which is the variable that has the constraint on it. If the constraint is not working as expected, you may need to lift the workflow out oftidymodels
.As a conceptual note, we're including the interaction constraint to prevent area from interacting with other variables, as a way to normalize results to polygon area size. TBH, I'm struggling with the logic behind this. To me, it seems like splitting on area still enforces the relationship that greater area -> greater outbreak probability? Or perhaps the idea is that, because the area splits are independent of the other variables, the model basically generates predictions for every "level" (as defined by the splits) of area?
Below are some notes on addressing the rarity of first outbreaks. WAHIS includes the first outbreak point and subsequent outbreaks that are part of the same event. Below we have discussed ways to handle this, but I don't think it's an immediate priority.
Relevant papers on spatial models.
The text was updated successfully, but these errors were encountered: