Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model training notes #25

Closed
emmamendelsohn opened this issue Mar 24, 2023 · 2 comments
Closed

Model training notes #25

emmamendelsohn opened this issue Mar 24, 2023 · 2 comments

Comments

@emmamendelsohn
Copy link
Collaborator

emmamendelsohn commented Mar 24, 2023

Cross validation:

  • Spatial blocking or clustering (kmeans) in predictor space
  • Likely to drop certain proportion of negative pixels (that have never had events)
  • Likely to merge blocks/clusters to ensure positive events are represented in training splits

Addressing rarity of first outbreaks:

  1. Need to code first outbreak in a thread versus subsequent outbreaks
  2. Stratify train/test and blocking based on first and subsequent events. Evaluate model performance for each.
  3. To tune performance for first events: upweight new events in the data and/or write custom evaluation function (weighted logistic error)
@emmamendelsohn
Copy link
Collaborator Author

Relevant papers:

spatialsample package for tidymodel-friendly spatial CV: https://arxiv.org/pdf/2303.07334.pdf
waywiser package for measuring spatial error: https://github.com/ropensci/waywiser
methods of evaluating spatial transferability of model predictions: https://onlinelibrary.wiley.com/doi/10.1111/geb.13635

@emmamendelsohn
Copy link
Collaborator Author

Address now in #95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant