Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balance graph datasets #290

Open
CarlinLiao opened this issue Jan 26, 2024 · 0 comments
Open

Balance graph datasets #290

CarlinLiao opened this issue Jan 26, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@CarlinLiao
Copy link
Collaborator

Since we're adapting pathology datasets for use in machine learning, often our datasets end up being imbalanced, e.g., the treatment non-responding set ends up having 2x the number of graphs as the responding to treatment set. This can lead to model overfitting since it's seeing 2x or worse of examples from one category than the other.

I'd like to go in and add an option to the train/validation/test set split such that, if one or more classes has more examples than another class, I shunt off excess examples into what is currently the "unlabeled" class but would be renamed to the "not used for training, validation, or testing" class. This function could also be propagated to spt-plugin so forked plugins also have access to it.

@CarlinLiao CarlinLiao added the enhancement New feature or request label Jan 26, 2024
@CarlinLiao CarlinLiao self-assigned this Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant