Balance graph datasets #290

CarlinLiao · 2024-01-26T22:23:10Z

Since we're adapting pathology datasets for use in machine learning, often our datasets end up being imbalanced, e.g., the treatment non-responding set ends up having 2x the number of graphs as the responding to treatment set. This can lead to model overfitting since it's seeing 2x or worse of examples from one category than the other.

I'd like to go in and add an option to the train/validation/test set split such that, if one or more classes has more examples than another class, I shunt off excess examples into what is currently the "unlabeled" class but would be renamed to the "not used for training, validation, or testing" class. This function could also be propagated to spt-plugin so forked plugins also have access to it.

CarlinLiao added the enhancement New feature or request label Jan 26, 2024

CarlinLiao self-assigned this Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Balance graph datasets #290

Balance graph datasets #290

CarlinLiao commented Jan 26, 2024

Balance graph datasets #290

Balance graph datasets #290

Comments

CarlinLiao commented Jan 26, 2024