Skip to content
This repository has been archived by the owner on Nov 12, 2024. It is now read-only.

Validate numerical output produced by individual data sources #453

Open
geening opened this issue Apr 16, 2021 · 0 comments
Open

Validate numerical output produced by individual data sources #453

geening opened this issue Apr 16, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@geening
Copy link
Contributor

geening commented Apr 16, 2021

As far as data source processing goes, we currently test each component of a DataPipeline object.

And we have a dry run in https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/main/src/test/test_source_run.py to make sure, for each individual data source, that there is at least one output whose location key matches a defined regex.

But we do not validate that individual extensions of DataSource (stored in src/pipelines//.py) actually produce the proper numerical output for particular inputs.

I propose unit testing the parse_dataframes method in each data source. To make this easier, perhaps we could have a framework that accepts input and output dataframes as CSV files to make them easier to specify.

@geening geening added the enhancement New feature or request label Apr 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant