Validate numerical output produced by individual data sources #453

geening · 2021-04-16T17:48:38Z

As far as data source processing goes, we currently test each component of a DataPipeline object.

And we have a dry run in https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/main/src/test/test_source_run.py to make sure, for each individual data source, that there is at least one output whose location key matches a defined regex.

But we do not validate that individual extensions of DataSource (stored in src/pipelines//.py) actually produce the proper numerical output for particular inputs.

I propose unit testing the parse_dataframes method in each data source. To make this easier, perhaps we could have a framework that accepts input and output dataframes as CSV files to make them easier to specify.

geening added the enhancement New feature or request label Apr 16, 2021

owahltinez assigned geening Apr 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate numerical output produced by individual data sources #453

Validate numerical output produced by individual data sources #453

geening commented Apr 16, 2021

Validate numerical output produced by individual data sources #453

Validate numerical output produced by individual data sources #453

Comments

geening commented Apr 16, 2021