This repository has been archived by the owner on Nov 12, 2024. It is now read-only.
Validate numerical output produced by individual data sources #453
Labels
enhancement
New feature or request
As far as data source processing goes, we currently test each component of a DataPipeline object.
And we have a dry run in https://github.com/GoogleCloudPlatform/covid-19-open-data/blob/main/src/test/test_source_run.py to make sure, for each individual data source, that there is at least one output whose location key matches a defined regex.
But we do not validate that individual extensions of DataSource (stored in src/pipelines//.py) actually produce the proper numerical output for particular inputs.
I propose unit testing the parse_dataframes method in each data source. To make this easier, perhaps we could have a framework that accepts input and output dataframes as CSV files to make them easier to specify.
The text was updated successfully, but these errors were encountered: