Skip to content
This repository has been archived by the owner on Nov 12, 2024. It is now read-only.

"Snapshot ID" Field Implementation (dependency mitigation) #450

Open
a27cheung opened this issue Apr 8, 2021 · 0 comments
Open

"Snapshot ID" Field Implementation (dependency mitigation) #450

a27cheung opened this issue Apr 8, 2021 · 0 comments

Comments

@a27cheung
Copy link
Contributor

As data sources wind down (& perhaps shut down?)...

"I haven't seen any winding down yet, but yes this will be an issue in the future. In our data, we have a hack to work around data sources which stop updating which is to flag them with "skip" which prevents unit tests from failing and forces to fetch the latest valid data. But if we make any other configuration changes, the data would be lost.

A good solution to this would be to add a "snapshot ID" field which, if populated, we wouldn't even try to go to the original data source and instead we fetch the intermediate file from the last successful processing of that data source (which we have saved). Unfortunately, the intermediate files are not externally available so that makes the fetch step of the pipelines not reproducible. I don't think there's a way around that: the original data source is gone, so of course you can't reproduce our work."

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant