Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large netCDF files and compile outputs in git history #76

Open
nbren12 opened this issue Apr 7, 2021 · 0 comments
Open

Large netCDF files and compile outputs in git history #76

nbren12 opened this issue Apr 7, 2021 · 0 comments

Comments

@nbren12
Copy link

nbren12 commented Apr 7, 2021

Describe the bug

This repo is nearly 1000 times larger than it was two years ago. A couple years ago a clone of this repo downloaded perhaps 1-2 MB of data. Now it is up nearly 1GB. This seems to be because of

  1. netCDF files intentionally checked into version control for tests (e.g. 1982a88)
  2. compiled artifacts accidentally checked into version control in prior commits. (a58453f). CI can ensure that this doesn't happen.

This large checkout size makes it difficult to download this code especially in an automated pipeline, and adds friction to working with this code base.

Unfortunately, adding data to git repositories is irreversible without rewriting the history (e.g. using git filter branch). The files remain in the .git folder even if a subsequent commit deletes the files from the working tree.

Would you be open to either

  1. Moving these data to a another location (e.g. a git submodule/FTP etc) and rewriting the history to remove them?, or
  2. using .gitattributes to remove this datafiles from the tarballs built by github, so that users can download the source quickly w/o the test data.

(2) is a lightweight solution that I have found works well with other repos with large test data checked into version control. This is an example of a .gitattributes that removes a directory of test files:

/path/to/directory/of/test/data export-ignore

To Reproduce

git clone https://github.com/NOAA-GFDL/FRE-NCtools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant