Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup and cleanup wp1 selections #355

Open
benoit74 opened this issue Jan 21, 2025 · 0 comments
Open

Backup and cleanup wp1 selections #355

benoit74 opened this issue Jan 21, 2025 · 0 comments
Labels
maint Maintenance tasks question Further information is requested

Comments

@benoit74
Copy link
Collaborator

Recent incident on storage node proved that we should not get for granted that we will be able to regenerate wp1 selections should they get lost.

Should we backup these selections?

These are volumes of download.openzim.org:

2.5G	./archive
31G	./nightly
41G	./wp1
8.5G	./release

archive, nightly and release are already backed-up.

Note that wp1 volume is probably expected to grow, I don't find anything to clean this up. But maybe this is just another missing part, pretty easy to fix as well.

That been said, if we look at a given folder content enwiki_2025-01, we notice that maybe we do not really need all this:

5.1G	pagelinks.tsv.zip
386M	./projects
367M	langlinks.tsv.zip
247M	pages.tsv.zip
222M	all.tsv.zip
210M	ratings.tsv.zip
147M	projects.zip
103M	pageviews.tsv.zip
 80M	redirects.tsv.zip
 64M	scores.tsv.zip
 32M	./tops
 24M	./customs
 15M	tops.zip
9.5M	customs.zip
 66K	vital.tsv.zip
 625	README

Do we really need these zips which represent the sheer volume, especially pagelinks.tsv.zip? AFAIK, we do not use them on the Zimfarm. And nobody raised an issue these were missing during the downtime.

@kelson42 this is for you to provide background / decisions

@benoit74 benoit74 added maint Maintenance tasks question Further information is requested labels Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maint Maintenance tasks question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant