Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement garbage collecting of old issues #98

Open
rashadg1030 opened this issue Jul 7, 2019 · 7 comments · May be fixed by #147
Open

Implement garbage collecting of old issues #98

rashadg1030 opened this issue Jul 7, 2019 · 7 comments · May be fixed by #147
Assignees
Labels
github Synchronization with GitHub, parsing content from GitHub
Milestone

Comments

@rashadg1030
Copy link
Collaborator

This will require to fetch all closed issues and delete all closed issues from our database.

@rashadg1030 rashadg1030 self-assigned this Jul 7, 2019
@rashadg1030 rashadg1030 added the github Synchronization with GitHub, parsing content from GitHub label Jul 7, 2019
@chshersh chshersh added this to the Sync milestone Jul 10, 2019
@rashadg1030
Copy link
Collaborator Author

I'm gonna use the updated_at field to determine if the issue should be deleted. What should time limit between updates be? If the issue isn't updated in 14 days or so or less?

@chshersh
Copy link
Contributor

@rashadg1030 This issue is a bit more complicated. Open issues should never be deleted, no matter how old they are. Old here means is no longer open. So you need to fetch all closed issues and delete all of them from our DB.

@rashadg1030
Copy link
Collaborator Author

@chshersh got it, make sense. Now how do I determine if an issue in the DB has closed or not? Do I just fetch all closed Haskell issues and delete the ones that match in the DB?

@chshersh
Copy link
Contributor

Do I just fetch all closed Haskell issues and delete the ones that match in the DB?

Yep, exactly. This should work for now. But architecture behind this implementation is not that straightforward. If you will repeatably call deleteClosedIssues and upsertIssues in forever loop, DB won't like it...

@rashadg1030
Copy link
Collaborator Author

@chshersh Can you explain why forever is bad in this situation? I'm curious.

@chshersh
Copy link
Contributor

Well, it's just bad in general for performance to do concurrently batch updates of different types and perform SELECT queries on a single time very fast. I'm not sure, but there's a chance that our sync can be very fast (like 5-20 minutes or even faster). Updating table with such frequency might be a high load for DB. But now I think that we don't need to do concurrent garbage collecting (this was my implicit assumption at first). The algorithm can be:

  1. Upsert repos
  2. Delete closed issues
  3. Upsert open issues

@rashadg1030
Copy link
Collaborator Author

@chshersh Sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
github Synchronization with GitHub, parsing content from GitHub
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants