-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache should not be racy #49
Comments
@Boshen had asked this over there:
I think both can work, it just depends. Well rather, I would never use a condvar unless you're absolutely sure you need a condvar ;) The basic question is "Mutex or message passing?" and in my opinion, mutexes tend to just be easier more often. Really just depends, if you want to do an actor-style "this task owns this data and responds to messages about it" that works fine too. |
Speaking with @anonrig in private messages, this won't actually work, because the caller can assume that after this returns true, the package is downloaded, and so that could cause issues if that work began before the task that's currently downloading it is finished doing so. So this won't work. So really what we're looking for is a sort of synchronization point, where for example things block on "everything is downloaded" rather than a bunch of individual "this is downloaded" responses. I feel like that's slightly poorly worded but I hope it makes sense. |
What about semaphores over individual packages? Or has been already discarded? 😅 |
I think this might be the best approach. |
So in #42, one issue that comes up multiple times is cache misses.
Basically,
Registrymanager::get_package
is racy:This works but has the property of a "race condition." (but not a data race!!!) If we spin up two threads, and both are looking for the same package, and thread 1 goes "1, don't have it, go to step 2 and fetch" and then thread 2 starts and does the same, we've fetched the package twice. This manifests as a cache miss.
The alternative is to cause 1 to block in some fashion. We want it to hold some sort of lock, so that if someone else wants something we're working on, we ask them to wait until it's ready. But there's not just one way to do this!
The first way, and the way that's most obvious, is that you put a mutex around the whole thing. That way, if someone is downloading a new package, you wait until they're done. The issue there is that you'd end up blocking everyone when a package is being downloaded, which kind of defeats the purpose of downloading them in paralell, haha.
So my next thought is that we need to do is change the data modelling from "package exists or not" to "package doesn't exist, package is being downloaded, package is complete in the cache." This more properly represents what's going on in the real world. But it's not actually the best way forward. Why? Well, what happens when we have a cache hit? We want to simply return the package. If we know that some other task is already downloading the package, what we actually want is to return. We have no more work to do! Sure it's not ready yet, but the other task will make sure it is.
So, I think that's the task to fix this issue. I don't think it should be a super complex diff, but I'm not going to try and write a patch right this second.
The text was updated successfully, but these errors were encountered: