-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread local prngs #4331
base: master
Are you sure you want to change the base?
Thread local prngs #4331
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but I have a question regarding the same seed on multiple different gRandomEngine
instances.
#include <numeric> | ||
#include <set> | ||
|
||
namespace stellar | ||
{ | ||
|
||
stellar_default_random_engine gRandomEngine; | ||
thread_local stellar_default_random_engine | ||
gRandomEngine(getLastGlobalStateSeed()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, it looks like it's very likely non-main threads will all get seeded with the same initial seed (unless reinitializeAllGlobalStateWithSeedInternal
is called in between the threads spinning up). Is this an issue? I can imagine two potential issues:
- Attacker observes a PRNG value on thread 0 and now has advance knowledge of the next PRNG value on thread 1. (seems highly unlikely)
- Some work is being done. Part 1 of the work occurs on thread 0 and calls randomGenerate for some value. Part 2 of the work occurs on thread 1 and calls randomGenerate again, but receives the same value as before. Could this break an assumption that the work receives two different PRNG values?
Or am I like super overthinking this since it's PRNG and not RNG anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too worried about this for production: because we don't spin up threads on demand, fairly quickly threads will diverge (under the assumption that we generate a fairly good number of random numbers).
For tests, it's a different story: tests tend to spin up some "app" that will spin up threads, making it that now threads will be seeded the same way and will basically behave the exact same way in the context of that test. This has the potential to hide a large class of bugs.
A way to mitigate all those issues may just be to make seeds between threads different (but deterministic): when spinning up a new thread, generate a new seed from the main thread and use that seed to initialize the thread.
I guess this may be a new thing: until recently prngs were not really impacted by the scheduling of threads, now, I am not sure what the story is (and having TLS may be the way to get rid of the dependency with the OS scheduler).
This moves the global PRNG to thread-local, removes the separate PRNG inside the QIC, and also moves the global signature-verification cache to be thread-local (it probably doesn't have to be, but less inter-thread pollution and contention is probably better).
This is motivated by some work going on in #4258 -- where that PR moved the global signature cache to have yet another sub-PRNG, I decided to sketch out what it might look like to go the other way and just make the global PRNGs (and the signature cache) all thread-local. It seems to me a nicer approach, and more likely to avoid having to plumb PRNG-seeding paths through future stuff that happens to take random decisions. But it might also strike some as distasteful. Happy to discuss!