Thread local prngs #4331

graydon · 2024-05-22T00:27:09Z

This moves the global PRNG to thread-local, removes the separate PRNG inside the QIC, and also moves the global signature-verification cache to be thread-local (it probably doesn't have to be, but less inter-thread pollution and contention is probably better).

This is motivated by some work going on in #4258 -- where that PR moved the global signature cache to have yet another sub-PRNG, I decided to sketch out what it might look like to go the other way and just make the global PRNGs (and the signature cache) all thread-local. It seems to me a nicer approach, and more likely to avoid having to plumb PRNG-seeding paths through future stuff that happens to take random decisions. But it might also strike some as distasteful. Happy to discuss!

SirTyson

Overall LGTM, but I have a question regarding the same seed on multiple different gRandomEngine instances.

SirTyson · 2024-07-19T18:59:38Z

src/util/Math.cpp

 #include <numeric>
 #include <set>

 namespace stellar
 {

-stellar_default_random_engine gRandomEngine;
+thread_local stellar_default_random_engine
+    gRandomEngine(getLastGlobalStateSeed());


If I understand correctly, it looks like it's very likely non-main threads will all get seeded with the same initial seed (unless reinitializeAllGlobalStateWithSeedInternal is called in between the threads spinning up). Is this an issue? I can imagine two potential issues:

Attacker observes a PRNG value on thread 0 and now has advance knowledge of the next PRNG value on thread 1. (seems highly unlikely)

Some work is being done. Part 1 of the work occurs on thread 0 and calls randomGenerate for some value. Part 2 of the work occurs on thread 1 and calls randomGenerate again, but receives the same value as before. Could this break an assumption that the work receives two different PRNG values?

Or am I like super overthinking this since it's PRNG and not RNG anyway...

I am not too worried about this for production: because we don't spin up threads on demand, fairly quickly threads will diverge (under the assumption that we generate a fairly good number of random numbers).

For tests, it's a different story: tests tend to spin up some "app" that will spin up threads, making it that now threads will be seeded the same way and will basically behave the exact same way in the context of that test. This has the potential to hide a large class of bugs.

A way to mitigate all those issues may just be to make seeds between threads different (but deterministic): when spinning up a new thread, generate a new seed from the main thread and use that seed to initialize the thread.

I guess this may be a new thing: until recently prngs were not really impacted by the scheduling of threads, now, I am not sure what the story is (and having TLS may be the way to get rid of the dependency with the OS scheduler).

graydon added 3 commits May 21, 2024 16:01

Make gRandomEngine thread_local

aa6749a

Make gVerifySigCache thread_local

18ad150

Remove separate PRNG from QIC since PRNG is now TLS

59e6e07

graydon requested review from marta-lokhova, SirTyson and MonsieurNicolas May 22, 2024 00:27

SirTyson approved these changes Jul 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread local prngs #4331

Thread local prngs #4331

graydon commented May 22, 2024

SirTyson left a comment

SirTyson Jul 19, 2024

MonsieurNicolas Jul 19, 2024

Thread local prngs #4331

Are you sure you want to change the base?

Thread local prngs #4331

Conversation

graydon commented May 22, 2024

SirTyson left a comment

Choose a reason for hiding this comment

SirTyson Jul 19, 2024

Choose a reason for hiding this comment

MonsieurNicolas Jul 19, 2024

Choose a reason for hiding this comment