Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace PRNG with hardware RNG #4225

Merged
merged 4 commits into from
Dec 20, 2024
Merged

Conversation

DedeHai
Copy link
Collaborator

@DedeHai DedeHai commented Oct 25, 2024

Both ESP8266 and ESP32 have a hardware random register. This update makes use of that. It is slightly faster than the fastled variants but mostly it is truly random, even when the timing limitations stated in the datasheet are disregarded. Also saves a bit on code size.

  • Replaced all random8() and random16() calls with new hw_random() versions
  • Not replaced in FX where PRNG is required

I started this for the particle system, where good randomness is required and made a more elaborate version with this PR.

As a test I filled an array of 128 with random numbers from both fastled random16() and hw_random() and found that the hardware variant gives a perfectly linear distribution while fastled PRNG does not: generate lots of numbers, sort them ascending in excel, plot and add a trend line to compare, see if the trendline goes (approximately) through (0,0). hw_random() results does, fastled random does not.

Please Review the changes to make sure I did not replace where not adequate. Also: if there is a better place than fcn_declare.h to add the #include and #define I can move it.

Both ESP8266 and ESP32 have a hardware random register. This update makes use of that. It is slightly faster than the fastled variants but mostly it is truly random, even when the timing limitations stated in the datasheet are disregarded. Also saves a bit on code size.

- Replaced all random8() and random16() calls with new hw_random() versions
- Not replaced in FX where PRNG is required
@DedeHai DedeHai requested a review from blazoncek October 25, 2024 06:20
@softhack007
Copy link
Collaborator

softhack007 commented Oct 25, 2024

Hi @DedeHai,

I have two points we maybe need to discuss:

A) on esp32, is there a benefit compared to esp_random() ?

B) there was a design decision (time before you joined) to use fastled random functions in code related to effects. The reason behind was that some users wanted to have effects in sync on different devices (aka 'super-sync') which means all effects have the same timebase (strip.now) , plus they use a deterministic random function with the same seed. I see the benefit of a good random distribution , though. The previous decision for effects to use fastled random is not forever so let's discuss.

Besides much better random distribution, did you measure if performance of effects improves?

@softhack007 softhack007 added enhancement optimization re-working an existing feature to be faster, or use less memory labels Oct 25, 2024
@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 25, 2024

thanks for the feedback

  • esp_random() uses a delay loop to ensure entropy, that is what I first used and found it is not necessary for WLED but probably required for encryption purposes. also that function is not available on ESP8266.
  • I did read that PR discussion, these changes would not affect that if I am not mistaken: the random seed of the PRNG at bootup makes all calls to random8() and random16() different on different devices. The FX that require sync I left untouched (the ones that call get seed and set seed) but please correct me if I am mistaken.
  • I did not test performance and I would not expect it to be any different: in my test, calling both random16() and hw_random16() gave a maybe 5% speed boost (a few microseconds difference at 512 calls IIRC) so not noticeable if you do not call it thousands of times. But replacing all calls with the hw version makes code smaller by 300 bytes and gives true randomness.

as mentioned, I made this primarily for the particle system, where I found that random16() is just no random enough, it biases some of the effects (waterfall for example) and I have to use random() which is slower than this version. Still need to port this over and see if it makes any noticeable difference.

Copy link
Collaborator

@blazoncek blazoncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing hundreds of lines, where preprocessor can do it for you, is prone to errors.

wled00/fcn_declare.h Show resolved Hide resolved
@softhack007
Copy link
Collaborator

softhack007 commented Oct 25, 2024

the fastled PRNG is just a few lines of code. could put that in util.cpp as prandom16() or something similar and then explicitly distinct between PRNG and RNG. I can easily do that if someone can point me towards the ideas behind super-sync.

@DedeHai the topic was actually split over different "threads" of discussion. I think these are most important parts

  • Seed FastLED's RNG #3552
    • --> Introduced the idea to always use strip.now (instead of millis()) and replaced all remaining random() calls with fastled random functions
    • This means AC WLED only implemented a "lightweight" version of super sync. If I remember correctly, the main objective was to have segments in sync as much as we can, plus allow effect to do "almost the same thing" when running on different devices that are "synced"
  • The full "super sync" idea originated in the MM fork. I have to say this activity was never 100% competed - we mainly tried to sync single effects to understand what's needed to make them "super sync".

@blazoncek blazoncek dismissed their stale review October 25, 2024 12:08

Ignore requirement to use preprocessor.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Oct 25, 2024

So the supersync idea using strip.now is still at a concept stage, is that correct?
IMHO the idea is flawed. Sync would need to be frame based, not time based between different controllers. My reasoning is this: the random16() functions are called by frame, if one controller runs at 60FPS and another doing more segments for example runs at 50FPS, they will not be in sync. Or what is the idea?
To me, that is a seperate topic and does not really have anything to do with this PR. The changes to FX are easy to revert back to a PRNG, the changes I made should not affect the current state of things.

@blazoncek
Copy link
Collaborator

My reasoning is this: the random16() functions are called by frame, if one controller runs at 60FPS and another doing more segments for example runs at 50FPS, they will not be in sync.

Thank you for putting it this way. I could not figure out what I disliked about "super-sync" but it looks this is one of the things that bothered me.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Dec 20, 2024

where do we stand on this PR? The way I see it the issues raised about random vs. pseudo random and supersync are something that needs to be addressed seperately and for the time being this PR improves true randomness of the current implementation.

@willmmiles
Copy link
Collaborator

I'm happy to accept a faster random implementation. I think "Super-sync" might be better implemented by the "copy FX output" PR - it could be used to target network strip types.

@DedeHai DedeHai merged commit 5f77478 into Aircoookie:main Dec 20, 2024
19 checks passed
@softhack007
Copy link
Collaborator

softhack007 commented Dec 20, 2024

where do we stand on this PR? The way I see it the issues raised about random vs. pseudo random and supersync are something that needs to be addressed seperately

Hi @DedeHai, I think the PR is ok. To come back to the "supersync" discussion, there are several possible use cases

  1. same board, several segments (with different sizes) should show the same effect "in sync"
    --> could not be done with the "copy segment" effect, however will not work as-is today anyway, because we would need to make sure that all effects start with the same random seed (effect already use a common "clock" from strip.now).
    --> Possible, but not fully implemented yet.
  2. as a variant of 1), but with several outputs or segments having the same dimensions. To keep outputs in sync, the best way is to mirror busses using "custom bus start index".
  3. the "super-sync cluster" - running the same effect on several boards, each board controlling a small patch of the fixture. For example 9 esp32, each running a section of "octopus" on its own 24x24, giving you a 72x72 octopus effect.
    --> this was the original usecase that we explored in the MoonModules fork. This use case requires synchronization of local clocks, framerates, and random seeds.
    --> However this was at a time where a local 72x72 effect was so slow that it would crawl along at a handfull of FPS.
    --> In fact I think with the speedups we have already achieved, and (maybe) using the new async udp output driver from @troyhacks, this use case can be implemented better with one esp32-S3 running the complete effect at 128x128 (30fps), and then pushing pixels to 9 sub-devices via DDP or art-net.

My summary is that only case #1 (and maybe #2) could still be relevant to users. Case 1 could still be implemented, if we implement a "switch" in hw_random() that falls back to fastled pseudo-random when the user explicitly wants this. Plus a bit of extra code to always set the same random seed before calling each effect function.

So yes, let's proceed to merge.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Dec 21, 2024

I thought about this and came up with a possible solution:

  • add an option to "enable segment sync"
  • add functions (or maybe a class) to generate random numbers:
    • if segment sync is disabled, all effects use HW RNG
    • if sync is enabled it uses a PRNG - the prng has a current seed and a "frame seed", in service() the PRNG is reset to frame seed before calling each FX, ensuring they use the same random numbers. before calling the first FX, it advances the "frame seed"
  • PRNG will be a bit slower, on ESP32 the hardware CRC unit can be used to "shuffle" random numbers for better randomness (I used this on a STM32 once and it gives good results) -> will be a bit slower but should still be pretty fast, CRC takes 16 clock cycles for a 32bit number. ESP8266 does not have a CRC unit though so if sync between controller is implemented, sync between ESP32 and ESP8266 would not be possible
  • doing it this way may even allow particle FX to be in sync, given that the segment size is identical
  • I am not sure which FX also require the segment to be the same size for this to work

edit:
gave this some more thought and the proposed solution could be done by having two functions to get a 32bit random number: one returns the hardware register, one returns a xorshift random number which should have very good randomness at the cost of a few extra instructions compared to fastled random.
using a function pointer and a struct or class allows to choose one or the other (globally) with a "set_rng_mode()" function that assigns the correct function pointer.
all non-FX random numbers can still use the hw_randomx() functions from this PR that always use the hardware RNG, FX use rng_random(). the disadvantage is that using a function pointer, inlining is of course no longer possible, resulting in function call overhead, nothing to be too worried about though.

Questions:

  • if a "per frame" PRNG as proposed is used would that solve the sync for all or at least most FX?
  • this approach would NOT allow to sync segments in groups, for example have 4 segments with the same FX, sync 2 and 2 -> the copy segment FX could be used in such a case
  • any other requirements?

I could add this PRNG approach easily if requirements are clear, so additional comments and inputs are welcome.

edit2: for the math enthusiasts: original paper about xorshift PRNG https://www.jstatsoft.org/article/download/v008i14/916

@willmmiles
Copy link
Collaborator

  1. same board, several segments (with different sizes) should show the same effect "in sync"
    --> could not be done with the "copy segment" effect, however will not work as-is today anyway, because we would need to make sure that all effects start with the same random seed (effect already use a common "clock" from strip.now).
    --> Possible, but not fully implemented yet.

I'm curious - is there an example of an FX that uses randomness but will be "in sync" with different segment sizes? Even if using the same PRNG/seed for each frame, I'd expect that most FX will render distinctly differently if the segment sizes aren't identical, eg. particles will be placed at different locations because they modulo over the segment size, or the like.

I definitely agree that "super-sync" cases 2 and 3 are best solved with mapping (either bus tools, copy FX, DDP, or the like).

Re PRNG: My $0.02 is that -- if there is a use case for #1, above -- then we should probably always use the PRNG for FX (seeded by the hw RNG or sync'd value) rather than make dynamic function calls. Keep it simple.

@softhack007
Copy link
Collaborator

softhack007 commented Dec 21, 2024

is there an example of an FX that uses randomness but will be "in sync" with different segment sizes

Yes there are some - like multi comet (random "launch" of comets), or drip (random "drop-off"), or fireworks starburst (random "bursts"). Plus some trivial ones that are not random, but strictly time/frame dependant - like twoDots, Akemi, DNA, GEQ, ....

@willmmiles
Copy link
Collaborator

is there an example of an FX that uses randomness but will be "in sync" with different segment sizes

Yes there are some - like multi comet (random "launch" of comets), or drip (random "drop-off"), or fireworks starburst (random "bursts"). Plus some trivial ones that are not random, but strictly time/frame dependant - like twoDots, Akemi, DNA, GEQ, ....

All of those have size-dependent RNG usage:

  • Multi comet uses the strip length to decide when to launch - different lengths will diverge quickly.
  • Drip will diverge if the vertical size differs (drops will hit bottom and reset at different times). For the same vertical dimension, partial "copy fx" would be equivalent
  • Fireworks starburst count and locations depend on segment length, so launch times will diverge

Synchronized rendering (ie. keeping frame counts in sync) is another ball of wax entirely. We'd either have to decouple the model calculation from the frame output timing (ie. separate physics from rendering - would change a lot of current FX); arrange bidirectional communication so no segment or node moves on to the next frame until everyone's up to date; or loop in the FX function until it's "caught up" if we notice we've fallen behind. All of which are doable but nontrivial. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement optimization re-working an existing feature to be faster, or use less memory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants