-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TTGO doesn't come up when flashed using Web Installer #367
Comments
Welcome. What you're seeing isn't a controlled "hey, let's reboot now".
That just a plain ole crash.
(E) (TerminateHandler)(C1) - NightDriverStrip Guru Meditation
Unhandled Exception -
Unhandled exceptions are, well, bad.
How confident are you tha A) this firmware is appropriate for your board
and B) there aren't power supply issues.
For (B), on most SBCs (well, those that don't have 8,000 light bulbs
attached to them) the two most power hungry things are writing to flash and
starting up the WiFi. If those happen and your power supply is too wimpy
(an old phone charger, wires too small/long, etc.) the board will usually
just crash and what you're describing is about the first case of both of
those being ignited at the same time. So check your power. Attach a scope
to VCC and trigger for < 4.8V or so.
For (A), it can just be a bit of frustrating hit and miss.
https://web.esphome.io/ has a bunch of ESP32 binaries that'll boot a lot of
boards, but it can still be frustrating to find what a random $4 asian
board _really_ corresponds to. It's a bit frustrating that that page leans
harder on older hardware but is pretty scant for, say ESP32-S3 boards.
I think the 'nightdriver' target is pretty specific to Dave's boards as it
relies on an exact combinatino of the mic being on these pins and the flash
being on that pin and the remote on this pin and so on. I've not had great
experiences booting it on a random board, but I've not really rolled up my
sleeves to tackle why.
FWIW, if it halves your testing matrix, you can just whack the
"USE_NETWORK' in the configuration when building firmware to see if that's
a key variable. I have it turned off in my development work just because it
takes up size and speed and I'm focused on quick testing. The info I need
comes to the serial console anyway, so the web interface doesn't help me.
Welcome and good luck!
|
Just so you have a reference for A/B testing, here's a successful boot on a
mesmerizer build on official Dave hardware:
rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:1184
load:0x40078000,len:13192
load:0x40080400,len:3028
entry 0x400805e4
E (927) esp_core_dump_flash: No core dump partition found!
E (927) esp_core_dump_flash��r���ɕ�"յ�����ѥѥ���2�չ��jR�Replacing Idle Tasks
with TaskManager...
(I) (PrintOutputHeader)(C1) NightDriverStrip
(I)
(I) (PrintOutputHeader)(C1)
------------------------------------------------------------------------------------------------------------
(I) (PrintOutputHeader)(C1) M5STICKC: 0, USE_M5DISPLAY: 0, USE_OLED: 0,
USE_TFTSPI: 0, USE_LCD: 0, USE_AUDIO: 1, ENABLE_REMOTE: 1
(I) (PrintOutputHeader)(C1) ESP32 PSRAM Init: OK
(I) (PrintOutputHeader)(C1) Version 37: Wifi SSID: "ElderOfTheInternet" -
ESP32 Free Memory: 293528, PSRAM:4192059, PSRAM Free: 4187691
(I) (PrintOutputHeader)(C1) ESP32 Clock Freq : 240 MHz
(I) (setup)(C1) Startup!
(I) (setup)(C1) Starting DebugLoopTaskEntry
> Launching JSON Writer Thread. Mem: 293492, LargestBlk: 110580, PSRAM
Free: 4187691/4192059, (W) (DeviceConfig)(C1) DeviceConfig could not be
loaded from JSON, using defaults
(W) (NotifyJSONWriterThread)(C1) >> Notifying JSON Writer Thread
Starting SmartMatrix Mallocs
Heap/32-bit Memory Available: 290068 bytes total, 110580 bytes largest free
block
8-bit/DMA Memory Available : 241292 bytes total, 110580 bytes largest free
block
Total PSRAM used: 4368 bytes total, 4187691 PSRAM bytes free
SmartMatrix Layers Allocated from Heap:
Heap/32-bit Memory Available: 288592 bytes total, 110580 bytes largest free
block
The "esp_core_dump_flash" thing looks scary. I'm pretty sure I know what
the issue is and would fix it, but I can't find the source. :-)
My PrintOutputHeader is different because I tweaked the Mesmerizer build as
I described.
The PSRAM might be a clue. If you're running a build (like Mesmerizer) on a
board that assumes less RAM and/or doesn't have external PSRAM, that's
probably not good, though I don't know the precise symptoms.
…On Wed, Jul 19, 2023 at 10:03 AM Robert Lipe ***@***.***> wrote:
Welcome. What you're seeing isn't a controlled "hey, let's reboot now".
That just a plain ole crash.
(E) (TerminateHandler)(C1) - NightDriverStrip Guru Meditation Unhandled Exception -
Unhandled exceptions are, well, bad.
How confident are you tha A) this firmware is appropriate for your board
and B) there aren't power supply issues.
For (B), on most SBCs (well, those that don't have 8,000 light bulbs
attached to them) the two most power hungry things are writing to flash and
starting up the WiFi. If those happen and your power supply is too wimpy
(an old phone charger, wires too small/long, etc.) the board will usually
just crash and what you're describing is about the first case of both of
those being ignited at the same time. So check your power. Attach a scope
to VCC and trigger for < 4.8V or so.
For (A), it can just be a bit of frustrating hit and miss.
https://web.esphome.io/ has a bunch of ESP32 binaries that'll boot a lot
of boards, but it can still be frustrating to find what a random $4 asian
board _really_ corresponds to. It's a bit frustrating that that page leans
harder on older hardware but is pretty scant for, say ESP32-S3 boards.
I think the 'nightdriver' target is pretty specific to Dave's boards as it
relies on an exact combinatino of the mic being on these pins and the flash
being on that pin and the remote on this pin and so on. I've not had great
experiences booting it on a random board, but I've not really rolled up my
sleeves to tackle why.
FWIW, if it halves your testing matrix, you can just whack the
"USE_NETWORK' in the configuration when building firmware to see if that's
a key variable. I have it turned off in my development work just because it
takes up size and speed and I'm focused on quick testing. The info I need
comes to the serial console anyway, so the web interface doesn't help me.
Welcome and good luck!
|
There may be a "cleaner" way to trigger reboots on ESP32s, but the approach taken in this project is indeed to throw an
This reboot happening makes sense if the build has been configured to require WiFi. That is the case if both So:
|
There may be a "cleaner" way to trigger reboots on ESP32s, but the
approach
ESP.restart();
is used elsewhere in the project. Is it not available here?
My work tree is unhappy enough I can't currently submit a CL in good faith,
but just plopping that into src/main.cpp compiles. Citation? Gladly.
https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/misc_system_api.html
While it may seem uncomfortable to rely on ESP-specific magic, there are
references to the IDF system libraries all over the tree - including in
main.cc
Message ID: <PlummersSoftwareLLC/NightDriverStrip/issues/367/1642474342@
… github.com>
|
Yes, it had to be something as straight-forward like that, didn't it? 🙂 I think what I mentioned as "current project MO" is a case of semi-consistently applied legacy code. Which I'll instantly grant can be replaced by more modern/now recommended approaches. However, personally I'm not going to implement that change in the context of this issue. If someone else wants to open a PR to do so, I'm very happy to review it. |
Is there a scenario in which it doesn’t restart? If the default behavior on an unhandled exception is to restart, then I think what we’re doing now is pretty clean.
We never actually restart - we just have a TerminateHandler to display a cute “Guru Meditation” and as a handy spot to set a breakpoint. We then rethrow that which was thrown. I don’t see an incentive to manually call restart,
Sheer elegance. Convince me otherwise :-)
- Dave
… On Jul 19, 2023, at 11:35 AM, Rutger van Bergen ***@***.***> wrote:
ESP.restart();
Yes, it had to be something as straight-forward like that, didn't it? 🙂
I think what I mentioned as "current project MO" is a case of semi-consistently applied legacy code. Which I'll instantly grant can be replaced by more modern/now recommended approaches. However, personally I'm not going to implement that change in the context of this issue. However, if someone else wants to open a PR to do so, I'm very happy to review it.
—
Reply to this email directly, view it on GitHub <#367 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA4HCF2YRV7RGRAOKMRJHGLXRASGBANCNFSM6AAAAAA2PTYVC4>.
You are receiving this because you are subscribed to this thread.
|
It's pretty clear that's a sucker's bet. No thanx, I'll pass.
I'll find a more exciting hill to die upon than answering a musing posted
by another developer. If not actually rebooting after printing
"rebooting..." is elegant, our zens just won't align on this.
Back to OP.
On Wed, Jul 19, 2023 at 1:39 PM David W Plummer ***@***.***>
wrote:
… Is there a scenario in which it doesn’t restart? If the default behavior
on an unhandled exception is to restart, then I think what we’re doing now
is pretty clean.
We never actually restart - we just have a TerminateHandler to display a
cute “Guru Meditation” and as a handy spot to set a breakpoint. We then
rethrow that which was thrown. I don’t see an incentive to manually call
restart,
Sheer elegance. Convince me otherwise :-)
- Dave
> On Jul 19, 2023, at 11:35 AM, Rutger van Bergen ***@***.***> wrote:
>
>
> ESP.restart();
>
> Yes, it had to be something as straight-forward like that, didn't it? 🙂
>
> I think what I mentioned as "current project MO" is a case of
semi-consistently applied legacy code. Which I'll instantly grant can be
replaced by more modern/now recommended approaches. However, personally I'm
not going to implement that change in the context of this issue. However,
if someone else wants to open a PR to do so, I'm very happy to review it.
>
> —
> Reply to this email directly, view it on GitHub <
#367 (comment)>,
or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AA4HCF2YRV7RGRAOKMRJHGLXRASGBANCNFSM6AAAAAA2PTYVC4>.
> You are receiving this because you are subscribed to this thread.
>
—
Reply to this email directly, view it on GitHub
<#367 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD3ZLOXJLPQ7POCUXYGLXRASXRANCNFSM6AAAAAA2PTYVC4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
It's a crash only insofar as it's the result of an unhandled exception, but the exception in question is caused by the lack of wifi credentials, which there's no opportunity to provide. See the last line of the log excerpt:
This issue is specific to the web installer, I'm not building from source. If I were then this wouldn't be a problem, the correct credentials would have been set in secrets.h, but this bug report is explicitly about the web installer and proposing a way to fix it so that other people don't run into this same problem when they're trying to get their boards up and running. The vast majority of people who will end up using NDL won't be building from source; as with WLED it'll be people who just want to click a button on a web installer to load a pre-compiled binary onto their ESP32 that they can immediately start using.
I gave the TTGO project just as an example, this happens for every board I've tested with and every version of the firmware available via the web installer that uses wifi. The issue isn't that the reboot is happening per se, that's intentional behaviour due to the unhandled exception being thrown; the issue is that if the board immediately reboots when wifi credentials aren't available then logically there will never be an opportunity to actually provide those credentials, making web installations effectively useless for anything other than projects that don't use wifi at all.
No, all combinations I've tried enter this reboot loop as soon as it notices there are no wifi credentials available, making it effectively impossible to send it the credentials since the Improv service isn't alive long enough to communicate with the web installer. This seems to be intended behaviour, NDL intentionally throws an exception when it can't connect to wifi, and because that exception is unhandled the correct behaviour is to restart.
Absolutely, rebooting on an unhandled exception is perfectly fine, the issue here though is that I don't believe this should be an unhandled exception in the first place - after all, how can Improv connect to the board to deliver the wifi credentials it needs if the firmware reboots the moment it discovers it has no credentials? A better way to handle it might be to simply wait for credentials to be delivered, just sit in an idle loop until Improv receives the correct RPC via serial. Otherwise I can't see how it would ever be possible to use the web installer to successfully load NDL onto a board, the only possible way to do it would be to build NDL from source with credentials preloaded in secrets.h. I should note that WLED also uses Improv in the same way you're using it, and their solution is to just continue booting up like normal and display whatever the default strip pattern is, while also exposing an access point that the user can connect to in order to use the web interface and continuing to listen for Improv RPCs over the serial port. |
@sdmtr What I'm saying is two things:
@davepl About
As I'm not sure why the "reboot if WiFi connectivity fails" behavior was originally implemented, I was wondering if you could give some input on this? |
@rbergen RE your second point, you're absolutely right. My original comment mixes observations and logs from many different tests across different boards rather than focusing on one specific test, which has created some confusion and lead to some inaccuracies on my part, sorry about that. I just did another handful of quick tests using just the TTGO board, and the ledstrip firmware is indeed the one that reboots when wifi credentials aren't present (as you correctly pointed out.) The TTGO firmware on the other hand seems to be experiencing a different problem, although I'm not sure what exactly. Here's the full log:
For the sake of clarity, this is an authentic Lilygo TTGO ESP32-DOWDQ6 board, and I selected "ESP32" as the device type and "TTGO" as the project in the web installer interface. |
Thanks @sdmtr for clearing this up. It does help focus the analysis of the problems (now plural) we are investigating. Putting the no-WiFi reboot aside for now - I think we now know what we're looking at there - I'd say the logging on the TTGO crash doesn't provide too many insights as to what's going on. The "abort()" mention at the bottom of the log doesn't help much either, as the C++ code in the project doesn't call any function by that name. I don't own any TTGO boards myself, so I can't compare what you're seeing to anything useful at my end - maybe @davepl can. A question I do have is if you've tried flashing the board using the PlatformIO route? I know the issue relates to the web installer specifically, but trying to flash the board the other way may well narrow the area that needs to be covered while investigating this. |
Thing is, I’m not aware of any instances in which we reboot that we COULD continue. About the only case of “intentional” reboot is when wifi can’t be acquired, and that’s exceptional, so it’s an exception.
As far as I know, our Improv codepath works the same way, doesn’t it?
Let me now which specific scenario you’re thinking of that should improve.
- Dave
… On Jul 19, 2023, at 11:46 PM, sdmtr ***@***.***> wrote:
What you're seeing isn't a controlled "hey, let's reboot now". That just a plain ole crash.
It's a crash only insofar as it's the result of an unhandled exception, but the exception in question is caused by the lack of wifi credentials, which there's no opportunity to provide. See the last line of the log excerpt:
(E) (TerminateHandler)(C1) Terminated due to exception: Unable to connect to WiFi, but must have it, so rebooting
FWIW, if it halves your testing matrix, you can just whack the "USE_NETWORK' in the configuration when building firmware to see if that's a key variable.
This issue is specific to the web installer, I'm not building from source. If I were then this wouldn't be a problem, the correct credentials would have been set in secrets.h, but this bug report is explicitly about the web installer and proposing a way to fix it so that other people don't run into this same problem when they're trying to get their boards up and running. The vast majority of people who will end up using NDL won't be building from source; as with WLED it'll be people who just want to click a button on a web installer to load a pre-compiled binary onto their ESP32 that they can immediately start using.
This reboot happening makes sense if the build has been configured to require WiFi. That is the case if both ENABLE_WIFI and WAIT_FOR_WIFI are defined as non-zero. In the "regular" project configurations as they stand, this is only the case for the LEDSTRIP project. That is obviously not the same as the TTGO project.
I gave the TTGO project just as an example, this happens for every board I've tested with and every version of the firmware available via the web installer that uses wifi. The issue isn't that the reboot is happening per se, that's intentional behaviour due to the unhandled exception being thrown; the issue is that if the board immediately reboots when wifi credentials aren't available then logically there will never be an opportunity to actually provide those credentials, making web installations effectively useless for anything other than projects that don't use wifi at all.
Is there a scenario in which it doesn’t restart?
No, all combinations I've tried enter this reboot loop as soon as it notices there are no wifi credentials available, making it effectively impossible to send it the credentials since the Improv service isn't alive long enough to communicate with the web installer. This seems to be intended behaviour, NDL intentionally throws an exception when it can't connect to wifi, and because that exception is unhandled the correct behaviour is to restart.
If the default behavior on an unhandled exception is to restart, then I think what we’re doing now is pretty clean.
Absolutely, rebooting on an unhandled exception is perfectly fine, the issue here though is that I don't believe this should be an unhandled exception in the first place - after all, how can Improv connect to the board to deliver the wifi credentials it needs if the firmware reboots the moment it discovers it has no credentials?
A better way to handle it might be to simply wait for credentials to be delivered, just sit in an idle loop until Improv receives the correct RPC via serial. Otherwise I can't see how it would ever be possible to use the web installer to successfully load NDL onto a board, the only possible way to do it would be to build NDL from source with credentials preloaded in secrets.h.
I should note that WLED also uses Improv in the same way you're using it, and their solution is to just continue booting up like normal and display whatever the default strip pattern is, while also exposing an access point that the user can connect to in order to use the web interface and continuing to listen for Improv RPCs over the serial port.
—
Reply to this email directly, view it on GitHub <#367 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA4HCF5NC5MNRQ2GZE4ZDILXRDH3DANCNFSM6AAAAAA2PTYVC4>.
You are receiving this because you commented.
|
My question comes from the fact that we only treat WiFi not connecting as an exception worthy of rebooting in the LEDSTRIP project, not any of the others. In all other projects, we continue trying to connect in the main.cpp loop() every so many seconds. Rebooting immediately after establishing that no credentials are present, as LEDSTRIP does, keeps the user from providing credentials via Improv. That means that for LEDSTRIP, the correct credentials have to be embedded into the image (i.e. secrets.h) for the image to work. |
Oh, ok, that’d be a bug. Please raise an issue specifically for that, and I’ll take it.
The reason LEDSTRIP reboots is that it’s remote-only, so no wifi, it’s dead in the water. But it needs to survive long enough to at least be able to set credentials!
- DAVE
… On Jul 20, 2023, at 8:27 AM, Rutger van Bergen ***@***.***> wrote:
My question comes from the fact that we only treat WiFi not connecting as an exception worthy of rebooting in the LEDSTRIP project, not any of the others. In all other projects, we continue trying to connect in the main.cpp loop() every so many seconds.
Rebooting immediately after establishing that no credentials are present, as LEDSTRIP does, keeps the user from providing credentials via Improv. That means that for LEDSTRIP, the correct credentials have to be embedded into the image (i.e. secrets.h) for the image to work.
—
Reply to this email directly, view it on GitHub <#367 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA4HCF7JC2C6PGLNZPCCUD3XRFE7VANCNFSM6AAAAAA2PTYVC4>.
You are receiving this because you were mentioned.
|
I've opened #371 for the LEDSTRIP reboot issue, and renamed this one to focus on TTGO failing to come up. |
Bug report
Problem
Howdy! I've spent the last few hours trying to flash all manner of ESP32 devices with any and all versions of NDL available via the web installer, and each time I'm unable to set the wifi credentials. As an example, if I install the TTGO firmware to a Lilygo TTGO board then once the install is complete, I'm simply dumped back at the initial screen where my only two options are to install NDL or view the console. I did manage to get the wifi setup process to begin once by carefully timing when I clicked on the "connect" button (more about that below), but the process stalled when it reached the part where it searches for available wifi networks.
Steps
Notes
When I look at the console output, I can see that NDL attempts to retrieve wifi credentials from flash memory, notices that none are set, and therefore reboots. I believe (although I could be totally wrong) that this is where the problem lies, because the Improv module simply doesn't have enough time to connect to the board, retrieve the list of available SSIDs, and receive the credentials from the user, before the device reboots and the connection is dropped. Here's an excerpt of the console logs:
As mentioned above, the one time I was able to see an option to set the wifi credentials didn't come at the end of an installation, it came when I timed clicking the "connect" button such that the web installer connected to the board during a moment of time when the Improv serial module was up and responding. As far as I can tell, the installer makes an attempt to connect via Improv at the beginning of the session, and if it succeeds then it'll read the board settings (firmware version and hardware type) and display the option to set wifi credentials. This is why I think it's a timing issue and that the forced reboot is what's causing the problem in the first place.
Proposed Solution
Don't force a reboot when wifi isn't available. As per the final line in the log above, it seems that NDL restarts the board as soon as it's unable to connect to wifi, either because credentials aren't set or the network isn't available. If that happens then there's no opportunity for the web installer to connect to the board via Improv and set the credentials, so the board is rendered useless.
(Also, I just want to say how excited I am about this project and how utterly cool it is. I've used WLED for a LOT of stuff over the last few years but it has a few idiosyncrasies that I don't love, and NDL looks like it's shaping up to be an incredible replacement going forward. I can't wait to get my hands on a Mesermerizer board and really see what it can do. Thank you so much for making this project available to us mere mortals, Dave.)
The text was updated successfully, but these errors were encountered: