Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webgui almost empty after update, instable #828

Closed
6 tasks
emmrichd opened this issue Mar 29, 2023 · 47 comments
Closed
6 tasks

Webgui almost empty after update, instable #828

emmrichd opened this issue Mar 29, 2023 · 47 comments
Assignees
Labels
cant_repro can not reproduce stale closed to no response / progress

Comments

@emmrichd
Copy link

Hardware

  • [ x] ESP8266
  • ESP32
  • Raspberry Pi

Modelname: ______
Retailer URL: ______

nRF24L01+ Module

  • [x ] nRF24L01+ you verified this is a Plus model capable of the required 256kBit/s mode
  • square dot indicates original Nordic Semicon chip
  • round dot indicates copy-cat / counterfeit SI labs chip

Antenna:

  • circuit board
  • [x ] external antenna
    E77D3F67-8D3E-4553-BC81-E26AC25597DC

Power Stabilization:

  • [ x] 100uF Electrolytic Capacitor
    connected between +3.3V and GND (Pin 1 & 2) of the NRF Module
  • Voltage stabilizing motherboard

After Ota update from 0.5.6, the gui is amost empty. Mqtt seems to run. Reboot does not help. Should I start from scratch?

@rmayergfx
Copy link

Which browser is used? Did you force to reload the webpage? Do you have any adblockers installed? If so, plz be sure to whitelist the ip of your AhoyDTU.

@Argafal
Copy link
Contributor

Argafal commented Mar 29, 2023

I believe that might be a known bug making a reappearance. See issues #660 and #765.

@Argafal
Copy link
Contributor

Argafal commented Mar 29, 2023

@emmrichd To rule out other reasons, maybe you could try the steps that @rmayergfx has suggested. Also starting from scratch (erase flash) will be a good idea. Please report back :)

@tastendruecker123
Copy link
Contributor

tastendruecker123 commented Mar 29, 2023

I have looked into this a bit and I think the ESP8266 is running out of memory during concurrent requests. Here's what's happening on mine when I reload /setup:

image

The response for api.js looks like this:

image

Which looks to me like it's outputting random garbage from the RAM. Sometimes api.js will load fine, but then style.css may fail in a similar fashion and the page looks like this:

image

And here's the response payload for style.css:

image

During all of this the free heap hovers around 10-11 kB. /setup is 7.4 kB, api.js is 3kB, style.css is 2.5 kB, so overall that's 12.9 kB.

Edit: One additional quirk I found is that this problem is much more likely to happen (3 in 5 reloads) and easy to reproduce if the browser is sending a cookie together with the request. In my case I'm accessing Ahoy via an external URL that used to point to a Grafana instance, so the browser was sending the Grafana session cookie to Ahoy. If I delete the cookie, it works reasonably well. If I add a cookie to the request, it fails to load properly most of the time. So in order to reproduce the problem I would suggest using the browser's dev tools to add one or two random 50-60 byte cookies for the Ahoy URL.

@lukask005
Copy link

lukask005 commented Mar 29, 2023

i have the same problem (Safari) on Chrome it's a bit better (ESP8266)

@emmrichd
Copy link
Author

emmrichd commented Mar 29, 2023

I used an iPhone with safari.
Refresh/reload did not help
More tests later.
I used the same phone for the old version without problems.
If it is a ram issue, why does the problem persist right after a esp8266 reboot?

@pschlan
Copy link

pschlan commented Mar 29, 2023

Same issue here on an ESP8266, newly flashed with current release, MacBook client (Chrome/Firefox).

@tastendruecker123
Copy link
Contributor

Can you guys try the URL in a private or incognito window? Trying to check whether it's related to cookies, or whether it's happening because Apple devices may be more aggressive about making several HTTP requests at the same time.

@pschlan
Copy link

pschlan commented Mar 29, 2023

Same issue in incognito
grafik
grafik

@pschlan
Copy link

pschlan commented Mar 29, 2023

Can't reproduce this on ESP32, by the way

@emmrichd
Copy link
Author

emmrichd commented Mar 29, 2023

Hello,
I am at home now. Windows 10 - Chrome:
One time I get an "ok view", then reload - empty again.
Uptime was only 13min, so it seems to restart regularly. This was not observed with the previous release.
MQTT data was delivered all day, though.
I would guess there is not relation to the "LED-config-bug". I did a config export, LEDs are set to 255.
Of course, I could reflash it now.
However, if it is running fine then, I can not provide any further bug observations.
So what now?

{"wifi":{"ssid":"LB30","pwd":"","dev":"AHOY-DTU2","adm":"","prot_mask":61,"dark":false,"ip":"","mask":"","dns1":"","dns2":"","gtwy":""},"nrf":{"intvl":30,"maxRetry":5,"cs":15,"ce":2,"irq":0,"sclk":0,"mosi":0,"miso":0,"pwr":2},"ntp":{"addr":"pool.ntp.org","port":123},"sun":{"lat":xx,"lon":xx,"dis":true,"offs":900},"serial":{"intvl":5,"show":false,"debug":false},"mqtt":{"broker":"192.168.180.209","port":1883,"user":"","pwd":"","topic":"inverter","intvl":0},"led":{"0":255,"1":255},"plugin":{"disp":{"type":0,"pwrSafe":false,"pxShift":false,"rotation":0,"contrast":60,"data":255,"clock":255,"cs":255,"reset":255,"busy":255,"dc":255}},"inst":{"en":false,"rstMidNight":false,"rstNotAvail":false,"rstComStop":false,"iv":[{"en":true,"name":"HM-1500-Dach","sn":xx,"yield":[0,0,0,0],"pwr":[420,420,420,420],"chName":["1","2","3","4"]},{"en":true,"name":"HM-1500-Garage","sn":xx,"yield":[0,0,0,0],"pwr":[360,360,360,360],"chName":["1","2","3","4"]},{"en":true,"name":"HM-800-Schuppen","sn":xx,"yield":[0,0,0,0],"pwr":[400,400,0,0],"chName":["1","2","",""]}]}}

@fila612
Copy link
Contributor

fila612 commented Mar 30, 2023

same here, but this was also in previous (dev) versions, maybe start round about 0.5.8x
Bildschirm­foto 2023-03-30 um 07 47 09

click in API results only "null"

by entering the settings via /setup, all the areas are empty, now WLAN, no inverter, no mqtt are shown, but the ahoy is receiving data from inverter and sending them also via mqtt.
seems that ist "only" a visual thing...

maybe a short clip shows the behaviour:

ahoy_bug.mov

@emmrichd
Copy link
Author

Hello,

I have flashed my esp8266 from scratch, including "wipe all data" via USB.
However, the odd behaviour remains, an the connection to the inverters can not established anymore.
With the last stable version, the system was working for about 4 months or so.

Dieter

@tastendruecker123
Copy link
Contributor

@emmrichd

Can you check the pin configuration? The LED pins should be set to off, not 0. I assume the problem also occurs in private/incognito mode of the browser?

@Mogdar-M
Copy link

Mogdar-M commented Mar 31, 2023

same issue here.

ESP8266 V0.6.0

access to DTU via smartphone shows same issue as access via PC.
So it cannot be the cache of the browser.
Opening the setup page takes quite some time.

After several refresh connection is back

for a moment the board entry at the footer shows ESP8266ESP8266ESP8266ESP8266ESP8266ESP8266

@Mogdar-M
Copy link

Mogdar-M commented Mar 31, 2023

image
I don't know if this is related but since update to 6.0 it happens that the total is shown even when i don't have more than one inverter

@sumerland
Copy link

sumerland commented Mar 31, 2023

Same symptoms over here with 0.6.0 on esp8266 (flashed with full wipe). Problems on several browsers and OS. I am seeing the same errors in chrome's developer tool.

I also noticed that sometimes clicking in the GUI while it is laggy can lead to a reboot of the esp8266.

EDIT: it just happened again. Reboot reason is "Software/ System restart"

@christian-karsie
Copy link

Good afternoon.

I've the same problems as written above since I've updated from 0.5.66 to 0.6.0. First I thought that it is a problem with my hardware (Wemos board) but I've then changed to an esp8266 nodemcu with the same problems.

When the GUI is not working correct and I make an ping loop to the AhoyDTU boerd (esp8266 nodemcu) I see some ping losts and after I opened the WebGUI some seconds later, the esp8266 had make a automatic reboot. So something should be buggy.

@cyrax303
Copy link

cyrax303 commented Apr 2, 2023

I have also the problem with D1Mini and 0.6.0.
Sometimes it help if I use another browser, in mos times, must reboot the Processor…

@sumerland
Copy link

Not sure if it helps debugging... The errors mentioned above also appear in 0.5.96:
image
The important difference to 0.6.0 is that with 0.5.96 the DTU does not reboot.

@tastendruecker123
Copy link
Contributor

I have looked into this a bit more. I'm writing down what I have learned so far because there doesn't seem to be an obvious quick fix, and the issue of available heap space may also be relevant in the future, so this information might continue to be useful.

I added a bit of code to output the available heap memory while a request is being processed at different stages of the request (at the beginning, before sending the response and after sending the response). On a dummy test system without an NRF connected the output looks like this: when accessing /setup:

`W: onSetup start: 13032

W: onSetup send: 12800
W: onSetup finish: 11536
W: onColor start: 11720
W: onColor send: 11528
W: onColor finish: 10936
W: onCss start: 10272
W: onCss send: 10080
W: onCss finish: 8816
W: onApiJs start: 8256
W: onApiJs send: 8040
W: onApiJs finish: 6776
W: onApi start: 13320
W: onApi send: 6912
W: onApi finish: 5752
W: onApi start: 13272
W: onApi send: 6840
W: onApi finish: 6352`

On a real system with one more inverter configured these numbers would be lower, obviously. While serving static files, the server seems to be running out of heap memory because multiple requests being processed at the same time, so the style.css or api.js requests typically fail because /setup and colors.css are still being processed. The API requests take quite a lot of heap memory as well, but they happen at a later stage, so they're not as problematic.

Possible fixes:

  • Increase the available heap memory. The ESP8266 Arduino core has an option to reduce the cache by 16kb and provide a secondary 16kb heap instead via build flags. Haven't looked at this in detail, but may be worth exploring.
  • Use browser caching for static files. Adding cache headers to the response would reduce the number of concurrent requests because the browser would only need to load these files once. However, it's important that we don't serve bad response data with a cache header so the browser doesn't cache the corrupted response.
  • Delay the response to wait for additional heap memory.
  • Combine the CSS files into one file to reduce the number of requests.
  • Use inline CSS or JS to reduce the number of requests.

@tastendruecker123
Copy link
Contributor

Some more information:

Random failures or crashes due to low heap problems seem to be pretty common with ESPAsyncWebserver if it needs to deal with several requests at once. As far as Ahoy goes, the following possible fixes seem like they'd be viable:

  • Change the order of the CSS/JS includes in the HTML files so api.js is loaded first and colors.css is loaded last. This will make the website a tiny bit slower to render, but at least if a request fails, it will just affect the color scheme, not the functionality. It's usually the last request that fails, and right now that's api.js.
  • Add cache headers to the response so the browser only has to download the CSS/JS files once. This should be safe because broken responses don't have any headers, so there's no risk of caching a bad response.
  • Add this PR to our version of ESPAsyncWebserver that has a number of improvements to prevent problems with low heap memory. This should fix the problem permanently and improve the overall stability of Ahoy.

@ziermmar
Copy link

ziermmar commented Apr 3, 2023

Similar problem here. After flashing the 0.6.0_prometheus version, the web-ui appears to become unstable after a while. Requests to the /api endpoint result in a "null" answer.

@tastendruecker123
Copy link
Contributor

Similar problem here. After flashing the 0.6.0_prometheus version, the web-ui appears to become unstable after a while. Requests to the /api endpoint result in a "null" answer.

How many inverters are associated with Ahoy, and on the 'System' page, what does it say after 'heap_free'?

@AsZork
Copy link

AsZork commented Apr 4, 2023

I added in web.h for 3 response the line
response->addHeader(F("Cache-Control"), "max-age=3600"); // only 1 Hour
for onFavicon, onCss and onColor. And the Web-Gui works again for my esp8266-systems.

@tastendruecker123
Copy link
Contributor

Which browser are you using? I did some testing with Cache-Control in web.h and found that Firefox needed additional headers to actually cache the requests (Last-Modified).

@AsZork
Copy link

AsZork commented Apr 4, 2023

I tried edge(Version 111.0.1661.62 (Offizielles Build) (64-Bit)), firefox(111.0.1 64-Bit) and Chrome(Version 111.0.5563.147 (Offizieller Build) (32-Bit)).
All three work with my HM and MI-Inverters. And yes you have to enter the pages two-times until the cached Data is loaded.

@ziermmar
Copy link

ziermmar commented Apr 4, 2023

How many inverters are associated with Ahoy, and on the 'System' page, what does it say after 'heap_free'?

That's one inverter only. Last time I checked, heap_free was at 10264. I was able to view 2-3 Pages before the DTU wasn't responding to anything at all anymore, so I had to reset it. I haven't noticed anything like it on 0.5.66. This issue definitely doesn't seem browser-cache related.

Edit: At most, caching reduces the amount of requests the webserver is receiving at a time. The underlying problem however seems to be that the webserver is struggling with to many incoming web requests.

@lumapu
Copy link
Owner

lumapu commented Apr 4, 2023

do you use the JSON API in parallel during surfing with your browser on the Ahoy-WebUI? This could be answer why the AsyncWebserver could not answer all requests.

@tastendruecker123
Copy link
Contributor

do you use the JSON API in parallel during surfing with your browser on the Ahoy-WebUI? This could be answer why the AsyncWebserver could not answer all requests.

I found that api requests aren't too critical. In my testing they start out with a free heap of 13kb or so, which dips down to about 6kb as the request is being processed. Makes sense because the code is allocating a 6kb JSON buffer.

The four simultaneous requests for the static resources are more problematic because they run in parallel:

image

The first one (to /setup) is still showing a free heap of 11.5kb (I added the heap header for debugging):

image

And this is the second request (colors.css), already down to about 2.5kb of free heap:

image

This is on a freshly booted ESP with the inverter not running. During the day it's worse. It has a single HM-1500 configured along with MQTT, nothing else.

@ziermmar
Copy link

ziermmar commented Apr 5, 2023

do you use the JSON API in parallel during surfing with your browser on the Ahoy-WebUI? This could be answer why the AsyncWebserver could not answer all requests.

At least I don't. I'm using prometheus (scraping every 30 seconds) and mqtt. Trouble only starts, when I also try to access the web ui.

@emmrichd
Copy link
Author

emmrichd commented Apr 5, 2023

I don't use the Api, but I have three inverters.

@gitty-jsu
Copy link

Same issue for me after updating to 0.6.0 but as log as I only use Firefox Browser on my iPhone, it works fine for days.
As well with PC (EDGE). Only if I start using iPhone/iPad Safari Browser, the Ahoy reboots.

@cyrax303
Copy link

cyrax303 commented Apr 7, 2023

Short info, I have installed 0.6.4 Beta and this looks very good. I can't reproduce the error anymore. I have try it with Safari and Firefox on my Mac and Safari on iPhone... If the Beta works fine with Communication to the HM, I install it also on my productive system

@fila612
Copy link
Contributor

fila612 commented Apr 8, 2023

Similar results here in 0.6.4, but as soon as I want to retrieve data via REST, ahoy always restarts - seems to be a crash.
as soon as I deactivate the REST query - system seems to be stable (no reboots).

@tastendruecker123
Copy link
Contributor

Similar results here, but as soon as I want to retrieve data via REST, ahoy always restarts - seems to be a crash. as soon as I deactivate the REST query - system seems to be stable (no reboots).

I have yet to understand why some systems are so unstable and others aren't. API requests do need about 7 kB of RAM, but on most systems that doesn't seem to be a problem.

What does your setup look like? How many inverters and what kind? Display, MQTT, Prometheus, Sunrise or any other 'options' configured?

@emmrichd
Copy link
Author

emmrichd commented Apr 8, 2023

Hello,
thank you for looking after this. I posted my config already somewhere above.
Here the summary:

  • 3 inverters (2x HM1500, 1xHM800)
  • No Display, no API No Prometheus
  • Using MQTT and Sunrise
    I have a friend with hardware identical setup, only one inverter, no issues. I tried using only one. Better, but not solved.
    Browser: I use Chrome for windows and safari for IOS, both unstable.
    I build a new system with ESP32 - no issues so far, as already mentioned by someone else.
    Regards, Dieter

@fila612
Copy link
Contributor

fila612 commented Apr 8, 2023

Similar results here, but as soon as I want to retrieve data via REST, ahoy always restarts - seems to be a crash. as soon as I deactivate the REST query - system seems to be stable (no reboots).

I have yet to understand why some systems are so unstable and others aren't. API requests do need about 7 kB of RAM, but on most systems that doesn't seem to be a problem.

What does your setup look like? How many inverters and what kind? Display, MQTT, Prometheus, Sunrise or any other 'options' configured?

Hi @tastendruecker123:
System is a D1 Mini Pro ESP8266 connected with only one Inverter (HM-700). No Display and not prometheus configured. Only mqtt and the sunrise option is used.

@tastendruecker123
Copy link
Contributor

Interesting. I assume both of you are using 0.6.3 or 0.6.4?

@gitty-jsu
Copy link

After updating to 0.6.4 it works finde fine for me on IOS

@emmrichd
Copy link
Author

emmrichd commented Apr 8, 2023

I was still on 0.6.0, Trying to update right now.

@tastendruecker123
Copy link
Contributor

Ah, that's not surprising then. 0.6.0 definitely has a low memory issue due to too many concurrent requests that's was fixed in the newer versions. I was just wondering if there's still something else going on.

@sumerland
Copy link

I am running 0.6.4 for almost 2 days and I still have occasional unnoticed reboots. Sometimes I can trigger a reboot by cycling through the top menu item, Live and System. At some point the menu tree is incomplete (only visible items are AhoyDTU, Rest API, Documentation and About) and a few seconds later the device reboots (reason Software/System restart). Heap frag is low (3) and does not increase prior to a reboot. This happens with a single inverter (HM800) and mqtt, ntp and sunrise/sunset active. Chrome/Mac but happens with Chrome/Android, too.

@fila612
Copy link
Contributor

fila612 commented Apr 8, 2023

Ah, that's not surprising then. 0.6.0 definitely has a low memory issue due to too many concurrent requests that's was fixed in the newer versions. I was just wondering if there's still something else going on.

my described behaviour was with 0.6.4, so if I use the REST query in parallel of MQTT, ahoy crashes.

@mr-p666
Copy link

mr-p666 commented Apr 21, 2023

Issue back with 0.6.9?
I had and have no problems with 0.6.7 on my 8266 but as soon as I update to the release version the UI becomes unstable again.

@benbecke
Copy link

benbecke commented Jun 13, 2023

Same Issue here with 0.6.9 running on 8266 after enabling mqtt

@lumapu
Copy link
Owner

lumapu commented Jul 3, 2023

for me it helped to reboot Ahoy after OTA upgrade. Check the heap after reboot in the system page. It should be around or below 10%

@lumapu lumapu added cant_repro can not reproduce stale closed to no response / progress labels Jul 23, 2023
@lumapu lumapu self-assigned this Jul 23, 2023
@lumapu lumapu closed this as not planned Won't fix, can't repro, duplicate, stale Jul 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cant_repro can not reproduce stale closed to no response / progress
Projects
None yet
Development

No branches or pull requests