Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illumina hangs on startup with large config #5

Open
NiekD opened this issue Aug 2, 2016 · 17 comments
Open

Illumina hangs on startup with large config #5

NiekD opened this issue Aug 2, 2016 · 17 comments

Comments

@NiekD
Copy link

NiekD commented Aug 2, 2016

Illumina stopped working on my Android devices (phone and tablet). When I try to start it, the progress bar at the bottom just keeps going forever.
I have a rather big config with over 100 rules, over 90 devices, of which almost 60 in the GUI (in 8 tabs).

I assumed that the issue is related to some recent change(s) in my config. In order to be able to remove devices that I added/changed recently, I started by removing all rules from the config.
Then before removing any device I restarted Illumina and it worked! So apparently the problem is not related to my devices. To eliminate the possibility that frequent changes caused by the rules were causing the issue, I put the rules back in, but disabled them all. Again Illumina failed to start.

I don't think Illumina will do anything with the rules part of the config, so it seems to me that Illumina cannot handle my config just because of its total size (almost 52K).

By trial and error I found that Illumina stops working when my config file size is over 30.759 bytes.
Via the API the size of the config is quite a bit smaller then (20.300 bytes) because it has no whitespace and I guess the real issue is not with the file size, but with a config bigger than 20.300 via the API.

Another possible cause could be a time out. The bigger the config the longer it takes to load. Instead of some sort of overflow, could there be an unhandled time out when a very large config is loaded by Illumina?

@CurlyMoo
Copy link

CurlyMoo commented Aug 2, 2016

@joachimeichborn do you have time again to take a look?

@joachimeichborn
Copy link

Could you please attach the configuration that made Illumina crash? I will then try to find some time to look into it.

@NiekD
Copy link
Author

NiekD commented Aug 6, 2016

I created a "crippled" version of my config without sensitive data and without rules, beacause these heavily rely on protocols, functions and actions I created myself.

To get a config size at which Illumina stops working I added stuff to the registry section as CurlyMo suggested.
Pilight runs fine with the attached config, but Illumina won't start. If I take out one "filler" from the registry section, Illumina runs ok.

I hope this is helpful.

Attached zip:
config_min.zip

@joachimeichborn
Copy link

After debugging it for a while, I found that actually the server sends an incomplete configuration message. The app than gets stuck because it keeps waiting for a well-formatted configuration information forever.

I have attached a wireshark protocol of the traffic when I try to connect to pilight with the app. You can see that packets 7 to 31 contain most of the configuration (only those with target ip x.x.x.45 are relevant) However, packet 31 ends somewhere in the configuration and there is no packet that completes the config data. The next package directed to the app is an update message (number 37). So the app receives a message that consists of a a huge configuration string that stop at some point concatenated with the first update message. This is of course not valid JSON, thus it is ignored and the app gets stuck.

@CurlyMoo Could you file a pilight bug report with that information? Do you need anything else?

protocol.pcapng.zip

@CurlyMoo
Copy link

@joachimeichborn Thank you for the work!

@NiekD Can you retrieve the full config through the webgui?

@NiekD
Copy link
Author

NiekD commented Aug 18, 2016

The webgui works perfectly with this config and http://<pilight_ip>:5001/config returns the full config. Just to make sure it is not corrupted somewhere in the middle, I copied the output to a temporary config file and pilight runs fine with that file. So it is definitely not corrupted or incomplete.

@CurlyMoo
Copy link

And pilight-control?

@NiekD
Copy link
Author

NiekD commented Aug 18, 2016

pilight-control works fine too. I tried a couple of devices and I can switch them on and of normally.

@CurlyMoo
Copy link

They all use the same socket logic under the hood.

@NiekD
Copy link
Author

NiekD commented Aug 18, 2016

I am still thinking that it may be a timing issue. Could it be that it takes too long for Android to swallow the whole config and that the transfer then stops somewhere in the middle?

@CurlyMoo
Copy link

I don't have enough android / java knowledge to know....

@NiekD
Copy link
Author

NiekD commented Aug 18, 2016

I also don't have that knowledge unfortunately....

@joachimeichborn
Copy link

I added the following line in socket.c right after the new value for x is computed in the while loop (line 381): logprintf(LOG_INFO, "ptr: %i n: %i x: %i BUFFER_SIZE: %i", ptr, n, x, BUFFER_SIZE);

What I got in the log is:
[Aug 19 23:06:11:421175] pilight-daemon: DEBUG: socket recv: {"action":"request config"}
[Aug 19 23:06:11:424597] pilight-daemon: INFO: ptr: 0 n: 32098 x: 1025 BUFFER_SIZE: 1025
[Aug 19 23:06:11:424682] pilight-daemon: INFO: ptr: 1025 n: 32098 x: 1025 BUFFER_SIZE: 1025
...
[Aug 19 23:06:11:426077] pilight-daemon: INFO: ptr: 26650 n: 32098 x: 1025 BUFFER_SIZE: 1025
[Aug 19 23:06:11:426121] pilight-daemon: INFO: ptr: 27675 n: 32098 x: 1025 BUFFER_SIZE: 1025
[Aug 19 23:06:11:426154] pilight-daemon: INFO: ptr: 28537 n: 32098 x: 1025 BUFFER_SIZE: 1025
[Aug 19 23:06:11:426184] pilight-daemon: DEBUG: socket write failed: {"message":"config","config":{"devices"....

You can see that the difference between the last two ptr values is only about 800 instead of the expected 1025. I also retrieved the errno that was set by send and it was 11. Google told me that this happens if the send buffer is full and the socket is set to non-blocking mode.

I think the Java code currently tries to read the whole config as once what is clearly not possible when it is larger than the send buffer. So I guess on the C side, send puts in the buffer whatever fits in. So the last successful call returns already less bytes that were send and the next one fails. I will see if I can fix the Java code to process any data as soon as it arrives, but I think it would be a good idea to set the socket to non-blocking mode on the C side anyway - what do you think @CurlyMoo ?

@CurlyMoo
Copy link

but I think it would be a good idea to set the socket to non-blocking mode on the C side anyway

No, because that would mean we need many many threads for the application to keep responsive.

@joachimeichborn
Copy link

I have improved the Java code but the problem remains the same. That's not astonishing as the send buffer can be filled faster than the data is actually send. So for large enough messages we have to expect that problem.

I can see two ways how to deal with it:

  1. React to the "would block" errno in a proper way such that the whole message is send eventually
  2. Always send in blocking mode. I am not aware of the pilight server architecture but I can't see why this would mean to have a really large number of threads. Shouldn't it be enough to have one thread per client that queries an associated queue of outgoing messages and processes them one after another in blocking mode?

@CurlyMoo
Copy link

Shouldn't it be enough to have one thread per client that queries an associated queue of outgoing messages and processes them one after another in blocking mode?

Yes, ofc. But the app is not the only socket operation pilight does. So it would mean a thread for all blocking socket operations.

@NiekD
Copy link
Author

NiekD commented Sep 15, 2016

I found out that the pilight app made by Zatta works with my config. So apparently that app is able to handle large configs. Unfortunately it is not useful for me because it does not support labels, but maybe looking at the way that app loads the config can help to find a solution for Illumina.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants