Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service instable #243

Open
emcodemall opened this issue Aug 29, 2018 · 9 comments
Open

Service instable #243

emcodemall opened this issue Aug 29, 2018 · 9 comments

Comments

@emcodemall
Copy link

Hey,

in my application, there are 2 editors and 3 viewers on the same document connected simultanously. It looks like after the 2 editors did their work for about 2 hous, inputting about 500 lines, the server application does not respond anymore a few hours later.
I am working on this now for some months, got a really performant server 8 cores, 12gb ram and ssd. With the new server, the stability improved from about 1-2 hours concurrent usage to 3 hours concurrent usage plus 6 hours running idle after the usage is over.

What i wonder of is, how do you run this for jetpad.net? Do you periodically restart the server or use a special version? ...i just checked out the latest version that was tagged with "version bump" (as there is no "release" afaik).

Cheers!
Harald

@pablojan
Copy link
Contributor

Hi Harald, thanks again for struggling with this matter.

After reading your old comments weeks ago I just could test the stability of the server while is idle. I didn't see deterioration of heap and cpu in the JVM.

Jetpad is rebooted frequently by the operators. So I can't see the issue you are pointing out, but of course it happens from time to time.

I really would like to help you to tackle this issue. First I would monitor the server's JVM using jconsole. Could you do it? I suspect that there is a memory leak and heap gets exhausted.

In addition, it is very important to tweak server's thread configuration. Are you using the default values? See the "threads" section of the config/reference.conf file

The latest commit on master has a lot of small fixes and improvements, it should be more stable but doesn't include any specific change on performance. The tag is: https://github.com/P2Pvalue/swellrt/releases/tag/2.0.0-beta

We could discuss and work together all this by chat/conference if you like. If we found the cause I will be happy to patch the server quickly.

  1. To remote jconsole monitoring, run JVM with following options:
    -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5000 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

@emcodemall
Copy link
Author

emcodemall commented Aug 29, 2018

Thanks for your promt response and your will to help, now you got me enthusiastic again too. Sure we can get more direct contact, just give me some details on [email protected].

I am currently checking out the suggested version, added the jmx options to the gradlew.bat "DEFAULT_JVM_OPTS". Compiling dev version right now and changing my code to support the newer version...

@pablojan
Copy link
Contributor

pablojan commented Aug 29, 2018 via email

@pablojan
Copy link
Contributor

Results from test with 4 users, 2 of them writing continuously with automatic script (see project swellrt-selenium):

Initial Server boot, New Wave loaded => heap ~90MB
Test run (~45min) => heap grows up to ~400MB (see picture 1)
Server reboot, no waves loaded => heap ~ 90MB
Wave reloaded => heap 300MB, then GC adjust to 200MB (wave in-memory size is ~100MB)

Picture 1: heap memory during test
imagen

Picture 2: heap memory after test end, server reboot and wave loaded again
imagen

@pablojan
Copy link
Contributor

pablojan commented Sep 2, 2018

I have created a new tag with some improvements regarding memory consumption:
https://github.com/SwellRT/swellrt/releases/tag/2.0.1-beta

  • Fix deltas in memory collection bug
  • Disable user's presence tracking defaults (user presence feature can make heavy use of transient wavelet)
  • Configurable user presence event rate
  • Safer rate control of caret update events (caret update events was doing heavy use of transient wavelet)
  • Properly clean deltas cache (cached deltas in memory were not properly flushed after being persisted)
  • Store transient data in db to reduce memory use

Repeating the previous test I got following results (I shortened the test time due to obvious results...)

captura de pantalla de 2018-09-02 13-12-55

@emcodemall
Copy link
Author

emcodemall commented Sep 2, 2018 via email

@pablojan
Copy link
Contributor

pablojan commented Sep 2, 2018 via email

@emcodemall
Copy link
Author

emcodemall commented Oct 24, 2018

Hi!

using the dev build you configured not to send any annotations about a week ago, i recognized the latest service dysfunctions (clients disconnected, java high memory and cpu usage) with this process:

grafik

2 swell instances were configured with jmx, 2 not.
The bad news is that one swell instance that did not have jmx enabled also showed high memory and cpu usage and disconnected clients after 1 hour of usage. Unfortnately i needed to restart the processes immediately for production so i could not collect any more evidence.

This was done having 2 editors and 2 viewers online at the same document concurrently.

At the time when the clients disconnect, it looks like only the java process with jmx has high memory and cpu usage.

Anyway, i'll keep trying to collect evidences.
Also i'll try to disable jmx so there is no extra java instanace running. It would be cool to even save more memory and also disable the gradle instance.

@pablojan
Copy link
Contributor

pablojan commented Nov 11, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants