Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--cleanuprss option to clear caches when memory reaches a certain limit #257

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kmike
Copy link
Member

@kmike kmike commented Jul 30, 2015

See #216 (comment).


I tried it only on simple examples (render several websites with --slots=1). Some findings:

  • sometimes clearing of WebKit caches helps, but usually not. Maybe it will be more helpful for higher workloads.
  • There are 2 new dangling DelayedCall objects after each HTTP request. It seems they never get collected. Alive DelayedCall objects look like this:
<DelayedCall 0x120363fc8 [-14.0942928791s] called=1 cancelled=0 LoopingCall<10>(check_memusage, *(), **{})()>
<DelayedCall 0x11ed455f0 [-15.7240140438s] called=1 cancelled=0 Site._updateLogDateTime()>
<DelayedCall 0x120363dd0 [-15.723662138s] called=1 cancelled=0 SplashProxyServerFactory._updateLogDateTime()>
<DelayedCall 0x118d17098 [-4.09461593628s] called=0 cancelled=0 LoopingCall<10>(check_memusage, *(), **{})()>
<DelayedCall 0x118d17f80 [-3.78609609604s] called=0 cancelled=0 Site._updateLogDateTime()>
<DelayedCall 0x1203633b0 [-3.78572893143s] called=0 cancelled=0 SplashProxyServerFactory._updateLogDateTime()>
<DelayedCall 0x11f002248 [43047.961432s] called=0 cancelled=1>
<DelayedCall 0x12016a878 [43047.968272s] called=0 cancelled=1>
<DelayedCall 0x11b9e0908 [43054.045301s] called=0 cancelled=1>
<DelayedCall 0x120181200 [43078.4654751s] called=0 cancelled=1>
<DelayedCall 0x12033d320 [43080.4918439s] called=0 cancelled=0 HTTPChannel.__timedOut()>
<DelayedCall 0x120347200 [43080.5012071s] called=0 cancelled=1>

after some time the example above becomes

<DelayedCall 0x1203633b0 [5.11707305908s] called=0 cancelled=0 LoopingCall<10>(check_memusage, *(), **{})()>
<DelayedCall 0x120363d40 [-1.79515600204s] called=0 cancelled=0 Site._updateLogDateTime()>
<DelayedCall 0x11e78ae18 [-1.79356193542s] called=0 cancelled=0 SplashProxyServerFactory._updateLogDateTime()>
<DelayedCall 0x11f002248 [42987.172821s] called=0 cancelled=1>
<DelayedCall 0x12016a878 [42987.179662s] called=0 cancelled=1>
<DelayedCall 0x11b9e0908 [42993.25669s] called=0 cancelled=1>
<DelayedCall 0x120181200 [43017.6768649s] called=0 cancelled=1>
<DelayedCall 0x12033d320 [43019.7032371s] called=0 cancelled=1>
<DelayedCall 0x120347200 [43019.7126009s] called=0 cancelled=1>
  • It seems that other Python-level object are not leaking. It looks like it is not a Python-level issue which causes memory to increase.
  • It seems that memory usage increase depends on screenshot rendering. When I try to render small screenshots memory increases (and stays increased) by a small amount. But if render_all is used (and large screenshots are requested), memory is increased more and not goes back.
  • Once I observed that memory decreased from ~150MB to ~100MB when I made a request after sending Splash nothing for a long period of time (a couple minutes?).

For me it looks like increasing memory usage is caused by some WebKit or qt or pyqt internals, or maybe by fragmentation of the Python heap. I don't want to dive deeper now because we're going to replace (py)qt4 with (py)qt5 soon. After the switch it worths checking again.

@AlexIzydorczyk
Copy link

@kmike

Perhaps unrelated but my experience with this is that creating a new BrowserTab after each request seems to increase memory usage drastically. If I'm not mistaken, a new BrowserTab is created at every request.

Hacky workaround that might be useful for others include

  • Using a lua script to continuously poll a Redis queue that contains URLs to visit. This keeps the single BrowserTab object open.
  • Running a cluster of Splash docker containers behind a load balancer. A cron job then periodically just restarts these containers and scrapy handles failure (just resubmits the request the Redis queue if the docker container was killed in the process).

@kmike
Copy link
Member Author

kmike commented Jul 31, 2015

@AlexIzydorczyk I'm not sure creating BrowserTab objects per se is a problem. There is a basic stress test script which sends 1000 render requests to Splash; very simple HTML pages are rendered (served by a mock server on a localhost). If configured to send only requests which are known to be successful (i had to change MockArgs code to do that), it shows the following resuts:

Total requests: 1000
Concurrency   : 50
Elapsed time  : 9.545s
Avg time p/req: 0.010s

So creating BrowserTab, rendering a basic web page and destroying BrowserTab can be done at 100 requests per second. Memory usage is increasing, but not that much (several MBs after 1000 requests).

@kmike
Copy link
Member Author

kmike commented Jul 31, 2015

But you may be right - memory usage is still increasing, even if not much.

@pawelmhm
Copy link
Member

pawelmhm commented Aug 3, 2015

It seems that memory usage increase depends on screenshot rendering. When I try to render small screenshots memory increases (and stays increased) by a small amount. But if render_all is used (and large screenshots are requested), memory is increased more and not goes back

Hey @kmike can this be related to png rendering and handling of png by PILLOW? In our project we get images as png from splash but than convert to jpeg. We noticed really high memory usage when converting image from png to jpeg, seems like Pillow had some trouble and leaking lots of memory when opening png files. This bug was driving us crazy with @chekunkov :) I described it in following question on SO: http://stackoverflow.com/questions/30411411/why-is-pil-not-releasing-memory-after-opening-certain-png-files

It is worth checking how newly added (on branch) render.jpeg endpoint will perform in terms of memory, I wonder if it will improve performance and reduce leaks.

@pawelmhm
Copy link
Member

pawelmhm commented Aug 3, 2015

yep can confirm this looks like same bug calling render.png repeatedly caused memory to increase, you can try adding debugging code line 689 of splash/browser tab.py:

            import resource
            mem = lambda:resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
            print(mem())
            image = render_qwebpage(self.web_page, self.logger,
                                    width=width, height=height,
                                    scale_method=scale_method)
            self.store_har_timing("_onScreenshotPrepared")
            print(mem())

            result = image.to_png()
            if b64:
                result = base64.b64encode(result)
            print(mem())

this prints following output:

2015-08-03 09:35:28.645714 [-] Starting factory <splash.proxy_server.SplashProxyServerFactory instance at 0x7f6d65f9a638>
2015-08-03 09:35:31.971187 [-] 162204
2015-08-03 09:35:32.081476 [-] 194532
2015-08-03 09:35:32.237071 [-] 198484
2015-08-03 09:35:32.238780 [stats] {"maxrss": 198484, "load": [0.39, 0.22, 0.24], "fds": 76, "qsize": 0, "rendertime": 2.7946600914001465, "active": 0, "path": "/render.png", "args": {"url": ["http://www.bankier.pl/"], "render_all": ["1"], "wait": ["1"]}, "_id": 140107839068928}
2015-08-03 09:35:32.239030 [-] "127.0.0.1" - - [03/Aug/2015:07:35:31 +0000] "GET /render.png?url=http://www.bankier.pl/&render_all=1&wait=1 HTTP/1.1" 200 1494739 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36"
2015-08-03 09:35:37.512875 [-] 198484
2015-08-03 09:35:37.607046 [-] 214784
2015-08-03 09:35:37.762596 [-] 214784

tested with http://localhost:8050/render.png?url=http://www.bankier.pl/&render_all=1&wait=1

@chekunkov
Copy link
Contributor

@pawelmhm can you try the same code but with psutil for memory check?

def print_memory():
    current_process = psutil.Process(os.getpid())
    print '{} Mb'.format(current_process.get_memory_info().rss / float(2 ** 20))

@pawelmhm
Copy link
Member

pawelmhm commented Aug 3, 2015

sure @chekunkov I get similar output:

2015-08-03 10:04:41.209928 [-] Starting factory <splash.proxy_server.SplashProxyServerFactory instance at 0x7feb029c0638>
2015-08-03 10:04:43.760722 [-] 157.40625 Mb
2015-08-03 10:04:43.867976 [-] 189.57421875 Mb
2015-08-03 10:04:44.021847 [stats] {"maxrss": 198604, "load": [0.35, 0.29, 0.24], "fds": 76, "qsize": 0, "rendertime": 2.7969260215759277, "active": 0, "path": "/render.png", "args": {"url": ["http://www.bankier.pl/"], "render_all": ["1"], "wait": ["1"]}, "_id": 140647337868032}
2015-08-03 10:04:44.022119 [-] "127.0.0.1" - - [03/Aug/2015:08:04:43 +0000] "GET /render.png?url=http://www.bankier.pl/&render_all=1&wait=1 HTTP/1.1" 200 1480272 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36"
2015-08-03 10:04:45.820468 [-] 199.06640625 Mb
2015-08-03 10:04:45.922019 [-] 220.29296875 Mb
2015-08-03 10:04:46.074872 [stats] {"maxrss": 225580, "load": [0.35, 0.29, 0.24], "fds": 77, "qsize": 0, "rendertime": 1.9062600135803223, "active": 0, "path": "/render.png", "args": {"url": ["http://www.bankier.pl/"], "render_all": ["1"], "wait": ["1"]}, "_id": 140645624206688}
2015-08-03 10:04:46.075174 [-] "127.0.0.1" - - [03/Aug/2015:08:04:45 +0000] "GET /render.png?url=http://www.bankier.pl/&render_all=1&wait=1 HTTP/1.1" 200 1480272 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36"
2015-08-03 10:04:50.860760 [-] 198.41015625 Mb
2015-08-03 10:04:50.947887 [-] 214.2265625 Mb
2015-08-03 10:04:51.101028 [stats] {"maxrss": 225580, "load": [0.56, 0.33, 0.26], "fds": 77, "qsize": 0, "rendertime": 2.055345058441162, "active": 0, "path": "/render.png", "args": {"url": ["http://www.bankier.pl/"], "render_all": ["1"], "wait": ["1"]}, "_id": 140645624208416}
2015-08-03 10:04:51.101300 [-] "127.0.0.1" - - [03/Aug/2015:08:04:50 +0000] "GET /render.png?url=http://www.bankier.pl/&render_all=1&wait=1 HTTP/1.1" 200 1481625 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36"
2015-08-03 10:04:54.584678 [-] 213.6796875 Mb
2015-08-03 10:04:54.668789 [-] 217.359375 Mb

@pawelmhm
Copy link
Member

pawelmhm commented Aug 3, 2015

after further research it seems like this happens in _render_qwebpage_full

2015-08-03 10:18:10.783906 [-] Starting factory <splash.proxy_server.SplashProxyServerFactory instance at 0x7f2ea3ba4368>
2015-08-03 10:18:15.473694 [-] begin _render_qwebpage_full() 164724
2015-08-03 10:18:15.476957 [-] QPainter() created 176360
2015-08-03 10:18:15.578693 [-] about to exit _render_qwebpage_full 196764
2015-08-03 10:18:15.731776 [stats] {"maxrss": 200140, "load": [0.09, 0.31, 0.31], "fds": 76, "qsize": 0, "rendertime": 2.76542592048645, "active": 0, "path": "/render.png", "args": {"url": ["http://www.bankier.pl/"], "render_all": ["1"], "wait": ["1"]}, "_id": 139838283016240}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants