Slowdown server throughput #101

pakit84 · 2020-10-02T15:44:57Z

Hello,
we are doing some stress test of our webserver in order to fine tune our system.
We've notice that putting nuster in front of our tomcat server the throughput reduce by 1/5.
In order to test we have a simple endpoint: (GET) /test that does not return any content, just an http code 200 and it's not cached.
We're performing the test on a machine with 2 cpu cores and 8GB ram

We've used apache ab tool (from remote machine) to stress test the server in the following way:
ab -k -n 100000 -c 1000 xxxxx/test
so 100K request with 1000 concurrent clients.

Without using nuster we'have following results:

Concurrency Level: 1000
Time taken for tests: 19.463 seconds
Complete requests: 100000
Failed requests: 0
Non-2xx responses: 100000
Keep-Alive requests: 99453
Total transferred: 31184137 bytes
HTML transferred: 12500000 bytes
Requests per second: 5138.08 [#/sec] (mean)
Time per request: 194.625 [ms] (mean)
Time per request: 0.195 [ms] (mean, across all concurrent requests)
Transfer rate: 1564.71 [Kbytes/sec] received

With nuster in front of our server:
Concurrency Level: 1000
Time taken for tests: 98.257 seconds
Complete requests: 100000
Failed requests: 0
Non-2xx responses: 100000
Keep-Alive requests: 100000
Total transferred: 28800000 bytes
HTML transferred: 12500000 bytes
Requests per second: 1017.74 [#/sec] (mean)
Time per request: 982.572 [ms] (mean)
Time per request: 0.983 [ms] (mean, across all concurrent requests)
Transfer rate: 286.24 [Kbytes/sec] received

We're using image nuster/nuster:5.2 with following configuration:

global
        maxconn 4000
        user root
        group root
        nbproc          2
        cpu-map         1 0
        cpu-map         2 1
        daemon
        log 127.0.0.1 local0 debug
        tune.ssl.default-dh-param 2048
	nuster cache on data-size 5m dir /tmp
	#tune.ssl.cachesize 1000000
        # nuster cache on data-size 5m uri /nuster
        

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        option http-server-close
        option forwardfor
        maxconn 4000
        timeout connect 5s
        timeout client  15min
        timeout server  15min
        stats enable
        stats hide-version
        stats refresh 30s
        stats show-node

frontend http
       bind *:80
       mode http
       acl http       ssl_fc,not
       acl https      ssl_fc
       #ACL NUSTER FOR PURGING
       acl network_allowed src 127.0.0.1/8 172.17.0.1/8
       acl purge_method method PURGE
       http-request deny if purge_method !network_allowed


       bind *:443 ssl crt /etc/nuster/certs/xxxxx.pem


       http-request redirect scheme https if http
   

#ACL letsencrypt
       acl letsencrypt-acl path_beg /.well-known/acme-challenge/
       use_backend letsencrypt-backend if letsencrypt-acl



#ACLS APIs
       acl xxxx_dev      hdr_beg(host) -i xxxxx.com
       use_backend srvs_xxxx_dev if xxxx_dev





#BACKENDS APIs
backend srvs_xxxx_dev
        nuster cache on
        nuster rule xxxx ttl 60 if { path_beg /xxxxx }
        nuster rule xxxx ttl 10 if { path_beg /xxxxxx }
        nuster rule xxxx ttl 10 if { path_beg /xxxxxx }
        

        http-request replace-path ^([^\ :]*)\ /xxx/xxx(.*)$    \1\ \2
        redirect scheme https if !{ ssl_fc }
        server srvs_xxxx_dev_1 xx.xx.xx.xx:1011


#BACKEND letsencrypt
backend letsencrypt-backend
        server letsencrypt xx.xx.xx.xx:54321

Is there something wrong in this configuration?

Thank you for your help.

Regards,
Pasquale

The text was updated successfully, but these errors were encountered:

packeteer · 2020-10-03T00:54:55Z

I thought nbthread had replaced nbproc ?!?

…

On Sat, 3 Oct 2020 at 01:45, pakit84 ***@***.***> wrote: Hello, we are doing some stress test of our webserver in order to fine tune our system. We've notice that putting nuster in front of our tomcat server the throughput reduce by 1/5. In order to test we have a simple endpoint: (GET) /test that does not return any content, just an http code 200. We've used apache ab tool to stress test the server in the following way: *ab -k -n 100000 -c 1000 xxxxx/test* so 100K request with 1000 concurrent clients. Without using nuster we'have following results: Concurrency Level: 1000 Time taken for tests: 19.463 seconds Complete requests: 100000 Failed requests: 0 Non-2xx responses: 100000 Keep-Alive requests: 99453 Total transferred: 31184137 bytes HTML transferred: 12500000 bytes *Requests per second: 5138.08 [#/sec] (mean)* Time per request: 194.625 [ms] (mean) Time per request: 0.195 [ms] (mean, across all concurrent requests) Transfer rate: 1564.71 [Kbytes/sec] received With nuster in front of our server: Concurrency Level: 1000 Time taken for tests: 98.257 seconds Complete requests: 100000 Failed requests: 0 Non-2xx responses: 100000 Keep-Alive requests: 100000 Total transferred: 28800000 bytes HTML transferred: 12500000 bytes *Requests per second: 1017.74 [#/sec] (mean)* Time per request: 982.572 [ms] (mean) Time per request: 0.983 [ms] (mean, across all concurrent requests) Transfer rate: 286.24 [Kbytes/sec] received We're using image nuster/nuster:5.2 with following configuration: global maxconn 4000 user root group root nbproc 2 cpu-map 1 0 cpu-map 2 1 daemon log 127.0.0.1 local0 debug tune.ssl.default-dh-param 2048 nuster cache on data-size 5m dir /tmp #tune.ssl.cachesize 1000000 # nuster cache on data-size 5m uri /nuster defaults log global mode http option httplog option dontlognull retries 3 option redispatch option http-server-close option forwardfor maxconn 4000 timeout connect 5s timeout client 15min timeout server 15min stats enable stats hide-version stats refresh 30s stats show-node Is there something wrong in this configuration? Thank you for your help. Regards, Pasquale — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#101>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKG6CCKUOBLLTYOMNMLPILSIXYQXANCNFSM4SB3E3CQ> .

jiangwenyuan · 2020-10-03T01:04:18Z

@pakit84 Can you remove option http-server-close and try again?

pakit84 · 2020-10-03T09:05:55Z

hello @jiangwenyuan , It doesn't make so much difference, but I understand that is better to remove this option. What indeed is making a lot of difference, 10X faster is when I do the test using directly the ip address in the url

ab -k -n 100000 -c 1000 <ip_address>/users/test I can get up to 10K req/s
ab -k -n 100000 -c 1000 https://foo.com/users/test I barely get 1K req/s

jiangwenyuan · 2020-10-03T10:29:39Z

@pakit84 Seems there's no default backend when you use ip_address. Does curl <ip_address>/users/test return normal response?

pakit84 · 2020-10-03T10:49:24Z

@jiangwenyuan you're right there's no default backend to handle this request. So please ignore it.
So, to recap deactivating option http-server-close I get 1200 req/sec while without nuster 5100 req/s

jiangwenyuan · 2020-10-03T11:29:40Z

@pakit84 You mentioned that /test only returns 200 without any content. but the ab shows:

Non-2xx responses: 100000
Keep-Alive requests: 99453
Total transferred: 31184137 bytes
HTML transferred: 12500000 bytes

which is not 200. can you double check that?

dario30186 · 2020-10-03T19:23:56Z

Hi @jiangwenyuan ,
I work with @pakit84 . We have a load balancer on DigitalOcean and two servers with Nuster with maxConn 2000.
Everything works fine except when we send a push notification to all our users where we have more or less 300 req/s (according to digital ocean charts). So we never reach the limit of 2000 req/s set in nuster.

Unfortunately many requests go into timeout after 20 seconds front side.

On the server side we use Apache Tomcat and monitor every incoming request and no request is served with more than 200ms. So this means that requests that go into timeout are never actually transferred to our API by nuster.

We have checked the CPU / memory status of the server and database and the values remain very low. They never exceed 10/15 %.
So the only clue we have is nuster.

Do you have any idea what it might be?

Here is our configuration:

global
        maxconn 2000
        nbthread 4
        user root
        group root
        daemon
        log stdout format raw local0 info
        tune.ssl.default-dh-param 2048
        nuster cache on data-size 200M dir /tmp
        nuster manager on uri /nuster
        master-worker

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        option forwardfor
        maxconn 2000
        timeout connect 1min
        timeout client  1min
        timeout server  1min
        stats enable
        stats hide-version
        stats refresh 30s
        stats show-node
        stats uri /stats
        stats auth "********************"

frontend http
       bind *:80
       mode http
       acl network_allowed src 127.0.0.1/8 172.17.0.1/8
       acl purge_method method PURGE
       http-request deny if purge_method !network_allowed

      #ACL API
       acl api          hdr_beg(host) -i xxxxxxyyyyy
       use_backend srvs_bm-api if api

#BACKENDS APIs
backend srvs_bm-api
        nuster cache on
        nuster rule live ttl 10 if { path_beg xxxxxx/fixtures/live }
        nuster rule fixtures ttl 20 if { path_beg /xxxxxxx/v1/fixtures }
        server srvs_bm-api_dev_1 172.17.0.1:8080

jiangwenyuan · 2020-10-04T03:51:06Z

@dario30186 So how about the /test throughput? As I suspect the setup is incorrect(non 200) so nuster terminates requests each time, which means keep alive does not work here.

And for this timeout issue, can you enable http log ?

dario30186 · 2020-10-04T08:40:47Z

@dario30186 So how about the /test throughput? As I suspect the setup is incorrect(non 200) so nuster terminates requests each time, which means keep alive does not work here.

And for this timeout issue, can you enable http log ?

The /test throughput is a special case where the API returns 204 No Content.
Maybe Nuster terminates requests each time in that case ?
In any case, when we send push notifications and the user clicks on it when the application starts, several GET requests are made that return 200 but also a POST request that returns 204 No Content.
Do you think this POST request made in parallel by hundreds of users can cause the problem in nuster?

We already have the http log enabled I think.
We have the following conf:

        log     global
        mode    http
        option  httplog
        option  dontlognull

but we don't see any nuster log when we execute command:
docker logs NUSTER_CONTAINER

We only see haproxy logs

jiangwenyuan · 2020-10-04T10:04:25Z

The /test throughput is a special case where the API returns 204 No Content.
Maybe Nuster terminates requests each time in that case ?

No, happens when the setup is incorrect, such as no backend server, etc.

Do you think this POST request made in parallel by hundreds of users can cause the problem in nuster?

I don't think so..

Is there any logs like:

  haproxy[14389]: 10.0.1.2:33317 [06/Feb/2009:12:14:14.655] http-in \
  static/srv1 10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} \
  {} "GET /index.html HTTP/1.1"

pakit84 · 2020-10-04T16:31:08Z

Hello @jiangwenyuan

The /test throughput is a special case where the API returns 204 No Content.
Maybe Nuster terminates requests each time in that case ?

No, happens when the setup is incorrect, such as no backend server, etc.

Do you think this POST request made in parallel by hundreds of users can cause the problem in nuster?

I don't think so..

Is there any logs like:
  haproxy[14389]: 10.0.1.2:33317 [06/Feb/2009:12:14:14.655] http-in \
  static/srv1 10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} \
  {} "GET /index.html HTTP/1.1"

Hello @jiangwenyuan , no we do not see any logs like this one.
All logs have following format :

 [03/Oct/2020:18:47:55.577] http srvs_bm-api/srvs_bm-api_dev_1 0/0/0/13/14 200 20643 - - ---- 2/2/0/0/0 0/0 "GET /xxxxx/xxx/xxx HTTP/1.1"
[03/Oct/2020:18:47:55.607] http srvs_bm-api/<NUSTER.CACHE.ENGINE> 0/0/0/0/0 200 39317 - - ---- 2/2/0/0/0 0/0 "GET /xxxxx/xxx/xxx HTTP/1.1"

About /test , sorry for the confusion. It was just test that has nothing to do with the case we've in prod exposed by Dario and yes the endpoint was replying 400 because of an issue on server side, so you can ignore this point.

jiangwenyuan · 2020-10-05T00:27:05Z

@pakit84

[03/Oct/2020:18:47:55.577] http srvs_bm-api/srvs_bm-api_dev_1 0/0/0/13/14 200 20643 - - ---- 2/2/0/0/0 0/0 "GET /xxxxx/xxx/xxx HTTP/1.1"
[03/Oct/2020:18:47:55.607] http srvs_bm-api/<NUSTER.CACHE.ENGINE> 0/0/0/0/0 200 39317 - - ---- 2/2/0/0/0 0/0 "GET /xxxxx/xxx/xxx HTTP/1.1"

Exact these logs, first one is served by backend server and second one is by nuster. So I would suggest look into these bold fields which you can find definitions here

pakit84 · 2020-10-22T16:11:37Z

Hello,
sorry for the late reply.
Unfortunately we've found nothing suspicious in the log.
We were wondering if there is an option to tell nuster in case of multiple concurrent request to the same endpoint to do not send all these requests to the api server but only one and let all the others wait for the response of the one sent to the server.

Is there something configuration that allows this behavior?

jiangwenyuan · 2020-10-23T00:48:45Z

Yes, you can use wait
https://github.com/jiangwenyuan/nuster#wait-onofftime-cache-only

pakit84 · 2020-10-23T09:54:21Z

thank you , this could be really useful for us.
I'm trying to test it but it doesn't seem to work as expected.
Is it my conf ok?

I'm using this nuster image : nuster/nuster:5.2
and following configurations.
nuster rule fixtures wait on ttl 10 if { path_beg /fixtures }
I'm doing a stress test over a minute calling this endpoint with 1000 clients/s.
According the documentation with wait on I should see only one call going to the backend every 10 seconds but they are a way more. Actually I do not see differences either 'wait on' is there or not.

jiangwenyuan · 2020-10-23T11:10:24Z

@pakit84 Explained here: #91 (comment)

jiangwenyuan added the investigation label Oct 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slowdown server throughput #101

Slowdown server throughput #101

pakit84 commented Oct 2, 2020 •

edited by jiangwenyuan

packeteer commented Oct 3, 2020 via email

jiangwenyuan commented Oct 3, 2020

pakit84 commented Oct 3, 2020

jiangwenyuan commented Oct 3, 2020

pakit84 commented Oct 3, 2020

jiangwenyuan commented Oct 3, 2020

dario30186 commented Oct 3, 2020 •

edited

jiangwenyuan commented Oct 4, 2020

dario30186 commented Oct 4, 2020 •

edited

jiangwenyuan commented Oct 4, 2020

pakit84 commented Oct 4, 2020 •

edited

jiangwenyuan commented Oct 5, 2020

pakit84 commented Oct 22, 2020

jiangwenyuan commented Oct 23, 2020

pakit84 commented Oct 23, 2020

jiangwenyuan commented Oct 23, 2020

Slowdown server throughput #101

Slowdown server throughput #101

Comments

pakit84 commented Oct 2, 2020 • edited by jiangwenyuan

packeteer commented Oct 3, 2020 via email

jiangwenyuan commented Oct 3, 2020

pakit84 commented Oct 3, 2020

jiangwenyuan commented Oct 3, 2020

pakit84 commented Oct 3, 2020

jiangwenyuan commented Oct 3, 2020

dario30186 commented Oct 3, 2020 • edited

jiangwenyuan commented Oct 4, 2020

dario30186 commented Oct 4, 2020 • edited

jiangwenyuan commented Oct 4, 2020

pakit84 commented Oct 4, 2020 • edited

jiangwenyuan commented Oct 5, 2020

pakit84 commented Oct 22, 2020

jiangwenyuan commented Oct 23, 2020

pakit84 commented Oct 23, 2020

jiangwenyuan commented Oct 23, 2020

pakit84 commented Oct 2, 2020 •

edited by jiangwenyuan

dario30186 commented Oct 3, 2020 •

edited

dario30186 commented Oct 4, 2020 •

edited

pakit84 commented Oct 4, 2020 •

edited