Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running into GatewayResultCode.GW_RES_SINK_OUT_OF_MEMORY when sending a lot of requests #255

Open
vespakoen opened this issue Jun 23, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@vespakoen
Copy link

vespakoen commented Jun 23, 2023

Describe the bug

I am trying to "spam" MQTT with a lot of messages to send into the network, to see what the limits are.
I am running mosquitto, wp-sink-service, wp-transport-service and wp-dbus-service on a Raspberry Pi 400, with a NINA B3 USB Dongle.

I am using the latest wirepas_mqtt_library for python with something like:

        res = client.send_message(
            gw_id=parameters.gw_id,
            sink_id=parameters.sink_id,
            dest=destination,
            src_ep=parameters.src_ep,
            dst_ep=parameters.dst_ep,
            payload=bytes.fromhex("8A")
        )
        if res != GatewayResultCode.GW_RES_OK:
            print("Sending data synchronously failed: res=%s" % res)
        else:
            print("Sending data synchronously succeeded")

I have a script that spams a FastAPI endpoint which calls the code above:

#!/usr/bin/env python

import requests
from concurrent import futures
from itertools import repeat

wpparameters_list = repeat({
   "gw_id": "my_wp_gateway",
   "sink_id": "sink0",
   "dest": 1,
   "src_ep": 1,
   "dst_ep": 1
}, 500)

def make_request(wpparameters):
  r = requests.post('http://localhost:3000/execute', timeout=4000, verify=False, json=wpparameters)
  print(r.status_code)

with futures.ThreadPoolExecutor(max_workers=10) as executor:
    res = executor.map(make_request, wpparameters_list)

The res output from the backend is:

Sending data synchronously failed: res=GatewayResultCode.GW_RES_SINK_OUT_OF_MEMORY

On the Raspberry Pi logs I find a bunch of:

wp-sink-service-1       | 2023-06-23 17:30:25,100 | [ERROR] wpc:Cannot send data. Dualmcu error code: 4
wp-sink-service-1       | 2023-06-23 17:30:25,100 | [ERROR] Data:Cannot send data: 21
wp-transport-service-1  | 2023-06-23 17:30:25,102 | [ERROR] wirepas_gateway@sink_manager.py:86:Cannot send message err=21

I have been looking through the c-mesh-api code but cannot find from where the out of memory error is being returned.
And I am also wondering if I should be putting a queue in between my backend and MQTT so that messages are not coming in too fast for the sink to handle them?
I am basically looking for a way to make the sink send out as many messages as possible ;)

To Reproduce
Steps to reproduce the behavior:

  1. Start gateway services
  2. Send a whole lot of messages
  3. See error

Expected behavior
No errors? ;)

Platform details:

  • OS: *e.g. Docker on Raspberry Pi 400, Docker macOS 13.0 *
  • Gateway host: rpi
  • Python version: 3.9?
  • Docker engine: 23.0.1
  • MQTT Broker vendor: mosquitto
  • MQTT Broker version: 2.0.15
@vespakoen vespakoen added the bug Something isn't working label Jun 23, 2023
@vespakoen
Copy link
Author

After some more testing I figure out that it runs alright for a while, and then at one point it starts giving me GW_RES_SINK_OUT_OF_MEMORY, even when sending the requests slowly, so it seems that when I send a lot of requests at the same time, the usb stick / sink service(?) get's into a weird state and starts returning those errors.

Looks more like a memory leak now?

@vespakoen
Copy link
Author

Recompiled sink service with DEBUG output in data.c, which didn't bring many new insights (log below).

wp-sink-service-1       | 2023-06-23 21:24:25,814 | [DEBUG] Data:Message to send on EP 1 from EP 1 to 0xffffffff size = 1
wp-sink-service-1       | 2023-06-23 21:24:25,857 | [ERROR] wpc:Cannot send data. Dualmcu error code: 4
wp-sink-service-1       | 2023-06-23 21:24:25,857 | [ERROR] Data:Cannot send data: 21
wp-transport-service-1  | 2023-06-23 21:24:25,859 | [ERROR] wirepas_gateway@sink_manager.py:85:Cannot send message err=21
wp-transport-service-1  | 2023-06-23 21:24:25,942 | [DEBUG] wirepas_gateway@transport_service.py:509:Downlink traffic: sink0 | 16371271045505284465
wp-sink-service-1       | 2023-06-23 21:24:25,945 | [DEBUG] Data:Message to send on EP 1 from EP 1 to 0xffffffff size = 1
wp-sink-service-1       | 2023-06-23 21:24:25,958 | [ERROR] wpc:Cannot send data. Dualmcu error code: 4
wp-sink-service-1       | 2023-06-23 21:24:25,958 | [ERROR] Data:Cannot send data: 21

So I tried to run the c-mesh-api example with some more logging to rule out dbus, and I got something more interesting:

2023-06-24 01:04:35,128 | [DEBUG] SERIAL:Custom bitrate set: 115200
2023-06-24 01:04:35,131 | [INFO] wpc_int:WPC initialized
2023-06-24 01:04:36,120 | [ERROR] wpc_int:Didn't receive answer to the request 0x06 error is: -2
2023-06-24 01:04:36,120 | [INFO] wpc_int:========= confirm_size: -2
 2023-06-24 01:04:36,899 | [WARNING] SLIP:Too small packet received (size=3)
2023-06-24 01:04:37,883 | [ERROR] SLIP:Wrong CRC 0x2e3e (computed) vs 0x0d51 (received)
2023-06-24 01:04:37,883 | [ERROR] wpc_int:CRC error in confirm for request 0x04
2023-06-24 01:04:37,883 | [INFO] wpc_int:========= confirm_size: -5
 2023-06-24 01:04:37,883 | [ERROR] wpc_int:No poll answer
2023-06-24 01:04:38,623 | [INFO] Main:Stack is stopped
2023-06-24 01:04:38,890 | [ERROR] wpc_int:Didn't receive answer to the request 0x04 error is: -2
2023-06-24 01:04:38,890 | [INFO] wpc_int:========= confirm_size: -2
 2023-06-24 01:04:38,890 | [ERROR] wpc_int:No poll answer
2023-06-24 01:04:39,880 | [ERROR] wpc_int:Didn't receive answer to the request 0x0d error is: -2
2023-06-24 01:04:39,880 | [INFO] wpc_int:========= confirm_size: -2
 2023-06-24 01:04:39,880 | [ERROR] Main:Cannot set node address
2023-06-24 01:04:39,880 | [WARNING] linux_plat:Exiting dispatch thread
2023-06-24 01:04:40,870 | [ERROR] wpc_int:Didn't receive answer to the request 0x04 error is: -2
2023-06-24 01:04:40,870 | [INFO] wpc_int:========= confirm_size: -2
 2023-06-24 01:04:40,870 | [ERROR] wpc_int:No poll answer
2023-06-24 01:04:40,870 | [WARNING] linux_plat:Exiting polling thread

I guess the message is not being received, so it is retrying to send it?

@GwendalRaoul
Copy link
Contributor

Hello @vespakoen,

Why do you think it is a bug?
A Wirepas network has limitation in term of downlink messages.

Downlink packets are queued in the sink before being sent to the network and once the buffers are full, the sink will answer you that its memory is full (GW_RES_SINK_OUT_OF_MEMORY).
And depending on the network topology you have it may take time to flush them.

Best regards,
Gwendal

@vespakoen
Copy link
Author

Hi Gwendal,

Thanks for looking at this.

Sorry for marking this as a bug, it probably is just a misunderstanding on my part ;)

I am trying to figure out:

  1. where this queue is, and if it's size can be increased
  2. if there is a way to check how full the queue is, so I can slow down the "spamming & losing messages"

Maybe what I am looking for is a blocking way to use the c-mesh-api, so I can make it process as many messages as possible without buffering anything at all?

All the best,

  • Koen

@vespakoen vespakoen changed the title [BUG] Running into GatewayResultCode.GW_RES_SINK_OUT_OF_MEMORY when sending a lot of requests Running into GatewayResultCode.GW_RES_SINK_OUT_OF_MEMORY when sending a lot of requests Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants