Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis support (Logging of lost connections + random webhook triggers) #286

Open
michivi opened this issue Aug 16, 2022 · 1 comment
Open
Labels
bug Something isn't working

Comments

@michivi
Copy link
Contributor

michivi commented Aug 16, 2022

Describe the bug
This is basically to report some weird behaviors when using OPAL with Redis:

  • Regular reported Redis lost connections. Though it doesn't seem to prevent OPAL from actually using Redis, it still produces several lines of logs regularly.
  • The GitHub webhook is randomly dropped when using more than one worker (though I saw someone else reported the same in the OPAL Slack channel)

To Reproduce
We had this issue on a testing cloud environment, but we can also reproduce it with Docker compose. The following is an abbreviated version of the official example that we use to test OPAL locally. Every environment variables are provided by the running script. I'm not disclosing their actual values as they're sensitive, but I'll happily discuss them if required (nothing really extraordinary though 🙂).

version: "3.8"
services:
  broadcast_channel:
    image: redis:6.2.7-alpine
    ports:
      - "6379:6379"
  opal_server:
    image: permitio/opal-server:0.1.21
    environment:
      - OPAL_BROADCAST_URI=redis://broadcast_channel:6379
      - UVICORN_NUM_WORKERS=4
      - OPAL_POLICY_REPO_URL=${OPAL_POLICY_REPO_URL}
      - OPAL_POLICY_REPO_SSH_KEY=${OPAL_POLICY_REPO_SSH_KEY}
      - OPAL_POLICY_REPO_MAIN_BRANCH=${OPAL_POLICY_REPO_MAIN_BRANCH}
      - OPAL_POLICY_REPO_POLLING_INTERVAL=0
      - OPAL_AUTH_PUBLIC_KEY=${OPAL_AUTH_PUBLIC_KEY}
      - OPAL_AUTH_PRIVATE_KEY=${OPAL_AUTH_PRIVATE_KEY}
      - OPAL_AUTH_MASTER_TOKEN=${OPAL_AUTH_MASTER_TOKEN}
      - OPAL_DATA_CONFIG_SOURCES=$OPAL_DATA_CONFIG_SOURCES{}
      - OPAL_LOG_FORMAT_INCLUDE_PID=true
      - OPAL_AUTH_JWT_AUDIENCE=https://api.opal.ac/v1/
      - OPAL_AUTH_JWT_ISSUER=https://opal.ac/
    ports:
      - "7002:7002"
    depends_on:
      - broadcast_channel
  opal_client:
    image: permitio/opal-client:0.1.21
    environment:
      - OPAL_SERVER_URL=http://opal_server:7002
      - OPAL_CLIENT_TOKEN=${OPAL_CLIENT_TOKEN}
      - OPAL_POLICY_SUBSCRIPTION_DIRS=${OPAL_POLICY_SUBSCRIPTION_DIRS}
      - OPAL_LOG_FORMAT_INCLUDE_PID=true
      - OPAL_INLINE_OPA_LOG_FORMAT=http
      - OPAL_AUTH_JWT_AUDIENCE=https://api.opal.ac/v1/
      - OPAL_AUTH_JWT_ISSUER=https://opal.ac/
    ports:
      - "7000:7000"
      - "8181:8181"
    depends_on:
      - opal_server
    command: sh -c "./wait-for.sh opal_server:7002 --timeout=20 -- ./start.sh"

Note here that UVICORN_NUM_WORKERS is set to 4. Running this, we'll have the following logs (I'll remove uninteresting lines but can provide them if required):

opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [1] [INFO] Starting gunicorn 20.1.0
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [1] [INFO] Listening at: http://0.0.0.0:7002 (1)
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [8] [INFO] Booting worker with pid: 8
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [9] [INFO] Booting worker with pid: 9
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [10] [INFO] Booting worker with pid: 10
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [11] [INFO] Booting worker with pid: 11
...
opal-opal_server-1        | 2022-08-16T17:37:20.346738+0000 | 10 | opal_server.policy.webhook.api          | INFO  | triggered webhook on repo: git://github.com/XXX/XXX.git
opal-opal_server-1        | 2022-08-16T17:37:20.401177+0000 | 10 | fastapi_websocket_pubsub.event_notifier | INFO  | New subscription {'id': '154d3e84d04944478a70dd9ed83a03fb', 'subscriber_id': 'cbfd39a4958b4dcb82bf0845134a10af', 'topic': '__EventNotifier_ALL_TOPICS__', 'callback': <bound method EventBroadcaster.__broadcast_notifications__ of <fastapi_websocket_pubsub.event_broadcaster.EventBroadcaster object at 0x7f56e1b33370>>, 'notifier_id': None}
opal-opal_server-1        | 2022-08-16T17:37:20.401529+0000 | 10 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook (ALL_TOPICS), subscription_id=154d3e84d04944478a70dd9ed83a03fb, subscriber_id=cbfd39a4958b4dcb82bf0845134a10af
opal-opal_server-1        | 2022-08-16T17:37:20.402571+0000 | 10 | fastapi_websocket_pubsub.event_broadc...| INFO  | Broadcasting incoming event: {'topic': 'webhook', 'notifier_id': 'cbfd39a4958b4dcb82bf0845134a10af'}
opal-opal_server-1        | 2022-08-16T17:37:20.402995+0000 | 10 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:37:20.406235+0000 | 10 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:37:20.406913+0000 | 10 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:37:20.411733+0000 | 10 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:37:20.413964+0000 | 10 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:37:20.415721+0000 | 9 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['webhook'], 'src': 'cbfd39a4958b4dcb82bf0845134a10af'}
opal-opal_server-1        | 2022-08-16T17:37:20.416836+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['webhook'], 'src': 'cbfd39a4958b4dcb82bf0845134a10af'}
opal-opal_server-1        | 2022-08-16T17:37:20.416765+0000 | 10 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:37:20.419045+0000 | 10 | fastapi_websocket_pubsub.event_notifier | INFO  | Removing Subscription of topic='__EventNotifier_ALL_TOPICS__' for subscriber=cbfd39a4958b4dcb82bf0845134a10af
opal-opal_server-1        | 2022-08-16T17:37:20.419894+0000 | 10 | uvicorn.protocols.http.httptools_impl   | INFO  | 172.18.0.1:56092 - "POST /webhook HTTP/1.1" 200
...
opal-opal_server-1        | 2022-08-16T17:37:32.989993+0000 | 11 | opal_server.policy.webhook.api          | INFO  | triggered webhook on repo: git://github.com/XXX/XXX.git
opal-opal_server-1        | 2022-08-16T17:37:32.990901+0000 | 11 | fastapi_websocket_pubsub.event_notifier | INFO  | New subscription {'id': 'cf9ec256f97746f6938b04692bb0c0b0', 'subscriber_id': '7fa214dba1424f2ca691fecf8f6d13d2', 'topic': '__EventNotifier_ALL_TOPICS__', 'callback': <bound method EventBroadcaster.__broadcast_notifications__ of <fastapi_websocket_pubsub.event_broadcaster.EventBroadcaster object at 0x7f56e1b27460>>, 'notifier_id': None}
opal-opal_server-1        | 2022-08-16T17:37:32.991249+0000 | 11 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook, subscription_id=351f02710fca494b9e83b101c37f301e, subscriber_id=56313d9105614800aa54690a28b8436d
opal-opal_server-1        | 2022-08-16T17:37:32.991507+0000 | 11 | opal_server.policy.watcher.factory      | INFO  | webhook listener triggered
opal-opal_server-1        | 2022-08-16T17:37:32.991952+0000 | 11 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook (ALL_TOPICS), subscription_id=cf9ec256f97746f6938b04692bb0c0b0, subscriber_id=7fa214dba1424f2ca691fecf8f6d13d2
opal-opal_server-1        | 2022-08-16T17:37:32.992189+0000 | 11 | fastapi_websocket_pubsub.event_broadc...| INFO  | Broadcasting incoming event: {'topic': 'webhook', 'notifier_id': '7fa214dba1424f2ca691fecf8f6d13d2'}
opal-opal_server-1        | 2022-08-16T17:37:32.992703+0000 | 11 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:37:32.994039+0000 | 11 | opal_common.sources.git_policy_source   | INFO  | Pulling changes from remote: 'origin'
opal-opal_server-1        | 2022-08-16T17:37:34.175476+0000 | 11 | opal_common.sources.git_policy_source   | INFO  | No new commits: HEAD is at 'XXX'
opal-opal_server-1        | 2022-08-16T17:37:34.177091+0000 | 11 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:37:34.177797+0000 | 11 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:37:34.179344+0000 | 11 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:37:34.180968+0000 | 11 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:37:34.181405+0000 | 9 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['webhook'], 'src': '7fa214dba1424f2ca691fecf8f6d13d2'}
opal-opal_server-1        | 2022-08-16T17:37:34.181499+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['webhook'], 'src': '7fa214dba1424f2ca691fecf8f6d13d2'}
opal-opal_server-1        | 2022-08-16T17:37:34.182022+0000 | 11 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:37:34.183110+0000 | 11 | fastapi_websocket_pubsub.event_notifier | INFO  | Removing Subscription of topic='__EventNotifier_ALL_TOPICS__' for subscriber=7fa214dba1424f2ca691fecf8f6d13d2
opal-opal_server-1        | 2022-08-16T17:37:34.184010+0000 | 11 | uvicorn.protocols.http.httptools_impl   | INFO  | 172.18.0.1:56164 - "POST /webhook HTTP/1.1" 200
...

We can see several things here:

  • The Redis connection is reported lost and made several times.
  • The webhook listener is not always triggered (webhook listener triggered only appears randomly)

As suggested in the Slack thread, setting UVICORN_NUM_WORKERS to 1 seems to resolve the second point (the webhook listener is always triggered afterward). But the Redis connection is still reported as lost.

Just as an experiment, I tried publishing some messages in the Redis instance (hello, still with a single worker) and ended up with the following logs:

opal-opal_server-1        | 2022-08-16T17:41:33.355978+0000 | 8 | opal_server.policy.webhook.api          | INFO  | triggered webhook on repo: git://github.com/XXX/XXX.git
opal-opal_server-1        | 2022-08-16T17:41:33.357930+0000 | 8 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook, subscription_id=238cb9d922044d7bb61894811f428f54, subscriber_id=a9a7c881851342a3b6d3ca84a4a8a86d
opal-opal_server-1        | 2022-08-16T17:41:33.358152+0000 | 8 | opal_server.policy.watcher.factory      | INFO  | webhook listener triggered
opal-opal_server-1        | 2022-08-16T17:41:33.358666+0000 | 8 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook (ALL_TOPICS), subscription_id=0eca665b7b41433ba953c4cfc4eff9f6, subscriber_id=63db1ff04fcd4dc2a1e1e3a7292a73e7
opal-opal_server-1        | 2022-08-16T17:41:33.359060+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Broadcasting incoming event: {'topic': 'webhook', 'notifier_id': '63db1ff04fcd4dc2a1e1e3a7292a73e7'}
opal-opal_server-1        | 2022-08-16T17:41:33.359783+0000 | 8 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:41:33.361329+0000 | 8 | opal_common.sources.git_policy_source   | INFO  | Pulling changes from remote: 'origin'
opal-opal_server-1        | 2022-08-16T17:41:34.543173+0000 | 8 | opal_common.sources.git_policy_source   | INFO  | No new commits: HEAD is at 'XXXX'
opal-opal_server-1        | 2022-08-16T17:41:34.545514+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:41:34.546808+0000 | 8 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:41:34.548838+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:41:34.551641+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:41:34.552786+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:41:34.554434+0000 | 8 | uvicorn.protocols.http.httptools_impl   | INFO  | 172.18.0.1:56330 - "POST /webhook HTTP/1.1" 200
opal-opal_server-1        | 2022-08-16T17:42:51.285310+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['hello'], 'src': '5bc185a3afdd4bd3a1caedc203f414ce'}
opal-opal_server-1        | 2022-08-16T17:42:52.746649+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['hello'], 'src': 'ce5dcb29523b42528c89031c09078e94'}
opal-opal_server-1        | 2022-08-16T17:43:00.656252+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['hello'], 'src': 'f2035dc737f24563ad01af799bc07a53'}
opal-opal_server-1        | 2022-08-16T17:43:02.443805+0000 | 8 | opal_server.policy.webhook.api          | INFO  | triggered webhook on repo: git://github.com/XXX/XXX.git
opal-opal_server-1        | 2022-08-16T17:43:02.444774+0000 | 8 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook, subscription_id=238cb9d922044d7bb61894811f428f54, subscriber_id=a9a7c881851342a3b6d3ca84a4a8a86d
opal-opal_server-1        | 2022-08-16T17:43:02.444923+0000 | 8 | opal_server.policy.watcher.factory      | INFO  | webhook listener triggered
opal-opal_server-1        | 2022-08-16T17:43:02.445497+0000 | 8 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook (ALL_TOPICS), subscription_id=0eca665b7b41433ba953c4cfc4eff9f6, subscriber_id=63db1ff04fcd4dc2a1e1e3a7292a73e7
opal-opal_server-1        | 2022-08-16T17:43:02.446015+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Broadcasting incoming event: {'topic': 'webhook', 'notifier_id': '63db1ff04fcd4dc2a1e1e3a7292a73e7'}
opal-opal_server-1        | 2022-08-16T17:43:02.446760+0000 | 8 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:43:02.447727+0000 | 8 | opal_common.sources.git_policy_source   | INFO  | Pulling changes from remote: 'origin'
opal-opal_server-1        | 2022-08-16T17:43:03.727878+0000 | 8 | opal_common.sources.git_policy_source   | INFO  | No new commits: HEAD is at 'XXXX'
opal-opal_server-1        | 2022-08-16T17:43:03.729302+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:43:03.729719+0000 | 8 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:43:03.730882+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:43:03.732462+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:43:03.732879+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:43:03.734165+0000 | 8 | uvicorn.protocols.http.httptools_impl   | INFO  | 172.18.0.1:56350 - "POST /webhook HTTP/1.1" 200

So it seems that the Redis connection is actually still alive, and OPAL is still listening to it. It only considers the connection lost when it receives a webhook notification. Though it does so after reading the incoming notification so at the very least the webhook notification is taken into account.

Expected behavior
Is Redis officially supported by OPAL? I've seen few references to it. It is supported by the broadcaster package and the example docker-compose.yaml file suggests that any backend supported by broadcaster is supported by OPAL.

If Redis is officially supported, then I guess the expected behavior would be:

  • No Redis connection lost message
  • Support for more than 1 worker without missing any webhook calls (or maybe a warning when starting OPAL with such a configuration, or failing fast when detecting it)

OPAL version

  • Version: 0.1.21

Have I missed something that might explain all of this? When testing the fastapi_websocket_pubsub, I didn't have Redis connection issue / message.

Thanks in advance for your help 🙂

@michivi michivi added the bug Something isn't working label Aug 16, 2022
@asafc
Copy link
Contributor

asafc commented Aug 17, 2022

Hey @michivi thanks for reporting this bug.

OPAL uses and supports Redis via the python broadcaster package.
We are looking to move away from or fork broadcaster since it's no longer actively maintained (that we can tell).

@roekatz can you please take a look?
we can fix that in our own fork of broadcaster.

@asafc asafc assigned asafc and unassigned asafc Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants