Skip to content

Push Messaging

Carl Mastrangelo edited this page Jan 24, 2020 · 4 revisions

Zulu 2.0 supports push messaging - i.e. sending messages from server to client. It supports two protocols, WebSockets and Server Sent Events (SSE) for sending push messages.

Our sample app demonstrates set up of both WebSockets as well as SSE to enable push messaging for Zuul.

Authentication

Zuul Push server must authenticate each incoming push connection. You can plugin your own custom authentication into Zuul Push server. You can do so by extending the abstract PushAuthHandler class and implementing its doAuth() method. Please refer to SamplePushAuthHandler as an example of how to do this.

Client Registration and Lookup

After successful authentication, Zuul Push registers each authenticated connection against the client or user identity so that it can be looked up later to send push message to that particular client or user. You can decide what goes into this identity by implementing PushUserAuth interface and returning an instance of it from doAuth() after successful authentication. Please refer to SamplePushUserAuth as an example.

Each Zuul Push server maintains a local, in-memory registry of all the clients connected to it using PushConnectionRegistry . For a single node push cluster this in-memory local registry is sufficient. In case of a multi-node push cluster, a second level, off-the box global datastore is needed to extend push registry beyond the single machine. In such case lookup for a particular client follows a two step process. First, you lookup the push server to which the specified client is connected in the off-the-box, global push registry. That lookup returns the push server which can then look up the actual client connection within its local, in-memory push registry.

You can integrate off-the-box, global push registry with Zuul Push by extending PushRegistrationHandler and overriding its registerClient() method. Zuul push allows you to plugin any datastore of your choice as the global push registry but for the best results the chosen datastore should support following features

  • Low read latency
  • TTL or automatic record expiry of some sort.
  • Sharding
  • Replication

Having these features means your push cluster can horizontally scale to millions of push connections, if needed. Redis, Cassandra, Amazon DynamoDB are just few of many possible good choices for the global push registry datastore.

Please take a look at classes SampleWebSocketPushRegistrationHandler and SampleSSEPushRegistrationHandler to see how to integrate WebSocket and SSE connections with the push registry respectively.

Accepting new Push connection

SampleWebSocketPushChannelInitializer and SampleSSEPushChannelInitializer demonstrate how to set up Netty channel pipeline for accepting incoming WebSocket and SSE connections respectively. These classes set up authentication and registration handlers for every new incoming connection based on the protocol being used.

Load balancers vs WebSockets and SSE

Push connections are different than normal request/response style HTTP connections in that they are persistent and long-lived. Once the connection is made, it is kept open by both client and server even when there are no requests pending. This throws off many popular load balancers which cut the connection after some period of inactivity. Amazon Elastic Load Balancers (ELB) and older versions of HAProxy and Nginx all have this issue. You have basically two choices to make your cluster work with your load balancers:

  1. Either use latest version of the load balancer that supports WebSocket proxying, like the latest version of HAProxy or Nginx or Application Load Balancer (ALB) instead of ELB in case of Amazon cloud, or
  2. Run your existing load balancer as a TCP load balancer at layer 4 instead of as HTTP load balancer doing layer 7 load balancing. Most load balancers - including ELBs - support a mode where they act as a TCP load balancer. In this mode they just proxy TCP packets back and forth without trying to parse or interpret any application protocol which generally fixes the issue.

You probably also need to increase the IDLE timeout value of your load balancer as the default value for IDLE timeout is usually in seconds and almost always insufficient for the typical long-lived, persistent and mostly idle push connections.

Configuration options

Name Description Default value
zuul.push.registry.ttl.seconds Record expiry (TTL) of a client registration record in the global registry 1800 seconds
zuul.push.reconnect.dither.seconds Randomization window for each client's max connection lifetime. Helps in spreading subsequent client reconnects across time 180 seconds
zuul.push.client.close.grace.period Number of seconds the server will wait for the client to close the connection before it closes it forcefully from its side 4 seconds

If you use Netflix OSS Archaius module, you can change all of the above configuration options at runtime and they will take effect without having to restart the server.