-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
With active-active + cluster mode enabled often messages get processed twice #151
Comments
Yeah, it's not built to handle such scenarios. One thing can be done here is, you can turn off Rqueue workers/listeners in 2nd region. Do you know what's the Redis lag between these two setups? Generally if data is replicated immediately then it should not process duplicate messages instead they should process unique messages. Also, some more detailed architecture will help me to understand what's happening. |
There seems to be a bit of lag between replication. When the same job gets dispatched to 2 instances in separate regions and they are then actually both able to update the request (we use redis for persistence as well) without a cas exception maybe even 30ms apart. Im currently trying to figure out what expected lag is. I think we will just try to handle it ourselves because our holy grail is to be able to handle in multiple regions so we are pretty fault tolerant. Just wanted to know if this had ever come up. We have been considering seeing if we could could fork your code and try to solve ourselves, but I imagine that would be harder for us than implementing in our own service :/. |
@sonus21 I've started to think about the problem in a different way but Im not sure the library allows for it.
But I can't really see if there is a an exposed way for an instance to start / stop consuming from a queue. Like not using the annotation and manually polling or something. Does this exist? |
We can stop/start listener at runtime but it does not support adding new queue at runtime. Why do you need active active setup for consumers? Should not you stop consumer in another region? |
Basically we want to be resistant to losing an entire region. |
I think it's not correct to use a multi-regions active-active setup for consumers. Even SQS is a single region it's not replicated across regions. https://stackoverflow.com/questions/66249605/does-aws-sqs-replicate-messages-across-regions Can you refer me to some articles that suggest this solution for high availability? Proposed solution:
Expected delay in message processing (5-15 minutes) |
I think your proposed solution sounds like what we are going to attempt. That sounds like the way I was kind of angling now. I was trying to find something like task count so thank you for sharing! I honestly was just dreaming a bit that this would be possible but sounds like my head was a bit in the clouds. |
We had originally implemented a scheduler using your library with Redis elasticache in AWS with cluster mode disabled. Every thing seemed to work out great. For context, our service in production that implements the scheduler runs 3 instances. The scheduled task was always dispatched to a single instance as expected.
We have now tried to switch to an enterprise version of redis with cluster mode enabled and an active-active database. By that I mean that we are running our service in two regions (x3 instances per region) both pointing to different redis databases (i.e. redis in region A and redis in region b) that are replicated in between.
What're application dependencies ?
How to Reproduce (optional)?
We aren't running under any load as we currently only really run smoke tests. We schedule a job with a wait of x seconds. x seconds later the job is consumed and executed by two instances. The instances are always in different regions. I'm guessing library was just never built to handle this case?
The text was updated successfully, but these errors were encountered: