-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERR Protocol error: too big inline request (Redis::CommandError) #1185
Comments
So this error is thrown by the server here: https://github.com/redis/redis/blob/2dd5c3a180a9892c339b66f76cc203f8b0f6fe5a/src/networking.c#L2112-L2115 And that constant is defined as 64kB:
So this seem to suggest that somehow the client sends more than 64kB of data without a single newline, that is very fishy. But somehow that error rings a bell, it feels like I researched this before. |
What would be sending inline, though? Looking at the ActionCable source, it's calling publish in |
Yeah, I don't know, it could be an overly long channel name on something like that. It will be hard to debug without a repro. I'm not sure what you setup look like, but if you have that flexibility I'd try to add some logging in there redis-rb/lib/redis/subscribe.rb Line 49 in e4f5299
|
So looking at the backtrace, the line in actioncable that ends up at and it's explicitly sending I can try to add some logging there, but if that's the case, I don't see how it could be anything other than the hardcoded channel, let along a 64k long channel name |
Hum, you're right. |
actioncable sets the message callback of the on.message do |chan, message|
broadcast(chan, message)
end If I'm reading it right, this block under while event = @client.next_event(timeout)
if event.is_a?(::RedisClient::CommandError)
raise Client::ERROR_MAPPING.fetch(event.class), event.message
end
type, *rest = event
if callback = sub.callbacks[type]
callback.call(*rest)
end
break if type == stop && rest.last == 0
end is blocking while loop in its own thread? So it's likely that it subscribes to the channel, then waits for incoming messages. If that's the case, though, we've attempted sending large messages through to trigger the exception with no luck |
Note that it could be a total red herring. When googling for that error, you can see lots of these being resolved in various ways on the server side. Given that you say the issue started when you upgraded, it makes sense to assume the client doesn't something wrong, but that wrong thing may not be to send a packet too big.
Have you tried the latest 4.x? Mostly to try to pinpoint when the issue might have been introduced. Another good thing to try would be to enable the |
That is a concern, that since the exception being raised is new in this version, that it's something that could have been happening all along and was either being swallowed or surfaced as another error that we haven't noticed is no longer happening. [1] pry(main)> Redis::Connection.drivers
=> [RedisClient::HiredisConnection] we're already using the hiredis-client gem. Is there a way to see if the AC redis instance is using inline? I thought resp was what was always used? |
Then the opposite, try to disable it. It's really just a basic bisection to try to narrow down where the bug could be.
I'm not sure what you are referring to by |
Oh wait, Hum, another thing, is there SSL into the mix? I cant remember nor find where I saw that Just a wild guess though. |
It does look like the gem uses RESP2 def config(**kwargs)
super(protocol: 2, **kwargs)
end The error itself says "inline request" and the comments on the method in the redis source say however, it seems like every client in existence uses resp. I'll have to check on the SSL issue, but if that was the case, I'd expect it to happen on every message, no? |
Oh wait, this is an intermittent issue? |
Yeah, I have no idea what "inline protocol" is supposed to be never heard of it, and I can confirm that Is there a proxy or something between the client and Redis? Or anything else that isn't just a raw standalone redis-server? |
Oh yes, intermittent and extremely elusive. We've had 0 luck in recreating it on purpose. We see the error 6-8 times a day among thousands and thousands of requests. No other issues we can tell on straight redis calls for caching and resque use, just something with action cable. We use consul to manage the url, but ATM we only have a single redis server in production. |
Hum, that is extremely intriguing. I'll try to find some time to review if it's possible that on 4.x that error was swallowed or retried. That could explain it, e.g. a pre-existing issue with your server, and now we're just reporting it. |
It looks like in the previous version, it looks like the errors were being raised in # Raised by the connection when a protocol error occurs.
class ProtocolError < BaseError
def initialize(reply_type)
super(<<-EOS.gsub(/(?:^|\n)\s*/, " "))
Got '#{reply_type}' as initial reply byte.
If you're in a forking environment, such as Unicorn, you need to
connect to Redis after forking.
EOS
end
end |
I'm also seeing this intermittently after upgrading |
Spent the time after my last attempt slow rolling the upgrade. Redis gem up to 4.8.1 and resque up to 2.5.0 with no issues. We have since moved from our old redis servers to GCP memorystore, which hosts In addition, there was a rails upgrade that fixed an ActionCable redis config with sentinel The final upgrade was swapping out hiredis for hiredis-client and upgrading the redis gem to 5.0.6 Within an hour I had received the error from a very, very small action cable call |
hmm, in fact, I can't find us using ActionCable in the call that Airbrake captures as the source at all. Could be some sneaky metaprogramming somewhere, but I'm not finding it. |
@Matt-Yorkley, did you ever find any cause? I haven't been able to try again in a while, but I'm working on some more logging in the action cable process to see if I can track it down. |
No, Matt didn't solve the issue for us. I just ran CI again on our latest code and still saw this error popping up. Since Sam claimed that hiredis doesn't bring any performance improvement I decided to remove the gem from our code and just use the default redis gem. Let me know if you come to another conclusion. |
I added more logging around what gets passed to the redis gem from action cable, but no smoking guns. Still very intermittent and haven't been able to consistently reproduce it. Are you saying that removing hiredis eliminated the issue? |
Redis bulk load commands need to be under 64 KB per line (more or less): redis/redis-rb#1185 https://github.com/redis/redis/blob/07b292af5e86459e435da5986f87bc4f7415d437/src/server.h#L178 If too many IDs are included in the batch file, the lines are too long, causing an error: ERR Protocol error: too big inline request Fix by chunking IDs written to lines. What chunk size to use? 1000 IDs works, and leads to lines of approximately 23 KB. But Register 2 generates statement IDs of around 20 characters in length, whereas BODS 0.2 and BODS 0.3 says they must be between 32 and 64 chars (the fact that Register 2 doesn't comply with the minimum is a separate issue). Multiplying by 20/64 gives a target of around 20 KB, so reducing to 750 IDs should be comfortable, leading to lines of around 17 KB (as measured with `wc -c`).
Yes, at least in our CI environment. We haven't tested production yet. |
Redis 5.3.3
Ruby 3.1.3
Rails 7.0.4.2
redis gem 5.0.6
resque gem 2.4.0
Just upgraded the redis/resque gems and their dependencies and am getting this intermittent error ERROR -- : ERR Protocol error: too big inline request (Redis::CommandError)
The backtrace is not super helpful
seems to indicate that action_cable is involved, but we use it a lot and having trouble tracking it down. I can't reproduce it and it appears on calls that don't trigger any actioncable actions.
The error is generated in redis, which has not changed, but looking at the gem difference, the error being raised in subscribe.rb is new. Am I looking at something that was happening all along, but was being swallowed by earlier versions of the redis gem?
The text was updated successfully, but these errors were encountered: