-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add upper limit on initial buffer size in MessagePack::Unpacker #368
Conversation
Currently, the initial buffer size is specified in rb_ary_new2() or rb_hash_new_capa(). If a huge size is specified, a large amount of memory is allocated and system memory might be depleted. We want to unpack the data received over the network. However the service may stop due to large amount of memory allocation with crafted data. So this patch add upper limit on initial buffer size. If the buffer runs out, Ruby API will be reallocated automatically. ## Test code ```ruby require "msgpack" puts "msgpack version: #{MessagePack::VERSION}" unpacker = MessagePack::Unpacker.new unpacker.feed_each("\xDF\x20\x00\x00\x00") {} puts "Memory Usage: #{`ps -o rss= -p #{Process.pid}`.strip} KB" ``` ## Before Before it apply this patch, it allocates 8 GB memory on my environment. ``` $ ruby -v test.rb ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux] msgpack version: 1.7.2 Memory Usage: 8403320 KB ``` ## After ``` ruby -v test.rb ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux] msgpack version: 1.7.2 Memory Usage: 14480 KB ```
So, I'm OK with this change, because You can't get a partially parsed array or hash, so unless this payload is malicious, you will end up needing such a large array/hash to parse in the end. Or am I missing something? |
Yes, you're right! |
So is this really a problem? Ruby will have to allocate this much eventually, might as well do it upfront, no? |
We would like to retrive exceptions like parse errors by invalid data, but suddenly huge memory is allocated at first. |
I'm concerned about this. |
That's fair, that's all I wanted to understand. |
@Watson1978 Thanks for this enhancement! |
Thanks |
We have been investigating a segfault in our application for the last few days, and we believe this PR has fixed the problem coincidentally. This was only happening in our CentOS 8 VM with Red Hat's ruby 3.0.4, and we couldn't get it to happen on any other system, so it's been difficult to track down. Upgrading to msgpack 1.7.3 and the problem now goes away, so posting this here for others if they happen to hit the same thing. This script was causing the segfault: require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "bootsnap", "=1.17.1"
gem "msgpack", "=1.7.2"
gem "psych", "=3.3.4"
end
yaml = "am: ''"
File.write("test.yml", yaml)
require "yaml"
require "bootsnap"
ENV["BOOTSNAP_CACHE_DIR"]=Dir.pwd + "/tmp/cache"
require "bootsnap/setup"
puts YAML.method(:unsafe_load_file)
puts
begin
YAML.unsafe_load_file("test.yml", symbolize_names: true, freeze: true)
rescue => err
puts "Failed loading"
raise
end
puts "Successfully loaded" and the error given was
|
Only on your red hat right? Because I can't reproduce it on my machine it seems. Or maybe it needs ruby 3.0.4 exactly?
This is WTF. You issue being fixed by this PR would suggest that msgpack would end up over allocating so much, that it would make a difference, but given your repro script, I don't see how that's possible unless there is another bug somewhere. |
I'd be very interested in the full crash report if you have it (mostly the |
I got an even smaller repro with only msgpack, but stealing only the relevant code from Bootsnap::CompileCache::YAML: require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "msgpack", "=1.7.2"
end
factory = MessagePack::Factory.new
factory.register_type(
0x00,
Symbol,
packer: :to_msgpack_ext,
unpacker: ->(payload) { (+payload).force_encoding(Encoding::UTF_8).to_sym },
)
input = {"x" => ""}
puts "input: #{input}"
packer = factory.packer
packer.pack(input)
storage = packer.to_s
puts "storage: #{storage.inspect}"
unpacker = factory.unpacker(freeze: true)
unpacker.feed(storage)
output = unpacker.unpack
puts "output: #{output}" Full crash report:
|
I tried on Mac with 3.0.4 with no luck - we're trying to see if we can get a standalone container with it. |
@Fryguy thank you very much for continuing the investigation that is very valuable. Feel free to open a new issue for further reporting. |
That's exactly why I started this convo - I can't understand why this PR would fix it, and it feels like another bug is lurking either in here or in combination with how bootsnap is using it. If you;d like a separate issue somewhere, I'd be happy to open. |
I opened #369 with an even smaller reproducer - still going to plug away at this but moving the convo there. |
Just a followup for future readers - this PR did not fix the segfault and it was caused by a different reason - see #369 |
Currently, the initial buffer size is specified in rb_ary_new2() or rb_hash_new_capa(). If a huge size is specified, a large amount of memory is allocated and system memory might be depleted.
We want to unpack the data received over the network. However the service may stop due to large amount of memory allocation with crafted data.
So this patch add upper limit on initial buffer size. If the buffer runs out, Ruby API will be reallocated automatically.
Test code
Before
Before it apply this patch, it allocates 8 GB memory on my environment.
After