-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault on 1.7.2 #369
Comments
So as mentioned on the other issue, I can't repro even with 3.0.3 even in docker. But it's a debian base and on ARM64, not red-hat, so hard to tell what kind of patches may have been applied to ruby, or if the bug is x86 only, or dependent of some specific system tuning. The |
Looks like overcommit is enabled
|
Yeah, probably isn't that. I'm trying to get some reproducer in Docker, using |
Oh yeah, centos 8 is a pain because they just went into the vault. Unfortunately, we have to cut a new version of an older release, and that's what that release is on...it wasn't until cutting the release that we found this segfault. I'm also trying to get a docker image - here's what I got so far, but it passes just fine (using ubi8) FROM registry.access.redhat.com/ubi8/ubi
RUN dnf -y module enable ruby:3.0 && \
dnf -y install ruby ruby-devel gcc make
RUN gem install msgpack -v 1.7.2
ADD segfault.rb /
ENTRYPOINT ["ruby", "segfault.rb"] |
Ok, so I managed to get Ruby 3.0.4 installed from centos 8 after quite a lot of pain, but still doesn't reproduce for me (still on ARM64 though). |
Not sure if this helps, but looking at the generated Makefiles for the failing 1.7.2 and the working 1.7.3 I noticed this: diff --git a/opt/manageiq/manageiq-gemset/gems/msgpack-1.7.2/ext/msgpack/Makefile b/opt/manageiq/manageiq-gemset/gems/msgpack-1.7.3/ext/msgpack/Makefile
index 69a2a0c..9106c87 100644
--- a/opt/manageiq/manageiq-gemset/gems/msgpack-1.7.2/ext/msgpack/Makefile
+++ b/opt/manageiq/manageiq-gemset/gems/msgpack-1.7.3/ext/msgpack/Makefile
@@ -32,8 +32,8 @@ rubygemsdir = $(DESTDIR)/usr/share/rubygems
vendorarchdir = $(DESTDIR)/usr/lib64/ruby/vendor_ruby
vendorlibdir = $(vendordir)
vendordir = $(DESTDIR)/usr/share/ruby/vendor_ruby
-sitearchdir = $(DESTDIR)./.gem.20241007-2975-tj4odq
-sitelibdir = $(DESTDIR)./.gem.20241007-2975-tj4odq
+sitearchdir = $(DESTDIR)./.gem.20241015-14952-yysfs2
+sitelibdir = $(DESTDIR)./.gem.20241015-14952-yysfs2
sitedir = $(DESTDIR)/usr/local/share/ruby/site_ruby
rubyarchdir = $(rubyarchprefix)
rubylibdir = $(rubylibprefix)
@@ -84,7 +84,7 @@ debugflags = -ggdb3
warnflags = -Wall -Wextra -Wdeprecated-declarations -Wduplicated-cond -Wimplicit-function-declaration -Wimplicit-int -Wmisleading-indentation -Wpointer-arith -Wwrite-strings -Wimplicit-fallthrough=0 -Wmissing-noreturn -Wno-cast-function-type -Wno-constant-logical-operand -Wno-long-long -Wno-missing-field-initializers -Wno-overlength-strings -Wno-packed-bitfield-compat -Wno-parentheses-equality -Wno-self-assign -Wno-tautological-compare -Wno-unused-parameter -Wno-unused-value -Wsuggest-attribute=format -Wsuggest-attribute=noreturn -Wunused-variable
cppflags =
CCDLFLAGS = -fPIC
-CFLAGS = $(CCDLFLAGS) -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC -fvisibility=hidden -I.. -Wall -O3 -std=gnu99 -ggdb3 -DHASH_ASET_DEDUPE=1 -DSTR_UMINUS_DEDUPE_FROZEN=1 $(ARCH_FLAG)
+CFLAGS = $(CCDLFLAGS) -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fPIC -fvisibility=hidden -I.. -Wall -O3 -std=gnu99 -ggdb3 -DRB_ENC_INTERNED_STR_NULL_CHECK=1 -DHASH_ASET_DEDUPE=1 -DSTR_UMINUS_DEDUPE_FROZEN=1 $(ARCH_FLAG)
INCFLAGS = -I. -I$(arch_hdrdir) -I$(hdrdir)/ruby/backward -I$(hdrdir) -I$(srcdir)
DEFS =
CPPFLAGS = -DHAVE_RB_ENC_INTERNED_STR -DHAVE_RB_PROC_CALL_WITH_BLOCK $(DEFS) $(cppflags)
@@ -110,7 +110,6 @@ sitearch = $(arch)
ruby_version = 3.0.0
ruby = $(bindir)/$(RUBY_BASE_NAME)
RUBY = $(ruby)
-BUILTRUBY = $(bindir)/$(RUBY_BASE_NAME)
ruby_headers = $(hdrdir)/ruby.h $(hdrdir)/ruby/backward.h $(hdrdir)/ruby/ruby.h $(hdrdir)/ruby/defines.h $(hdrdir)/ruby/missing.h $(hdrdir)/ruby/intern.h $(hdrdir)/ruby/st.h $(hdrdir)/ruby/subst.h $(arch_hdrdir)/ruby/config.h
RM = rm -f That |
Oh yeah, I was suspecting that, and that will do it. This flag is for a know Ruby 3.0 bug. We check the Ruby version for it, it was fixed in Ruby 3.0.5: msgpack-ruby/ext/msgpack/extconf.rb Line 18 in 6bbaa97
If somehow your generated Makefile didn't set this flag, then an empty string combined with the |
Ref: https://bugs.ruby-lang.org/issues/18772 |
Awesome this helps a lot - It's very likely this, though I'm not sure why the gem we built is missing this for one but not the other...I can dig into this, but I'm suspecting this is the reason. |
@byroot Thank you so much for digging into this with me...we understand what's happening now so I'm going to close this. Our application has 2 deployable form factors, containerized and a virtual machine appliance. Our build process first builds our application and its dependencies into rpms from within a container env (which is ubi8) and those rpms are pushed to a yum repository. Later we install those rpms into the final deployable container image (which is ubi8) as well as a virtual machine appliance image (which is centos8-stream). So, the weird part here is that centos8-stream is EOL, but Red Hat has been continuing to keep ubi8 up to date, and they are now out of sync. centos8-stream ships with Ruby 3.0.4 and ubi8 ships with Ruby 3.0.7. Thus, when we built msgpack it was built against Ruby 3.0.7, but then when we deployed in the appliance it was running with Ruby 3.0.4. When I manually built 1.7.3 on the appliance during this investigation, what I actually did was align the Ruby versions unknowingly, which is why it worked. 1.7.3 wasn't the fix at all. In fact, I uninstalled the 1.7.2 gem, and just rebuilt it again on the appliance, and suddenly 1.7.2 worked, because, again, I just aligned the Ruby version. It really was just a perfect storm where the Ruby versions just happened to straddle this exact bug 😭 . Again thank you for helping us dig in! |
Ouch. Yeah, that hurt. |
I came here from #368 only to know how the episode on the odd segfault ended :D Kudos to both of you for dedication on a single bug and for sharing this! |
Extracting the conversation from #368 (comment), we are getting a segfault that seems to have been fixed by #368 (v1.7.3), however I'm concerned that PR didn't actually fix it, or that something else is going on, so I'm opening this issue to continue the investigation.
This was only happening in our CentOS 8 VM with Red Hat's ruby 3.0.4, and we couldn't get it to happen on any other system, so it's been difficult to track down. Upgrading to msgpack 1.7.3 and the problem now goes away
The smallest reproduce I have so far is
Full stack trace:
Some system info:
The text was updated successfully, but these errors were encountered: