Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCI8.__execute hangs after > 1 hour #159

Closed
mrship opened this issue Feb 8, 2017 · 21 comments
Closed

OCI8.__execute hangs after > 1 hour #159

mrship opened this issue Feb 8, 2017 · 21 comments

Comments

@mrship
Copy link

mrship commented Feb 8, 2017

I am using OCI8 with Sequel and experiencing an issue with connections hanging in __execute after an idle period of ~ 1 hour.

I'm running on Heroku that has a 30 seconds limit on requests, so the stack trace below is from using rack-timeout to force an error and stack trace after 15s. However, this also happens on AWS where there is no timeout, the connection just hangs indefinitely.

From what I can see on the Oracle, we get a reconnect (we see a login from Heroku after the hour idle) but the new connection then just times out. The only recourse is to restart Heroku dynos (or nginx on AWS) to get a "fresh" connection that doesn't time out. However, if we have another idle period of ~ 1 hour then this fresh connection suffers the same problem.

I have tried setting the various connection timeouts, but with no difference.

  OCI8.properties[:tcp_connect_timeout] = 3
  OCI8.properties[:connect_timeout] = 3
  OCI8.properties[:send_timeout] = 10
  OCI8.properties[:recv_timeout] = 10

Can you advise on how I can further diagnose/fix? Thanks.

Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/ruby-oci8-2.2.3/lib/oci8/cursor.rb:126:in `__execute' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/ruby-oci8-2.2.3/lib/oci8/cursor.rb:126:in `exec' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/ruby-oci8-2.2.3/lib/oci8/oci8.rb:276:in `exec_internal' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/ruby-oci8-2.2.3/lib/oci8/oci8.rb:267:in `exec' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/adapters/oracle.rb:91:in `block (2 levels) in _execute' 
Feb 06 11:51:31 (eval):11:in `call' 
Feb 06 11:51:31 (eval):11:in `block (2 levels) in log_connection_yield' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.7.1/lib/active_support/notifications.rb:164:in `block in instrument' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.7.1/lib/active_support/notifications/instrumenter.rb:20:in `instrument' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.7.1/lib/active_support/notifications.rb:164:in `instrument' 
Feb 06 11:51:31 (eval):5:in `block in log_connection_yield' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/database/logging.rb:44:in `log_connection_yield' 
Feb 06 11:51:31 (eval):4:in `log_connection_yield' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/adapters/oracle.rb:91:in `block in _execute' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/database/connecting.rb:280:in `block in synchronize' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/connection_pool/threaded.rb:107:in `hold' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/database/connecting.rb:280:in `synchronize' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/adapters/oracle.rb:82:in `_execute' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/adapters/oracle.rb:67:in `execute' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/dataset/actions.rb:1064:in `execute' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/adapters/oracle.rb:364:in `fetch_rows' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/dataset/actions.rb:149:in `each' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/dataset/actions.rb:84:in `columns!' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/sequel-4.43.0/lib/sequel/dataset/actions.rb:74:in `columns' 
Feb 06 11:51:31 /app/vendor/bundle/ruby/2.2.0/gems/rom-sql-0.9.1/lib/rom/plugins/relation/sql/base_view.rb:16:in `block (2 levels) in inherited' 
@mrship
Copy link
Author

mrship commented Feb 8, 2017

To add to this, I am reusing a connection object from the pre-idle queries to reconnect/test with a simple query before trusting that the connection is valid. However, the simple test query also fails.

FYI, this is using the Sequel connection validator extension.

Feb 07 19:55:50 app/web.1:  I, [2017-02-07T19:55:49.690814 #25]  INFO -- : (0.015746s) (conn: 70034341617740) SELECT "ID", "TYPE", "LABEL", "MANDATORY", "HIDDEN", "ORDER", "STATUS", "DEFAULT", "SAGE_KLASS", "CREATED_AT", "UPDATED_AT", "COLUMN" FROM "FILTERS" ORDER BY "FILTERS"."ID" 
Feb 07 21:54:45 app/web.1:  I, [2017-02-07T21:54:44.581208 #25]  INFO -- : (943.562764s) (conn: 70034341617740) SELECT NULL FROM DUAL ```

@kubo
Copy link
Owner

kubo commented Feb 8, 2017

Could you get C-level backtrace of the hung process?

If you use Linux and can use gdb on the machine,

$ ps -ef | grep ruby   # get the process ID of the hung process.
$ sudo gdb /proc/PID/exe PID # replace PID with the process ID of the hung process. This may work without `sudo' depending on the machine setting.
gdb> info threads  # get the list of native threads
gdb> thread 1    # change to thread 1
gdb> where       # get backtrace
gdb> thread 2    # change to thread 2
gdb> where       # get backtrace
   ...

@mrship
Copy link
Author

mrship commented Feb 8, 2017

Not easily as I'm either on Heroku with no access to the dyno directly, or AWS running passenger under nginx. I'll see if I can replicate a test case locally instead and let you know.

@mrship
Copy link
Author

mrship commented Feb 8, 2017

3:18:17 PM web.1 |  I, [2017-02-08T15:18:17.596725 #50592]  INFO -- : (0.015757s) (conn: 70140625562900) SELECT "ID", "FILTER_ID", "YAMMER_UID", "VALUE" FROM "USER_FILTERS" WHERE ("YAMMER_UID" = '123456789') ORDER BY "USER_FILTERS"."ID"
4:51:20 PM web.1 |  I, [2017-02-08T16:51:20.295415 #50592]  INFO -- : (20.172824s) (conn: 70140625562900) SELECT NULL FROM DUAL
4:51:20 PM web.1 |  2017-02-08 16:51:20 - Rack::Timeout::RequestTimeoutException - Request ran for longer than 15000ms:
4:51:20 PM web.1 |  	/Users/shipmana/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ruby-oci8-2.2.3/lib/oci8/cursor.rb:126:in `__execute'
4:51:20 PM web.1 |  	/Users/shipmana/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ruby-oci8-2.2.3/lib/oci8/cursor.rb:126:in `exec'
4:51:20 PM web.1 |  	/Users/shipmana/.rbenv/versions/2.2.3/lib/ruby/gems/2.2.0/gems/ruby-oci8-2.2.3/lib/oci8/oci8.rb:276:in `exec_internal'

With the above failure locally, I see this from gdb. Does that help?

➜  lighthouse git:(development) ps aux | grep puma
shipmana         50592   0.0  0.7  2779188 118988 s003  S+    3:18pm   0:02.41 puma: cluster worker 1: 50577 [lighthouse-testbed]
shipmana         50591   0.0  0.5  2763000  92172 s003  S+    3:18pm   0:01.30 puma: cluster worker 0: 50577 [lighthouse-testbed]
shipmana         50577   0.0  0.6  2737268 104948 s003  S+    3:18pm   0:02.85 puma 3.7.0 (tcp://0.0.0.0:5000) [lighthouse-testbed]
shipmana         56334   0.0  0.0  2433828   1764 s006  R+    4:51pm   0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn puma
➜  lighthouse git:(development) sudo gdb /proc/50592/exe 50592
Password:
GNU gdb (GDB) 7.12.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin16.3.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
/proc/50592/exe: No such file or directory.
Attaching to process 50592
[New Thread 0x1903 of process 50592]
[New Thread 0x1a03 of process 50592]
[New Thread 0x1b03 of process 50592]
[New Thread 0x1c03 of process 50592]
[New Thread 0x1d03 of process 50592]
[New Thread 0x1e03 of process 50592]
[New Thread 0x1f03 of process 50592]
[New Thread 0x2003 of process 50592]
[New Thread 0x2103 of process 50592]
[New Thread 0x2203 of process 50592]
[New Thread 0x2303 of process 50592]
[New Thread 0x2403 of process 50592]
Reading symbols from /Users/shipmana/.rbenv/versions/2.2.3/bin/ruby...(no debugging symbols found)...done.

warning: unhandled dyld version (15)
0x00007fffa9f72c86 in ?? ()

(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x1803 of process 50592 0x00007fffa9f72c86 in ?? ()
  2    Thread 0x1903 of process 50592 0x00007fffa9f74232 in ?? ()
  3    Thread 0x1a03 of process 50592 0x00007fffa9f72f4a in ?? ()
  4    Thread 0x1b03 of process 50592 0x00007fffa9f72c86 in ?? ()
  5    Thread 0x1c03 of process 50592 0x00007fffa9f72c86 in ?? ()
  6    Thread 0x1d03 of process 50592 0x00007fffa9f72c86 in ?? ()
  7    Thread 0x1e03 of process 50592 0x00007fffa9f72c86 in ?? ()
  8    Thread 0x1f03 of process 50592 0x00007fffa9f72c86 in ?? ()
  9    Thread 0x2003 of process 50592 0x00007fffa9f72c86 in ?? ()
  10   Thread 0x2103 of process 50592 0x00007fffa9f72f4a in ?? ()
  11   Thread 0x2203 of process 50592 0x00007fffa9f72c86 in ?? ()
  12   Thread 0x2303 of process 50592 0x00007fffa9f72c86 in ?? ()
  13   Thread 0x2403 of process 50592 0x00007fffa9f72f4a in ?? ()


(gdb) thread 1
[Switching to thread 1 (Thread 0x1803 of process 50592)]
#0  0x00007fffa9f72c86 in ?? ()
(gdb) where
#0  0x00007fffa9f72c86 in ?? ()
#1  0x00007fffaa05c96a in ?? ()
#2  0x0000000000000000 in ?? ()

(gdb) thread 2
[Switching to thread 2 (Thread 0x1903 of process 50592)]
#0  0x00007fffa9f74232 in ?? ()
(gdb) where
#0  0x00007fffa9f74232 in ?? ()
Backtrace stopped: Cannot access memory at address 0x700005a88ea8

(gdb) thread 3
[Switching to thread 3 (Thread 0x1a03 of process 50592)]
#0  0x00007fffa9f72f4a in ?? ()
(gdb) where
#0  0x00007fffa9f72f4a in ?? ()
Backtrace stopped: Cannot access memory at address 0x700005b8b408

(gdb) thread 4
[Switching to thread 4 (Thread 0x1b03 of process 50592)]
#0  0x00007fffa9f72c86 in ?? ()
(gdb) where
#0  0x00007fffa9f72c86 in ?? ()
Backtrace stopped: Cannot access memory at address 0x700005c8e6b8

(gdb) thread 5
[Switching to thread 5 (Thread 0x1c03 of process 50592)]
#0  0x00007fffa9f72c86 in ?? ()
(gdb) where
#0  0x00007fffa9f72c86 in ?? ()
Backtrace stopped: Cannot access memory at address 0x700005d90d38

(gdb) thread 6
[Switching to thread 6 (Thread 0x1d03 of process 50592)]
#0  0x00007fffa9f72c86 in ?? ()
(gdb) where
#0  0x00007fffa9f72c86 in ?? ()
Backtrace stopped: Cannot access memory at address 0x700005e93d38

(gdb) thread 7
[Switching to thread 7 (Thread 0x1e03 of process 50592)]
#0  0x00007fffa9f72c86 in ?? ()
(gdb) where
#0  0x00007fffa9f72c86 in ?? ()
Backtrace stopped: Cannot access memory at address 0x700005f96d38

(gdb) thread 8
[Switching to thread 8 (Thread 0x1f03 of process 50592)]
#0  0x00007fffa9f72c86 in ?? ()
(gdb) where
#0  0x00007fffa9f72c86 in ?? ()
Backtrace stopped: Cannot access memory at address 0x700006099d38

(gdb) thread 9
[Switching to thread 9 (Thread 0x2003 of process 50592)]
#0  0x00007fffa9f72c86 in ?? ()
(gdb) where
#0  0x00007fffa9f72c86 in ?? ()
Backtrace stopped: Cannot access memory at address 0x70000619cd38

(gdb) thread 10
[Switching to thread 10 (Thread 0x2103 of process 50592)]
#0  0x00007fffa9f72f4a in ?? ()
(gdb) where
#0  0x00007fffa9f72f4a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7000062a04e8

(gdb) thread 11
[Switching to thread 11 (Thread 0x2203 of process 50592)]
#0  0x00007fffa9f72c86 in ?? ()
(gdb) where
#0  0x00007fffa9f72c86 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7000063a36b8

(gdb) thread 12
[Switching to thread 12 (Thread 0x2303 of process 50592)]
#0  0x00007fffa9f72c86 in ?? ()
(gdb) where
#0  0x00007fffa9f72c86 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7000064a66b8```

@kubo
Copy link
Owner

kubo commented Feb 9, 2017

Thanks and sorry. It didn't provide usable information.
I hoped that it printed function names and valid backtraces as this comment. But all function names were '??' and backtrace traversal failed except thread 1. Moreover the number of stacks in thread 1 is too small. It is generally more than 10.

By the way, is there a firewall between Oracle sever and client? If so, the firewall may drop inactive connections. It may be solved by setting (ENABLE = BROKEN) and configuring TCP keepalive settings in the OS. If keepalive packets are sent within inactive connection timeout, the firewall doesn't drop the connection.
Well, I don't know how to configure it.

@mrship
Copy link
Author

mrship commented Feb 9, 2017

OK, I'll try and see if I can get more useful stack trace.

I'll also review the firewall setup as that is also a possible solution.

Thanks for support.

@kubo
Copy link
Owner

kubo commented Feb 9, 2017

As far as I checked, when (ENABLE = BROKEN) is set, Oracle client library issues the following system call just after a TCP/IP socket is created.

setsockopt(10, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0

According this linux manual page keep-alive packets are sent only when SO_KEEPALIVE is enabled. The default TCP keepalive time is 2 hours and it is configurable via procfs or sysctl.

@mrship
Copy link
Author

mrship commented Feb 13, 2017

I tried the enable=broken on Heroku, but in the end the only way to get this to work was to (eventually) remove the firewall restriction on the connection timeout. With that removed, I no longer get the hang after a period of idleness. I was not able to set tcp_keepalive on Heroku (due to fact it is a PAAS provider).

On AWS, we are now running the enable=broken version of OCI8 (via our fork) along with a reduced tcp_keepalive (down from 2 hours to 8 minutes). This appears to be working, but I will need to let it run for a few days to be more certain of a fix. If we have some success, then I'll look to PR a change to allow setting enable=broken via a config setting, if that's considered valuable?

@kubo
Copy link
Owner

kubo commented Feb 14, 2017

Edited: The first line of commands was fixed as @yahonda pointed.

Could you use the latest git revision and set new OCI8.properties parameters?

$ git clone https://github.com/kubo/ruby-oci8.git
$ cd ruby-oci8
$ gem build ruby-oci8.gemspec
$ gem install ./ruby-oci8-2.2.3.gem
OCI8.properties[:tcp_keepalive] = true
OCI8.properties[:tcp_keepalive_time] = 480
# OCI8.properties[:tcp_keepalive_intvl] = ...
# OCI8.properties[:tcp_keepalive_probes] = ...

When :tcp_keepalive is true, ruby-oci8 set (ENABLE=BROKEN) if you use easy connect naming. This would be what you proposed as PR. I usually wait your PR. However I hit on the idea of the three other parameters and I thought these were usable for you. So I did it by myself to make the idea open soon.

The other three parameters :tcp_keepalive_time, :tcp_keepalive_intvl and :tcp_keepalive_probes correspond to files with same names in /proc/sys/net/ipv4. They overwrite system-wide keepalive setting if they are set. They would be available on Heroku also.

Note: The three parameters are available only on Linux. :tcp_keepalive_time and :tcp_keepalive_intvl may become available on Windows later.

@yahonda
Copy link
Contributor

yahonda commented Feb 14, 2017

I'm interested in supporting (ENABLE=BROKEN). I think the first line would mean like this:

$ git clone https://github.com/kubo/ruby-oci8.git

@kubo
Copy link
Owner

kubo commented Feb 14, 2017

@yahonda Thanks! I fixed it.

@mrship
Copy link
Author

mrship commented Feb 15, 2017

@kubo That looks very interesting; I'll give that a try ASAP. Would you also consider supporting macOS too?

@kubo
Copy link
Owner

kubo commented Feb 15, 2017

macOS after Mountain Lion could be supported according to this archive. I'll try to support it this weekend.

Well, this feature is implemented with hackish way. When :tcp_keepalive_time, :tcp_keepalive_intvl or :tcp_keepalive_probes is set to OCI8.properties, Oracle client library's procedure linkage table is modified to intercept setsockopt system calls. When Oracle client library issues a system call which enables keepalive, ruby-oci8 sets per-connection keepalive parameters immediately after the system call.

@mrship
Copy link
Author

mrship commented Feb 15, 2017

Well, the good news is that this solves my firewall issues completely on Heroku! I've yet to test it, however, on the AWS setup but I'll let you know how it runs on there too.

Thanks for all your help in getting this far.

macOS support would be appreciated so we can run this in development mode too.

@mrship
Copy link
Author

mrship commented Feb 16, 2017

And, having soak-tested this overnight on AWS, it appears to be working really well there too.

Marvellous stuff!

@kubo
Copy link
Owner

kubo commented Feb 17, 2017

@mrship
Good news!
What parameters did you use? If you didn't use :tcp_keepalive_intvl and :tcp_keepalive_probes, I'll drop these two parameters because some platforms could not customize them.

Edited: Linux and macOS could customize them.

@mrship
Copy link
Author

mrship commented Feb 17, 2017

We went with this setup:

  if defined?(OCI8) && !(/darwin/ =~ RUBY_PLATFORM)
    OCI8.properties[:tcp_keepalive] = true
    OCI8.properties[:tcp_keepalive_time] = 480
  end

kubo added a commit that referenced this issue Feb 19, 2017
…ies.

Support :tcp_keepalive_time on macOS, Windows x86/x64 and Solaris x86_64 in addition to Linux i386/x86_64.
#159
@kubo
Copy link
Owner

kubo commented Feb 19, 2017

I pushed 0ffd0c8.
:tcp_keepalive_intvl and :tcp_keepalive_probes were deleted by the commit.
:tcp_keepalive_time is supported on the following platforms

  • Linux i386 and x86_64
  • macOS
  • Windows x86 and x64
  • Solaris x86_64.

IMO, it will be supported on the following platforms if plthook supports them. They seem to have the TCP_KEEPIDLE socket option as Linux has.

  • HP-UX IA64
  • HP-UX PA-RISC 64
  • Solaris SPARC

The next ruby-oci8 version including this feature will be released after rubyinstaller.org releases ruby 2.4.0 on Windows.

@yahonda
Copy link
Contributor

yahonda commented Feb 27, 2017

I'm interested in this feature and considering to set some default value :tcp_keepalive_time for Oracle enhanced adapter 1.8 to support Rails 5.1

It does not look like RubyInstaller 2.4.0 released soon oneclick/rubyinstaller#350 . Would you consider to release newer version of ruby-oci8 without waiting for RubyInstaller 2.4.0, or could you tell me the reason.

Thanks.

@kubo
Copy link
Owner

kubo commented Feb 28, 2017

Extension libraries made for ruby 2.x don't work with ruby 2.y. So I need to put pre-compiled binary files for each ruby version in a binary gem. For example, ruby-oci8-2.2.3-x64-mingw32.gem includes oci8lib_200.so, oci8lib_210.so, oci8lib_220.so, oci8lib_230.so and oci8lib_240.so for ruby 2.0, 2.1, 2.2, 2.3 and 2.4.
The oci8lib_240.so was made for ruby 2.4.0 compiled by myself. It will probably work with ruby 2.4.0 distributed by rubyinstaller.org. However I'd like to make it again to make sure after rubyinstaller 2.4.0 is out.

@yahonda
Copy link
Contributor

yahonda commented Feb 28, 2017

Thanks for the explanation. Hope RubyInstaller 2.4.0 will be available soon.

@kubo kubo closed this as completed Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants