Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Major GC's to be disabled #10598

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

eightbitraptor
Copy link
Contributor

Background

Ruby's GC running during Rails requests can have negative impacts on currently running requests, causing applications to have high tail-latency.

A technique to mitigate this high tail-latency is Out-of-band GC (OOBGC). This is basically where the application is run with GC disabled, and then GC is explicitly started after each request, or when no requests are in progress.

This can reduce the tail latency, but also introduces problems of its own. Long GC pauses after each request reduce throughput. This is more pronounced on threading servers like Puma because all the threads have to finish processing user requests and be "paused" before OOBGC can be triggered.

This throughput decrease happens for a couple of reasons:

  1. There are few heuristics available for users to determine when GC should run, this means that in OOBGC scenarios, it's possible that major GC's are being run more than necessary.
  2. The lack of any GC during a request means that lots of garbage objects have been created and not cleaned up, so the process is using more memory than it should - requiring major GC's run as part of OOBGC to do more work and therefore take more time.

This ticket attempts to address these issues by:

  1. Provide GC.disable_major and its antonym GC.enable_major to disable and enable only major GC
  2. Provide GC.needs_major? as a basic heuristic allowing users to tell when Ruby should run a Major GC.

These ideas were originally proposed by @ko1 and @byroot in this rails issue

Disabling GC major's would still allow minor GC's to run during the request, avoiding the ballooning memory usage caused by not running GC at all, and reducing the time that a major takes when we do run it, because the nursery objects have been cleaned up during the request already so there is less work for a major GC to do.

This can be used in combination with GC.needs_major? to selectively run an OOBGC only when necessary

Implementation

This PR adds 3 new methods to the GC module

  • GC.disable_major
    This prevents major GC's from running automatically. It does not restrict minors. When objspace->rgengc.need_major_gc is set and a GC is run, instead of running a major, new heap pages will be allocated and a minor run instead. objspace->rgengc.need_major_gc will remain set until a major is manually run. If a major is not manually run then the process will eventually run out of memory.

    When major GC's are disabled, object promotion is disabled. That is, no objects will increment their ages during a minor GC. This is to attempt to minimise heap growth during the period between major GC's, by restricting the number of old-gen objects that will remain unconsidered by the GC until the next major.

    When GC.start is run, then major GC's will be enabled, a GC triggered with the options passed to GC.start, and then disable_major will be set to the state it was in before GC.start was called.

  • GC.enable_major
    This simply unsets the bit preventing major GC's. This will revert the GC to normal generational behaviour. Everything behaves as default again.

  • GC.needs_major?
    This exposes the value of objspace->rgengc.need_major_gc to the user level API. This is already exposed in GC.latest_gc_info[:need_major_by] but I felt that a simpler interface would make this easier to use and result in more readable code. eg.

out_of_band do
  GC.start if GC.needs_major?
end

Because object aging is disabled when majors are disabled it is recommended to use this in conjunction with Process.warmup, which will prepare the heap by running a major GC, compacting the heap, and promoting every remaining object to old-gen. This ensures that minor GC's are running over the smallets possible set of young objects when GC.disable_major is true.

Benchmarks

We ran some tests in production on Shopify's core monolith over a weekend and found that:

Mean time spent in GC, as well as p99.9 and p99.99 GC times are all improved.
Screenshot 2024-04-22 at 16 41 49

p99 GC time is slightly higher.
Screenshot 2024-04-22 at 16 44 55

We're running far fewer OOBGC major GC's now that we have GC.needs_major? than we were before, and we believe that this is contributing to a slightly increased number of minor GC's. raising the p99 slightly.

App response times are all improved

We see a 9% reduction in average and p99 response times when compared against standard GC (4% p99.9 and p99.99).

Screenshot 2024-04-22 at 16 55 53

This drops slightly to an 8% reduction in average and p99 response times when compared against standard OOBGC (3.59 p99.9 and 4% p99.99)

Screenshot 2024-04-22 at 16 56 10

This comment has been minimized.

@eightbitraptor eightbitraptor force-pushed the mvh-disable-major branch 5 times, most recently from 4b8994d to 3c4808d Compare April 24, 2024 13:31
This feature configures Ruby's GC to only run minor GC's. It's designed to give
users relying on Out of Band GC complete control over when a major GC is
run. Running with `disable_major` does two main things.

* Never runs a Major GC. When the heap runs out of space during a minor and when
  a major would traditionally be run, instead we allocate more heap pages, and
  mark objspace as needing a major GC.
* Don't increment object ages. We don't promote objects during GC, this will
  cause every object to be scanned on every minor. This is an intentional
  trade-off between minor GC's doing more work every time, and potentially
  promoting objects that will then never be GC'd.

The intention behind not aging objects is that users of this feature should use
a preforking web server, or some other method of pre-warming the oldgen (like
Nakayoshi fork)before disabling Majors. That way most objects that are going to
be old will have already been promoted.
This patch exposes objspace->rgengc.needs_major_gc to the Ruby GC module. It is
intended to be used in conjunction with GC.disable_major to provide a heuristic
about when a major GC should be run.

eg.

```
before_fork do
  # Warm the oldgen and disable Major GC's
  4.times { GC.start }
  GC.disable_major
end
```

```
GC.start(full_mark: true) if GC.needs_major?
```
Previously we would have `GC.enable` also re-enable major, but this is
problematic because of the macro `DURING_GC_COULD_MALLOC_REGION_START`,
which disables and re-enables GC during regions that could use malloc.

This has the side effect of re-enabling majors if they had been
previously disabled, and then not disabling them again afterwards.

Instead of changing the behaviour of these macros to detect the state of
`dont_major_val` I have opted to introduce an `enable_major` method,
alongside `disable_major` to mirror `GC.enable` and `GC.disable`.
@eightbitraptor eightbitraptor marked this pull request as draft April 24, 2024 14:28
@eightbitraptor eightbitraptor marked this pull request as ready for review April 24, 2024 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant