Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDIT: This branch is the iteration target. Implementing a per-host circuit breaker state using shared memory and semaphores #54

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

kyewei
Copy link
Contributor

@kyewei kyewei commented Sep 28, 2015

@sirupsen @csfrancis @byroot for review, suggest refactoring if needed!

Implementation details:
Shared memory stores the space of int+int+double[], corresponding to @successes, @errors.count, and the @errors array itself (stored as doubles). The array has a max size of @errors.count since the Ruby code ensures the array never grows above that size.

  • Most of semaphore initialization, permissions, etc were borrowed from semian.c.
  • A regular array (as opposed to a linked list) was chosen because of the small array length and simplicity.
  • Shared memory is not explicitly marked for deletion from the C side since (1) it's on the order of ten's of bytes, (2) it won't increase if the same key/id is used to create the memory. This is the same(?) rationale as why in semian.c, semian_resource_free() doesn't clear the semaphores, instead leaving them to be cleared on reboot.

PR v2 (10/06/2015, up to hash 1f0960b...)

I did quite a lot of refactoring, so here's a detailed overview.

  • The final structure looks like this: Semian::SharedMemoryObject has two subclasses, Semian::AtomicInteger and Semian::SlidingWindow. I took suggestions to abstract the memory management further so that Semian::CircuitBreaker would only be using instances of the specific subclasses. I also added header files so code can be shared.
  • Semian::AtomicEnum subclasses Semian::AtomicInteger to provide shared memory for CircuitBreaker's @state. It runs like a traditional C-style enum type thats backed by integers. It's used in @state's :closed, :open, :half_open
  • SharedMemoryObject provides and manages the semaphore, shared memory, and the subclasses implement the use cases for the memory. Constructor calls look like this: SharedMemoryObject.new("name", [:int, :int, :long, :long], 0660) and specify the memory structure. The associated C structure has a reference to function of type void (*object_init_fn)(size_t, void *, void *, size_t, int) so a custom data initializer can be used for each type
  • I thought about using extend Forwardable for delegating and decorating but that would introduce some complications with binding C functions to objects (things like AtomicInteger#value would need an extra argument to access SharedMemoryObject functions)
  • I exposed a SharedMemoryObject#execute_atomically to allow custom actions such as those found in CircuitBreaker#push_time without providing #lock and #unlock explicitly
  • Dynamic resizing and initialization for SlidingWindow now works properly. The following scenarios resize:
    • Worker 1 uses memory of size=n; worker 2 is created, requesting memory of size!=n; both workers will use new size, with data copied over and reinitialized using object_init_fn(). This occurs when you restart workers with new configurations
    • #resize_to is called with new size on a worker, every worker will resize to new size
  • When initializing, memory is 0-initialized only if the worker is the only worker attached to it. Otherwise, memory resizing/copying is done. Use case: if one worker crashes, it can be restarted without losing data. However if all workers are restarted, it makes sense to reset the configuration to prevent some edge case from occurring.
  • Added tests for the shared memory. Looking for suggestions for more tests / or if there are better ways to do things than what I've done.
  • Fixed bugs/error locations

I'm looking for suggestions for how to structure the fallback ruby implementation so that both the fallback and the shared memory versions of AtomicInteger, SlidingWindow, etc can coexist. Right now C code hooks and replaces them if semaphores are enabled, i.e. AtomicInteger is either semaphore-supported or it isn't, and theres no way to toggle between them.

Also, if I missed anything, comment below.

.gitignore Outdated
/lib/semian/*.so
/lib/semian/*.bundle
/lib/semian*/*.so
/lib/semian*/*.bundle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't /lib/**/*.so work here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary if you move to a single .so.

@sirupsen sirupsen mentioned this pull request Sep 29, 2015
shared_cb_data *shm_address;
} semian_cb_data;

static int system_max_semaphore_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never used.

@sirupsen
Copy link
Contributor

This needs some tests around (at least):

  • Two workers in two different processes sharing the CB (and the different scenarios that follow, e.g. circuit break trips, etc)
  • Forcefully killing a worker that might have the semaphore releases it (fuzzy)
  • Resizing the queue (e.g. two workers starting with different window sizes)
  • Starting a worker when the circuit breaker has failures in the sliding window should not reset it

Other than that I'm assuming you're borrowing from the existing test suite?

@sirupsen
Copy link
Contributor

Shared memory is not explicitly marked for deletion from the C side

Agree. We also don't need to use IPC_RMID for the same reason.

if (TYPE(id) != T_SYMBOL && TYPE(id) != T_STRING)
rb_raise(rb_eTypeError, "id must be a symbol or string");
if (TYPE(size) != T_FIXNUM /*|| TYPE(size) != T_BIGNUM*/)
rb_raise(rb_eTypeError, "expected integer for arr_max_size");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name arr_max_size is the internal name, it won't make much sense for the Ruby caller.

…ared memory and semaphores

Shared memory stores the space of int+int+double[], corresponding to
successes, array length, and the errors array itself
Remove second Ruby extension
Provide fallback Ruby implementation of Sliding Window
Sliding window resizes if new size is given and Semian restarts
New worker added does not reset shared memory
Use integers in ms for time, not floats
… Rubocop, added correctness testing with FakeSysV classes, introduce Semian.extension_loaded
…ation (#destroy, #shared), use #run_test_with_x_classes
…ule, remove "Atomic" naming, change execute_atomically to synchronize
@kyewei kyewei force-pushed the circuit-breaker-per-host branch from 9957a65 to 4896c3c Compare November 12, 2015 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants