Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple: Thread local caches to avoid atomic overhead of Chunk in small object arenas #11

Open
Techcable opened this issue Jun 3, 2020 · 3 comments
Labels
enhancement impl-simple The simple mark/sweep collector (our first one :D) performance Performance issues

Comments

@Techcable
Copy link
Member

Right now allocation requires atomic operations. We should use a thread-local buffer so this isn't required in the common-case.

This would be somewhat difficult to do since we can have multiple running instances. Would we have to use thread_local? How does the performance overhead of that compare to using atomics?

Maybe we could make SmallObjectCache a static variable shared between instances. However, if we do that we'd have to differentiate between allocations from different collectors (in for_each and the linked list). This could also result in worse performance if all the different collectors end up messing with each others stuff.

This was much easier when it was just a comment in a config file.......

@Techcable Techcable added enhancement performance Performance issues impl-simple The simple mark/sweep collector (our first one :D) labels Jun 3, 2020
@Techcable Techcable changed the title Simple: Thread local caches for small object arenas Simple: Thread local caches to avoid atomic overhead of Chunk in small object arenas Jun 3, 2020
@Techcable
Copy link
Member Author

Maybe we could just make the cache the cold path. I think that would happen naturally if we implement Lazy Sweeping (#7). Is it acceptable if we still use atomics when allocating from the free list? We already use a loop there.....

@playXE
Copy link

playXE commented Jun 8, 2020

Maybe do TLAB allocation? Some of Dora GCs and some JVM GCs use it. What Dora does is that it allocates 32KB memory for TLAB and if object fits into tlab (<8KB) then runtime allocates in tlab memory, this memory is not recycled but can be traced just fine.

@Techcable
Copy link
Member Author

TLAB is definitely an option I'm considering. I'm planning to look into this more deeply after I get multi threading support working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement impl-simple The simple mark/sweep collector (our first one :D) performance Performance issues
Projects
None yet
Development

No branches or pull requests

2 participants