Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to reduce pagefault when available #430

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

hankluo6
Copy link
Contributor

@hankluo6 hankluo6 commented Jun 30, 2021

This PR add MAP_POPULATE flag in mi_unix_mmap and enable user to control it through the new option (MIMALLOC_PREFAULT).

The MAP_POPULATE will prefault the page tables and therefore can reduce some page fault in the runtime. This can improve some performance.

In my system, Linux x86_64 with i5-8300H CPU and 1 numa node, test with ./mimalloc-test-stress 32 100 50 in debug build,

Results:

  • without MAP_POPULATE causes around 2,943,560 page faults and 205 seconds
  • with MAP_POPULATE causes around 94,554 page faults and 203 seconds

I also test in mimalloc-bench. With eager_commit = 0, page-fault in most benchmark (cfrac, espresso, larsonN) are increase about 65000 and other big test (leanN, mstressN) will increase even more. And with eager_commit = 1, setting MAP_POPULATE can't see any significant effects and the page-fault is same like origin version.

I think this is because the option eager_commit was set to 0 will prefaults some unused memory overall. Therefore, this option can get user an ability to tune by themselves.

Btw, above page faults means main page faults and minor page faults.

@jserv
Copy link
Contributor

jserv commented Jul 1, 2021

In my system, Linux x86_64 with 6 CPU cores and 1 numa node, test with ./mimalloc-test-stress 32 100 50,

If you would like to share the benchmark results, you shall describe the hardware configurations. "6 CPU cores" is too rough. Instead, mention the microarchitecture at least.

@jserv
Copy link
Contributor

jserv commented Jul 1, 2021

Besides the reducing of page fault amounts, the elapsed time should be listed for each run.

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git commit message is not as informative as what "readme.md" was changed. You should improve the messages.

@hankluo6 hankluo6 changed the base branch from master to dev July 1, 2021 05:29
Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap the body of git commit messages at 72 characters as the article How to Write a Git Commit Message suggests.
You shall mention that it is specific to Linux only.

@jserv
Copy link
Contributor

jserv commented Jul 1, 2021

I also test in mimalloc-bench but I found that the page faults would become larger compared to initial version.

It would be great if the comprehensive experimental results of mimalloc-bench under different configurations can be shown along with the analysis.

This option instructs the kernel to synchronously load the entire mapped
region into active memory by specifying `MAP_POPULATE` in `mmap`. It will
cause read-ahead on that memory, and then the subsequent accesses to the
memory can proceed without page faults, improving some performance.
@daanx
Copy link
Collaborator

daanx commented Oct 19, 2021

Hi @hankluo6 ; thanks for your PR. At this moment I am hesitant though, as jserv remarks we need to measure more. It sounds great of course to reduce page-faults by pre-populating and it seems indeed that the (smallish) benchmarks get faster. But generally with large (real-world) workloads we usually try to avoid touching any pages that may not be needed after all. For example, for each block size mimalloc reservers mimalloc "pages" that are usually about 64k but it only touches at first the initial OS page (of 4k) -- in case just few objects are needed, all the other OS pages (60k) keep being just virtual address space without needing real physical memory. That can be a big fraction due to memory fragmentation. So, generally, I would say it is not a good idea to do this.

This is also why the eager_commit setting is there: in general you don't really want to do this; but indeed, for benchmarks it is always better to enable it (but I try to avoid optimizing for benchmarks as for real world workloads like browsers or long running servers, being good about memory fragmentation is much more important)

Anyways, I need some more thinking on this and better understand the impact. Best, Daan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants