Evaluate using Profile-Guided Optimization (PGO) and Post Link Optimization (PLO) for ast-grep #738
Replies: 2 comments 1 reply
-
O!M!G! Thank @zamazan4ik for your heroic adventure to explore PGO! I'm amazed at your detailed work 🥇 I'm not available now but I will definitely look at the post later! Thanks and Best Wishes Herrington |
Beta Was this translation helpful? Give feedback.
-
Sorry for the late reply! I am not too familiar with advanced techniques like PGO/PLO. My intuition is that PGO uses some code examples to build a better machine code layout? Therefore, the training input code should be as typical as possible to benefit most use cases. Currently, the cargo bench code is some randomly chosen code: they are too arbitrary to represent common use cases. Regardless of my poor benchmark cases, your analysis is really deep and insightful! Let me learn more about PGO! |
Beta Was this translation helpful? Give feedback.
-
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are available here. According to the tests, PGO can help with achieving better performance in many cases for many applications. Since this, I think trying to optimize ast-grep with PGO can be a good idea.
I already did some benchmarks and want to share my results here.
Test environment
tree-sitter
build),CFLAGS
are-O3
main
branch on commit76d845162185bed4a5b9a22de456f26e976af0a6
Benchmark
For the benchmark purposes, I used two scenarios:
All PGO and PLO optimizations are done with cargo-pgo.
For Release built-in benchmarks were tested with
cargo bench -p benches
. PGO instrumentation phase is done withcargo pgo bench -- -p benches
, PGO optimized benches are done withcargo pgo optimize bench -- -p benches
.The handcrafted scenario is scanning for a simple Python rule in some Python project. The command to test is
taskset -c 0 sg scan -r python.yml feast/
(taskset -c 0
is used for reducing the OS scheduler noise).feast
directory contains https://github.com/feast-dev/feast repo (themaster
branch on052182bcca046e35456674fc7d524825882f4b35
commit). The PGO training phase is done on other project - PyPy (https://github.com/mozillazg/pypy,master
branch,5306d9822d91412b224f529ae1aec485bf93dc86
commit) with the same Python rule. Release build is done withcargo build --release
, PGO instrumented build is done withcargo pgo build
+CFLAGS="-O3 -fprofile-generate=tr_%m_%p.profraw"
, PGO optimized build is done withcargo pgo optimize build
+CFLAGS="-O3 -fprofile-use=tr.profdata"
(profdata
generated byllvm-profdata merge
fromprofraw
file from the instrumentation phase). I used this trick since I wanted to optimize the C dependency with PGO too sincecargo-pgo
does not support this scenario out of the box.Python rule in
python.yml
(just a copy and paste from the official website):All tests are done on the same machine, done multiple times (with
hyperfine
), with the same background "noise" (as much as I can guarantee of course).Results
Let's begin with the built-in benchmarks:
Results for the handcrafted scenario on running ast-grep scan on the Feast project (in
hyperfine
format):where:
sg_release_clang
- Release buildsg_optimized_with_tree
- Release build + PGO optimizationsg_optimized_with_tree_bolt_optimized
- Release build + PGO optimization + PLO optimization (via LLVM BOLT)For reference, I post performance results in the instrumentation phases.
Release build:
PGO instrumented run:
LLVM BOLT instrumented run:
According to the tests above, I see measurable improvements from PGO.
Further steps
I can suggest the following action points:
Here are some examples of how PGO optimization is integrated in other projects:
configure
scriptI have some examples of how PGO information looks in the documentation:
Regarding LLVM BOLT integration, I have the following examples:
Beta Was this translation helpful? Give feedback.
All reactions