-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-add a unify mode using -P #815
base: master
Are you sure you want to change the base?
Conversation
Thanks for the idea. I would suggest changing the title to something like "Re-add a unify mode using -P" so that it's easier to understand the intent. I'm OK with the idea in principle if the implementation can be made correct without too much added complexity. The devil's in the details as can be witnessed by many tests failing with the current POC.
That's not quite true. The preprocessed file should still be scanned for
Right. Note though that ccache supports compilers without 100% GCC-like behavior for dependency files, most notably EDG-based compilers (such as the Green Hills compiler, GHS). I think that the mode has to be opt in.
As noted in #812 (comment), I currently don't think it's terribly important to support |
I changed the title to you suggestion but I think it is misleading because without
Incbin is non-standard and I don't think a lot of projects are using it. The more compatible options would be something like However, for safety and distcc-pump I agree we may still need to parse the file, although I would add a sloppiness option to disable that scanning for projects that know they don't use incbin. I could try implementing detection of
OK, I don't know these compilers. What if gcc/clang was detected? |
What should the option enabling this be called? |
Even if it's not functionally equivalent to the old unify mode, I guess it still has to be called something (perhaps not in the PR but at least in the documentation) and I thought that "a unify mode" would be fluffy enough to hint that it's not "the unify mode". Better name suggestions are of course welcome.
I think that the Linux kernel project motivates the detection code just fine even though it's only one project (see #136).
If you convince all ccache users that they should use something else then we can for sure drop the detection. 😉
That would be perfectly fine.
Not sure I follow here. If you suggest depending on a potential future implementation in Clang I think I would need more sales pitches.
Right, and then the user deserves to get a false cache hit.
That's a good point. I would consider that a bug in the depend mode. For the record, I'm not convinced that I made the right choice to accept the depend mode into ccache since it adds complexity and is a bit hard to reason about, but as long as it's opt-in it doesn't do too much harm.
Of course.
Doesn't it need to be opt in anyway because of the issues with line numbers in compiler diagnostics, assert macros, etc being potentially off?
I don't have any better suggestion at the moment. |
66cdd92
to
5cd544b
Compare
The patch is ready for review. If preferred, I could extract parts of it into separate pull requests. To speed-up the review, please feel free to directly edit this PR. This assumes that support for the |
Thanks. I just wanted to say that I haven't forgotten about this PR, but I'm currently focusing on storage backends so I'll have to come back to this at some later time. |
OK, usefulness is limited before we get |
This patch adds the -fminimize-whitespace with the following effects: * If combined with -E, remove as much non-line-breaking whitespace as possible. * If combined with -E -P, removes as much whitespace as possible, including line-breaks. The motivation is to reduce the amount of insignificant changes in the preprocessed output with source files where only whitespace has been changed (add/remove comments, clang-format, etc.) which is in particular useful with ccache. A patch for ccache for using this flag has been proposed to ccache as well: ccache/ccache#815, which will use -fnormalize-whitespace when clang-13 has been detected, and additionally uses -P in "unify_mode". ccache already had a unify_mode in an older version which was removed because of problems that using the preprocessor itself does not have (such that the custom tokenizer did not recognize C++11 raw strings). This patch slightly reorganizes which part is responsible for adding newlines that are required for semantics. It is now either startNewLineIfNeeded() or MoveToLine() but never both; this avoids the ShouldUpdateCurrentLine workaround and avoids redundant lines being inserted in some cases. It also fixes a mandatory newline not inserted after a _Pragma("...") that is expanded into a #pragma. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D104601
The feature was accepted in clang from clang 14.0.0 (NOT the next release) as Note I rebase only to the commit before accd21e. That commit makes unittests read the user's configuration in |
Unify mode uses -P to suppress line directives in the preprocessed output. The dependency file generated by '-MD' is additionally used to track included files to be hashed for direct mode. Additionally, -fnormalize-whitespace is used if supported by the compiler (Clang 13 and later). The following configuration options are added: * unify_mode to enable/disabled unify mode (disabled by default) * preprocessor, preprocessor_check, preprocessor_type to use a different preprocessor to determine whether a recompilation is necessary * incbin sloppiness to skip scanning for incbin directives * unify_with_debug sloppiness to allow matching unified hashes when generating debug info * unify_with_output sloppiness to allow matching unified hashes even f the compiler emitted diagnostic messages
I rebased everything to the current master branch. I would appreciate a review. |
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This patch adds the -fminimize-whitespace with the following effects: * If combined with -E, remove as much non-line-breaking whitespace as possible. * If combined with -E -P, removes as much whitespace as possible, including line-breaks. The motivation is to reduce the amount of insignificant changes in the preprocessed output with source files where only whitespace has been changed (add/remove comments, clang-format, etc.) which is in particular useful with ccache. A patch for ccache for using this flag has been proposed to ccache as well: ccache/ccache#815, which will use -fnormalize-whitespace when clang-13 has been detected, and additionally uses -P in "unify_mode". ccache already had a unify_mode in an older version which was removed because of problems that using the preprocessor itself does not have (such that the custom tokenizer did not recognize C++11 raw strings). This patch slightly reorganizes which part is responsible for adding newlines that are required for semantics. It is now either startNewLineIfNeeded() or MoveToLine() but never both; this avoids the ShouldUpdateCurrentLine workaround and avoids redundant lines being inserted in some cases. It also fixes a mandatory newline not inserted after a _Pragma("...") that is expanded into a #pragma. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D104601
Clang starting at version 14 supports the *-fminimize-whitespace* | ||
option. The switch removes as much whitespace between tokens as | ||
possible which reduces the chance of hashes of preprocessed | ||
output being different only because of formatting changes. It is most | ||
useful in <<the_unify_mode,unify mode>> where the *-P* switch | ||
also removes newlines as much as possible, otherwise it only tries | ||
to remove spaces between tokens on the same line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether the compiler supports -fminimize-whitespace
feels way too specific to encode in this way. Let's say that GCC thinks that -fminimize-whitespace
is awesome and adds support for it in version 13. Should we then add gcc_minimize_whitespace
as well? And what happens if we want some other feature that is available in Clang 15 and GCC 12?
Another issue is that guessing -fminimize-whitespace
support based on the compiler name won't work when the user uses clang
instead of clang-14
.
Through the years I have thought several times about doing compiler probing (either feature-based or based on $COMPILER --version
) instead of guessing from the compiler name, but kept not doing so to keep things simple. Maybe this feature is the straw that breaks the camel's back and we should implement compiler probing instead. I created #958 for this.
When we first discussed this PR, I thought that the implementation for -fminimize-whitespace
would be something like "add an option for enabling -fminimize-whitespace" and "if the option is enabled, add -fminimize-whitespace unconditionally". Adding it by default based on a compiler guess and adding support for using a custom preprocessor, etc., to me feels like too large a maintenance burden for little gain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compiler_type
seemed to be ccache's mechanism to identify a the set of flags supported by a compiler, -fminimize-whitespace
would be one flavor of clang and therefore the most straightforward way to add another supported command line set. If gcc/others start supporting the flag, it could be made a bitflag, but at this point we cannot assume that gcc would also call there command line flag -fminimize-whitespace
, but maybe -fminify
in which case we'd indeed need a gcc_minify
. What gcc will do is speculation at this point. There could be a more general mechanism to handle different versions of a compiler, but that's a lot more than needed for -fminimize-whitespace
to work.
sccache compiles a small program to determine how the compiler identifies itself, including its version. The ccache way to do this seemed to be to follow the symlink. Clang's bin/lang
and bin/clang++
are symlinks to bin/clang-${version}
, i.e. sufficient for our purpose. Unfortunately I only recently discovered that at least Ubuntu changed this in their distribution and does this the other way around: clang-14
is a symlink to /usr/lib/llvm-14/bin/clang
, so this patch at least needs to be extended to detect that pattern as well.
The best solution that also works with any compiler derived from clang would be a test invocation of $cc -P -fminimize-whitespace
to see whether specifically this flag is support, but I assumed we'd not want to pay the cost of an additional program execution every time and I am not knowledgeable enough to add caching for the result.
When using unify mode there is no reason to no use -fminimize-whitespace
, it is what the original unify mode did. Adding a configuration for enable it would make the ccache configuration more complex and most users would miss out on its advantages if had to be enabled explicitly.
doc/MANUAL.adoc
Outdated
Neither <<_the_direct_mode,direct mode>> nor | ||
<<the_depend_mode,depend mode>> take *.incbin* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true. The direct mode does not have problems with .incbin
since the preprocessor is run on a cache miss and .incbin
usage is detected then. It's true for the depend mode, though. However, I think that it would be better to document that limitation in the The depend mode chapter instead of here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the file gets into the cache via depend mode, the manifest will not include the .incbin
file and therefore get a direct hit in direct mode (whether depend mode is still enabled is irrelevant at this point) even though the .incbin
file may have changed. Therefore I think wording is correct, direct mode does not look for .incbin
instruction, it just hashes the files in the manifest.
This might be an issue with calling them "modes".
doc/MANUAL.adoc
Outdated
debug info rather useless. | ||
|
||
[#config_sloppy_unify_with_output] | ||
*unify_with_output*:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would unify_with_diagnostics
be an improved name?
To me, the output of the compiler is the object file, dependency file, etc., plus warning/error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. With the caveat that not all output is necessarily diagnostics.
if (ctx.config.unify_mode() && ctx.args_info.seen_MMD | ||
&& !ctx.config.sloppiness().is_enabled(core::Sloppy::system_headers)) { | ||
LOG_RAW( | ||
"Disabling unify mode: -MMD output requested, but -MD output needed to " | ||
"track system headers. Either enable the system_header sloppiness option " | ||
"or use run_second_cpp"); | ||
ctx.config.set_unify_mode(false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should consider starting to parse the output of -H
instead of retrieving header files from the preprocessor output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-H
intermixes diagnostics with included files, or we need another compiler invocation. The logic -MD
/-MMD
for handling depfiles already exists, adding support for -H
would be new feature unrelated to unify mode.
src/third_party/nonstd/optional.hpp
Outdated
@@ -1198,6 +1198,7 @@ class optional | |||
) | |||
operator=( optional && other ) noexcept | |||
{ | |||
reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I found that the previous state of an optional was leaking after assignment. I don't remember the details, I will remove it until I rediscover the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the following waring by gcc:
/home/runner/work/ccache/ccache/src/third_party/nonstd/optional.hpp: In function ‘nonstd::expected_lite::expected<Digest, {anonymous}::Failure> to_cache(Context&, Args&, nonstd::optional_lite::optional<Digest>, const Args&, Hash*)’:
/home/runner/work/ccache/ccache/src/third_party/nonstd/optional.hpp:1203:78: error: ‘*((void*)& __tmp +28)’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
else if ( (has_value() == true ) && (other.has_value() == true ) ) { contained.value() = std::move( *other ); }
if (ctx.config.compiler_type() == CompilerType::clang_minimize_whitespace | ||
&& language_supports_unify(ctx.args_info.actual_language)) { | ||
args.push_back("-fminimize-whitespace"); | ||
args_added += 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that unconditionally (when available) enabling -fminimize-whitespace
never has any negative side effects?
I haven't checked how -fminimize-whitespace
works, but based on your description I guess it normalizes e.g.
int f()
{
}
to
int f()
{
}
? If so and run_second_cpp
is false, can the compiler diagnostics become inaccurate since they may contain column information? E.g. like this:
% clang++ -c test.cpp
test.cpp:3:3: warning: non-void function does not return a value [-Wreturn-type]
}
^
1 warning generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Column information is already inaccurate even without -fminimize-whitespace
as detailed in the docs and https://github.com/ccache/ccache/pull/815/files#r741491621. What changes with -fminimize-whitespace
:
- Clang tries to preserve indentation of the first token. That is, your example would indeed show a different column, but only because it is the first token on line 3. Note that tab only counts as one column.
- Clang emits a single space between two tokens on the same line if there was a space before the second token. That is, it will emit only a single space even if there were multiple spaces (or tabs) before. With
-fminimize-whitespace
, a space is only added if required to not merge two tokens into one.
The status is still that I haven't forgotten about this PR, but I consider it low priority (sorry) and I don't have time to engage with the topic. |
This patch adds the -fminimize-whitespace with the following effects: * If combined with -E, remove as much non-line-breaking whitespace as possible. * If combined with -E -P, removes as much whitespace as possible, including line-breaks. The motivation is to reduce the amount of insignificant changes in the preprocessed output with source files where only whitespace has been changed (add/remove comments, clang-format, etc.) which is in particular useful with ccache. A patch for ccache for using this flag has been proposed to ccache as well: ccache/ccache#815, which will use -fnormalize-whitespace when clang-13 has been detected, and additionally uses -P in "unify_mode". ccache already had a unify_mode in an older version which was removed because of problems that using the preprocessor itself does not have (such that the custom tokenizer did not recognize C++11 raw strings). This patch slightly reorganizes which part is responsible for adding newlines that are required for semantics. It is now either startNewLineIfNeeded() or MoveToLine() but never both; this avoids the ShouldUpdateCurrentLine workaround and avoids redundant lines being inserted in some cases. It also fixes a mandatory newline not inserted after a _Pragma("...") that is expanded into a #pragma. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D104601
This patch adds the -fminimize-whitespace with the following effects: * If combined with -E, remove as much non-line-breaking whitespace as possible. * If combined with -E -P, removes as much whitespace as possible, including line-breaks. The motivation is to reduce the amount of insignificant changes in the preprocessed output with source files where only whitespace has been changed (add/remove comments, clang-format, etc.) which is in particular useful with ccache. A patch for ccache for using this flag has been proposed to ccache as well: ccache/ccache#815, which will use -fnormalize-whitespace when clang-13 has been detected, and additionally uses -P in "unify_mode". ccache already had a unify_mode in an older version which was removed because of problems that using the preprocessor itself does not have (such that the custom tokenizer did not recognize C++11 raw strings). This patch slightly reorganizes which part is responsible for adding newlines that are required for semantics. It is now either startNewLineIfNeeded() or MoveToLine() but never both; this avoids the ShouldUpdateCurrentLine workaround and avoids redundant lines being inserted in some cases. It also fixes a mandatory newline not inserted after a _Pragma("...") that is expanded into a #pragma. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D104601 llvm-monorepo: ae6b40000238e5faaaa319ffcfc713a15e459be8
This is a proof-of-concept of using the
-P
preprocessor flag in cpp2 mode (i.e. only use preprocessor output for has lookup) as outlined in #812 (comment). Since ccache uses line directives inserted by the preprocessor to determine which files are to be hashed in direct mode, we have to get this information another way. It is done by instructing the preprocessor to emit a.d
file and which is parsed the same way as depend mode does.The motivation of this patch is to increase the chances of a cache hit after only whitespace changes. In particular, removing the line directives may ignore added/removed empty or comment comment lines from the source input. It was possible to ignore even more whitespace changes with
CCACHE_UNIFY
option, that unfortunately was removed in ccache 3.7.7 (0774bfe).What kinds of whitespace changes can be ignored depends on the compiler: gcc's
-E -P
mode removes all empty lines, but keeps indention and only normalizes horizontal whitespace to at most a single space. clang's-E -P
mode even only discards empty lines if there are more than 8 of them. However, I implemented a-fnormalize-whitespace
mode in clang (https://github.com/meinersbur/llvm-project/tree/normalize-whitespace) that basically does what ccache's unify did, ignoring almost all formatting changes. If it finds use I would upstream this to llvm project.This approach holds several advantages:
.i
/.ii
temporary file, less to hash-P
being passed as an argument (Improve statistics for "-P -E" #812)Most problems of
CCACHE_UNIFY
do not apply, only the following disadvantages remain:__LINE__
macros (e.g. in anassert
) will cause a miss if the number of lines before changed.The last is probably what you want, but may reduce the usefulness of this mode. One might want to use an
assert
that does not print the line. The first two can be mitigated by not using this mode when-g
is pass and not caching if there is any diagnostic output. Direct mode hits are still possible. A sloppiness option could still enable-P
-mode nonetheless. Even though-fnormalize-whitespace
is only supported with clang, the mode could still by uses by pipeing throughclang -fpreprocessed -fnormalize-whitespace
or by linking libclang into ccache.This patch is a draft, I only tested it with my workflow. Missing changes in the documentation, statistics, configuration and different compilers and modes (such as invoking only the preprocessor without compilation) may not work. If the project is interested in merging such a patch, I'd appreciate hints on what still needs to be done or if the maintainers take over from here. It helps me being less hesitant to clang-format or edit comments in central header files.