-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] Statically known branches #15019
Conversation
44bbbb8
to
0e49dc7
Compare
CodSpeed Performance ReportMerging #15019 will degrade performances by 17.15%Comparing Summary
Benchmarks breakdown
|
c0e8ef7
to
b2a0fba
Compare
This comment was marked as resolved.
This comment was marked as resolved.
5bad462
to
79a1b6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really superb!! Very happy with how it turned out. (Modulo the arbitrary recursion limit, but I'm hopeful we'll be able to bump that up.)
Probably more review than you were looking for at this stage, sorry. Just not sure where to draw the line, once I got started :)
crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md
Outdated
Show resolved
Hide resolved
crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md
Show resolved
Hide resolved
aa60174
to
1f54069
Compare
dc3330b
to
35c5e10
Compare
let inference = infer_expression_types(db, test_expr); | ||
let scope = test_expr.scope(db); | ||
let ty = | ||
inference.expression_ty(test_expr.node_ref(db).scoped_expression_id(db, scope)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referencing a previous review comment by @MichaReiser: #14759 (comment)
This is probably still relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaReiser I discussed this with @carljm: Using node_ref
here should be fine, since the Expression
s always come from the same file from which we call VisibilityConstraints::evaluate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be, but the problem is that evaluate
isn't a salsa query, nor is the code calling into evaluate.
The problem is, to some extent, pre-existing because symbol_id
already depends on the UseDefMap
, that's why I think we can tackle it as a follow-up (but we definitely should):
The flow I'm concerned about is:
Type::member
callstypes::symbol
which is not a salsa query (nor isType::member
)symbol
callstypes::symbol_by_id
which resolves the use def map (so it's a pre-existing issue!)symbol_by_id
calls intodeclarations_ty
which callsevaluate
and depends on the AST
There are other examples for the same flow which, I think are all cross-module:
global_symbol
Class:own_member
We should add a comment if, for some reason, the constraint holds that symbol_by_id
is only called for the same file and never cross files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#15080 could address this and may also help with performance if it reduce (caches) the constraints that need solving?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like #15080 has been rolled directly into this PR. That seems like a robust and general fix. I guess when we have a tracked query that takes >1 argument (and one of them is not a Salsa ID), then effectively under the hood Salsa will end up creating an ingredient key for this (ScopeId, ScopedSymbolId)
pair?
crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work. I think @carljm will be by far the better reviewer on this overall, but I noticed a few things
crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md
Outdated
Show resolved
Hide resolved
Not a reviewer, just wanted to say that's the most beautiful tests I've ever seen 😍 |
crates/red_knot_python_semantic/src/semantic_index/use_def/symbol_state.rs
Show resolved
Hide resolved
let inference = infer_expression_types(db, test_expr); | ||
let scope = test_expr.scope(db); | ||
let ty = | ||
inference.expression_ty(test_expr.node_ref(db).scoped_expression_id(db, scope)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be, but the problem is that evaluate
isn't a salsa query, nor is the code calling into evaluate.
The problem is, to some extent, pre-existing because symbol_id
already depends on the UseDefMap
, that's why I think we can tackle it as a follow-up (but we definitely should):
The flow I'm concerned about is:
Type::member
callstypes::symbol
which is not a salsa query (nor isType::member
)symbol
callstypes::symbol_by_id
which resolves the use def map (so it's a pre-existing issue!)symbol_by_id
calls intodeclarations_ty
which callsevaluate
and depends on the AST
There are other examples for the same flow which, I think are all cross-module:
global_symbol
Class:own_member
We should add a comment if, for some reason, the constraint holds that symbol_by_id
is only called for the same file and never cross files.
55658f2
to
f0e9cce
Compare
I did the following benchmark to see how to set the recursion limit (where hyperfine \
--warmup=20 \
--shell=none \
--ignore-failure \
--export-json results.json \
-L limit 0,8,16,24,32,48,64,no_limit \
"./red_knot_{limit} --project path/to/black" See the results below. There are two things to notice: there is no significant change in performance when going from zero (i.e. no static visibility analysis at all!) all the way up to a limit of ~32. After that, there is a jump of the runtime by a factor of five. I'll set the limit to 24 for now. Zoomed in on the runtimes up to a limit of 32: |
crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md
Outdated
Show resolved
Hide resolved
//! use(x) | ||
//! ``` | ||
//! | ||
//! ### Explicit ambiguity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look at this section again. I'm not sure if that makes sense. Or if there is another way to solve this without explicit ambiguity constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense? On a statically-analyzable branch, we have to add test
visibility to the branch path and ~test
visibility to the not-branch path. VisibilityConstraint::Ambiguous
is just a shortcut to do the same thing in the case where there is no test
expression we plan to analyze for truthiness later. It's "the same thing" because ambiguous
and ~ambiguous
are indistinguishable; we can just add VisibilityConstraint::Ambiguous
to both paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good to me! Just a few nits, feel free to land after addressing.
//! infer `test` to be of type `Literal[True]` or `Literal[False]`, we can rule out one of the | ||
//! possible paths. To support this feature, we record a visibility constraint of `test` to all | ||
//! live bindings and declarations *after* visiting the body of the `if` statement. And we record | ||
//! a negative visibility constraint `~test` to all live bindings/declarations in the (implicit) | ||
//! `else` branch. For the example above, we would record the following visibility constraints | ||
//! (adding the implicit "unbound" definitions for clarity): | ||
//! ```py | ||
//! x = <unbound> # not live, shadowed by `x = 1` | ||
//! y = <unbound> # visibility constraint: ~test | ||
//! | ||
//! x = 1 # visibility constraint: ~test | ||
//! if test: | ||
//! x = 2 # visibility constraint: test | ||
//! y = "y" # visibility constraint: test | ||
//! ``` | ||
//! When we encounter a use of `x` after this `if` statement, we would record two live bindings: `x | ||
//! = 1` with a constraint of `~test`, and `x = 2` with a constraint of `test`. In type inference, | ||
//! when we iterate over all live bindings, we can evaluate these constraints to determine if a | ||
//! particular binding is actually visible. For example, if `test` is `Literal[True]`, we only see | ||
//! the `x = 2` binding. If `test` is `Literal[False]`, we only see the `x = 1` binding. And if the | ||
//! type of `test` is `bool`, we can see both bindings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good! One nit on the wording: unlike Rust, Python automatically converts an expression of any type to bool in a boolean context. So test
itself need not be of type Literal[False]
or Literal[True]
or bool
; it could be of any type. What we actually look at is the return type of the __bool__
method of test
: is it of type Literal[False]
or Literal[True]
or bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python automatically converts an expression of any type to bool in a boolean context
Right. I was wondering whether or not I should bring it up and decided to keep it simple, because I didn't want to talk too much about types in the use-def-map module. But it's better to be precise, rather than concise. Changed now.
pub(super) fn add_constraint(&mut self, constraint: Constraint<'db>) -> ScopedConstraintId { | ||
self.all_constraints.push(constraint) | ||
} | ||
|
||
pub(super) fn record_constraint_id(&mut self, constraint: ScopedConstraintId) { | ||
for state in &mut self.symbol_states { | ||
state.record_constraint(constraint); | ||
} | ||
} | ||
|
||
pub(super) fn record_constraint(&mut self, constraint: Constraint<'db>) -> ScopedConstraintId { | ||
let new_constraint_id = self.add_constraint(constraint); | ||
self.record_constraint_id(new_constraint_id); | ||
new_constraint_id | ||
} | ||
|
||
pub(super) fn add_visibility_constraint( | ||
&mut self, | ||
constraint: VisibilityConstraint<'db>, | ||
) -> ScopedVisibilityConstraintId { | ||
self.visibility_constraints.add(constraint) | ||
} | ||
|
||
pub(super) fn record_visibility_constraint_id( | ||
&mut self, | ||
constraint: ScopedVisibilityConstraintId, | ||
) { | ||
for state in &mut self.symbol_states { | ||
state.record_constraint(constraint_id); | ||
state.record_visibility_constraint(&mut self.visibility_constraints, constraint); | ||
} | ||
|
||
self.scope_start_visibility = self | ||
.visibility_constraints | ||
.add_and_constraint(self.scope_start_visibility, constraint); | ||
} | ||
|
||
pub(super) fn record_visibility_constraint( | ||
&mut self, | ||
constraint: VisibilityConstraint<'db>, | ||
) -> ScopedVisibilityConstraintId { | ||
let new_constraint_id = self.add_visibility_constraint(constraint); | ||
self.record_visibility_constraint_id(new_constraint_id); | ||
new_constraint_id | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, it would be nice if we could simplify these APIs and avoid exposing the use-def map's internal IndexVec IDs to the semantic index builder. But I realize that you didn't add these for no good reason, it's because you needed them to get the semantics right (it looks like in particular for the BoolOp case.) I don't think we should spend more time right now trying to see if we can simplify this, just maybe something to keep in mind in future if/when we revisit this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agreed. I will note it down as a TODO item for myself.
crates/red_knot_python_semantic/src/semantic_index/use_def/symbol_state.rs
Show resolved
Hide resolved
//! use(x) | ||
//! ``` | ||
//! | ||
//! ### Explicit ambiguity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense? On a statically-analyzable branch, we have to add test
visibility to the branch path and ~test
visibility to the not-branch path. VisibilityConstraint::Ambiguous
is just a shortcut to do the same thing in the case where there is no test
expression we plan to analyze for truthiness later. It's "the same thing" because ambiguous
and ~ambiguous
are indistinguishable; we can just add VisibilityConstraint::Ambiguous
to both paths.
b3434b2
to
95d079c
Compare
Congrats on landing this massive improvement! Enjoy your time off |
Summary
Rendered version of the test suite including a proper introduction to the topic / motivation: click.
This changeset adds support for precise type-inference and boundness-handling of definitions inside control-flow branches with statically-known conditions, i.e. test-expressions whose truthiness we can unambiguously infer as always false or always true. In code:
and:
Implementation
Visibility constraints, truthiness, negation
One option to implement this would have been to add special handling for a limited set of test-expressions in semantic index-building. We would then analyze expressions like
sys.version_info >= (x, y)
,typing.TYPE_CHECKING
,True
andFalse
without any type inference and consequently close down (or unconditionally open) branches whose truthiness we can analyze in this way. This would simplify the implementation, but is much less general than the approach taken here.Instead, we collect all necessary information during semantic index building, and then re-analyze the control flow during type-checking. The way this works is by recording so-called visibility constraints for each binding and declaration. Note that these constraints are in some sense similar to narrowing constraints, but also work quite differently and are applied at different points in control flow. Consider the following example first. Note how visibility constraints can apply to bindings outside of the if-statement:
The static truthiness of the
test
condition can either be always-false, ambiguous, or always-true. Similarly, if the visibility constraint of a binding evaluates to always-true/always-false, it will be either always visible or never visible. If the truthiness of the constraint is ambiguous, we need to consider both options (the binding could be visible or not). For the example above, this would result in the following type inference / boundness results for the uses ofx
andy
:test
truthiness~test
truthinessx
y
Literal[1]
Literal[1, 2]
Literal[2]
Sequential constraints
Next, let's consider a sequence of multiple control flow elements:
The binding
x = 2
is easy to analyze. Its visibility correponds to the truthiness oftest2
. For thex = 1
binding, things are a bit more interesting. It is always visible iftest1
is always-true ANDtest2
is always-false. It is never visible iftest1
is always-false ORtest2
is always-true. Note the asymmetry in the logical operators here. We introduce a new constraintKleeneAnd(a, b)
, which is always-true if botha
andb
are always-true, always-false if eithera
orb
are always-false, and ambiguous otherwise. Then we can formulate the constraint for thex = 1
binding asKleeneAnd(test1, ~test2)
.The
x = 0
binding can be handled similarly, with the difference that bothtest1
andtest2
are negated:Merged (or parallel) constraints
Finally, let's consider this example of "parallel" control flow (where we have omitted the test condition for the outer control flow element, as it would only complicate the discussion; instead of
if <…>
you can also consider any other control-flow splitting element where we have no static analysis of which branch is taken)At the usage of
x
, i.e. after the control flow has been merged again, the visibility of thex = 0
binding behaves as follows: the binding is always visible iftest1
is always-false ORtest2
is always-false; and it is never visible iftest1
is always-true ANDtest2
is always-true. Again, note the asymmetry. We introduce a new constraintKleeneOr(a, b)
which is always-true ifa
is always-true ORb
is always true; and always-false ifa
is always-false ANDb
is always-false. This allows us to annotate the bindings with the following constraints:Properties
We note that
KleeneAnd
andKleeneOr
have the property thatKleeneOr(~a, ~b) = ~KleeneAnd(a, b)
. This means we can, in principle, get rid of either of these two to simplify the representation.However, we already apply negative constraints
~test1
and~test2
to the "branches not taken" in the example above. This means that the tree-representationKleeneOr(~test1, ~test2)
is much cheaper/shallower than basically creating~KleeneAnd(~(~test1), ~(~test2))
. Similarly, if we wanted to get rid ofKleeneAnd
, we would also have to create additional nodes. So for performance reasons, there is a certain "duplication" in the code between those two constraint types.Other
This branch also includes:
sys.platform
supporttarget-version
requirements in some Markdown tests which were now required due to the understanding ofsys.version_info
branches.closes #12700
closes #15034
Performance
tomllib
, -7%, needs to resolve one additional module (sys)./red_knot_main --project /home/shark/tomllib
./red_knot_feature --project /home/shark/tomllib
black
, -6%./red_knot_main --project /home/shark/black
./red_knot_feature --project /home/shark/black
Test Plan
statically-known-branches.md
sys.platform
EllipsisType
,Never
, etc