[red-knot] Statically known branches #15019

sharkdp · 2024-12-16T12:33:12Z

Summary

Rendered version of the test suite including a proper introduction to the topic / motivation: click.

This changeset adds support for precise type-inference and boundness-handling of definitions inside control-flow branches with statically-known conditions, i.e. test-expressions whose truthiness we can unambiguously infer as always false or always true. In code:

x = 1

if "z" in "haystack":  # Literal[False]
    x = 2

reveal_type(x)  # revealed: Literal[1]

and:

x = 1

if "y" in "haystack":  # Literal[True]
    x = 2

reveal_type(x)  # revealed: Literal[2]

Implementation

Visibility constraints, truthiness, negation

One option to implement this would have been to add special handling for a limited set of test-expressions in semantic index-building. We would then analyze expressions like sys.version_info >= (x, y), typing.TYPE_CHECKING, True and False without any type inference and consequently close down (or unconditionally open) branches whose truthiness we can analyze in this way. This would simplify the implementation, but is much less general than the approach taken here.

Instead, we collect all necessary information during semantic index building, and then re-analyze the control flow during type-checking. The way this works is by recording so-called visibility constraints for each binding and declaration. Note that these constraints are in some sense similar to narrowing constraints, but also work quite differently and are applied at different points in control flow. Consider the following example first. Note how visibility constraints can apply to bindings outside of the if-statement:

x = 1  # visibility constraint: ~test
if test:
    x = 2  # visibility constraint: test

    y = 2  # visibility constraint: test

use(x)
use(y)

The static truthiness of the test condition can either be always-false, ambiguous, or always-true. Similarly, if the visibility constraint of a binding evaluates to always-true/always-false, it will be either always visible or never visible. If the truthiness of the constraint is ambiguous, we need to consider both options (the binding could be visible or not). For the example above, this would result in the following type inference / boundness results for the uses of x and y:

`test` truthiness	`~test` truthiness	type of `x`	boundness of `y`
always false	always true	`Literal[1]`	unbound
ambigous	ambigous	`Literal[1, 2]`	possibly unbound
always true	always false	`Literal[2]`	bound

Sequential constraints

Next, let's consider a sequence of multiple control flow elements:

x = 0

if test1:  
    x = 1

if test2:
    x = 2

The binding x = 2 is easy to analyze. Its visibility correponds to the truthiness of test2. For the x = 1 binding, things are a bit more interesting. It is always visible if test1 is always-true AND test2 is always-false. It is never visible if test1 is always-false OR test2 is always-true. Note the asymmetry in the logical operators here. We introduce a new constraint KleeneAnd(a, b), which is always-true if both a and b are always-true, always-false if either a or b are always-false, and ambiguous otherwise. Then we can formulate the constraint for the x = 1 binding as KleeneAnd(test1, ~test2).

The x = 0 binding can be handled similarly, with the difference that both test1 and test2 are negated:

x = 0  # KleeneAnd(~test1, ~test2)

if test1:  
    x = 1  # KleeneAnd(test1, ~test2)

if test2:
    x = 2  # test2

Merged (or parallel) constraints

Finally, let's consider this example of "parallel" control flow (where we have omitted the test condition for the outer control flow element, as it would only complicate the discussion; instead of if <…> you can also consider any other control-flow splitting element where we have no static analysis of which branch is taken)

x = 0

if <…>:
    if test1:
        x = 1
else:
    if test2:
        x = 2

use(x)

At the usage of x, i.e. after the control flow has been merged again, the visibility of the x = 0 binding behaves as follows: the binding is always visible if test1 is always-false OR test2 is always-false; and it is never visible if test1 is always-true AND test2 is always-true. Again, note the asymmetry. We introduce a new constraint KleeneOr(a, b) which is always-true if a is always-true OR b is always true; and always-false if a is always-false AND b is always-false. This allows us to annotate the bindings with the following constraints:

x = 0  # KleeneOr(~test1, ~test2)

if <…>:
    if test1:
        x = 1  # test1
else:
    if test2:
        x = 2  # test2

use(x)

Properties

We note that KleeneAnd and KleeneOr have the property that KleeneOr(~a, ~b) = ~KleeneAnd(a, b). This means we can, in principle, get rid of either of these two to simplify the representation.

However, we already apply negative constraints ~test1 and ~test2 to the "branches not taken" in the example above. This means that the tree-representation KleeneOr(~test1, ~test2) is much cheaper/shallower than basically creating ~KleeneAnd(~(~test1), ~(~test2)). Similarly, if we wanted to get rid of KleeneAnd, we would also have to create additional nodes. So for performance reasons, there is a certain "duplication" in the code between those two constraint types.

Other

This branch also includes:

sys.platform support
statically-known branches handling for Boolean expressions and while loops
new target-version requirements in some Markdown tests which were now required due to the understanding of sys.version_info branches.

closes #12700
closes #15034

Performance

`tomllib`, -7%, needs to resolve one additional module (sys)

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./red_knot_main --project /home/shark/tomllib`	22.2 ± 1.3	19.1	25.6	1.00
`./red_knot_feature --project /home/shark/tomllib`	23.8 ± 1.6	20.8	28.6	1.07 ± 0.09

`black`, -6%

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./red_knot_main --project /home/shark/black`	129.3 ± 5.1	119.0	137.8	1.00
`./red_knot_feature --project /home/shark/black`	136.5 ± 6.8	123.8	147.5	1.06 ± 0.07

Test Plan

New Markdown tests for the main feature in statically-known-branches.md
New Markdown tests for sys.platform
Adapted tests for EllipsisType, Never, etc

codspeed-hq · 2024-12-17T09:39:17Z

CodSpeed Performance Report

Merging #15019 will degrade performances by 17.15%

_{Comparing david/statically-known-branches-2 (95d079c) with main (d3f51cf)}

Summary

❌ 1 (👁 1) regressions
✅ 31 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`david/statically-known-branches-2`	Change
👁	`red_knot_check_file[cold]`	69.5 ms	83.9 ms	-17.15%

carljm

This is really superb!! Very happy with how it turned out. (Modulo the arbitrary recursion limit, but I'm hopeful we'll be able to bump that up.)

Probably more review than you were looking for at this stage, sorry. Just not sure where to draw the line, once I got started :)

crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md

crates/red_knot_python_semantic/src/semantic_index/builder.rs

crates/red_knot_python_semantic/src/types.rs

crates/red_knot_python_semantic/src/semantic_index/builder.rs

sharkdp · 2024-12-19T16:27:50Z

crates/red_knot_python_semantic/src/visibility_constraints.rs

+                let inference = infer_expression_types(db, test_expr);
+                let scope = test_expr.scope(db);
+                let ty =
+                    inference.expression_ty(test_expr.node_ref(db).scoped_expression_id(db, scope));


Referencing a previous review comment by @MichaReiser: #14759 (comment)

This is probably still relevant here.

@MichaReiser I discussed this with @carljm: Using node_ref here should be fine, since the Expressions always come from the same file from which we call VisibilityConstraints::evaluate.

That might be, but the problem is that evaluate isn't a salsa query, nor is the code calling into evaluate. The problem is, to some extent, pre-existing because symbol_id already depends on the UseDefMap, that's why I think we can tackle it as a follow-up (but we definitely should):

The flow I'm concerned about is:

Type::member calls types::symbol which is not a salsa query (nor is Type::member)

symbol calls types::symbol_by_id which resolves the use def map (so it's a pre-existing issue!)

symbol_by_id calls into declarations_ty which calls evaluate and depends on the AST

There are other examples for the same flow which, I think are all cross-module:

global_symbol

Class:own_member

We should add a comment if, for some reason, the constraint holds that symbol_by_id is only called for the same file and never cross files.

#15080 could address this and may also help with performance if it reduce (caches) the constraints that need solving?

Looks like #15080 has been rolled directly into this PR. That seems like a robust and general fix. I guess when we have a tracked query that takes >1 argument (and one of them is not a Salsa ID), then effectively under the hood Salsa will end up creating an ingredient key for this (ScopeId, ScopedSymbolId) pair?

crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md

AlexWaygood

Fantastic work. I think @carljm will be by far the better reviewer on this overall, but I noticed a few things

crates/red_knot_python_semantic/src/python_platform.rs

crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

crates/red_knot_python_semantic/src/types.rs

crates/red_knot_python_semantic/src/types/infer.rs

crates/red_knot_python_semantic/src/visibility_constraints.rs

TomerBin · 2024-12-19T18:23:47Z

Not a reviewer, just wanted to say that's the most beautiful tests I've ever seen 😍

crates/red_knot_python_semantic/src/semantic_index.rs

crates/red_knot_python_semantic/src/semantic_index/builder.rs

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

crates/red_knot_python_semantic/src/semantic_index/use_def/symbol_state.rs

MichaReiser · 2024-12-20T08:42:18Z

crates/red_knot_python_semantic/src/visibility_constraints.rs

+                let inference = infer_expression_types(db, test_expr);
+                let scope = test_expr.scope(db);
+                let ty =
+                    inference.expression_ty(test_expr.node_ref(db).scoped_expression_id(db, scope));


That might be, but the problem is that evaluate isn't a salsa query, nor is the code calling into evaluate. The problem is, to some extent, pre-existing because symbol_id already depends on the UseDefMap, that's why I think we can tackle it as a follow-up (but we definitely should):

The flow I'm concerned about is:

Type::member calls types::symbol which is not a salsa query (nor is Type::member)

symbol calls types::symbol_by_id which resolves the use def map (so it's a pre-existing issue!)

symbol_by_id calls into declarations_ty which calls evaluate and depends on the AST

There are other examples for the same flow which, I think are all cross-module:

global_symbol

Class:own_member

We should add a comment if, for some reason, the constraint holds that symbol_by_id is only called for the same file and never cross files.

sharkdp · 2024-12-20T15:24:35Z

I did the following benchmark to see how to set the recursion limit (where red_knot_xy has a MAX_RECURSION_DEPTH of xy):

hyperfine \                                                                                          
  --warmup=20 \
  --shell=none \
  --ignore-failure \
  --export-json results.json \
  -L limit 0,8,16,24,32,48,64,no_limit \
  "./red_knot_{limit} --project path/to/black"

See the results below. There are two things to notice: there is no significant change in performance when going from zero (i.e. no static visibility analysis at all!) all the way up to a limit of ~32. After that, there is a jump of the runtime by a factor of five. I'll set the limit to 24 for now.

Zoomed in on the runtimes up to a limit of 32:

crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md

sharkdp · 2024-12-20T16:05:04Z

crates/red_knot_python_semantic/src/visibility_constraints.rs

+//! use(x)
+//! ```
+//!
+//! ### Explicit ambiguity


I need to look at this section again. I'm not sure if that makes sense. Or if there is another way to solve this without explicit ambiguity constraints.

I think it makes sense? On a statically-analyzable branch, we have to add test visibility to the branch path and ~test visibility to the not-branch path. VisibilityConstraint::Ambiguous is just a shortcut to do the same thing in the case where there is no test expression we plan to analyze for truthiness later. It's "the same thing" because ambiguous and ~ambiguous are indistinguishable; we can just add VisibilityConstraint::Ambiguous to both paths.

carljm

This is looking good to me! Just a few nits, feel free to land after addressing.

carljm · 2024-12-20T17:57:59Z

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

+//! infer `test` to be of type `Literal[True]` or `Literal[False]`, we can rule out one of the
+//! possible paths. To support this feature, we record a visibility constraint of `test` to all
+//! live bindings and declarations *after* visiting the body of the `if` statement. And we record
+//! a negative visibility constraint `~test` to all live bindings/declarations in the (implicit)
+//! `else` branch. For the example above, we would record the following visibility constraints
+//! (adding the implicit "unbound" definitions for clarity):
+//! ```py
+//! x = <unbound>  # not live, shadowed by `x = 1`
+//! y = <unbound>  # visibility constraint: ~test
+//!
+//! x = 1  # visibility constraint: ~test
+//! if test:
+//!     x = 2  # visibility constraint: test
+//!     y = "y"  # visibility constraint: test
+//! ```
+//! When we encounter a use of `x` after this `if` statement, we would record two live bindings: `x
+//! = 1` with a constraint of `~test`, and `x = 2` with a constraint of `test`. In type inference,
+//! when we iterate over all live bindings, we can evaluate these constraints to determine if a
+//! particular binding is actually visible. For example, if `test` is `Literal[True]`, we only see
+//! the `x = 2` binding. If `test` is `Literal[False]`, we only see the `x = 1` binding. And if the
+//! type of `test` is `bool`, we can see both bindings.


This is really good! One nit on the wording: unlike Rust, Python automatically converts an expression of any type to bool in a boolean context. So test itself need not be of type Literal[False] or Literal[True] or bool; it could be of any type. What we actually look at is the return type of the __bool__ method of test: is it of type Literal[False] or Literal[True] or bool?

Python automatically converts an expression of any type to bool in a boolean context

Right. I was wondering whether or not I should bring it up and decided to keep it simple, because I didn't want to talk too much about types in the use-def-map module. But it's better to be precise, rather than concise. Changed now.

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

carljm · 2024-12-20T18:23:18Z

crates/red_knot_python_semantic/src/semantic_index/use_def.rs

+    pub(super) fn add_constraint(&mut self, constraint: Constraint<'db>) -> ScopedConstraintId {
+        self.all_constraints.push(constraint)
+    }
+
+    pub(super) fn record_constraint_id(&mut self, constraint: ScopedConstraintId) {
+        for state in &mut self.symbol_states {
+            state.record_constraint(constraint);
+        }
+    }
+
+    pub(super) fn record_constraint(&mut self, constraint: Constraint<'db>) -> ScopedConstraintId {
+        let new_constraint_id = self.add_constraint(constraint);
+        self.record_constraint_id(new_constraint_id);
+        new_constraint_id
+    }
+
+    pub(super) fn add_visibility_constraint(
+        &mut self,
+        constraint: VisibilityConstraint<'db>,
+    ) -> ScopedVisibilityConstraintId {
+        self.visibility_constraints.add(constraint)
+    }
+
+    pub(super) fn record_visibility_constraint_id(
+        &mut self,
+        constraint: ScopedVisibilityConstraintId,
+    ) {
        for state in &mut self.symbol_states {
-            state.record_constraint(constraint_id);
+            state.record_visibility_constraint(&mut self.visibility_constraints, constraint);
+        }
+
+        self.scope_start_visibility = self
+            .visibility_constraints
+            .add_and_constraint(self.scope_start_visibility, constraint);
+    }
+
+    pub(super) fn record_visibility_constraint(
+        &mut self,
+        constraint: VisibilityConstraint<'db>,
+    ) -> ScopedVisibilityConstraintId {
+        let new_constraint_id = self.add_visibility_constraint(constraint);
+        self.record_visibility_constraint_id(new_constraint_id);
+        new_constraint_id
+    }


At a high level, it would be nice if we could simplify these APIs and avoid exposing the use-def map's internal IndexVec IDs to the semantic index builder. But I realize that you didn't add these for no good reason, it's because you needed them to get the semantics right (it looks like in particular for the BoolOp case.) I don't think we should spend more time right now trying to see if we can simplify this, just maybe something to keep in mind in future if/when we revisit this code.

Fully agreed. I will note it down as a TODO item for myself.

crates/red_knot_python_semantic/src/semantic_index/use_def/symbol_state.rs

carljm · 2024-12-20T18:38:41Z

crates/red_knot_python_semantic/src/visibility_constraints.rs

+//! use(x)
+//! ```
+//!
+//! ### Explicit ambiguity


I think it makes sense? On a statically-analyzable branch, we have to add test visibility to the branch path and ~test visibility to the not-branch path. VisibilityConstraint::Ambiguous is just a shortcut to do the same thing in the case where there is no test expression we plan to analyze for truthiness later. It's "the same thing" because ambiguous and ~ambiguous are indistinguishable; we can just add VisibilityConstraint::Ambiguous to both paths.

crates/red_knot_python_semantic/src/visibility_constraints.rs

MichaReiser · 2024-12-21T13:36:37Z

Congrats on landing this massive improvement! Enjoy your time off

sharkdp added the red-knot Multi-file analysis & type inference label Dec 16, 2024

sharkdp force-pushed the david/statically-known-branches-2 branch from 44bbbb8 to 0e49dc7 Compare December 17, 2024 09:32

sharkdp force-pushed the david/statically-known-branches-2 branch 2 times, most recently from c0e8ef7 to b2a0fba Compare December 17, 2024 19:21

This comment was marked as resolved.

Sign in to view

sharkdp force-pushed the david/statically-known-branches-2 branch from 5bad462 to 79a1b6b Compare December 18, 2024 11:46

AlexWaygood added the great writeup A wonderful example of a quality contribution label Dec 18, 2024

carljm reviewed Dec 19, 2024

View reviewed changes

sharkdp force-pushed the david/statically-known-branches-2 branch from aa60174 to 1f54069 Compare December 19, 2024 11:15

sharkdp commented Dec 19, 2024

View reviewed changes

crates/red_knot_python_semantic/src/semantic_index/builder.rs Show resolved Hide resolved

sharkdp force-pushed the david/statically-known-branches-2 branch from dc3330b to 35c5e10 Compare December 19, 2024 14:55

sharkdp marked this pull request as ready for review December 19, 2024 16:00

sharkdp requested review from MichaReiser and AlexWaygood as code owners December 19, 2024 16:00

sharkdp commented Dec 19, 2024

View reviewed changes

crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md Outdated Show resolved Hide resolved

AlexWaygood reviewed Dec 19, 2024

View reviewed changes

AlexWaygood mentioned this pull request Dec 19, 2024

[red-knot] Rename and rework the CoreStdlibModule enum #15071

Merged

MichaReiser reviewed Dec 20, 2024

View reviewed changes

crates/red_knot_python_semantic/src/semantic_index.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Dec 20, 2024

View reviewed changes

sharkdp force-pushed the david/statically-known-branches-2 branch from 55658f2 to f0e9cce Compare December 20, 2024 08:45

sharkdp commented Dec 20, 2024

View reviewed changes

crates/red_knot_python_semantic/resources/mdtest/statically_known_branches.md Outdated Show resolved Hide resolved

sharkdp commented Dec 20, 2024

View reviewed changes

carljm approved these changes Dec 20, 2024

View reviewed changes

carljm reviewed Dec 20, 2024

View reviewed changes

crates/red_knot_python_semantic/src/visibility_constraints.rs Show resolved Hide resolved

sharkdp added 2 commits December 21, 2024 11:23

[red-knot] Statically known branches

7daaba8

Temporarily patch typeshed to avoid cycles

2c38c42

sharkdp and others added 25 commits December 21, 2024 11:23

Simplify code in simplify_visibility_constraints

564ea98

Use actual return values in dummy functions

ad3b81e

Use struct instead of three-tuple

52129a6

::default() instead of ::new()

7c55b67

Minor review comments

2a0672a

Use KnownModule

4ce2d2a

Minor iterator changes

d3216fe

More minor changes

f0b7f47

Revert possibly-unbound changes

db0e190

Documentation

a84614c

Separate struct for declaration ids with constraint

40525db

More documentation

a448b3a

Adapt use-def documentation

96d8faa

Add section on static visibility

2dc3276

Add module-level documentation

87b5da8

Set recursion limit to 24, document the limitation

bf0002e

Make symbol_by_id a salsa query

b133549

Add section regarding ambiguity

4f4c142

fmt skip

c158577

Truthy/Falsy instead of Literal[True/False]

4ee8fe4

Only first entry should be None

feb9f1b

Fix visibility

dcd00c5

Remove visibility_constraints reference from iterator

9d57adf

Add comment regarding performance

b455093

txt-sections to prevent doctest failures

95d079c

sharkdp force-pushed the david/statically-known-branches-2 branch from b3434b2 to 95d079c Compare December 21, 2024 10:26

sharkdp merged commit 000948a into main Dec 21, 2024
21 checks passed

sharkdp deleted the david/statically-known-branches-2 branch December 21, 2024 10:33

sharkdp mentioned this pull request Dec 21, 2024

Make symbol_by_id a salsa query #15080

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] Statically known branches #15019

[red-knot] Statically known branches #15019

sharkdp commented Dec 16, 2024 •

edited

Loading

codspeed-hq bot commented Dec 17, 2024 •

edited

Loading

This comment was marked as resolved.

carljm left a comment

sharkdp Dec 19, 2024

sharkdp Dec 19, 2024

MichaReiser Dec 20, 2024 •

edited

Loading

MichaReiser Dec 20, 2024

carljm Dec 20, 2024

AlexWaygood left a comment

TomerBin commented Dec 19, 2024

MichaReiser Dec 20, 2024 •

edited

Loading

sharkdp commented Dec 20, 2024 •

edited

Loading

sharkdp Dec 20, 2024

carljm Dec 20, 2024

carljm left a comment

carljm Dec 20, 2024

sharkdp Dec 21, 2024

carljm Dec 20, 2024

sharkdp Dec 21, 2024

carljm Dec 20, 2024

MichaReiser commented Dec 21, 2024

[red-knot] Statically known branches #15019

[red-knot] Statically known branches #15019

Conversation

sharkdp commented Dec 16, 2024 • edited Loading

Summary

Implementation

Visibility constraints, truthiness, negation

Sequential constraints

Merged (or parallel) constraints

Properties

Other

Performance

tomllib, -7%, needs to resolve one additional module (sys)

black, -6%

Test Plan

codspeed-hq bot commented Dec 17, 2024 • edited Loading

Merging #15019 will degrade performances by 17.15%

Summary

Benchmarks breakdown

This comment was marked as resolved.

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexWaygood left a comment

Choose a reason for hiding this comment

TomerBin commented Dec 19, 2024

MichaReiser Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

sharkdp commented Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser commented Dec 21, 2024

sharkdp commented Dec 16, 2024 •

edited

Loading

`tomllib`, -7%, needs to resolve one additional module (sys)

`black`, -6%

codspeed-hq bot commented Dec 17, 2024 •

edited

Loading

MichaReiser Dec 20, 2024 •

edited

Loading

MichaReiser Dec 20, 2024 •

edited

Loading

sharkdp commented Dec 20, 2024 •

edited

Loading