Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Support assert_type #15194

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

InSyncWithFoo
Copy link
Contributor

Summary

See #15103.

Test Plan

Markdown tests and unit tests.

Copy link
Contributor

github-actions bot commented Dec 30, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@InSyncWithFoo
Copy link
Contributor Author

This is largely a proof-of-concept. is_equals_to() is not a thing in the specification, and I'm quite sure its name is not the best.

@MichaReiser MichaReiser added the red-knot Multi-file analysis & type inference label Dec 30, 2024
Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for working on this. I added a few initial review comments.

Comment on lines 1415 to 1435
/// Returns true if this type and `other` are "exactly the same".
///
/// This powers the `assert_type()` directive.
pub(crate) fn is_equals_to(self, db: &'db dyn Db, other: Type<'db>) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you go into a bit more detail in how this is different from is_equivalent_to? The main difference seems to be that this can also handle non-fully-static types and return true for something like Any ~ Any? If so, could it handle those gradual types (+ intersections/unions/tuples) and then fall back to is_equivalent_to for fully static types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's quite possible that a function like this could help fix #14899, FWIW, so it doesn't seem like an unreasonable addition to me. I'm also not sure what name would clearly distinguish it from Type::is_equivalent_to(), though 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I was not suggesting that this function is not useful. I just think there's some overlap with is_equivalent_to, and I'd like to understand it and possibly make use of it.

Copy link
Contributor Author

@InSyncWithFoo InSyncWithFoo Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another notable exception is type[Any] != type != type[object]. I'm not too sure about the former, but I think it is better to be strict.

Also, as it currently is, int | Unknown | Any != int | Any. This is supposedly due to Unknown not being foldable into Any, but int | Any is just Any; could such union types even be emitted in the first place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int | Any is not just Any; it is a gradual type with a known lower bound. It is an unknown type that must be at least as large as int, whereas plain Any is an unknown type that could be a smaller type than int. It is thus intentional that we don't simplify int | Any, and int | Any can be treated differently in the type system than plain Any. For instance, given an object of type int | Any, it is an error to assign that object to a destination expecting e.g. a str, whereas Any is assignable to str. Using terms from the typing spec, Any can materialize to str, but int | Any cannot; at best it can materialize to int | str.

The thing that causes more difficulty here is our choice to not simplify Any | Unknown. Because both of these simply represent the dynamic/unknown type, just with different origin. From a type system perspective, they should simplify, but we choose to preserve the information that the type may have come from an explicit Any annotation or may have come from a lack of annotation. But we should still ideally treat Any | Unknown as equivalent to Any and equivalent to Unknown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type[Any] != type != type[object]

I'm not sure if this is describing current behavior or correct behavior. Correct behavior would be that type == type[object], but both are not equivalent to type[Any]. (In our internal representation, type would be Type::Instance(<builtins.type>) and type[object] would be Type::SubclassOf(<builtins.object>), but these are equivalent types and we should treat them as such.)

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thank you for working on it! I think if we rebase on top of #15103 we can address some of the limitations and get it ready to land.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
let equal = |(first, second): (&Type<'db>, &Type<'db>)| first.is_equals_to(db, *second);

match (self, other) {
(Type::Todo(_), Type::Todo(_)) => false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo is supposed to behave like Any or Unknown in the typing system, so I think our default approach should be that all three are gradually-equivalent to each other. Is there a concrete reason you felt it necessary to make Todo != Todo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought that it would be rather nonsensical to do something like this:

def _(a: tuple[int, ...]):
	assert_type(a, tuple[str, ...])  # Pass because both are Todos

No strong feelings though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reasoning, but I think it is OK for that assert_type to pass, until we actually are able to properly differentiate those types. It is more important that we consistently treat Todo in the type system as the dynamic type, just with more provenance information.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
Comment on lines 1486 to 1515
(Type::Union(first), Type::Union(second)) => {
let first_elements = first.elements(db);
let second_elements = second.elements(db);

if first_elements.len() != second_elements.len() {
return false;
}

let first_elements = first_elements.iter().collect::<OrderedTypeSet>();
let second_elements = second_elements.iter().collect::<OrderedTypeSet>();

iter::zip(first_elements, second_elements).all(equal)
}

(Type::Intersection(first), Type::Intersection(second)) => {
let first_positive = first.positive(db).iter().collect::<OrderedTypeSet>();
let first_negative = first.negative(db).iter().collect::<OrderedTypeSet>();

let second_positive = second.positive(db).iter().collect::<OrderedTypeSet>();
let second_negative = second.negative(db).iter().collect::<OrderedTypeSet>();

iter::zip(first_positive, second_positive).all(equal)
&& iter::zip(first_negative, second_negative).all(equal)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice; this is a fix we need in is_equivalent_to as well. It would be nice if there were a way to do it without allocating so many hashsets...

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
@@ -553,6 +556,8 @@ pub enum Type<'db> {
// TODO protocols, callable types, overloads, generics, type vars
}

type OrderedTypeSet<'a, 'db> = HashSet<&'a Type<'db>, BuildHasherDefault<FxHasher>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes this set ordered? This is just a regular hash set, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I just thought its name should reflect the fact that the order of iteration is depended upon.

Copy link
Member

@MichaReiser MichaReiser Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the order isn't guaranteed. The element ordering may depend on insertion order. The IntersectionType positive and negative sets are ordered (although I don't remember the details of how they're ordered)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntersectionType uses OrderedSet, whose iteration order is the same as insertion order. The comparison in question needs a self-ordered set type.

I have so far been unsuccessful in finding such a type. Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could collect the types into a vec and then order them. I'm just not sure why the order is important. I suspect that we instead have to implement an O(n^2) algorithm that, for every type in a, searches a type in b to which it is equivalent, disregarding ordering entirely. But I'm not sure if doing so is too naive.

Copy link
Contributor Author

@InSyncWithFoo InSyncWithFoo Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this was done to avoid $$O(n^2)$$ time at the cost of a few $$O(n)$$ iterations and allocations. However, even for a $$O(n^2)$$ algorithm, I suspect $$O(n)$$ space is still needed to store paired indices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking this over more, I agree with Micha, in that I don't think naive Type ordering (even if we were to find/create a self-ordering set implementation) will be able to get us the right results here. The problem is that we can have equivalent "atomic" types that are not represented as the same Type enum bit pattern. (Examples already discussed include Any and Unknown, or Type::Instance(<builtins.type>) and Type::SubclassOf(<builtins.object>). So our definition of equivalence for unions and intersections needs to recurse into a full understanding of atomic type equivalence, not be built solely on Type enum identity.

Given that we already have this as an unsolved problem with the existing is_equivalent_to implementation, I'm OK with punting on this for this PR. (In other words, I think we can, and probably should, separate the problems of "implement assert_type" and "fully correct gradual type equivalence" into separate PRs.) In which case, I would suggest just removing the set stuff from this PR, and implementing is_gradual_equivalent_to for now without any handling for differently-ordered unions and intersections (just like is_equivalent_to), and a TODO comment for improving that in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually briefly tried looking at the problem of equivalence for differently ordered unions over the Christmas break, and, yeah, it is Not Easy

}

(Type::Intersection(first), Type::Intersection(second)) => {
let first_positive = first.positive(db).iter().collect::<OrderedTypeSet>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to collect the positive elements into another set when they are already set in IntersectionType? Could we instead compute the difference between a and b and then compare if those elements are equivalent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int & Unknown & ~Any is supposed to be gradual equivalent to int & Any & ~Unknown; I did mention this in a previous comment. Since I'm not sure whether such types can exist, I took the safe route.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! A few nits here and there, but basically this looks just right.

from typing_extensions import assert_type

def _(a: str | int):
# TODO: Infer the second argument as a type expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: Infer the second argument as a type expression
# TODO: order-independent union handling in type equivalence

if isinstance(a, B) and not isinstance(a, C) and not isinstance(a, D):
reveal_type(a) # revealed: A & B & ~C & ~D

# TODO: Infer the second argument as a type expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: Infer the second argument as a type expression
# TODO: order-independent intersection handling in type equivalence


## Unions

Unions with the same elements are the same, regardless of order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test showing union equivalence when the elements are actually in the same order? (This one should actually pass now.)

## Intersections

Intersections are the same when their positive and negative parts are respectively the same,
regardless of order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, let's also add a test for intersection equivalence when ordering is the same? It doesn't have to highlight ordering, just show two identical unions and that assert_type considers them equivalent.

match (self, other) {
(_, _) if self == other => true,

(Type::Any | Type::Unknown, Type::Any | Type::Unknown) => true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Todo?

(Type::SubclassOf(first), Type::SubclassOf(second)) => {
match (first.subclass_of(), second.subclass_of()) {
(first, second) if first == second => true,
(ClassBase::Any | ClassBase::Unknown, ClassBase::Any | ClassBase::Unknown) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Todo?

iter::zip(first_elements, second_elements).all(equivalent)
}

(Type::Intersection(first), Type::Intersection(second)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Type::Intersection(first), Type::Intersection(second)) => {
// TODO handle equivalent intersections with items in different order
(Type::Intersection(first), Type::Intersection(second)) => {

&& iter::zip(first.elements(db), second.elements(db)).all(equivalent)
}

(Type::Union(first), Type::Union(second)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(Type::Union(first), Type::Union(second)) => {
// TODO handle equivalent unions with items in different order
(Type::Union(first), Type::Union(second)) => {

#[test_case(Ty::Todo, Ty::Todo)]
#[test_case(Ty::Any, Ty::Any)]
#[test_case(Ty::Unknown, Ty::Unknown)]
#[test_case(Ty::Any, Ty::Unknown)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also test Todo vs Any and Todo vs Unknown, also all equivalent

.map(|arg_or_keyword| {
.enumerate()
.map(|(index, arg_or_keyword)| {
// TODO: Remove this once we have proper overload matching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think overload matching is the feature that will allow removing this; TypeForm and contextual type inference might. But since the current approach is not buggy, just hardcoded, I don't think we need a TODO for that. We could make a note of it, but I'd probably put that in the doc comment for takes_type_expression_arguments, not here.

Suggested change
// TODO: Remove this once we have proper overload matching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants