Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Check field types in database emulation #4862

Merged
merged 33 commits into from
Jun 12, 2024
Merged

[FEAT] Check field types in database emulation #4862

merged 33 commits into from
Jun 12, 2024

Conversation

rix0rrr
Copy link
Collaborator

@rix0rrr rix0rrr commented Dec 6, 2023

There is currently a problem with a table that has an index on a certain field, but is trying to insert values of a type not matching that index. It fails in Dynamo, which requires that all values used as index keys are of a single declared type (either string, number or binary).

However, the local emulation layer doesn't catch that, because it doesn't know anything about types that need to be checked.

Add a required types argument to Table, which must declare at least the key fields (of the table itself and of the indexes), and can optionally declare the types of other fields. It will be an error to insert values of the wrong type into the database.

This may look like an easy feature, but it has some edge cases you need to be aware of:

  • First of all, it relies on users declaring the indexes in code; they can choose NOT to declare the indexes in code, and there may still be an error after deploying.
  • Second of all: just because you declare a type on a fields saying what may be inserted into a table going forward, this can not make any guarantees on data already in the table. Don't use this feature to make any assumptions on data you read out of the table, only use it to sanity-check new data going into it!

We may decide this change is not worth it... (for example, it would not have caught the current problem, as the index was there in Dynamo but not declared in code, so this feature wouldn't have caught it).

There is currently a problem with a table that has an index on a
certain field, but is trying to insert values of a type not matching
that index. It fails in Dynamo, which doesn't allow that.

However, the local emulation layer doesn't catch that, because it
doesn't know anything about types that need to be checked.

Add a required `types` argument to `Table`, which must declare at least
the key fields (of the table itself and of the indexes), and can
optionally declare the types of other fields. It will be an error to
insert values of the wrong type into the database.

This may look like an easy feature, but it has some edge cases you
need to be aware of:

* First of all, it relies on users declaring the indexes in code; they
  can choose NOT to declare the indexes in code, and there may still
  be an error after deploying.
* Second of all: just because you declare a type on a fields saying what
  may be inserted into a table going forward, this can not make any
  guarantees on data already in the table. Don't use this feature
  to make any assumptions on data you read out of the table, only
  use it to sanity-check new data going into it!

We may decide this change is not worth it... (for example, it would
not have caught the current problem, as the index was there in Dynamo
but not declared in code, so this feature wouldn't have caught it).
@rix0rrr rix0rrr marked this pull request as draft December 6, 2023 19:32
@ghost
Copy link

ghost commented Dec 6, 2023

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Dec 6, 2023

Let's see how badly this breaks on the Cypress tests... :)

@rix0rrr rix0rrr changed the title [FEAT] Check field types in emulation layer [FEAT] Check field types in database emulation Dec 6, 2023
@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jan 11, 2024

Not sure anybody really wants or needs this, so I'll be closing it.

@rix0rrr rix0rrr marked this pull request as ready for review April 25, 2024 05:32
'created': int,
'classes': Optional(SetOf(str)),
'keyword_language': Optional(str),
'password': str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this!!! I think this makes a lot of sense. More typing annotations on our DB would be a huge win. I think the work now would be to adapt the definitions of all the tables. I can handle that!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 that would be great, thanks! You probably also have a better view into what the currently expected data contract is

@jpelay jpelay self-assigned this Apr 29, 2024
@Felienne
Copy link
Member

I am picking up #4678 so it would be nice if we can merge this 🙏

@@ -88,7 +88,7 @@ def times():
return int(round(time.time()))


DEBUG_MODE = False
DEBUG_MODE = not os.getenv('NO_DEBUG_MODE')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it for now, but maybe this is not the cleanest way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this always set it to True ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, it's what we are already doing:

hedy/app.py

Lines 2908 to 2912 in 6e7573d

# Start the server on a developer machine. Flask is initialized in DEBUG mode, so it
# hot-reloads files. We also flip our own internal "debug mode" flag to True, so our
# own file loading routines also hot-reload.
no_debug_mode_requested = os.getenv('NO_DEBUG_MODE')
utils.set_debug_mode(not no_debug_mode_requested)
. Just that this get executed before

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better solution is probably to control the initialization order of all these things better, by making the tables class members of Database and initializing it when the app is created

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense too.

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jun 6, 2024

Ohh I think I get what's happening: only_in_dev is being called before DEBUG_MODE is set to True

Oh shit, yes that makes sense.

Comment on lines 1775 to 1782
def is_valid(self, value):
if not isinstance(value, dict):
return False
first_type = list(self.inner.keys())[0]
if isinstance(first_type, InstanceOf):
return all(first_type.is_valid(k) and self.inner.get(first_type).is_valid(v) for k, v in value.items())
else:
return all(validator.is_valid(value.get(key)) for key, validator in self.inner.items())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rico, what do you think about this? I changed that you now can specify str in the left-hand side of the validator dictionary. This way we can be a little more generic in our type checks, like this one:

'sorted_adventures': RecordOf({
    str: ListOf(RecordOf({
        'name': str,
        'from_teacher': bool
}))

It's important to note that you can only use str and no other type, since that would go against Dynamo naming rules either way. This is just a way to relax the type checking a little bit without needing to use dict.

Copy link
Collaborator Author

@rix0rrr rix0rrr Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "record" (a set of known fields, with known and potentially different types for every value) is a different kind of type than a "map" (can be any field, but the types are the same).

Examples:

person = { 
  "name": "Bertus",
  "age": "67"
}
  
favorite_restaurant_in_city = {
  "Oegstgeest": "Suzie's & Van Aken",
   "Amsterdam": "Ron's Gastropub",
}

In Python, you might use a dict to represent both of these, but the type-of-type is very different!

They are:

person :: { "name" -> str, "age" -> int }

# but
favorite_restaurant_in_city :: { str - > str }

So I wouldn't overload RecordOf in this way, because it looks like it will just get confusing. Also, it would allow you to construct this... now what does that mean?

RecordOf({
  str: str,
  "age": int
})

I would make a MapOf(str, str) or DictOf(str, str) or something.

But you have to explain the differences very clearly, because I can see if even you are confused about it, there is a good chance other people will also not realize the difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make a MapOf(str, str) or DictOf(str, str) or something.

Hmm I think this would make sense! In essence what I'm trying to do is relax the checking a bit, so another name for that type of validation makes a ton of sense.

So I wouldn't overload RecordOf in this way, because it looks like it will just get confusing. Also, it would allow you to construct this... now what does that mean?

RecordOf({
 str: str,
 "age": int
})

I thought of that! If you use str in the left hand side, then you can't use a named field (and all other classes aren't allowed)

hedy/website/dynamo.py

Lines 1677 to 1680 in dd20d55

if str in validator_dict.keys() and len(validator_dict.keys()) > 1:
raise ValueError(f'If you specify str as part of the validation, you cant define more type checks.\
You dindt do that for {validator_dict}')
type_dict = {}

But you have to explain the differences very clearly, because I can see if even you are confused about it, there is a good chance other people will also not realize the difference

Yes, I need to add something to the wiki and to the comments.

@jpelay
Copy link
Member

jpelay commented Jun 6, 2024

Seems like adding an element to a set is giving errors, for example adding a student to a class gives this:

    CLASSES.update({"id": class_id}, {"students": dynamo.DynamoAddToStringSet(student_id)})
  File "/home/capybara/hedy/website/querylog.py", line 211, in wrapped
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/capybara/hedy/website/dynamo.py", line 547, in update
    self._validate_types(updates, full=False)
  File "/home/capybara/hedy/website/dynamo.py", line 695, in _validate_types
    raise ValueError(f'In {data}, value of {field} should be {validator} (got {value})')
ValueError: In {'students': Add(('user1',))}, value of students should be optional set of instance of <class 'str'> (got Add(('user1',)))

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jun 7, 2024

Seems like adding an element to a set is giving errors, for example adding a student to a class gives this:

You're right, I made a mistake. Fixed it and added a test.

@jpelay
Copy link
Member

jpelay commented Jun 11, 2024

@rix0rrr I think I'm finished tweaking the PR! Added types for all of the tables and also added two new type validators: Or and DictOf as you suggested. What do you think?

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jun 11, 2024

❤️ Love it! Instead of Or(), consider perhaps naming it Either() or OneOf(), but this is great stuff!

Still not sure about the initialization of only_in_dev()... but if you feel comfortable with it, I feel good about it too!

Feel free to ship this at any point (since I'm the author, you need to be the shipper 😉 )

@jpelay
Copy link
Member

jpelay commented Jun 11, 2024

❤️ Love it! Instead of Or(), consider perhaps naming it Either() or OneOf(), but this is great stuff!

Damn, you're good at coming out with names!

Still not sure about the initialization of only_in_dev()... but if you feel comfortable with it, I feel good about it too!

I think it's best to tackle that in a new PR, so ship this one and then take your suggestion of explicitly initializing the tables in a new PR. @Felienne wants this PR ready because it'll help her with the live documentation of the tables.

Feel free to ship this at any point (since I'm the author, you need to be the shipper 😉 )

Understood 🫡

@rix0rrr
Copy link
Collaborator Author

rix0rrr commented Jun 11, 2024

Thanks for taking this across the finish line, and fixing my gigantic fuckup of the code not actually doing anything 🫣🫠😆

@jpelay
Copy link
Member

jpelay commented Jun 12, 2024

Thanks for taking this across the finish line, and fixing my gigantic fuckup of the code not actually doing anything 🫣🫠😆

Hahaha the core idea was so cool already! Thanks a lot for that!!

Copy link
Member

@jpelay jpelay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go! And hope nothing breaks !

Copy link
Contributor

mergify bot commented Jun 12, 2024

Thank you for contributing! Your pull request is now going on the merge train (choo choo! Do not click update from main anymore, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 16f1246 into main Jun 12, 2024
12 checks passed
@mergify mergify bot deleted the dynamodb-types branch June 12, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants