Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: initial relay integration support #65

Merged
merged 10 commits into from
Dec 6, 2023

Conversation

bellini666
Copy link
Member

@bellini666 bellini666 commented Oct 12, 2023

TODO:

  • Support for connections
  • Support for node()

@bellini666 bellini666 changed the title feat: initial relay integration support WIP: feat: initial relay integration support Oct 12, 2023
@botberry
Copy link
Member

botberry commented Oct 12, 2023

Thanks for adding the RELEASE.md file!

Here's a preview of the changelog:


Initial relay connection/node support using Strawberry's relay integration.

@codecov-commenter
Copy link

codecov-commenter commented Oct 12, 2023

Codecov Report

Merging #65 (d86b98f) into main (bf75e07) will increase coverage by 6.26%.
Report is 1 commits behind head on main.
The diff coverage is 89.69%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #65      +/-   ##
==========================================
+ Coverage   78.45%   84.71%   +6.26%     
==========================================
  Files          10       14       +4     
  Lines         789     1590     +801     
  Branches      118      260     +142     
==========================================
+ Hits          619     1347     +728     
- Misses        141      191      +50     
- Partials       29       52      +23     

@codspeed-hq
Copy link

codspeed-hq bot commented Oct 12, 2023

CodSpeed Performance Report

Merging #65 will not alter performance

Comparing bellini666:relay_integration (9ef9785) with main (af6a48c)

Summary

✅ 1 untouched benchmarks

@bellini666 bellini666 changed the title WIP: feat: initial relay integration support feat: initial relay integration support Oct 14, 2023
Copy link
Contributor

@mattalbr mattalbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reviewing, but this is definitely over my head so far. Trying to focus on the API since I'm still wrapping my head around the implementation. Going to relay on @erikwrede and @patrick91 heavily :)

src/strawberry_sqlalchemy_mapper/field.py Show resolved Hide resolved
src/strawberry_sqlalchemy_mapper/field.py Outdated Show resolved Hide resolved
mapper = StrawberrySQLAlchemyMapper()

@mapper.type(fruit_table)
class Fruit(relay.Node):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's any way to create this type by default in the library through primary key abstraction? Doesn't have to be in this PR, but I really would love us to get to zero boilerplate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably there is a way, indeed! It goes beyond the scope of this PR, but it can be done in the future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There definitely is a way, but the zero boilerplate should be optional. Explicitly selecting what to include in the schema on a type level has advantages too, so I'm all for supporting both ways.

@erikwrede
Copy link
Member

hey everyone, I still need some time to fully review this. Will get back when I have done some more testing

@bellini666
Copy link
Member Author

Hey @erikwrede , @mattalbr

I just got back from djangocon and I'll get back to this PR during the weekend now.

One thing I was wondering here is regarding the fact that you need to have a session to query sqlalchemy objects.

Because of that I had to add a sessionmaker attribute to the fields, and I notice that the data loaders require one as well.

I was thinking about a way to avoid that and I have a proposal. Let me know what you think and I can implement that in this PR:

The idea is to use contextvars to provide a session that will be used during the execution of a query. Not only the relay fields I'm defining here can use it, but the dataloaders can also use it instead of requiring a bind argument.

Something like:

current_session = contextvars.ContextVar("current-session")

class SessionProviderExtension(SchemaExtension):
    def __init__(self, sessionmaker):
        self.sessionmaker = sessionmaker

    def on_execute(self):
        token = current_session.set(self.sessionmaker())
        try:
            yield
        finally:
            current_session.reset(token)

Of course we can also make the current_session be started lazely (create it when first accessing it)

What do you think?

Copy link
Contributor

@mattalbr mattalbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few API clarifications and concerns, but this is AWESOME and a huge step in the right direction for this library.

Lots of things we can improve eventually, but I love the direction this is going. Thank you so much Bellini!

Patrick and Erik, I'm relying on you two to review the implementation since it's way out of my expertise, and I'm strictly keeping a user-focused lens while reviewing.

src/strawberry_sqlalchemy_mapper/field.py Outdated Show resolved Hide resolved
assert result.errors is None
assert result.data == {
"fruit": {
"id": relay.to_base64("Fruit", f.id),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand how this relay API works. Having this returned in id seems like it would obfuscate id to the caller, which for a lot of situations is totally fine, but what if we wanted to do this without having an opaque id? Is there a way to put the relay ID in a different response field, but still tell the mapper that it should use id for mapping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the resolved query to the frontend, the frontend clients should only know the real relay ID. In your mutations you can use relay.from_base64 and then the id to node function to get the corresponding DB entity. Where else would you need the raw ID where it is difficult to access by this paradigm?

Not sure I got the question right, so please feel free to specify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more a matter of, what if someone already has a field named ID that the client is already using, but they want to add support using this new relay functionality. It's nice to think all ids should be opaque, but that's far too opinionated for a middleware library like this to make that decision IMO.

So my question is, what does someone do if they already have a field named "id" that the client is depending on? Can they put this relay ID in a different field so as not to clobber "id"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattalbr it is part of the Node interface and its spec to have a id attribute which is serialized something that can globally identify the object (i.e. without even having to tell the type).

Having said that, there's a proposal to make it easier to support custom implementations (in case you want to make it opaque in some other way, or even skip it at all) which might be useful once it gets merged: strawberry-graphql/strawberry#3057

Another possibility is to expose the original id in another field.

But I would say that anyone wanting to use this is not only aware of the whole GlobalID thing, but also expects that it works like that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I would say that anyone wanting to use this is not only aware of the whole GlobalID thing, but also expects that it works like that.

I personally disagree with this one :D I think the global id is mostly useful when using relay as a client, which might not be the case most of the times 😊

I think it might be worth trying to solve 3057 in Strawberry, but I haven't had the right amount of time to think about a good solution yet

if root is not None:
# root won't be None when resolving nested connections.
# TODO: Maybe we want to send this to a dataloader?
query = getattr(root, field.python_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost want to log a warning here because this will be so bad for performance. But, I think we'd only want to log it once per query, rather than per resolver, which maybe we don't have an easy way to do. So maybe no action, just thinking out loud.

src/strawberry_sqlalchemy_mapper/field.py Outdated Show resolved Hide resolved
src/strawberry_sqlalchemy_mapper/field.py Outdated Show resolved Hide resolved
... )
...
... @relay.connection(description="ABC")
... def get_some_nodes(self, age: int) -> Iterable[SomeType]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would someone do something like this were they extend the resolver to implement custom filtering without pulling all the data in from the database? Feels like either an abstraction is missing that would allow this to modify the query instead of returning results, or it would need to have access to the pagination params

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By returning a Query object. This is actually exactly what the default_resolver does when you don't define one, but you could do something like:

@relay.connection(relay.ListConnection[SomeType])
def some_connection(self) -> Iterable[SomeType]:
    return session.query(SomeModel)

In this case session.query(SomeModel) will only access the database when being iterated, and what you return in here will be modified by the connection later: for ListConnection it will slice it, producing a limit/offset pagination. For the KeysetConnection I created it will do its thing

An example from strawberry-django: https://github.com/bellini666/strawberry-django-demo/blob/main/demo/order/schema.py#L28

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thanks!

def __init__(
self,
sessionmaker: _SessionMaker | None = None,
keyset: Keyset | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is at field definition time, correct? Keyset I believe should come not at definition time but at query time. The Keyset gets passed to the caller as the pagination token (well technically the Marker gets passed, which includes the Keyset), which gets passed back when fetching the next page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattalbr not sure if I understood whast you mean, and maybe the name of the variable here is not the best one =P

This variable defines the fields that will be used for the pagination, i.e., that will be used to order_by the query when you are relying on the default resolver.

You can see an example in this test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that link isn't working for me, can you point me to the test by name?
I just want to make sure that which fields will be used to order the query is defined per request. E.g. imagine we had a resolver:

def fruits(order: list[string], info: Info):
    return info.context.session.query(models.Fruit).order(*order).all()

The ordering, and thus the keyset, is not fixed for the resolver -- it's determined by the client.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattalbr oh I see. In the way I implemented it, the ordering is defined in the field

The old link is not working anymore, here is a new one: https://github.com/strawberry-graphql/strawberry-sqlalchemy/pull/65/files#diff-e466e4061d3ec646290e8d41bfb1ccfb9f3d7f36113d7c8c9196d2e1ae9de05aR509

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the link is not working because the file itself is collapsed.

Here is the screenshot of the implementation

Screenshot 2023-11-30 at 18 12 31

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that

Sorry that link isn't working for me, can you point me to the test by name? I just want to make sure that which fields will be used to order the query is defined per request. E.g. imagine we had a resolver:

def fruits(order: list[string], info: Info):
    return info.context.session.query(models.Fruit).order(*order).all()

The ordering, and thus the keyset, is not fixed for the resolver -- it's determined by the client.

Having said that, your example here would actually work. keyset is only used for the automatic resolver, not for manual ones. So if you define a resolver like you did it will work

Copy link
Member

@erikwrede erikwrede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Minor revisions attached, let's figure those points out and work on shaping the rest iteratively in future releases.

About the context vars for sessions: I'd prefer to leave things in context for now, this way we don't force async on everyone and we have a clear entry point for field extensions to modify the context and thus the session.

Comment on lines +100 to +116
while isinstance(field, StrawberryContainer):
field = field.of_type

edge_class = cast(Edge[NodeType], field)

return cls(
page_info=relay.PageInfo(
has_next_page=page.paging.has_next,
has_previous_page=page.paging.has_previous,
start_cursor=page.paging.get_bookmark_at(0) if page else None,
end_cursor=page.paging.get_bookmark_at(-1) if page else None,
),
edges=[
edge_class.resolve_edge(n, cursor=page.paging.get_bookmark_at(i))
for i, n in enumerate(page)
],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a later PR: we should optimize this to use dataloader if there is no limit or filter on the pagination here.

if isinstance(session, AsyncSession):

async def resolve_async(nodes=nodes):
return await session.run_sync(lambda s: resolve_nodes(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we falling back to sync here? If this is a limitation of sqlakeyset, we need to properly warn users about it and this is a major drawback for anyone using async sqla.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to use the sqlakeyset.get_page API instead of the sqlakeyset.select_page because of the way this is implemented.

sqlakeyset doesn't have a get_page async counterpart, only a select_page

TBH I don't know much of the sqlalchemy internals and neither of sqlakeyset, so open to try any suggestions here =P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_sync should be fine here, right, since sqlalchemy does all the magic of making sure this still hooks into the async dbapi?

If we really need to keep this async, we would have to basically clone sqlakeyset.paging.orm_get_page (very short, so not a huge deal) to get the query and then execute it and process results.

I've tried to augment sqlakeyset but got stuck on code reviews unfortunately.

src/strawberry_sqlalchemy_mapper/relay.py Outdated Show resolved Hide resolved
src/strawberry_sqlalchemy_mapper/relay.py Outdated Show resolved Hide resolved
)

try:
return session.get(model, filter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we should be clear about where async is supported. Not a blocker for first version, we can add async everywhere later.

raise


def resolve_model_id_attr(source: Type) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we handle this via str | tuple[str]? Parsing a string everytime seems slow and hard to understand without knowing the source. Also easier to not miss the standard for users providing custom implementations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not work very well as the relay implementation on strawberry will do a str() on anything that is not already a string, meaning (1, 2, 3) would become "(1, 2, 3)", which is harder to parse

But TBF, multiple primary keys are not very common, so anyone using them would probably know that there must be some special handling for this.

But I can/will add a comment here about this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Composite keys are very common in some of the codebases I am familiar with. Maybe we can adjust that behavior in strawberry to also allow tuples (which are later stringified)? cc @patrick91

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me 😊

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bellini666 @mattalbr are you fine with merging this in and later causing a soft breaking change in the internal api by switching to tuples after we added support in strawberrry? I'd like to avoid blocking the PR for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that, yeah! I'm already way too slow on this PR and gotta get out of the way.

src/strawberry_sqlalchemy_mapper/relay.py Show resolved Hide resolved
mapper = StrawberrySQLAlchemyMapper()

@mapper.type(fruit_table)
class Fruit(relay.Node):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There definitely is a way, but the zero boilerplate should be optional. Explicitly selecting what to include in the schema on a type level has advantages too, so I'm all for supporting both ways.

assert result.errors is None
assert result.data == {
"fruit": {
"id": relay.to_base64("Fruit", f.id),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the resolved query to the frontend, the frontend clients should only know the real relay ID. In your mutations you can use relay.from_base64 and then the id to node function to get the corresponding DB entity. Where else would you need the raw ID where it is difficult to access by this paradigm?

Not sure I got the question right, so please feel free to specify.

@bellini666
Copy link
Member Author

@mattalbr @erikwrede think I addressed all of the comments in this PR

Let me know if there's anything else missing to be able to merge this

Copy link
Member

@erikwrede erikwrede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from my side, pending agreement on this from you:

@bellini666 @mattalbr are you fine with merging this in and later causing a soft breaking change in the internal api by switching to tuples after we added support in strawberrry? I'd like to avoid blocking the PR for that.

@bellini666
Copy link
Member Author

lgtm from my side, pending agreement on this from you:

Thank you! @mattalbr let me know if you want any extra changes in this

Copy link
Contributor

@mattalbr mattalbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last of the open questions, but looks good for a first version! Thanks so much for your help!

Comment on lines 344 to 351
order_by = []
if order := kwargs.get("order"):
order_by.extend([getattr(model, o) for o in order])
if field.keyset:
order_by.extend(field.keyset)

if order_by:
query = query.order_by(*order_by)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! No need to add this "order" column yet since we don't have a fully fleshed API yet. I think you can just do:

if after := kwargs.get("after"):
  order_by.extend(sqlakeyset.unserialize_bookmark(after).place)

And then specify where field.keyset is defined that it's just the default ordering, and later pages grab the order from the cursor.

@mattalbr mattalbr merged commit 1be8a65 into strawberry-graphql:main Dec 6, 2023
19 checks passed
@bellini666 bellini666 deleted the relay_integration branch December 6, 2023 18:37
@guilhermelou
Copy link

current_session

Hey @bellini666 , have you started working on something to avoid the duplicate session setup?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants