Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-RFC: Simpler inserts #6

Open
tailhook opened this issue Apr 14, 2020 · 6 comments
Open

Pre-RFC: Simpler inserts #6

tailhook opened this issue Apr 14, 2020 · 6 comments

Comments

@tailhook
Copy link
Contributor

Motivation

Insert statements are usually look like:

db.fetchrow("""
    INSERT User {
        first_name := <str>$first_name,
        last_name := <str>$last_name,
        birth_date := <cal::local_date>$birth_date,
    }
""", dict(
  first_name=form['first_name'],
  last_name=form['last_name'],
  birth_date=form['birth_date'],
))

This is repetitive and error-prone (adding a field must take place in multiple places).

If some fields are optional, it needs the whole query builder to deal with that correctly.

(spectrum thread)

New Syntax

So I propose to make a dedicated syntax for that:

db.fetchrow("INSERT User {..$user}", dict(first_name=....)

This is much easier to write and doesn't need query builder for optional fields.

It uses .. operator that comes from Rust. We can use ** from python, or ... from Javascript.

Protocol

This requires the new protocol message to make queries, let's call it SmartExecute which consists of:

  1. query: EdgeQL text
  2. params: Data serialized by some self-descriptive serialization format
  3. output_typedesc_id: Optional, plays same role as in OptimisticExecute

Notes:

  1. Generally server sends output type descriptor first, then data
  2. In case of output_typedesc_id is specified and it matches the value, this behaves as optimistic execute

Client

Client might switch to this new protocol message if any of the argument is a dictionary. Otherwise, keep old protocol for compatibility. Later we can benchmark new protocol and make some other heuristics if the new protocol is faster (i.e. at least when there are no arguments).

Internals

  1. Server extracts list of keys from params and sends it to the compiler alongside with query. Compiler returns same thing as normal query, except argmap is a bit more complex (contains not just indexes for top-level arguments but also for keys of dictionaries).
  2. Then server transcodes arguments from self-descriptive format to what is provided by input type descriptor returned by compiler.

Performance Notes

  1. This introduces re-encoding of params, which take a little bit more CPU but I doubt it's too much. Transcoding arguments in Rust should be quite fast.
  2. For small number of arguments it should be faster because it avoids Prepare/Execute and doesn't need client-side cache of descriptors (output typedesc may only be cached if it's large enough that it justifies traffic savings).
  3. If this is benchmarked as performance issue, we can recommend using expanded syntax for either (a) hot spots or (b) starting with certain amount of data (which should be at least tens of arguments or hundreds of kilobytes of data I think).

Open Questions and Alternatives

  1. We use unpack/spread operator in proposal. We can use <User>$x instead, but:

  2. We may want to make type inference first (INSERT { last_name :=$last_name }), but this is not something that is strictly prerequisite and is not huge ergonomic improvement by itself.

  3. If this is a spread operator, should it work in places other than inserts? Perhaps yes, as far as we can make type inference for fields. Error "can't infer types, use expanded syntax" is fine for me.

  4. Should this work for links? In the first implementation, no. User {..$fields, link := (SELECT ..)} is fine.

@1st1
Copy link
Member

1st1 commented Apr 14, 2020

Thanks Paul, this is a good proposal.

If some fields are optional, it needs the whole query builder to deal with that correctly.

Yeah, but we will have query builders, it's only a question of when. And even before query builders we can create a driver-level API to simplify inserts without the need to complicate EdgeQL/server/protocol. At least right now this seems the preferred option to me.

It uses .. operator that comes from Rust. We can use ** from python, or ... from Javascript.

We planned to use ... syntax for GraphQL-inspired "fragments" or "splats" post 1.0.

@1st1
Copy link
Member

1st1 commented Apr 14, 2020

Lastly, your opening example can be simplified:

db.fetchone("""
    INSERT User {
        first_name := <str>$first_name,
        last_name := <str>$last_name,
        birth_date := <cal::local_date>$birth_date,
    }
""", **form)

And it's debatable if

db.fetchone("""
    INSERT User {
        {...$form}
    }
""", **form)

is actually clearer and better.

@elprans
Copy link
Member

elprans commented Apr 14, 2020

Thanks for starting the discussion, Paul!

We can use <User>$x instead

No, the type cast operator semantically is a constructor and since objects can only be constructed by inserting, type casts with object types are not supported.

If this is a spread operator, should it work in places other than inserts?

Yes, we are contemplating adding a generic spread operator in shapes: edgedb/edgedb#180

Should this work for links?

A question "how do I insert data as a blob of JSON?" has popped more than once, and we also want to make it easy to save an entire form of data, so it's worth exploring. Perhaps if a sub-dict in input has a non-empty "id" it's a nested UPDATE, and if it doesn't it's a nested INSERT?

Another open question is how do you actually describe the argument data and is it allowed to be non-homogeneous?

@tailhook
Copy link
Contributor Author

And even before query builders we can create a driver-level API

Can you elaborate? Anything I can come up with, doesn't work with anything a little more complex than bare insert, like: WITH X := INSERT...SELECT. Also we can't reliably guess column type from the value type (and using edgeql introspection in driver for that sounds slow and unreliable).

Lastly, your opening example can be simplified:

Sure. That's not the point. The point is that you have that enumeration in the app code anyway (either specifically for insert, as I've shown in example or in say API or form validation code), there is no reason to duplicate it.

We can use <User>$x instead

No, the type cast operator semantically is a constructor and since objects can only be constructed by inserting, type casts with object types are not supported.

That's also not a point. We just abuse cast as variable type specification. Cast isn't something that's inherent in variable usage, so we could invent any other syntax, like User := $user or whatever. The idea there is to accept whole objects from the client. But for now I still prefer this less than "splat" syntax.

If this is a spread operator, should it work in places other than inserts?

Yes, we are contemplating adding a generic spread operator in shapes: edgedb/edgedb#180

Does it conflicts with this issue? For me it looks like it doesn't, but I'm not sure.

Should this work for links?

A question "how do I insert data as a blob of JSON?" has popped more than once, and we also want to make it easy to save an entire form of data, so it's worth exploring. Perhaps if a sub-dict in input has a non-empty "id" it's a nested UPDATE, and if it doesn't it's a nested INSERT?

We can probably explore that theoretically, so we don't design thing that is too limited, but I would skip this in the first version of implementation.

Another open question is how do you actually describe the argument data and is it allowed to be non-homogeneous?

This question is only important if we accept list of dicts as an argument. Or inserting the links.
Which this proposal doesn't. I don't think the former is even possible with the splat syntax.

Answering the question: I would disallow different shapes in the first implementation. And then re-evaluate whenever there are users complaining. I believe that it's much harder to do, if we want to avoid recompiling postgres query on every request (perhaps, not impossible, though)

@elprans
Copy link
Member

elprans commented Apr 14, 2020

Cast isn't something that's inherent in variable usage, so we could invent any other syntax

Sure, I was clarifying the semantics of the cast syntax and why it can't be used for objects.

Does it conflicts with this issue?

It doesn't, but it's worth keeping that perspective in mind.

@tailhook
Copy link
Contributor Author

tailhook commented Apr 14, 2020

Lastly, your opening example can be simplified:

db.fetchone("""
    INSERT User {
        first_name := <str>$first_name,
        last_name := <str>$last_name,
        birth_date := <cal::local_date>$birth_date,
    }
""", **form)

Also, in fact I have more complex examples in mind:

db.fetchone("""
    INSERT Session {...$session, user := (INSERT User {...$user})}
""", user=dict(...), session=dict(...))

This is much more tricky to do via **, especially if there are matching keys in $user and $session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants