-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hacky support for Postgres schemas w/ runtime interpolation #3370
base: main
Are you sure you want to change the base?
Conversation
@bgentry Thanks for kicking off the conversation. I agree with you that this is a hacky solution, but I think we can figure out a solution. To confirm, when writing your queries, the SQL looks like the following: -- name: GetAuthor :one
SELECT * FROM sqlc_schema_placeholder.authors
WHERE id = $1 LIMIT 1; Instead of this, I was thinking we could add support for dynamic identifiers. The queries would look something like this -- name: GetAuthor :one
SELECT * FROM "{schema}.authors"
WHERE id = $1 LIMIT 1; This would add another parameter to the queries. The generated code would look like: func GetAuthor(ctx context.Context, schema string, id int) {
....
} |
@kyleconroy I think that would work perfectly well for what we would need, and it would be more flexible too (able to support fully qualified Do you have any thoughts on the query cache side of it? Obviously you wouldn't want to perform the same string substitution repeatedly hundreds of times per second. You could imagine a swappable implementation via an interface like: type QueryCache interface {
Substitute(query, placeholder, value) string
} A naiive implementation could just replace it every single time, whereas a more advanced one could cache the results with a tunable size or LRU setup. |
im not him but i do. i dont think a cache here is as good as you think
i dont think this is obvious either. a string replacement is simple O(n) copy operation that can be optimized in advance since the we know how many to replace and actually we already know where the things we need to replace are. on the other hand, a lookup in cache/LRU is not only a O(n) operation since it needs to validate equality, but it also requires hashing and extra time to shuffle buckets, and it won't necessarily save you any string copies either. A simple benchmark with naive strings.ReplaceAll shows this- I bet that some simple preprocessing and hits will make it faster and more memory efficient than the mapping. the speed is the same for small queries, and hash table is slower than replace for a large table, im assuming because of the hash + compound key. Making it a nested map might reduce the cost of the compound key but that would increase the runtime cost. code:
results:
edit: upon some more thinking, it could perhaps be possible to create an efficient cache if one was initialized per query? that would remove the large cost of hashing the query, as that would be compiled into the code, and instead you would only hash your substitutions. It would still be debatable performance based off the query size and substitution size though. the query for pgx has to be a string anyways - and so for large queries, that allocation will probably always be your bottleneck, while for small strings substitution is rather fast. |
Hey @kyleconroy, wanted to see if you've thought any more about how this might be implemented or where it fits on your roadmap. As of now it's the only thing blocking River from having full support for multiple Postgres schemas, and while we really don't want to abandon sqlc it's a pretty important feature that we'll need to support soon one way or another. |
@kyleconroy at this point freeing up these sessions (thousands) that river requires because of this limitation of sqlc would be big difference for us. i'm glad to do all the work to implement this feature, but I want to confirm that this is still something you are interested in and would be okay with merging. |
Hi @kyleconroy and @andrewmbenton 👋 @brandur and I have recently been looking more into making River fully support isolation at the Postgres schema level (that is, allowing different River clients to be configured to use different Postgres schemas). Our initial hope was to be able to rely on the
search_path
connection param as recommended by pgx and #2635 for this purpose.Unfortunately, one of our users quickly discovered that this approach is basically incompatible with pgbouncer. As far as I know the only ways to make this work with pgbouncer and sqlc would be:
SET search_path
prior to any query, within a transaction. This is error prone and has perf penalties in the form of added round trips.I decided to spend chunk of time last night seeing if I could hack a way to achieve option (3) in sqlc. It actually wasn't too difficult to achieve in a hacky way, though doing so did cement my dislike of Go's
text/template
😆 This draft PR shows one potential path for achieving this using a newuse_schema_placeholder
option, which when enabled causes all queries to be passed through a runtime string replacement on the known prefix placeholdersqlc_schema_placeholder.
. It's definitely ugly and probably shouldn't be shipped in any form that resembles this PR, but I did want to at least raise it as a possible proof-of-concept.Here's a corresponding River commit showing the queries that I updated to make use of the placeholder prefix. This sqlc PR does actually build and generate fully functional code for being able to use arbitrary schemas specified at runtime as shown in that commit.
I have lots of ideas for how this could be improved or done differently from the way I've done here (caching, maybe make this customizable per-query ala Ecto instead of being statically wired into
Queries
, etc.), but rather than spin my wheels on them I thought I'd use this draft PR to start a discussion around if and how this might be achieved in sqlc given the fact that the previously recommended approach is not actually viable for those using pgbouncer in the most common way.I also want to highlight that this is fairly distinct from allowing arbitrary runtime SQL substitution, because a schema prefix can be strictly validated to avoid i.e. injection concerns.
Related issues: