-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-948287: Best practice for doing cross database joins using CORE #456
Comments
Has anyone had a chance to review? |
Jim - Do you have access to this: This article provides alternatives for How To get around this using session variables and optionally identifier() Details SELECT set dbEntNm = 'ent_dev_edw.'; ... We do have a Synonym jira ticket out there and I will add your company to it. -Pete |
I hacked my way around it but because of how our org is set up, I'm unable to run the test suite to verify it doesn't break anything. If someone could help out, that'd be wonderful. All edits were made to Bottom line, there are two sections in this file around reflection. I added a bit of code to check to see if the schema has a |
I am literally copying this from the ORM question because it's virtually identical but the ORM approach doesn't work for CORE.
Question
Hi there, in Snowflake we often have multiple databases, with their own set of schemas, and tables inside those schemas. Ideally data shouldn't have to move between these databases nor schemas, but in practice they often do.
One use case is when there might be valuable data between two different databases. In my example, I'm going to reference database
A
and databaseB
which has schemasant
andbaboons
, which has tablesapples
andbananas
, respectively.Through a direct query using Snowflake SQL, you could implement a join like
How can you implement a cross database join like the above example using a single session through the CORE interface? My understanding is that a session can only point towards one database. I'm not creating an app and the tables already exist so ideally this would all happen through SQLAlchemy MetaData object reflection. Below are some example of what I've tried.
The ideal approach would look like the following:
I've tried all types of combinations of cases with the the database name and schema names too. None of them work.
What I find silly about the above is I often get StackTraces saying I have to provide a fully qualified name since I'm not declaring a database. But I AM providing a fully qualified name. And not explicitly choosing a database is the point. I need to pull data from tables in multiple databases so the reflected tables NEED TO BE FULLY QUALIFIED.
Can this be done through the CORE interface without dropping to SQL? If so, what's the best practice in doing this?
Please answer these questions before submitting your issue. Thanks!
What version of Python are you using?
Python 3.9.6
What operating system and processor architecture are you using?
macOS-14.0-arm64-arm-64bit
What are the component versions in the environment (
pip freeze
)?appnope==0.1.3 asn1crypto==1.5.1 asttokens==2.4.0 backcall==0.2.0 certifi==2023.7.22 cffi==1.16.0 charset-normalizer==3.3.0 comm==0.1.4 cryptography==41.0.4 debugpy==1.8.0 decorator==5.1.1 exceptiongroup==1.1.3 executing==2.0.0 filelock==3.12.4 idna==3.4 importlib-metadata==6.8.0 ipykernel==6.25.2 ipython==8.16.1 jaraco.classes==3.3.0 jedi==0.19.1 jupyter_client==8.4.0 jupyter_core==5.4.0 keyring==24.2.0 matplotlib-inline==0.1.6 more-itertools==10.1.0 nest-asyncio==1.5.8 numpy==1.26.1 oscrypto==1.3.0 packaging==23.2 pandas==2.0.3 parso==0.8.3 pexpect==4.8.0 pickleshare==0.7.5 platformdirs==3.11.0 prompt-toolkit==3.0.39 psutil==5.9.6 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==10.0.1 pycparser==2.21 pycryptodomex==3.19.0 Pygments==2.16.1 PyJWT==2.8.0 pyOpenSSL==23.2.0 python-dateutil==2.8.2 pytz==2023.3.post1 pyzmq==25.1.1 requests==2.31.0 six==1.16.0 snowflake-connector-python==3.3.1 snowflake-sqlalchemy==1.5.0 sortedcontainers==2.4.0 SQLAlchemy==1.4.49 sqlparse==0.4.4 stack-data==0.6.3 tomlkit==0.12.1 tornado==6.3.3 traitlets==5.11.2 typing_extensions==4.8.0 tzdata==2023.3 urllib3==1.26.18 wcwidth==0.2.8 zipp==3.17.0
What did you do?
The above example should be sufficient.
What did you expect to see?
Ideally, the fully qualified table objects, including the database, would have been instantiated as SQLAlchemy Table objects. In this way, I could then create a SQLAlchemy
select
statement such asselect(a).select_from(a.join(b, on=[a.c.col1 == b.c.col1])
and the corresponding SQL generated would beSELECT A.ant.apple.* FROM A.ant.apple JOIN B.baboons.bananas ON A.ant.apple.col1 = B.baboons.bananas.col1;
.Instead, the reflection step fails forcing me to choose a database which then prevents me from joining tables in different databases.
Can you set logging to DEBUG and collect the logs?
I'm happy to provide this info but in private. It's on a company computer with company backend stuff that's not for the public eye.
The text was updated successfully, but these errors were encountered: