Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test #1

Open
wants to merge 134 commits into
base: main
Choose a base branch
from
Open

Test #1

wants to merge 134 commits into from

Conversation

evb123
Copy link

@evb123 evb123 commented Jan 26, 2024

TEst

susodapop and others added 30 commits June 7, 2023 14:02
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
## Summary

Support OAuth flow for Databricks Azure

## Background

Some OAuth endpoints (e.g. Open ID Configuration) and scopes are different between Databricks Azure and AWS. Current code only supports OAuth flow on Databricks in AWS

## What changes are proposed in this pull request?

- Change `OAuthManager` to decouple Databricks AWS specific configuration from OAuth flow
- Add `sql/auth/endpoint.py` that implements cloud specific OAuth endpoint configuration
- Change `DatabricksOAuthProvider` to work with the OAuth configurations in different Databricks cloud (AWS, Azure)
- Add the corresponding unit tests
---------

Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
* Cloud Fetch download handler

Signed-off-by: Matthew Kim <[email protected]>

* Issue fix: final result link compressed data has multiple LZ4 end-of-frame markers

Signed-off-by: Matthew Kim <[email protected]>

* Addressing PR comments
 - Linting
 - Type annotations
 - Use response.ok
 - Log exception
 - Remove semaphore and only use threading.event
 - reset() flags method
 - Fix tests after removing semaphore
 - Link expiry logic should be in secs
 - Decompress data static function
 - link_expiry_buffer and static public methods
 - Docstrings and comments

Signed-off-by: Matthew Kim <[email protected]>

* Changing logger.debug to remove url

Signed-off-by: Matthew Kim <[email protected]>

* _reset() comment to docstring

Signed-off-by: Matthew Kim <[email protected]>

* link_expiry_buffer -> link_expiry_buffer_secs

Signed-off-by: Matthew Kim <[email protected]>

---------

Signed-off-by: Matthew Kim <[email protected]>
* Cloud Fetch download manager

Signed-off-by: Matthew Kim <[email protected]>

* Bug fix: submit handler.run

Signed-off-by: Matthew Kim <[email protected]>

* Type annotations

Signed-off-by: Matthew Kim <[email protected]>

* Namedtuple -> dataclass

Signed-off-by: Matthew Kim <[email protected]>

* Shutdown thread pool and clear handlers

Signed-off-by: Matthew Kim <[email protected]>

* Docstrings and comments

Signed-off-by: Matthew Kim <[email protected]>

* handler.run is the correct call

Signed-off-by: Matthew Kim <[email protected]>

* Link expiry buffer in secs

Signed-off-by: Matthew Kim <[email protected]>

* Adding type annotations for download_handlers and downloadable_result_settings

Signed-off-by: Matthew Kim <[email protected]>

* Move DownloadableResultSettings to downloader.py to avoid circular import

Signed-off-by: Matthew Kim <[email protected]>

* Black linting

Signed-off-by: Matthew Kim <[email protected]>

* Timeout is never None

Signed-off-by: Matthew Kim <[email protected]>

---------

Signed-off-by: Matthew Kim <[email protected]>
* Cloud fetch queue and integration

Signed-off-by: Matthew Kim <[email protected]>

* Enable cloudfetch with direct results

Signed-off-by: Matthew Kim <[email protected]>

* Typing and style changes

Signed-off-by: Matthew Kim <[email protected]>

* Client-settable max_download_threads

Signed-off-by: Matthew Kim <[email protected]>

* Docstrings and comments

Signed-off-by: Matthew Kim <[email protected]>

* Increase default buffer size bytes to 104857600

Signed-off-by: Matthew Kim <[email protected]>

* Move max_download_threads to kwargs of ThriftBackend, fix unit tests

Signed-off-by: Matthew Kim <[email protected]>

* Fix tests: staticmethod make_arrow_table mock not callable

Signed-off-by: Matthew Kim <[email protected]>

* cancel_futures in shutdown() only available in python >=3.9.0

Signed-off-by: Matthew Kim <[email protected]>

* Black linting

Signed-off-by: Matthew Kim <[email protected]>

* Fix typing errors

Signed-off-by: Matthew Kim <[email protected]>

---------

Signed-off-by: Matthew Kim <[email protected]>
* Cloud Fetch e2e tests

Signed-off-by: Matthew Kim <[email protected]>

* Test case works for e2-dogfood shared unity catalog

Signed-off-by: Matthew Kim <[email protected]>

* Moving test to LargeQueriesSuite and setting catalog to hive_metastore

Signed-off-by: Matthew Kim <[email protected]>

* Align default value of buffer_size_bytes in driver tests

Signed-off-by: Matthew Kim <[email protected]>

* Adding comment to specify what's needed to run successfully

Signed-off-by: Matthew Kim <[email protected]>

---------

Signed-off-by: Matthew Kim <[email protected]>
Signed-off-by: Sebastian Eckweiler <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Co-authored-by: Sebastian Eckweiler <[email protected]>
Co-authored-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Daniel Segesdi <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Co-authored-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Co-authored-by: Jesse Whitehouse <[email protected]>
---------
Signed-off-by: Bogdan Kyryliuk <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Co-authored-by: Jesse Whitehouse <[email protected]>
Signed-off-by: William Gentry <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Co-authored-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Behaviour is gated behind `enable_v3_retries` config. This will be removed and become the default behaviour in a subsequent release.

Signed-off-by: Jesse Whitehouse <[email protected]>
benc-db and others added 30 commits March 14, 2024 14:26
* Add a default for retry after
Signed-off-by: Ben Cassell <[email protected]>

* Applied black formatter
Signed-off-by: Ben Cassell <[email protected]>
Set supports_native_boolean to True

Signed-off-by: Alex Holyoke <[email protected]>
* Don't retry requests that fail with 404

Signed-off-by: Jesse Whitehouse <[email protected]>

* Fix lint error

Signed-off-by: Jesse Whitehouse <[email protected]>

---------

Signed-off-by: Jesse Whitehouse <[email protected]>
* bump to 3.1.1

Signed-off-by: Ben Cassell <[email protected]>
* fix cookie setting

Signed-off-by: Ben Cassell <[email protected]>

* Removing cookie code

Signed-off-by: Ben Cassell <[email protected]>

---------

Signed-off-by: Ben Cassell <[email protected]>
* Create py.typed

Signed-off-by: wyattscarpenter <[email protected]>

* add -> Connection annotation

Signed-off-by: wyattscarpenter <[email protected]>

* massage the code to appease the particular version of the project's mypy deps

Signed-off-by: wyattscarpenter <[email protected]>

* fix circular import problem

Signed-off-by: wyattscarpenter <[email protected]>

---------

Signed-off-by: wyattscarpenter <[email protected]>
fix the return types of the classes' __enter__ functions so that the type information is preserved in context managers eg with-as blocks

Signed-off-by: wyattscarpenter <[email protected]>
Signed-off-by: Ben Cassell <[email protected]>
changed authentication for proxy
Signed-off-by: Levko Kravets <[email protected]>
* Relax `pyarrow` pin

Signed-off-by: Dave Hirschfeld <[email protected]>

* Allow `pyarrow` 16

Signed-off-by: Dave Hirschfeld <[email protected]>

* Update `poetry.lock`

Signed-off-by: Dave Hirschfeld <[email protected]>

---------

Signed-off-by: Dave Hirschfeld <[email protected]>
* Duplicate of applicable change from #93

Signed-off-by: Jesse Whitehouse <[email protected]>

* Update changelog

Signed-off-by: Jesse Whitehouse <[email protected]>

* Fix after merge

Signed-off-by: Levko Kravets <[email protected]>

---------

Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Levko Kravets <[email protected]>
Co-authored-by: Levko Kravets <[email protected]>
* Enable `delta.feature.allowColumnDefaults` for all tables

* Code style

Signed-off-by: Levko Kravets <[email protected]>

---------

Signed-off-by: Levko Kravets <[email protected]>
Co-authored-by: Levko Kravets <[email protected]>
Signed-off-by: Levko Kravets <[email protected]>
Signed-off-by: Milan Lukac <[email protected]>
* Prepare release 3.2.0

Signed-off-by: Levko Kravets <[email protected]>

* Update changelog

Signed-off-by: Levko Kravets <[email protected]>

---------

Signed-off-by: Levko Kravets <[email protected]>
* move py.typed to correct places

https://peps.python.org/pep-0561/ says 'For namespace packages (see PEP 420), the py.typed file should be in the submodules of the namespace, to avoid conflicts and for clarity.'. Previously, when I added the py.typed file to this project, #382 , I was unaware this was a namespace package (although, curiously, it seems I had done it right initially and then changed to the wrong way). As PEP 561 warns us, this does create conflicts; other libraries in the databricks namespace package (such as, in my case, databricks-vectorsearch) are then treated as though they are typed, which they are not. This commit moves the py.typed file to the correct places, the submodule folders, fixing that problem.
Signed-off-by: wyattscarpenter <[email protected]>

* change target of mypy to src/databricks instead of src.

I think this might fix the CI code-quality checks failure, but unfortunately I can't replicate that failure locally and the error message is unhelpful

Signed-off-by: wyattscarpenter <[email protected]>

* Possible workaround for bad error message 'error: --install-types failed (no mypy cache directory)'; see python/mypy#10768 (comment)

Signed-off-by: wyattscarpenter <[email protected]>

* fix invalid yaml syntax

Signed-off-by: wyattscarpenter <[email protected]>

* Best fix (#3)

Fixes the problem by cding and supplying a flag to mypy (that mypy needs this flag is seemingly fixed/changed in later versions of mypy; but that's another pr altogether...). Also fixes a type error that was somehow in the arguments of the program (?!) (I guess this is because you guys are still using implicit optional)

---------

Signed-off-by: wyattscarpenter <[email protected]>

* return the old result_links default (#5)

Return the old result_links default, make the type optional, & I'm pretty sure the original problem is that add_file_links can't take a None, so these statements should be in the body of the if-statement that ensures it is not None

Signed-off-by: wyattscarpenter <[email protected]>

* Update src/databricks/sql/utils.py

"self.download_manager is unconditionally used later, so must be created. Looks this part of code is totally not covered with tests 🤔"

Co-authored-by: Levko Kravets <[email protected]>
Signed-off-by: wyattscarpenter <[email protected]>

---------

Signed-off-by: wyattscarpenter <[email protected]>
Co-authored-by: Levko Kravets <[email protected]>
* Upgrade mypy

This commit removes the flag (and cd step) from f53aa37 which we added to get mypy to treat namespaces correctly. This was apparently a bug in mypy, or behavior they decided to change. To get the new behavior, we must upgrade mypy. (This also allows us to remove a couple `# type: ignore` comment that are no longer needed.)

This commit runs changes the version of mypy and runs `poetry lock`. It also conforms the whitespace of files in this project to the expectations of various tools and standard (namely: removing trailing whitespace as expected by git and enforcing the existence of one and only one newline at the end of a file as expected by unix and github.) It also uses https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade codebase due to a change in mypy behavior. For a similar reason, it also fixes a new type (or otherwise) errors:

* "Return type 'Retry' of 'new' incompatible with return type 'DatabricksRetryPolicy' in supertype 'Retry'"
* databricks/sql/auth/retry.py:225: error: object has no attribute update  [attr-defined]
* /test_param_escaper.py:31: DeprecationWarning: invalid escape sequence \) [as it happens, I think it was also wrong for the string not to be raw, because I'm pretty sure it wants all of its backslashed single-quotes to appear literally with the backslashes, which wasn't happening until now]
* ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject [this is like a numpy version thing, which I fixed by being stricter about numpy version]

---------

Signed-off-by: wyattscarpenter <[email protected]>

* Incorporate suggestion.

I decided the most expedient way of dealing with this type error was just adding the type ignore comment back in, but with a  `[attr-defined]` specifier this time. I mean, otherwise I would have to restructure the code or figure out the proper types for a TypedDict for the dict and I don't think that's worth it at the moment.

Signed-off-by: wyattscarpenter <[email protected]>

---------

Signed-off-by: wyattscarpenter <[email protected]>
- Raises NonRecoverableNetworkError when request results in 401 status code

Signed-off-by: Tor Hødnebø <[email protected]>
Signed-off-by: Tor Hødnebø <[email protected]>
…#405)

* [PECO-1751] Refactor CloudFetch downloader: handle files sequentially; utilize Futures

Signed-off-by: Levko Kravets <[email protected]>

* Retry failed CloudFetch downloads

Signed-off-by: Levko Kravets <[email protected]>

* Update tests

Signed-off-by: Levko Kravets <[email protected]>

---------

Signed-off-by: Levko Kravets <[email protected]>
* Disable SSL verification for CloudFetch links

Signed-off-by: Levko Kravets <[email protected]>

* Use existing `_tls_no_verify` option in CloudFetch downloader

Signed-off-by: Levko Kravets <[email protected]>

* Update tests

Signed-off-by: Levko Kravets <[email protected]>

---------

Signed-off-by: Levko Kravets <[email protected]>
* Prepare relese 3.3.0

Signed-off-by: Levko Kravets <[email protected]>

* Remove @arikfr from CODEOWNERS

Signed-off-by: Levko Kravets <[email protected]>

---------

Signed-off-by: Levko Kravets <[email protected]>
* Support pandas 2.2.2

See release note numpy 2.2.2:
https://pandas.pydata.org/docs/dev/whatsnew/v2.2.0.html#to-numpy-for-numpy-nullable-and-arrow-types-converts-to-suitable-numpy-dtype

* Allow pandas 2.2.2 in pyproject.toml

* Update poetry.lock, poetry lock --no-update

* Code style

Signed-off-by: Levko Kravets <[email protected]>

---------

Signed-off-by: Levko Kravets <[email protected]>
Co-authored-by: Levko Kravets <[email protected]>
…ion setting is provided (#419)

* [PECO-1801] Make OAuth as the default authenticator if no authentication setting is provided

Signed-off-by: Jacky Hu <[email protected]>
* [PECO-1857] Use SSL options with HTTPS connection pool

Signed-off-by: Levko Kravets <[email protected]>

* Some cleanup

Signed-off-by: Levko Kravets <[email protected]>

* Resolve circular dependencies

Signed-off-by: Levko Kravets <[email protected]>

* Update existing tests

Signed-off-by: Levko Kravets <[email protected]>

* Fix MyPy issues

Signed-off-by: Levko Kravets <[email protected]>

* Fix `_tls_no_verify` handling

Signed-off-by: Levko Kravets <[email protected]>

* Add tests

Signed-off-by: Levko Kravets <[email protected]>

---------

Signed-off-by: Levko Kravets <[email protected]>
Prepare release 3.4.0

Signed-off-by: Levko Kravets <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.