Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Python3.13, fix running test on localhost, and move regression tests to pytest #545

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

MonkeyCanCode
Copy link
Contributor

@MonkeyCanCode MonkeyCanCode commented Dec 13, 2024

Description

Fix #21

High level changes:

  1. Add support for Python 3.13:

While trying to run regtest on local with Python 3.13, it failed on installing pydantic-core:

➜ regtests/run.sh
...
Package operations: 8 installs, 1 update, 0 removals
...
  • Installing pydantic-core (2.18.4): Failed
...
  ChefBuildError
  Backend subprocess exited when trying to invoke build_wheel
  Running `maturin pep517 build-wheel -i /private/var/folders/fl/1gq302cn7dd_w8mx24s7zhq40000gn/T/tmponjgdrat/.venv/bin/python --compatibility off`
  💥 maturin failed
    Caused by: Cargo metadata failed. Do you have cargo in your PATH?
    Caused by: No such file or directory (os error 2)
  Error: command ['maturin', 'pep517', 'build-wheel', '-i', '/private/var/folders/fl/1gq302cn7dd_w8mx24s7zhq40000gn/T/tmponjgdrat/.venv/bin/python', '--compatibility', 'off'] returned non-zero exit status 1
...
Note: This error originates from the build backend, and is likely not a problem with poetry but with pydantic-core (2.18.4) not supporting PEP 517 builds. You can verify this by running 'pip wheel --use-pep517 "pydantic-core (==2.18.4)"'.
...
Thu Dec 12 22:46:15 CST 2024: Starting pytest t_cli:test_cli.py
/Users/yong/polaris-venv/bin/python3: No module named pytest
...
Thu Dec 12 22:46:15 CST 2024: Test FAILED: t_cli:test_cli.py
...

This then prevents the installation of pytest which then fails all tests that are based on pytest.

  1. Fix running test on localhost
    Also, regtests/t_cli/src/test_cli.py set default polaris host to polaris which can be problematic (as we are not documenting extra set for setting hostname anywhere for running test on local. Thus, better to set this to localhost instead.

  2. Move all regression tests to pytests

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Detail changes summary:

  1. Update README.md for inform users Polaris server should be start before local test can be run
  2. Update README.md/docker-compose to remove AWS_TEST_BASE reference due to bad incomplete logic in the test cases (we should create a diff test case for this)
  3. Refactor dockerfile for regtests
  4. Add missing Azure variables in regtests/README.md
  5. Include spark 3.5.3 via pip instead of setting up local spark
  6. Bump the version of tox from >=3.9.0 to >=4.19.0 as that is the first version that support Python 3.13
  7. Locked the version of platformdirs to 4.3.6 as virtualenv has >=3.9.1,<5 and will try to downgrade version to 3.11.0 during rerun of local test
  8. Remove regtests/pyspark-setup.sh, regtests/run_spark_sql.sh and regtests/setup.sh as these two are no longer needed
  9. Refactor regtests/runsh to perform regression tests on pytests only
  10. Use localhost as default value for polaris host instead of polaris to support running regression tests not within docker (still do-able without this change if we want to add document to ask user to set POLARIS_HOST)
  11. Refactor conftest.py and iceberg_spark.py
  12. Reduce older ref/src regression tests with pytest

Things I didn't do

  1. As I don't have access to Azure/GCP, I am not able to test 3 of the refactored regression tests related to them
  2. I did some minor refactor on regtests/t_pyspark/src/test_spark_sql_s3_with_privileges.py. I can do another follow up PR to refactor all of them to more standard layout/format after current PR. Also, we should remove snowflake in this file in the next PR.

Test results:

# local test
➜ regtests/run.sh
...
Sat Dec 14 23:14:17 CST 2024: Running all tests
Sat Dec 14 23:14:17 CST 2024: Starting pytest t_cli:test_cli.py
========================================================================== test session starts ===========================================================================
...
=========================================================================== 9 passed in 35.10s ===========================================================================
Sat Dec 14 23:14:52 CST 2024: Test SUCCEEDED: t_cli:test_cli.py
Sat Dec 14 23:14:52 CST 2024: Starting test t_hello_world:hello_world.sh
Sat Dec 14 23:14:52 CST 2024: Test run concluded for t_hello_world:hello_world.sh
Sat Dec 14 23:14:52 CST 2024: Starting pytest t_pyspark:test_spark_sql_azure_blob.py
========================================================================== test session starts ===========================================================================
...
=========================================================================== 1 skipped in 0.14s ===========================================================================
Sat Dec 14 23:14:53 CST 2024: Test SUCCEEDED: t_pyspark:test_spark_sql_azure_blob.py
Sat Dec 14 23:14:53 CST 2024: Starting pytest t_pyspark:test_spark_sql_s3_with_privileges.py
========================================================================== test session starts ===========================================================================
...
============================================================== 14 passed, 27 warnings in 200.40s (0:03:20) ===============================================================
Sat Dec 14 23:18:14 CST 2024: Test SUCCEEDED: t_pyspark:test_spark_sql_s3_with_privileges.py
Sat Dec 14 23:18:14 CST 2024: Starting pytest t_pyspark:test_spark_sql_view.py
========================================================================== test session starts ===========================================================================
...
=========================================================================== 1 passed in 11.21s ===========================================================================
Sat Dec 14 23:18:26 CST 2024: Test SUCCEEDED: t_pyspark:test_spark_sql_view.py
Sat Dec 14 23:18:26 CST 2024: Starting pytest t_pyspark:test_spark_sql_gcp.py
========================================================================== test session starts ===========================================================================
...
=========================================================================== 1 skipped in 0.06s ===========================================================================
Sat Dec 14 23:18:27 CST 2024: Test SUCCEEDED: t_pyspark:test_spark_sql_gcp.py
Sat Dec 14 23:18:27 CST 2024: Starting pytest t_pyspark:test_spark_sql_basic.py
========================================================================== test session starts ===========================================================================
...
=========================================================================== 1 passed in 9.18s ============================================================================
Sat Dec 14 23:18:36 CST 2024: Test SUCCEEDED: t_pyspark:test_spark_sql_basic.py
Sat Dec 14 23:18:36 CST 2024: Starting pytest t_pyspark:test_spark_sql_s3.py
========================================================================== test session starts ===========================================================================
...
=========================================================================== 1 passed in 30.70s ===========================================================================
Sat Dec 14 23:19:08 CST 2024: Test SUCCEEDED: t_pyspark:test_spark_sql_s3.py
Sat Dec 14 23:19:08 CST 2024: Starting pytest t_pyspark:test_spark_sql_azure_dfs.py
========================================================================== test session starts ===========================================================================
...
=========================================================================== 1 skipped in 0.10s ===========================================================================
Sat Dec 14 23:19:09 CST 2024: Test SUCCEEDED: t_pyspark:test_spark_sql_azure_dfs.py
Sat Dec 14 23:19:09 CST 2024: Starting pytest t_pyspark:test_spark_sql_s3_cross_region.py
========================================================================== test session starts ===========================================================================
...
=========================================================================== 1 passed in 17.49s ===========================================================================
Sat Dec 14 23:19:27 CST 2024: Test SUCCEEDED: t_pyspark:test_spark_sql_s3_cross_region.py
Sat Dec 14 23:19:27 CST 2024: Tests completed with 0 successes and 0 failures

# docker test
➜ docker compose up --build --exit-code-from regtest
...
regtest-1  | Sun Dec 15 05:13:05 UTC 2024: Test SUCCEEDED: t_pyspark:test_spark_sql_s3.py
regtest-1  | Sun Dec 15 05:13:05 UTC 2024: Tests completed with 0 successes and 0 failures
regtest-1 exited with code 0
Aborting on container exit...

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue

@MonkeyCanCode MonkeyCanCode changed the title Add support for Python3.13 and fix running test on localhost Add support for Python3.13, fix running test on localhost, and move regression tests to pytests Dec 15, 2024
@MonkeyCanCode MonkeyCanCode changed the title Add support for Python3.13, fix running test on localhost, and move regression tests to pytests Add support for Python3.13, fix running test on localhost, and move regression tests to pytest Dec 15, 2024
@MonkeyCanCode
Copy link
Contributor Author

@adutra @collado-mike @eric-maynard may I get a review for this one?

@adutra
Copy link
Contributor

adutra commented Jan 13, 2025

@MonkeyCanCode from a first glance, I note that the original ask in #21 was to move the tests to pytests, but your PR has actually 3 (unrelated?) changes:

  1. Upgrade to python 3.13
  2. Fix running tests locally
  3. Move to pytests

Would it be possible to split this PR in 3? In particular I'm worried about the complete redesign of the dockerfile.

Also, the removal of regtests/run_spark_sql.sh seems problematic to me: it's used to run Spark locally for adhoc testing against a local running Polaris server:

polaris/regtests/README.md

Lines 105 to 111 in 9377aa7

## Run a spark-sql interactive shell
With in-memory standalong Polaris running:
```
./run_spark_sql.sh
```

@MonkeyCanCode
Copy link
Contributor Author

@MonkeyCanCode from a first glance, I note that the original ask in #21 was to move the tests to pytests, but your PR has actually 3 (unrelated?) changes:

  1. Upgrade to python 3.13
  2. Fix running tests locally
  3. Move to pytests

Would it be possible to split this PR in 3? In particular I'm worried about the complete redesign of the dockerfile.

Also, the removal of regtests/run_spark_sql.sh seems problematic to me: it's used to run Spark locally for adhoc testing against a local running Polaris server:

polaris/regtests/README.md

Lines 105 to 111 in 9377aa7

## Run a spark-sql interactive shell
With in-memory standalong Polaris running:
```
./run_spark_sql.sh
```

Thanks for the quick check. Yes, I can split them into 3 diff PRs later this week. So the run_spark_sql.sh won't be needed with this PR as it got refactor into running the same thing in pytest. Also, I removed the part for downloading spark distribution and installed it via pip poetry instead. But yeah, let me do them in diff PR. Also, one more thing is python 3.8 is EOL since 2024-10-07. Should we bump the minimal support version to 3.9 instead (EOL at 2025-10)?

@adutra
Copy link
Contributor

adutra commented Jan 13, 2025

Sounds good, looking forward to the new PRs!

So the run_spark_sql.sh won't be needed with this PR as it got refactor into running the same thing in pytest.

I still am concerned by this. What if users want to start a Spark SQL session just to play with Polaris locally? Is there another easy way to do this?

@flyrain
Copy link
Contributor

flyrain commented Jan 13, 2025

run_spark_sql.sh is not limited to regression testing; it’s pretty useful for local PoC or quick verifications. I believe it adds significant value in these scenarios and should not be removed. However, I’m open to relocating it to a different location if it no longer aligns with the primary purpose of regression testing.

I can split them into 3 diff PRs later this week.

+1 for splitting

@MonkeyCanCode
Copy link
Contributor Author

run_spark_sql.sh is not limited to regression testing; it’s pretty useful for local PoC or quick verifications. I believe it adds significant value in these scenarios and should not be removed. However, I’m open to relocating it to a different location if it no longer aligns with the primary purpose of regression testing.

I can split them into 3 diff PRs later this week.

+1 for splitting

Sounds good. Let me revisit this a bit later this week then update.

@MonkeyCanCode MonkeyCanCode marked this pull request as draft January 14, 2025 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move all regression tests to pytests
3 participants