Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix polaris CLI support for updating catalog properties and hitting real Polaris services #404

Conversation

dennishuo
Copy link
Contributor

@dennishuo dennishuo commented Oct 25, 2024

Add a --base-url argument that can be used instead of --host --port to support arbitrary Polaris URLs including https ones that have additional base prefixes before the /api/*/v1 subpaths.

Fix namespaces.py to parse such URLs as well.

Fix catalogs update to explicitly preserve default-base-location in the update request and use dict merge semantics for specified properties instead of just replacing the entire catalog contents with only what's specified in cli arguments, otherwise default-base-location or other properties get clobbered unexpectedly.

Make the test_cli.py abide by POLARIS_HOST so that it can be pointed at a basic ./gradlew runApp with POLARIS_HOST=localhost even without a docker container.

Add tests for updating catalog properties.

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Manual testing using misc Polaris services
  • ./run.sh t_cli/test_cli.py

Test Configuration:

  • Hardware:
  • Toolchain:
  • SDK:

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue

…eal Polaris services

Add a --base-url argument that can be used instead of --host --port to support
arbitrary Polaris URLs including https ones that have additional base prefixes
before the /api/*/v1 subpaths.

Fix namespaces.py to parse such URLs as well.

Fix `catalogs update` to explicitly preserve default-base-location in the update
request and use dict merge semantics for specified properties instead of just
replacing the entire catalog contents with only what's specified in cli arguments,
otherwise default-base-location or other properties get clobbered unexpectedly.

Make the test_cli.py abide by POLARIS_HOST so that it can be pointed at a basic
`./gradlew runApp` with `POLARIS_HOST=localhost` even without a docker container.

Add tests for updating catalog properties.
Add checks for not specifying both --base-url and --host/--port
Remove duplicate examples
Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, left a few suggestions. Thanks for the fix!

@eric-maynard
Copy link
Contributor

eric-maynard commented Oct 25, 2024

I took another look at the namespaces command compared to the updated catalogs command, and I have one concern about this change.

Currently, the other commands support an update that overwrites properties. We probably shouldn't change catalog's update to more of a "merge" type operation without changing the other commands at the same time. There's also something to be said for breaking the existing APIs.

I am supportive of us changing all of the APIs to more of a "merge" style update (at the same time?), but in that case I would like to see us simultaneously add new APIs to actually remove properties / property-adjacent fields like default-base-location.

…nds instead of using --property.

This pattern matches Iceberg Spark's behavior for e.g. ALTER TABLE SET TBLPROPERTIES and ALTER TABLE UNSET TBLPROPERTIES
where the SET command has "merge" semantics and thus UNSET is analogous to the separate --remove-property. This
syntax is also consistent with other cloud providers' command-line tools such as `gcloud compute instances update --update-labels ... --remove-labels

Implement consistent merge/remove pattern for catalogs/principals/principal-roles/catalog-roles

Fix `namespaces create` to correctly merge --location with --property

Fix `namespaces get` to print the full response json after calling the right load_namespace_metadata API instead of only calling namespace_exists

Fix the handling of --base-url being exclusive to --host/--port
Copy link
Contributor Author

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to have consistent merge behavior, introduced --set-property and --remove-property which is how other CLIs that follow the REST style have handled this case, such as gcloud compute instances update --update-labels ... --remove-labels ...: https://cloud.google.com/sdk/gcloud/reference/compute/instances/update

We probably don't need to be too worried about this being a "breaking" change to the CLI because:

  1. catalogs update never even worked locally because of line 192 having catalog=catalog instead of properties=.... -- UpdateCatalogRequest doesn't actually have a catalog member so the update doesn't work
  2. principals.py on line 79 had current_entity_version=principal.current_entity_version, when there is no such member on the right-hand-side (it should be principal.entity_version) so it also errored out
  3. No versioned release of Polaris yet anyways

regtests/client/python/cli/polaris_cli.py Outdated Show resolved Hide resolved
regtests/t_cli/src/test_cli.py Show resolved Hide resolved
regtests/client/python/cli/command/catalogs.py Outdated Show resolved Hide resolved
@@ -46,6 +46,8 @@ class CatalogRolesCommand(Command):
catalog_role_name: str
principal_role_name: str
properties: Optional[Dict[str, StrictStr]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does properties do now? Can we remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. could you use set-property during create as well?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking for preserving --property for create is to emphasize the contrasting "declarative" semantics of CREATE vs the choice of having an "imperative" semantic for UPDATE. Looking at examples of other CLIs with a similar split, the same pattern is used:

gcloud compute instances create ... --labels=...
gcloud compute instances update ... --update-labels=... --remove-labels=...

And similarly --update-labels and --remove-labels aren't available in the create command, while --labels is not available in the update command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. My initial hesitation was just having two properties that do the same thing.

Lets keep this, but I think we should reject the "invalid" inputs then:
catalogs update some_catalog --property k=v
catalogs update some_catalog --set-property k=v

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this test case in test_cli_parsing.py covers the rejection of --property:

https://github.com/apache/polaris/pull/404/files#diff-437c53a66ce1039f3ff70bef07bb0325924edf2b8f2c864762969e6ecdde3ecfR58

    with self.assertRaises(SystemExit) as cm:
        Parser.parse(['catalogs', 'update', 'catalog_name', '--property', 'foo=bar'])
    self.assertEqual(cm.exception.code, INVALID_ARGS)

…hon-cli-for-catalog-updates-and-managed-polaris-services
@eric-maynard eric-maynard merged commit ea7ffeb into apache:main Oct 30, 2024
5 checks passed
@dennishuo dennishuo deleted the dhuo-fix-python-cli-for-catalog-updates-and-managed-polaris-services branch October 31, 2024 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants