Rework PolarisAuthorizer to use self-contained manifest on input #499

dimas-b · 2024-11-29T03:09:00Z

Previously authZ checks were done based on a complex set of parameters provided individually. This PR consolidates authZ inputs into the PolarisResolutionManifest with declarative "selector" objects depending on operation context.

Naturally, Principal entity resolution is moved from the request authentication phase to authorization phase.

Request authentication merely validates the provided principal ID and/or name, but does not immediately access related entities from storage.

An anonymous authenticator and corresponding "allow all" authorizer are added as simple examples the separation of authN and authZ duties.

Also, application config is updated to allow different authorizer implementations to be chosen in runtime.

Has This Been Tested?

Unit / Integration tests

polaris-service/src/main/java/org/apache/polaris/service/config/PolarisApplicationConfig.java

polaris-core/src/main/java/org/apache/polaris/core/auth/PolarisAuthorizer.java

polaris-core/src/main/java/org/apache/polaris/core/auth/ResolvedPolarisPrincipal.java

adutra · 2024-11-29T08:56:56Z

polaris-core/src/main/java/org/apache/polaris/core/auth/AuthEntitySelector.java

+ * Defines which sub-set of a {@link PolarisResolutionManifest} should be used for various
+ * authorization check parameters.
+ */
+public interface AuthEntitySelector {


Suggested change

public interface AuthEntitySelector {

@FunctionalInterface

public interface AuthEntitySelector {

Actually, I moved "targets" and "secondaries" inside PolarisResolutionManifest. I think this improves encapsulation. Now callers prepare the manifest and add inputs to it, then objects are resolved, then authorizer gets relevant entries directly from the manifest.

polaris-core/src/main/java/org/apache/polaris/core/auth/ActivatedEntitySelector.java

polaris-core/src/main/java/org/apache/polaris/core/auth/AllowAllAuthorizer.java

polaris-core/src/main/java/org/apache/polaris/core/auth/CoreAuthorizer.java

adutra · 2024-11-29T09:40:43Z

polaris-service/src/test/resources/polaris-server-anonymous.yml

+# under the License.
+#
+
+server:


The only difference between this config file and the "default" one for integration tests is the authenticator class. But I think we need to keep it because of the tokenBroker field... Dropwizard is really crappy :-(

adutra · 2024-11-29T09:45:56Z

polaris-server.yml

@@ -105,6 +105,8 @@ authenticator:
 #    type: symmetric-key
 #    secret: polaris

+authorizer: org.apache.polaris.service.auth.CoreAuthorizer


This should be org.apache.polaris.core.auth.CoreAuthorizer – this is why the regtests are failing.

That said I do think that moving it to polaris-service would be better :-)

Previously authZ checks were done based on a complex set of parameters provided individually. This PR consolidates authZ inputs into the PolarisResolutionManifest with declarative "selector" objects depending on operation context. Naturally, Principal entity resolution is moved from the request authentication phase to authorization phase. Request authentication merely validates the provided principal ID and/or name, but does not immediately access related entities from storage. An anonymous authenticator and corresponding "allow all" authorizer are added as simple examples the separation of authN and authZ duties. Also, application config is updated to allow different authorizer implementations to be chosen in runtime.

adutra · 2024-11-30T10:34:36Z

polaris-core/src/main/java/org/apache/polaris/core/auth/AuthenticatedPolarisPrincipal.java

+  /**
+   * Principal entity ID obtained during request authentication (e.g. from the authorization token).
+   *
+   * <p>Negative values indicate that a principal ID was not provided in authenticated data,


Nit: would an OptionalLong would be better here?

adutra · 2024-11-30T10:39:36Z

polaris-core/src/main/java/org/apache/polaris/core/auth/PolarisAuthorizer.java

+   * (including principals, roles, and catalog objects) that are affected by the operation.
+   *
+   * <p>"activated" entities, "targets" and "secondaries" are contained within the provided
+   * manifest. The extra selector parameters merely define what sub-set of objects from the manifest


I think this sentence is obsolete now: "The extra selector parameters merely define what sub-set of objects from the manifest should be considered as "targets", etc."

adutra · 2024-11-30T10:42:04Z

...service/src/main/java/org/apache/polaris/service/auth/AuthenticatedPolarisPrincipalImpl.java

+import java.util.Set;
+import org.apache.polaris.core.auth.AuthenticatedPolarisPrincipal;
+
+public final class AuthenticatedPolarisPrincipalImpl implements AuthenticatedPolarisPrincipal {


Maybe use a record here?

adutra · 2024-11-30T10:43:47Z

...s-service/src/main/java/org/apache/polaris/service/catalog/PolarisCatalogHandlerWrapper.java

+          if (subType == PolarisEntitySubType.TABLE) {
+            throw new NoSuchTableException("Table does not exist: %s", identifier);
+          } else {
+            throw new NoSuchViewException("View does not exist: %s", identifier);
+          }


Nitpicking, but below we are using the ternary operator for exactly the same logic:

Suggested change

if (subType == PolarisEntitySubType.TABLE) {

throw new NoSuchTableException("Table does not exist: %s", identifier);

} else {

throw new NoSuchViewException("View does not exist: %s", identifier);

}

throw subType == PolarisEntitySubType.TABLE

? new NoSuchTableException("Table does not exist: %s", identifier)

: new NoSuchViewException("View does not exist: %s", identifier);

this was a pure copy-paste of old code :) I did not try to rework it for the sake of correctness (aside from the secret rotation auth, which I had to change)

collado-mike

I like the additional authorizer factory approach. I'd really like to settle on #465 and #493 before we move forward here, though. There is a lot of refactoring going on and I think we should settle on a target goal before we step too much on each other's toes.

In my mind, the three areas of responsibility we're dealing with here are

Authenticator - responsible for validating the authentication token, validating the subject is a real principal, and validating the principal has the roles claimed
Authorizer - responsible for validating that the previously authenticated principal has the desired privileges on the target entities (this may be through grant records or some other policy-based approach)
Resolver - this is largely a performance-oriented component that manages the local entity cache so that we're always looking at the latest versions of the target entities while avoiding too many queries to the persistence layer

Ideally, the responsibilities of these three components should be entirely isolated. The presence of the Resolver should be optional, as it's really a performance optimization.

collado-mike · 2024-12-02T17:38:49Z

polaris-core/src/main/java/org/apache/polaris/core/auth/PolarisAuthorizer.java

-      @Nonnull PolarisAuthorizableOperation authzOp,
-      @Nullable List<PolarisResolvedPathWrapper> targets,
-      @Nullable List<PolarisResolvedPathWrapper> secondaries);
+      @Nonnull PolarisResolutionManifest manifest,


My intention in #465 is to decouple the PolarisAuthorizer logic from the Resolver code. IMO, the Authorizer should take in a list of entities (principal and authz targets) without regard for how those entities were retrieved. I kept the PolarisResolvedPathWrapper types in the signature just because we need a way to pass in full entity paths (e.g., a table's lineage, including parent namespaces and catalog) and I think it's more friendly that passing around Lists of Lists. But overall, I think we should avoid tying the Authorizer to the entity resolution.

I do believe that this change actually decouples the Authorizer implementation from entity resolution. That is because Authrorizer inputs in this PR are self-sufficient when the authrorization SPI is invoked. The implementation does not have to access the model. The default implementation certainly can function purely on the data from the authrorization SPI method parameters.

I suppose you meant it to be approached the other way, when the Authorizer SPI is decoupled from entity resolution, but the Authorizer implementation resolves what is needs to resolve against the storage data model inside the authorization calls.

The latter approach is certainly possible, but it will lose strong correlation with public API inputs. My reading of current code made me think, that the Authorizer was expected to act exactly on the same state of data that the API implementations use for producing API outputs. Is that so?

In that case, and if the Authorizer has to perform extra resolutions, it will complicate the contract of the Persistence layer, which will then have to ensure consistency across calls from multiple services. WDYT?

collado-mike · 2024-12-02T17:50:48Z

...service/src/main/java/org/apache/polaris/service/auth/AuthenticatedPolarisPrincipalImpl.java

+  private final Set<String> roles;
+
+  public AuthenticatedPolarisPrincipalImpl(long id, String name, Set<String> roles) {


I'm actually thinking we should move toward resolving the PrincipalRole entities prior to the authorization check. Right now, the Resolver code validates that the AuthenticatedPrincipal actually has the grant records to assume the roles specified in the authentication token. That means that we can't decouple the authenticator from the resolver, as someone needs to validate that the roles specified in the token can actually be assumed by the principal. I think that validation should happen at the Authenticator. This allows for third party identity providers to validate the roles in the token without needing grant records, while the default authenticator can continue to rely on the grant records stored by the grant manager.

We may have slightly different idea about what the Authenticator should do, I guess :)

From my POV, an Authenticator makes sure the user credentials in the request are valid in the sense that they are not fake or forged (i.e. authentic). I would not necessarily expect the Authenticator to validate that the principal is allowed to assume the roles it wants to assume. I think that is the role of the Authorizer.

In that regard, this PR proposes internal API changes that do decouple resolving principals and and roles against Polaris entities from the Authentication layer.

The main change from my POV is making the entity ID optional in AuthenticatedPolarisPrincipal, which is specifically to allow principals that are not mirrored in the Polaris database.

While I think this change is in line with future support for 3rd party identity providers, additional change will be required, I'm sure.

I think this is something that was brought up in the Federated Entities doc. Principal Roles are identity roles, not like Catalog Roles, which are groups of privileges. If Principal Roles are a form of identity (e.g., LDAP groups, Okta groups, or IAM group), then membership in those groups is the purview of the Identity Provider, not the Authorizer. I.e., if LDAP says I'm a member of the ENGINEER group, then that claim should be authoritative. The privileges assigned to the ENGINEER group are the purview of the Authorizer (e.g., are there grant records to show the ENGINEER role has access to read a table, is there a policy document that granting the ENGINEER group privileges to create a table in a given namespace...).

If the Authorizer is doubly responsible for validating group membership and for determining privileges, then either the Identity Provider must also be able to store privilege assignments or the identity/group membership must be duplicated in the Authorizer (e.g., an IAM policy must exist that asserts that I'm a member of the ENGINEER group and that the ENGINEER group has the create_table privilege).

The main reason I think this strengthens the coupling is that the AuthenticatedPrincipal passes in only the role names without resolving the entities. If we remove the check for the grant records, then a caller could pass in whatever scope they want and be able to assume that role without confirmation they actually have access to that role.

What I'm hoping to do is change this class to take in the list of validated PrincipalRole entities so that the Authenticator itself is responsible for resolving the roles and validating (by whatever means) that the user has access to those roles. At that point, we can change the Resolver and Authorizer to rely on the AuthenticatedPrincipal to directly return the list of PrincipalRole entities, rather than just the names, knowing that that list has already been validated by the Authenticator.

If we remove the check for the grant records, then a caller could pass in whatever scope they want and be able to assume that role without confirmation they actually have access to that role.

I do not think so. The caller (Polaris API client) must obtain the credentials first. If the credentials are Polaris-owned, then the authenticator validates that the principal ID is safe to use, but it does not have to load / resolve the roles because it is not know whether the roles are needed for servicing the request. Nonetheless, the authenticity of the caller is established and roles can be validated against Polaris data. If the credentials are 3rd-party, then the IdP must have assigned them (e.g. in JWT claims). Again, the authenticator can validate the claims but does not have to resolve role names to Polaris IDs.

We may need another component to deal specifically with mapping external roles / principals to Polaris entities. I would not want to overload the request authentication layer with that duty. This, I think, will complicate integration with external IdPs. For example, authenticating a JWT is in no way connected to Polaris entities. Where entities are synchronously pulled as roles from IdP or via SCIM or some other way, should not affect how credentials are verified to be trusted.

Again, the authenticator can validate the claims but does not have to resolve role names to Polaris IDs.

We may need another component to deal specifically with mapping external roles / principals to Polaris entities.

This is fine, but right now, that work is being done by the Authenticator + the Resolver and I think that's the wrong separation. The Authenticator today doesn't validate the content of the scopes at all - that's left to the resolver and is only supported by Grant records. This means that it's not enough to add a user to a group in the 3P IdP, but the user must also be added to the group via the Polaris API so that the requisite grant record exists.

Today, the Polaris /tokens endpoint doesn't validate the contents of the scopes claim when it generates the token. We can fix this and then the Authenticator doesn't need to do the work of validating the contents of the token. But we'd still need the Resolver to trust the contents of the AuthenticatedPolarisPrincipal's activePrincipalRoleNames list and resolve the PrincipalRoles without looking for associated grant records.

collado-mike · 2024-12-02T18:35:57Z

@dennishuo

dimas-b · 2024-12-05T15:23:43Z

Putting this to Draft state. Will rework after #465 and #493 are done.

dimas-b requested review from jbonofre, ashvina, RussellSpitzer, snazy, vvcephei, takidau, jackye1995, flyrain, eric-maynard, collado-mike, ebyhr and adutra as code owners November 29, 2024 03:09

adutra reviewed Nov 29, 2024

View reviewed changes

dimas-b added 2 commits November 29, 2024 17:19

review: Add authorizer factory

41ddc0a

dimas-b force-pushed the principal-in-api branch from 497c612 to 41ddc0a Compare November 29, 2024 22:19

dimas-b added 5 commits November 29, 2024 17:35

review: remove statics from AuthenticatedPolarisPrincipal

5776153

review: move AuthenticatedPolarisPrincipalImpl to polaris-service

f10c804

review: remove ResolvedPolarisPrincipal

3e2aaa1

review: clarify authZ on one's own secret rotation

75841e7

review: move authZ targets into PolarisResolutionManifest

c9b9545

adutra reviewed Nov 30, 2024

View reviewed changes

collado-mike reviewed Dec 2, 2024

View reviewed changes

dimas-b marked this pull request as draft December 5, 2024 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework PolarisAuthorizer to use self-contained manifest on input #499

Rework PolarisAuthorizer to use self-contained manifest on input #499

dimas-b commented Nov 29, 2024 •

edited

Loading

adutra Nov 29, 2024

dimas-b Nov 30, 2024

adutra Nov 29, 2024

dimas-b Nov 29, 2024

adutra Nov 29, 2024

adutra Nov 30, 2024

adutra Nov 30, 2024

adutra Nov 30, 2024

adutra Nov 30, 2024

dimas-b Dec 2, 2024

collado-mike left a comment

collado-mike Dec 2, 2024

dimas-b Dec 2, 2024

collado-mike Dec 2, 2024

dimas-b Dec 2, 2024

collado-mike Dec 2, 2024

dimas-b Dec 2, 2024 •

edited

Loading

collado-mike Dec 6, 2024

collado-mike commented Dec 2, 2024

dimas-b commented Dec 5, 2024 •

edited

Loading

	public interface AuthEntitySelector {
	@FunctionalInterface
	public interface AuthEntitySelector {

		private final Set<String> roles;

		public AuthenticatedPolarisPrincipalImpl(long id, String name, Set<String> roles) {

Rework PolarisAuthorizer to use self-contained manifest on input #499

Are you sure you want to change the base?

Rework PolarisAuthorizer to use self-contained manifest on input #499

Conversation

dimas-b commented Nov 29, 2024 • edited Loading

Has This Been Tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

collado-mike left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimas-b Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

collado-mike commented Dec 2, 2024

dimas-b commented Dec 5, 2024 • edited Loading

dimas-b commented Nov 29, 2024 •

edited

Loading

dimas-b Dec 2, 2024 •

edited

Loading

dimas-b commented Dec 5, 2024 •

edited

Loading