Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework PolarisAuthorizer to use self-contained manifest on input #499

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

dimas-b
Copy link
Contributor

@dimas-b dimas-b commented Nov 29, 2024

Previously authZ checks were done based on a complex set of parameters provided individually. This PR consolidates authZ inputs into the PolarisResolutionManifest with declarative "selector" objects depending on operation context.

Naturally, Principal entity resolution is moved from the request authentication phase to authorization phase.

Request authentication merely validates the provided principal ID and/or name, but does not immediately access related entities from storage.

An anonymous authenticator and corresponding "allow all" authorizer are added as simple examples the separation of authN and authZ duties.

Also, application config is updated to allow different authorizer implementations to be chosen in runtime.

Has This Been Tested?

Unit / Integration tests

* Defines which sub-set of a {@link PolarisResolutionManifest} should be used for various
* authorization check parameters.
*/
public interface AuthEntitySelector {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public interface AuthEntitySelector {
@FunctionalInterface
public interface AuthEntitySelector {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I moved "targets" and "secondaries" inside PolarisResolutionManifest. I think this improves encapsulation. Now callers prepare the manifest and add inputs to it, then objects are resolved, then authorizer gets relevant entries directly from the manifest.

# under the License.
#

server:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference between this config file and the "default" one for integration tests is the authenticator class. But I think we need to keep it because of the tokenBroker field... Dropwizard is really crappy :-(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly

@@ -105,6 +105,8 @@ authenticator:
# type: symmetric-key
# secret: polaris

authorizer: org.apache.polaris.service.auth.CoreAuthorizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be org.apache.polaris.core.auth.CoreAuthorizer – this is why the regtests are failing.

That said I do think that moving it to polaris-service would be better :-)

Previously authZ checks were done based on a complex set of parameters
provided individually. This PR consolidates authZ inputs into the
PolarisResolutionManifest with declarative "selector" objects depending
on operation context.

Naturally, Principal entity resolution is moved from the request
authentication phase to authorization phase.

Request authentication merely validates the provided principal ID and/or
name, but does not immediately access related entities from storage.

An anonymous authenticator and corresponding "allow all" authorizer
are added as simple examples the separation of authN and authZ duties.

Also, application config is updated to allow different authorizer
implementations to be chosen in runtime.
/**
* Principal entity ID obtained during request authentication (e.g. from the authorization token).
*
* <p>Negative values indicate that a principal ID was not provided in authenticated data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would an OptionalLong would be better here?

* (including principals, roles, and catalog objects) that are affected by the operation.
*
* <p>"activated" entities, "targets" and "secondaries" are contained within the provided
* manifest. The extra selector parameters merely define what sub-set of objects from the manifest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence is obsolete now: "The extra selector parameters merely define what sub-set of objects from the manifest should be considered as "targets", etc."

import java.util.Set;
import org.apache.polaris.core.auth.AuthenticatedPolarisPrincipal;

public final class AuthenticatedPolarisPrincipalImpl implements AuthenticatedPolarisPrincipal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a record here?

Comment on lines +299 to +303
if (subType == PolarisEntitySubType.TABLE) {
throw new NoSuchTableException("Table does not exist: %s", identifier);
} else {
throw new NoSuchViewException("View does not exist: %s", identifier);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking, but below we are using the ternary operator for exactly the same logic:

Suggested change
if (subType == PolarisEntitySubType.TABLE) {
throw new NoSuchTableException("Table does not exist: %s", identifier);
} else {
throw new NoSuchViewException("View does not exist: %s", identifier);
}
throw subType == PolarisEntitySubType.TABLE
? new NoSuchTableException("Table does not exist: %s", identifier)
: new NoSuchViewException("View does not exist: %s", identifier);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a pure copy-paste of old code :) I did not try to rework it for the sake of correctness (aside from the secret rotation auth, which I had to change)

Copy link
Contributor

@collado-mike collado-mike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the additional authorizer factory approach. I'd really like to settle on #465 and #493 before we move forward here, though. There is a lot of refactoring going on and I think we should settle on a target goal before we step too much on each other's toes.

In my mind, the three areas of responsibility we're dealing with here are

  1. Authenticator - responsible for validating the authentication token, validating the subject is a real principal, and validating the principal has the roles claimed
  2. Authorizer - responsible for validating that the previously authenticated principal has the desired privileges on the target entities (this may be through grant records or some other policy-based approach)
  3. Resolver - this is largely a performance-oriented component that manages the local entity cache so that we're always looking at the latest versions of the target entities while avoiding too many queries to the persistence layer

Ideally, the responsibilities of these three components should be entirely isolated. The presence of the Resolver should be optional, as it's really a performance optimization.

@Nonnull PolarisAuthorizableOperation authzOp,
@Nullable List<PolarisResolvedPathWrapper> targets,
@Nullable List<PolarisResolvedPathWrapper> secondaries);
@Nonnull PolarisResolutionManifest manifest,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention in #465 is to decouple the PolarisAuthorizer logic from the Resolver code. IMO, the Authorizer should take in a list of entities (principal and authz targets) without regard for how those entities were retrieved. I kept the PolarisResolvedPathWrapper types in the signature just because we need a way to pass in full entity paths (e.g., a table's lineage, including parent namespaces and catalog) and I think it's more friendly that passing around Lists of Lists. But overall, I think we should avoid tying the Authorizer to the entity resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe that this change actually decouples the Authorizer implementation from entity resolution. That is because Authrorizer inputs in this PR are self-sufficient when the authrorization SPI is invoked. The implementation does not have to access the model. The default implementation certainly can function purely on the data from the authrorization SPI method parameters.

I suppose you meant it to be approached the other way, when the Authorizer SPI is decoupled from entity resolution, but the Authorizer implementation resolves what is needs to resolve against the storage data model inside the authorization calls.

The latter approach is certainly possible, but it will lose strong correlation with public API inputs. My reading of current code made me think, that the Authorizer was expected to act exactly on the same state of data that the API implementations use for producing API outputs. Is that so?

In that case, and if the Authorizer has to perform extra resolutions, it will complicate the contract of the Persistence layer, which will then have to ensure consistency across calls from multiple services. WDYT?

Comment on lines +27 to +29
private final Set<String> roles;

public AuthenticatedPolarisPrincipalImpl(long id, String name, Set<String> roles) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually thinking we should move toward resolving the PrincipalRole entities prior to the authorization check. Right now, the Resolver code validates that the AuthenticatedPrincipal actually has the grant records to assume the roles specified in the authentication token. That means that we can't decouple the authenticator from the resolver, as someone needs to validate that the roles specified in the token can actually be assumed by the principal. I think that validation should happen at the Authenticator. This allows for third party identity providers to validate the roles in the token without needing grant records, while the default authenticator can continue to rely on the grant records stored by the grant manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have slightly different idea about what the Authenticator should do, I guess :)

From my POV, an Authenticator makes sure the user credentials in the request are valid in the sense that they are not fake or forged (i.e. authentic). I would not necessarily expect the Authenticator to validate that the principal is allowed to assume the roles it wants to assume. I think that is the role of the Authorizer.

In that regard, this PR proposes internal API changes that do decouple resolving principals and and roles against Polaris entities from the Authentication layer.

The main change from my POV is making the entity ID optional in AuthenticatedPolarisPrincipal, which is specifically to allow principals that are not mirrored in the Polaris database.

While I think this change is in line with future support for 3rd party identity providers, additional change will be required, I'm sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something that was brought up in the Federated Entities doc. Principal Roles are identity roles, not like Catalog Roles, which are groups of privileges. If Principal Roles are a form of identity (e.g., LDAP groups, Okta groups, or IAM group), then membership in those groups is the purview of the Identity Provider, not the Authorizer. I.e., if LDAP says I'm a member of the ENGINEER group, then that claim should be authoritative. The privileges assigned to the ENGINEER group are the purview of the Authorizer (e.g., are there grant records to show the ENGINEER role has access to read a table, is there a policy document that granting the ENGINEER group privileges to create a table in a given namespace...).

If the Authorizer is doubly responsible for validating group membership and for determining privileges, then either the Identity Provider must also be able to store privilege assignments or the identity/group membership must be duplicated in the Authorizer (e.g., an IAM policy must exist that asserts that I'm a member of the ENGINEER group and that the ENGINEER group has the create_table privilege).

The main reason I think this strengthens the coupling is that the AuthenticatedPrincipal passes in only the role names without resolving the entities. If we remove the check for the grant records, then a caller could pass in whatever scope they want and be able to assume that role without confirmation they actually have access to that role.

What I'm hoping to do is change this class to take in the list of validated PrincipalRole entities so that the Authenticator itself is responsible for resolving the roles and validating (by whatever means) that the user has access to those roles. At that point, we can change the Resolver and Authorizer to rely on the AuthenticatedPrincipal to directly return the list of PrincipalRole entities, rather than just the names, knowing that that list has already been validated by the Authenticator.

Copy link
Contributor Author

@dimas-b dimas-b Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the check for the grant records, then a caller could pass in whatever scope they want and be able to assume that role without confirmation they actually have access to that role.

I do not think so. The caller (Polaris API client) must obtain the credentials first. If the credentials are Polaris-owned, then the authenticator validates that the principal ID is safe to use, but it does not have to load / resolve the roles because it is not know whether the roles are needed for servicing the request. Nonetheless, the authenticity of the caller is established and roles can be validated against Polaris data. If the credentials are 3rd-party, then the IdP must have assigned them (e.g. in JWT claims). Again, the authenticator can validate the claims but does not have to resolve role names to Polaris IDs.

We may need another component to deal specifically with mapping external roles / principals to Polaris entities. I would not want to overload the request authentication layer with that duty. This, I think, will complicate integration with external IdPs. For example, authenticating a JWT is in no way connected to Polaris entities. Where entities are synchronously pulled as roles from IdP or via SCIM or some other way, should not affect how credentials are verified to be trusted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the authenticator can validate the claims but does not have to resolve role names to Polaris IDs.

We may need another component to deal specifically with mapping external roles / principals to Polaris entities.

This is fine, but right now, that work is being done by the Authenticator + the Resolver and I think that's the wrong separation. The Authenticator today doesn't validate the content of the scopes at all - that's left to the resolver and is only supported by Grant records. This means that it's not enough to add a user to a group in the 3P IdP, but the user must also be added to the group via the Polaris API so that the requisite grant record exists.

Today, the Polaris /tokens endpoint doesn't validate the contents of the scopes claim when it generates the token. We can fix this and then the Authenticator doesn't need to do the work of validating the contents of the token. But we'd still need the Resolver to trust the contents of the AuthenticatedPolarisPrincipal's activePrincipalRoleNames list and resolve the PrincipalRoles without looking for associated grant records.

@collado-mike
Copy link
Contributor

@dennishuo

@dimas-b
Copy link
Contributor Author

dimas-b commented Dec 5, 2024

Putting this to Draft state. Will rework after #465 and #493 are done.

@dimas-b dimas-b marked this pull request as draft December 5, 2024 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants