-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework PolarisAuthorizer to use self-contained manifest on input #499
base: main
Are you sure you want to change the base?
Conversation
polaris-service/src/main/java/org/apache/polaris/service/config/PolarisApplicationConfig.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/auth/PolarisAuthorizer.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/auth/ResolvedPolarisPrincipal.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/auth/ResolvedPolarisPrincipal.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/auth/ResolvedPolarisPrincipal.java
Outdated
Show resolved
Hide resolved
* Defines which sub-set of a {@link PolarisResolutionManifest} should be used for various | ||
* authorization check parameters. | ||
*/ | ||
public interface AuthEntitySelector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public interface AuthEntitySelector { | |
@FunctionalInterface | |
public interface AuthEntitySelector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I moved "targets" and "secondaries" inside PolarisResolutionManifest
. I think this improves encapsulation. Now callers prepare the manifest and add inputs to it, then objects are resolved, then authorizer gets relevant entries directly from the manifest.
polaris-core/src/main/java/org/apache/polaris/core/auth/ActivatedEntitySelector.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/auth/AllowAllAuthorizer.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/auth/CoreAuthorizer.java
Outdated
Show resolved
Hide resolved
# under the License. | ||
# | ||
|
||
server: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference between this config file and the "default" one for integration tests is the authenticator class. But I think we need to keep it because of the tokenBroker
field... Dropwizard is really crappy :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly
polaris-server.yml
Outdated
@@ -105,6 +105,8 @@ authenticator: | |||
# type: symmetric-key | |||
# secret: polaris | |||
|
|||
authorizer: org.apache.polaris.service.auth.CoreAuthorizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be org.apache.polaris.core.auth.CoreAuthorizer
– this is why the regtests are failing.
That said I do think that moving it to polaris-service
would be better :-)
Previously authZ checks were done based on a complex set of parameters provided individually. This PR consolidates authZ inputs into the PolarisResolutionManifest with declarative "selector" objects depending on operation context. Naturally, Principal entity resolution is moved from the request authentication phase to authorization phase. Request authentication merely validates the provided principal ID and/or name, but does not immediately access related entities from storage. An anonymous authenticator and corresponding "allow all" authorizer are added as simple examples the separation of authN and authZ duties. Also, application config is updated to allow different authorizer implementations to be chosen in runtime.
497c612
to
41ddc0a
Compare
/** | ||
* Principal entity ID obtained during request authentication (e.g. from the authorization token). | ||
* | ||
* <p>Negative values indicate that a principal ID was not provided in authenticated data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: would an OptionalLong
would be better here?
* (including principals, roles, and catalog objects) that are affected by the operation. | ||
* | ||
* <p>"activated" entities, "targets" and "secondaries" are contained within the provided | ||
* manifest. The extra selector parameters merely define what sub-set of objects from the manifest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sentence is obsolete now: "The extra selector parameters merely define what sub-set of objects from the manifest should be considered as "targets", etc."
import java.util.Set; | ||
import org.apache.polaris.core.auth.AuthenticatedPolarisPrincipal; | ||
|
||
public final class AuthenticatedPolarisPrincipalImpl implements AuthenticatedPolarisPrincipal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a record
here?
if (subType == PolarisEntitySubType.TABLE) { | ||
throw new NoSuchTableException("Table does not exist: %s", identifier); | ||
} else { | ||
throw new NoSuchViewException("View does not exist: %s", identifier); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking, but below we are using the ternary operator for exactly the same logic:
if (subType == PolarisEntitySubType.TABLE) { | |
throw new NoSuchTableException("Table does not exist: %s", identifier); | |
} else { | |
throw new NoSuchViewException("View does not exist: %s", identifier); | |
} | |
throw subType == PolarisEntitySubType.TABLE | |
? new NoSuchTableException("Table does not exist: %s", identifier) | |
: new NoSuchViewException("View does not exist: %s", identifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a pure copy-paste of old code :) I did not try to rework it for the sake of correctness (aside from the secret rotation auth, which I had to change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the additional authorizer factory approach. I'd really like to settle on #465 and #493 before we move forward here, though. There is a lot of refactoring going on and I think we should settle on a target goal before we step too much on each other's toes.
In my mind, the three areas of responsibility we're dealing with here are
- Authenticator - responsible for validating the authentication token, validating the subject is a real principal, and validating the principal has the roles claimed
- Authorizer - responsible for validating that the previously authenticated principal has the desired privileges on the target entities (this may be through grant records or some other policy-based approach)
- Resolver - this is largely a performance-oriented component that manages the local entity cache so that we're always looking at the latest versions of the target entities while avoiding too many queries to the persistence layer
Ideally, the responsibilities of these three components should be entirely isolated. The presence of the Resolver should be optional, as it's really a performance optimization.
@Nonnull PolarisAuthorizableOperation authzOp, | ||
@Nullable List<PolarisResolvedPathWrapper> targets, | ||
@Nullable List<PolarisResolvedPathWrapper> secondaries); | ||
@Nonnull PolarisResolutionManifest manifest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention in #465 is to decouple the PolarisAuthorizer logic from the Resolver code. IMO, the Authorizer should take in a list of entities (principal and authz targets) without regard for how those entities were retrieved. I kept the PolarisResolvedPathWrapper
types in the signature just because we need a way to pass in full entity paths (e.g., a table's lineage, including parent namespaces and catalog) and I think it's more friendly that passing around Lists of Lists. But overall, I think we should avoid tying the Authorizer to the entity resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do believe that this change actually decouples the Authorizer implementation from entity resolution. That is because Authrorizer inputs in this PR are self-sufficient when the authrorization SPI is invoked. The implementation does not have to access the model. The default implementation certainly can function purely on the data from the authrorization SPI method parameters.
I suppose you meant it to be approached the other way, when the Authorizer SPI is decoupled from entity resolution, but the Authorizer implementation resolves what is needs to resolve against the storage data model inside the authorization calls.
The latter approach is certainly possible, but it will lose strong correlation with public API inputs. My reading of current code made me think, that the Authorizer was expected to act exactly on the same state of data that the API implementations use for producing API outputs. Is that so?
In that case, and if the Authorizer has to perform extra resolutions, it will complicate the contract of the Persistence layer, which will then have to ensure consistency across calls from multiple services. WDYT?
private final Set<String> roles; | ||
|
||
public AuthenticatedPolarisPrincipalImpl(long id, String name, Set<String> roles) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually thinking we should move toward resolving the PrincipalRole
entities prior to the authorization check. Right now, the Resolver code validates that the AuthenticatedPrincipal
actually has the grant records to assume the roles specified in the authentication token. That means that we can't decouple the authenticator from the resolver, as someone needs to validate that the roles specified in the token can actually be assumed by the principal. I think that validation should happen at the Authenticator. This allows for third party identity providers to validate the roles in the token without needing grant records, while the default authenticator can continue to rely on the grant records stored by the grant manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have slightly different idea about what the Authenticator should do, I guess :)
From my POV, an Authenticator makes sure the user credentials in the request are valid in the sense that they are not fake or forged (i.e. authentic). I would not necessarily expect the Authenticator to validate that the principal is allowed to assume the roles it wants to assume. I think that is the role of the Authorizer.
In that regard, this PR proposes internal API changes that do decouple resolving principals and and roles against Polaris entities from the Authentication layer.
The main change from my POV is making the entity ID optional in AuthenticatedPolarisPrincipal
, which is specifically to allow principals that are not mirrored in the Polaris database.
While I think this change is in line with future support for 3rd party identity providers, additional change will be required, I'm sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is something that was brought up in the Federated Entities doc. Principal Roles are identity roles, not like Catalog Roles, which are groups of privileges. If Principal Roles are a form of identity (e.g., LDAP groups, Okta groups, or IAM group), then membership in those groups is the purview of the Identity Provider, not the Authorizer. I.e., if LDAP says I'm a member of the ENGINEER group, then that claim should be authoritative. The privileges assigned to the ENGINEER group are the purview of the Authorizer (e.g., are there grant records to show the ENGINEER role has access to read a table, is there a policy document that granting the ENGINEER group privileges to create a table in a given namespace...).
If the Authorizer is doubly responsible for validating group membership and for determining privileges, then either the Identity Provider must also be able to store privilege assignments or the identity/group membership must be duplicated in the Authorizer (e.g., an IAM policy must exist that asserts that I'm a member of the ENGINEER group and that the ENGINEER group has the create_table
privilege).
The main reason I think this strengthens the coupling is that the AuthenticatedPrincipal passes in only the role names without resolving the entities. If we remove the check for the grant records, then a caller could pass in whatever scope they want and be able to assume that role without confirmation they actually have access to that role.
What I'm hoping to do is change this class to take in the list of validated PrincipalRole
entities so that the Authenticator itself is responsible for resolving the roles and validating (by whatever means) that the user has access to those roles. At that point, we can change the Resolver and Authorizer to rely on the AuthenticatedPrincipal to directly return the list of PrincipalRole
entities, rather than just the names, knowing that that list has already been validated by the Authenticator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove the check for the grant records, then a caller could pass in whatever scope they want and be able to assume that role without confirmation they actually have access to that role.
I do not think so. The caller (Polaris API client) must obtain the credentials first. If the credentials are Polaris-owned, then the authenticator validates that the principal ID is safe to use, but it does not have to load / resolve the roles because it is not know whether the roles are needed for servicing the request. Nonetheless, the authenticity of the caller is established and roles can be validated against Polaris data. If the credentials are 3rd-party, then the IdP must have assigned them (e.g. in JWT claims). Again, the authenticator can validate the claims but does not have to resolve role names to Polaris IDs.
We may need another component to deal specifically with mapping external roles / principals to Polaris entities. I would not want to overload the request authentication layer with that duty. This, I think, will complicate integration with external IdPs. For example, authenticating a JWT is in no way connected to Polaris entities. Where entities are synchronously pulled as roles from IdP or via SCIM or some other way, should not affect how credentials are verified to be trusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the authenticator can validate the claims but does not have to resolve role names to Polaris IDs.
We may need another component to deal specifically with mapping external roles / principals to Polaris entities.
This is fine, but right now, that work is being done by the Authenticator + the Resolver and I think that's the wrong separation. The Authenticator today doesn't validate the content of the scopes at all - that's left to the resolver and is only supported by Grant records. This means that it's not enough to add a user to a group in the 3P IdP, but the user must also be added to the group via the Polaris API so that the requisite grant record exists.
Today, the Polaris /tokens
endpoint doesn't validate the contents of the scopes
claim when it generates the token. We can fix this and then the Authenticator doesn't need to do the work of validating the contents of the token. But we'd still need the Resolver to trust the contents of the AuthenticatedPolarisPrincipal
's activePrincipalRoleNames
list and resolve the PrincipalRoles without looking for associated grant records.
Previously authZ checks were done based on a complex set of parameters provided individually. This PR consolidates authZ inputs into the PolarisResolutionManifest with declarative "selector" objects depending on operation context.
Naturally, Principal entity resolution is moved from the request authentication phase to authorization phase.
Request authentication merely validates the provided principal ID and/or name, but does not immediately access related entities from storage.
An anonymous authenticator and corresponding "allow all" authorizer are added as simple examples the separation of authN and authZ duties.
Also, application config is updated to allow different authorizer implementations to be chosen in runtime.
Has This Been Tested?
Unit / Integration tests