-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a flag to control whether credentials are printed during bootstrapping #461
base: main
Are you sure you want to change the base?
Conversation
@@ -181,6 +196,19 @@ private PrincipalSecretsResult bootstrapServiceAndCreatePolarisPrincipalForRealm | |||
throw new IllegalArgumentException(overrideMessage); | |||
} | |||
|
|||
// TODO rebase onto #422, call a method like PrincipalSecretsGenerator.hasEnvironmentVariables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idea: maybe pass a flag down to PrincipalSecretsGenerator
to not use random secrets if printCredentials
is false
? Then the PrincipalSecretsGenerator
can simply throw if the specific realm/user combination is missing env. vars. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea. If there is a good pathway from the bootstrap command down to the PrincipalSecretsGenerator
then I think that works as well. It should hopefully be more clear when #422 merges.
...e/src/main/java/org/apache/polaris/core/persistence/LocalPolarisMetaStoreManagerFactory.java
Show resolved
Hide resolved
Hey @dimas-b, do you mind taking a look now that #422 has merged? I think the integration is easy enough with some slight refactoring to I left the current behavior wrt. using env variables even when printing is enabled, since if that's what the user decides to explicitly configure we can respect it. In the worst case we are just echoing env variables. |
polaris-core/src/main/java/org/apache/polaris/core/persistence/PrincipalSecretsGenerator.java
Outdated
Show resolved
Hide resolved
if (this.printCredentials(polarisContext)) { | ||
String msg = | ||
String.format( | ||
"realm: %1s root principal credentials: %2s:%3s", | ||
realmContext.getRealmIdentifier(), | ||
secretsResult.getPrincipalSecrets().getPrincipalClientId(), | ||
secretsResult.getPrincipalSecrets().getMainSecret()); | ||
System.out.println(msg); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic belongs to the secrets generator. The MetaStoreManager doesn't need to know anything about whether the secrets generated are provided by the user or if they've been generated randomly. So why would it be concerned with printing the credentials? The secrets generator knows if the secrets were provided explicitly or if they were randomly generated.
I think the bootstrap
command should take a print-credentials
config flag and the constructed secrets generator can react accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the PrincipalSecretsGenerator
is doing exactly what the name suggests: generating secrets.
Whatever is done with those secrets -- persisting them, using them, printing them -- is outside the purview of the generator itself.
You are right that the MetaStoreManager
doesn't need to know anything about printing either (it doesn't in this PR) and clearly this should be outside the purview of the metastore itself.
And so we landed on the factory. I would be happy to take this bootstrapping logic and excise it to somewhere more idiomatic if that is a concern. But right now the bootstrapping logic lives here (e.g. the purge
check) and this seems like the most appropriate place that doesn't change the responsibility of either the metastore or generator classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking again, is your objection specifically to the protected method printCredentials
?
That only exists to support the legacy behavior of the in-memory metastore always printing credentials, and if possible I would very much be in favor of removing that.
However it feels like pushing that logic down into an existing method (whether secretsGenerator
, createMetaStoreSession
, or elsewhere) could be a bit hacky if it winds up somewhere it doesn't belong.
boolean environmentVariableCredentials = | ||
PrincipalSecretsGenerator.hasCredentialVariables( | ||
realmContext.getRealmIdentifier(), PolarisEntityConstants.getRootPrincipalName()); | ||
if (!this.printCredentials(polarisContext) && !environmentVariableCredentials) { | ||
String failureMessage = | ||
String.format( | ||
"It appears that environment variables were not provided for root credentials, and that printing " | ||
+ "the root credentials is disabled via %s. If bootstrapping were to proceed, there would be no way " | ||
+ "to recover the root credentials", | ||
PolarisConfiguration.BOOTSTRAP_PRINT_CREDENTIALS.key); | ||
LOGGER.error("\n\n {} \n\n", failureMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here - why is the metastore aware of whether the secrets were provided by environment variables? What if there are other impls of secrets generators that don't rely on env variables? E.g., we could have one that calls AWS SecretsManager to dynamically generate and store the secrets without any env variables. Should this code throw an exception?
|
||
String clientId = config.apply(propId.toUpperCase(Locale.ROOT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we lose uppercasing in the new code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I actually think this is wrong isn't it? It seems like you can have both a dimas
and a DIMAS
user, so how would you differentiate them in the env variables?
This is assuming we now allow the use of env variables for non-root users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixed case env. variable look odd, but technically we can support them. I do not mind ;)
* @return A {@link PrincipalSecretsGenerator} that can generate secrets through `produceSecrets` | ||
*/ | ||
public static PrincipalSecretsGenerator bootstrap(String realmName) { | ||
return new DefaultPrincipalSecretsGenerator(realmName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe rename DefaultPrincipalSecretsGenerator
-> BootstrapPrincipalSecretsGenerator
?.. it is not actually default in LocalPolarisMetaStoreManagerFactory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It's just a wrapper, so I was not sure what to call it. Since it's only used during bootstrap, let me change to BootstrapPrincipalSecretsGenerator
for now
...is-core/src/test/java/org/apache/polaris/core/persistence/PrincipalSecretsGeneratorTest.java
Outdated
Show resolved
Hide resolved
String clientIdKey = "POLARIS_BOOTSTRAP_REALM_PRINCIPAL_CLIENT_ID"; | ||
String clientSecretKey = "POLARIS_BOOTSTRAP_REALM_PRINCIPAL_CLIENT_SECRET"; | ||
|
||
doReturn("test-id").when(psg).getEnvironmentVariable(clientIdKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nicer to allow EnvVariablePrincipalSecretsGenerator
to take an explicit Map
(or a Function<String, String>
) as a constructor argument. The the "environment" part cab be just a static factory method like EnvVariablePrincipalSecretsGenerator.fromEnv()
.
Testing the static method would not really be necessary because it would be a one-line redirect to System.env()
, but it would allow for nicer design that is testable without Mockito
.
Just my 2 cents :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, but my concern is that EnvVariablePrincipalSecretsGenerator
essentially becomes MapPrincipalSecretsGenerator
with that approach. I would rather not place the burden on its callers to do the fromEnv
/ getEnv
; this seems like the exact responsibility we would like EnvVariablePrincipalSecretsGenerator
to take on.
For example: if we later add command-line options to bootstrap
that allow you to provide credentials, would we use the EnvVariablePrincipalSecretsGenerator
for that? If it takes a map, we could.
Taking this further, we could even pass in a map with random secrets to EnvVariablePrincipalSecretsGenerator
!
And so its role, as well as that of RandomPrincipalSecretsGenerator
, can quickly become quite unclear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I'm fine with the current code in this PR.
} | ||
|
||
@Override | ||
protected PrincipalSecretsGenerator buildEnvVariablePrincipalSecretsGenerator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here: I think we could have a simple factory that bind to System.env()
in runtime, but use explicit constructor parameters in tests... This will remove reliance on Mockito
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case what I think we really want is DI 😔
...e/polaris/extension/persistence/impl/eclipselink/PolarisEclipseLinkMetaStoreSessionImpl.java
Outdated
Show resolved
Hide resolved
It seems odd that Polaris determines whether bootstrapping has failed based on a configuration controlling whether credentials are printed. IIUC, #438 removed plain text secrets from the metastore, meaning these secrets cannot be retrieved unless they are printed in the console. Would it be more reasonable to always print the credentials if they are generated by Polaris? This ensures the secrets remain accessible when needed without relying on an external configuration. |
The issue at hand is that currently credentials are unrecoverable after bootstrapping, which needs to be fixed ASAP.
@collado-mike expressed concern about an approach like this some time ago. I think a configuration, or perhaps better a CLI argument to the This last point is also very important to consider: some metastore implementations could allow secrets to be retrieved, in which case it's okay to bootstrap without printing credentials. The problem is that after #438 EclipseLink does not allow this. |
I think it is generally not a good idea to store retrievable secrets in the metastore. If we want that functionality it would probably be preferable to integrate with well-known secret manages (e.g. k8s secrets, cloud-specific secret managers, Vault, etc.). |
Completely agree with this. I'd extend this even to not put any credentials into a server log at all, because that information is not just "ephemeral on a console window", but logs can easily go into 3rd party systems, which would then make those clear text credentials easily accessible. |
Agreed as well. But for the time being we need to make the EclipseLink metastore work again. This is significantly better than both the current state and where we were before #438. I added a note to the relevant doc clarifying that we don't recommend using this in production, where users should provide secure credentials through environment variables. |
Agreed to not log the secrets, but I also feel the urgency of fixing EclipseLink. How about writing the secrets into a separated file? Here are benefits:
|
I added a note in the most recent commit to clarify, but the best thing
users can do is provide secrets through env variables. The credential
printing is there as a workaround (the in-memory metastore always does
this) for testing or development.
The important thing is we don’t let users brick the metastore. If printing
is super controversial then we can always just require the env variables
for now.
…On Mon, Dec 9, 2024 at 10:22 AM Yufei Gu ***@***.***> wrote:
Agreed to not log the secrets, but I also feel the urgency of fixing
EclipseLink. How about writing the secrets into a separated file? Here are
benefits:
1. A file can be potentially integrated with third-party secret
managers in the future.
2. Avoid putting secrets in logs
3. No configuration item needed, alway persist the secrets file in
case of auto-generation.
—
Reply to this email directly, view it on GitHub
<#461 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFRE3SAZOHDY4HHUR56KOTD2EXNU5AVCNFSM6AAAAABSF72CZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRZGAYDMNZSHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
My thoughts here are complex. On the one hand, I agree that we should never print credentials to the service log. On the other hand, users need a way of bootstrapping their Polaris service and storing their secrets. This is one reason why we made the My hope for the I don't think the user should need a separate secrets store, like Vault or whatever, in order to be able to use Polaris at all. If the RDBMS persistence impl is enough for them to store hashed secrets and they can manage their principal secrets in some other way, we should support that. So for the user who has no separate secrets store and only stores hashes of the secret in Postgres, how do we get the user their secrets? Either we can require the user to pass in secrets as an argument to bootstrapping or we can randomly generate secrets and print them out for the user. It seems that we can support randomly generated credentials when we do have a separate secrets store, but I don't really like the idea of the secrets manager having to declare that its secrets are retrievable by the end user. So if we require the user to pass in the secrets as an argument, then I think we should always require the secrets as an argument. And if we don't always take the secrets as an argument, then we need to pass randomly generated secrets back to the user in some way - printing seems obvious. |
So it looks like the approach proposed in this PR (while I keep my non-binding approval) appears to be not robust enough. I'd like to propose to move the printing of generated credentials to the For that matter, I think even the generation of random secrets should be delegated to the |
I'm quite open (and probably brutal) here: logging or storing plain/clear text credentials is a severe security issue that justifies a CVE. The process of creating credentials must really be a command that only allows the user to grab the secrets - but only once - not stored anywhere - not explicitly or implicitly (or ephemerally), accessible by other tools (database, logging system/files, etc). If the user does not grab the generated secrets, bummer. If the bootstrap process cannot ensure this, then the bootstrap process has to be changed. Security is a very sensitive topic - and and absolute necessity for the production readiness of Apache Polaris! Generally, I do not think that Apache Polaris should get into the business of handling identities or secrets, but rather interface w/ systems that are purely there for these kinds of things. The currently built-in secrets handling should IMHO entirely go away. |
+1 to this approach. We could introduce an additional step before starting the Polaris instance to handle secret generation and environment variable setup. This step could take one of the following forms:
Additionally, users is responsible for ensuring anything generated during this step, such as logs or extra files, is not leaked to unsecured locations to maintain security. |
Conceptually, bootstrapping may not even be the right term here :) This is not about helping the server to start and run, but about defining some prerequisite persisted objects (the root principal, specifically). How about removing "bootstrap" methods from For the in-memory case, we could probably execute |
This is all valuable discussion, but I am worried we are going in circles a bit. For now, does someone else want to take a crack at a minimal change to fix the experience of using EclipseLink? |
I'd not be too worried about the name( |
With respect to the apparent security issues, I can only echo @collado-mike's comment -- the stdout of the bootstrap command is not the Polaris log. As such, this is neither Moreover, I question whether doing such a thing iff the user sets a parameter called
I agree with this point. But we must give them an opportunity to "grab" them from somewhere. If not stdout, then a file, or env variables. Anything is fine. But the current behavior is untenable. |
While I appreciate the security focus, I think the reality is that we are going to have to be able to manage secrets in Polaris as a standalone application for a long time (forever?). External secrets management systems like Vault or external identity providers like Okta are fantastic and I definitely promote their usage, but the reality is that some installations will simply not want/need the overhead of managing an external service. I think our approach should be that core features work out of the box with nothing more than a database (even that might be a local file), but that we allow extension points to add more and more layers of security and functionality. My proposal for Federated Principals and Roles is an example of this - basic identity management works, but if you want to be really secure, delegate identity and group membership to an external service that's tailor made for that. I 100% agree with the approach that we should expose secrets once and never again. This is a core tenet of the service and the reason why there are no secrets retrieval APIs. It's why @eric-maynard's changes to store only secret hashes works. But there has to be once. That means that during bootstrapping, we need the ability to return the root credentials to the user who bootstrapped. Never again will we return those credentials be accessible, but we have to be able to return them somewhere. I see a few approaches for this:
Personally, I like 1. It's clean and simple and there's no opportunity to misuse it. The tests and the in-memory startup will need to change, but I think that's a cost worth paying. |
Description
This adds a new flag,
BOOTSTRAP_PRINT_CREDENTIALS
, that controls whether the bootstrap command prints root credentials to stdout.If it's disabled, and environment variables were not provided to set the root credentials, bootstrapping will fail.
Fixes #450
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Credentials are now printed during bootstrap when it's enabled: