Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imagePullSecrets for GCR - gcp secrets engine #123

Open
TJM opened this issue Oct 25, 2021 · 16 comments · May be fixed by #130
Open

imagePullSecrets for GCR - gcp secrets engine #123

TJM opened this issue Oct 25, 2021 · 16 comments · May be fixed by #130

Comments

@TJM
Copy link

TJM commented Oct 25, 2021

Hi all,

I was looking at #54 to create imagePullSecrets, and that looks like it might work, but the secret that I am trying to access is not a "kv" type. The credential comes from the gcp secrets engine. So, as my goal is to get a secret (imagePullSecrets) to access GCR, would it be better to try to hack at this code to use the GCP secrets engine in Vault, or to hack at something else like the vault agent to create a kubernetes secret?

@TJM
Copy link
Author

TJM commented Oct 28, 2021

So far, I am running vault agent, and dumping the gitlab-runner-sa.json (key.json) file out... then use the following command to "create" the secret... the .dockerconfigjson is a PITA ... embedding that entire mess of json into the "password" field seems really wrong to me.

template:

{{ with secret "path/to/gcp/key/gitlab-runner-sa" }}
{{ base64Decode .Data.private_key_data }}
{{ end }}

relevant part of agent config template:

  command = "bash -c 'EMAIL=$(jq -r .client_email gitlab-runner-sa.json); kubectl create secret docker-registry gcr-secret --docker-server gcr.io --docker-username _json_key --docker-email \"$EMAIL\" --docker-password \"$(cat gitlab-runner-sa.json)\" --dry-run -o yaml | kubectl apply --namespace gitlab-runner -f -'"

@ricoberger
Copy link
Owner

Hi @TJM, I think adding support for the Google Cloud Secrets Engine would be the best way to support your use case. Unfortunately I do not have that much experience with GCP and Vault.

If you or someone else wants to add support for the Google Cloud Secrets Engine, I think it could be done in a similar way like it was done for Azure: #114

@TJM
Copy link
Author

TJM commented Nov 1, 2021

That PR looks like an authentication mechanism, rather than a secrets engine. The GCP Secrets Engine would replace KV or KVV2, which hold manual/static secrets, with dynamic secrets (like AWS it creates the service account and returns the credentials). I have a vault agent based solution (hack?) working, but I would consider it to be very alpha. https://github.com/TJM/vault-gcr-secrets

I think it would be better to tie it into VSO at some point. I just need to wrap my head around the secrets engine interfaces and see how hard it would be to add GCP. :)

~tommy

@ricoberger
Copy link
Owner

Ah ok, sorry for misunderstanding, hopefully I got it now:

  • For the authentication against Vault, the Operator should use one of the existing authentication methods (e.g. Kubernetes auth method, token auth method, ...)
  • Then the operator should support the command vault read gcp/roleset/my-key-roleset/key, so that the private_key_data can we used within a Kubernetes secret?

Did I get it right now?

If this is correct I think we can support this. We already have a secretEngine field in the CRD, which can be set to gcp.

If this is the case we can check for the secretEngine in the GetSecret function and try to get the secret in a similar way as for the KV engine.

Does this makes sense?

@TJM
Copy link
Author

TJM commented Nov 1, 2021

Yes, sir! This sounds right.

To make things a little more complicated, we would have to do some trickery with the data returned because of the way that kubernetes formats the "docker-registry" type certificates, or rather the fact that Google wants the entire key.json as the docker-password option (JSON in JSON). I think we can probably handle it, but the "escaping" of all the quotes in the json strong had me stumped, so I just used kubectl create secret .... --dry-run=client -o yaml | kubectl apply -f to get the secret added/updated.

~tommy

@ricoberger
Copy link
Owner

ricoberger commented Nov 2, 2021

Can you maybe share the output of the following command vault read -format=json gcp/roleset/my-key-roleset/key without any confidential data?

Edit: Maybe also the output of vault read -output-curl-string gcp/roleset/my-key-roleset/key can be helpful.

@TJM
Copy link
Author

TJM commented Nov 5, 2021

@ricoberger The output is pretty boring... I can tell you that the part that we care about is base64 encoded in the json key private_key_data.

We use the vault template:

{{ with secret "path/to/gcp/key/gitlab-runner-sa" }}
{{ base64Decode .Data.private_key_data }}
{{ end }}

Here is the censored version:

{
  "request_id": "2b0f5057-411d-aa4f-8df7-8b6b91b849de",
  "lease_id": "path/to/gcp/key/gitlab-runner-sa/(CENSORED-SOME-ID)",
  "lease_duration": 2592000,
  "renewable": true,
  "data": {
    "key_algorithm": "KEY_ALG_RSA_2048",
    "key_type": "TYPE_GOOGLE_CREDENTIALS_FILE",
    "private_key_data": "(CENSORED-BASE64-ENCODED-keyfile)="
  },
  "warnings": null
}

the last one is pretty straightforward too, as it doesn't really connect to vault...

[tmcneely@den3l1cliqa74077 vault-agent]$ vault read -output-curl-string gcp/roleset/my-key-roleset/key
curl -H "X-Vault-Request: true" https://127.0.0.1:8200/v1/gcp/roleset/my-key-roleset/key

@ricoberger ricoberger linked a pull request Nov 10, 2021 that will close this issue
@ricoberger
Copy link
Owner

Hi @TJM, I tried to implement support for GCP.

I would need some help to verify that it is working. Can you maybe take a look at #130 and test the Docker image with the dev tag?

This would be very helpful.

@TJM
Copy link
Author

TJM commented Nov 12, 2021

@ricoberger THANKS! ... I saw this yesterday, but was in meetings all day ;( ... I need to get some project work completed today, but will jump on testing this next week. Sorry for the delay

@TJM
Copy link
Author

TJM commented Nov 18, 2021

Looks like this was released a couple days ago? I will hopefully get a chance to try it out :) I am running into an issue finding an environment that I could put a development version of VSO... now that it is released, maybe I can talk them into it :-/

@ricoberger
Copy link
Owner

Hi, this is only available via the dev tag yet. I would like to verify that it is working like expected before an official release.

@TJM
Copy link
Author

TJM commented Dec 2, 2021

This is looking OK, so far...

NOTE: We already had the kubernetes auth setup...

Created the prerequisites: vault_gcp_secret_roleset, kubernetes_service_account, vault_kubernetes_auth_backend_role, vault_policy ...

Then the helm chart:

resource "helm_release" "a0000_vault_secrets_operator" {
  name       = "vault-secrets-operator"
  namespace  = kubernetes_namespace.a0000_anthos_namespace.metadata.0.name
  repository = "https://ricoberger.github.io/helm-charts"
  chart      = "vault-secrets-operator"
  values = [
    templatefile("${path.module}/files/vault_secrets_operator.yaml.tpl", {
      KUBERNETES_PATH = "auth/${vault_auth_backend.kubernetes.path}"
      KUBERNETES_ROLE = vault_kubernetes_auth_backend_role.a0000_vault_secrets_operator.role_name
      KUBERNETES_SA   = kubernetes_service_account.a0000_vault_secrets_operator.metadata.0.name
      RBAC_NAMESPACED = false # Limit VSO to its own namespace
      NAMESPACES      = [kubernetes_namespace.a0000_anthos_namespace.metadata.0.name]
    })
  ]

  depends_on = [
    vault_kubernetes_auth_backend_role.a0000_vault_secrets_operator,
    vault_policy.a0000,
    kubernetes_service_account.a0000_vault_secrets_operator,
  ]
  // FIXME: Create VaultSecrets for each of the three service account keys

}

with the following yaml tmpl mentioned above:

### WARNING! Changing this file changes *all* VaultSecretOperators
### Use an instance variable whenever possible
image:
  repository: docker-remote.artifactory.company.com/ricoberger/vault-secrets-operator
  tag: dev
resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 100m
    memory: 128Mi
vault:
  address: https://vault.company.com
  authMethod: kubernetes
  kubernetesPath: ${KUBERNETES_PATH}
  kubernetesRole: ${KUBERNETES_ROLE}
  namespaces: ${join(",", NAMESPACES)}
rbac:
  namespaced: ${RBAC_NAMESPACED}
serviceAccount:
  create: false
  name: ${KUBERNETES_SA}

Then, the secret looks like:

apiVersion: ricoberger.de/v1alpha1
kind: VaultSecret
metadata:
  name: a0000-bq-secret
spec:
  isBinary: true
  keys:
    - private_key_data
  path: ai/np/gcp/key/a0000-bq
  secretEngine: gcp
  type: Opaque

... which I had to apply manually because apparently raw yaml (CRDs) are not well liked by terraform... I have created a helm chart in the past to get around this, still thinking on how to manage these in terraform.

Logs:

{"level":"info","ts":1638403415.9145195,"logger":"vault","msg":"Reconciliation is enabled.","ReconciliationTime":0}
I1202 00:03:37.569527       1 request.go:665] Waited for 1.04742487s due to client-side throttling, not priority and fairness, request: GET:https://10.202.208.1:443/apis/node.k8s.io/v1beta1?timeout=32s
{"level":"info","ts":1638403419.1731763,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1638403419.1738043,"logger":"setup","msg":"starting manager"}
I1202 00:03:39.174152       1 leaderelection.go:248] attempting to acquire leader lease a0000-ro-feat-v-8430/vaultsecretsoperator.ricoberger.de...
{"level":"info","ts":1638403419.1742282,"msg":"starting metrics server","path":"/metrics"}
I1202 00:04:02.307290       1 leaderelection.go:258] successfully acquired lease a0000-ro-feat-v-8430/vaultsecretsoperator.ricoberger.de
{"level":"info","ts":1638403442.3074925,"logger":"controller.vaultsecret","msg":"Starting EventSource","reconciler group":"ricoberger.de","reconciler kind":"VaultSecret","source":"kind source: /, Kind="}
{"level":"info","ts":1638403442.307584,"logger":"controller.vaultsecret","msg":"Starting EventSource","reconciler group":"ricoberger.de","reconciler kind":"VaultSecret","source":"kind source: /, Kind="}
{"level":"info","ts":1638403442.3075905,"logger":"controller.vaultsecret","msg":"Starting Controller","reconciler group":"ricoberger.de","reconciler kind":"VaultSecret"}
{"level":"info","ts":1638403442.4085374,"logger":"controller.vaultsecret","msg":"Starting workers","reconciler group":"ricoberger.de","reconciler kind":"VaultSecret","worker count":1}
{"level":"info","ts":1638403442.4087484,"logger":"controllers.VaultSecret","msg":"Use shared client to get secret from Vault","vaultsecret":"a0000-ro-feat-v-8430/a0000-bq-secret"}
{"level":"info","ts":1638403442.4087753,"logger":"vault","msg":"Read secret ai/np/gcp/key/a0000-bq"}
{"level":"info","ts":1638403442.798935,"logger":"controllers.VaultSecret","msg":"Updating a Secret","vaultsecret":"a0000-ro-feat-v-8430/a0000-bq-secret","Secret.Namespace":"a0000-ro-feat-v-8430","Secret.Name":"a0000-bq-secret"}
{"level":"info","ts":1638403442.8157718,"logger":"controllers.VaultSecret","msg":"Use shared client to get secret from Vault","vaultsecret":"a0000-ro-feat-v-8430/a0000-bq-secret"}
{"level":"info","ts":1638403442.8158085,"logger":"vault","msg":"Read secret ai/np/gcp/key/a0000-bq"}
{"level":"info","ts":1638403443.4009356,"logger":"controllers.VaultSecret","msg":"Updating a Secret","vaultsecret":"a0000-ro-feat-v-8430/a0000-bq-secret","Secret.Namespace":"a0000-ro-feat-v-8430","Secret.Name":"a0000-bq-secret"}

... though I am a little concerned that it appeared to update the secret twice. Every time the secret is "read" a new key is created, max 10 ... so it can't "read" the secret to reconcile, it needs to pay attention to the lease time. I have not had a chance to look at the code yet to see if it does that...

AND NOW, other than the secret having the "wrong" key name (they were expecting a0000-bq.json, but it's using the private_key_data (as expected), the value of the secret is "good" !!! :)

@TJM
Copy link
Author

TJM commented Dec 2, 2021

One thing we found is that it is a little too aggressive on retries... I messed up the path and it looked like it tried ~45 times in just a few minutes. I can send the logs if you want, but I think it needs to back off when it gets permission denied (403) - I dont think this is "gcp" specific?

@TJM
Copy link
Author

TJM commented Dec 3, 2021

FYI: We are using https://github.com/TJM/vault-secret-helmchart to create our VaultSecret objects through Terraform ;)

@ricoberger
Copy link
Owner

Hi @TJM, thanks for testing.

  1. Yes the secret will be read twice, when it doesn't already exists. Thats the current behavior of the operator. I will check if we can somehow omit the second reconciliation after the secret was created.
  2. If I get you right, the returned secret for GCP has a lease time? The operator currently only supports a global setting for this, but not on a per secret basis. Does this already help?
  3. For the expected key name, maybe this can be changed to the expected value by the user via https://github.com/ricoberger/vault-secrets-operator#using-templated-secrets
  4. After an error the operator tries to reconcile the secret immediately. I'm not sure if we can change this behavior. Maybe we can save the number of failed reconciliations in the status field and use this for an exponential backoff logic.

@TJM
Copy link
Author

TJM commented Dec 6, 2021

Hi @TJM, thanks for testing.

  1. Yes the secret will be read twice, when it doesn't already exists. Thats the current behavior of the operator. I will check if we can somehow omit the second reconciliation after the secret was created.

That would be best if it was only "read" once, since each time it is "read" it creates a new key (lease).

  1. If I get you right, the returned secret for GCP has a lease time? The operator currently only supports a global setting for this, but not on a per secret basis. Does this already help?

Oh, Vault leases :)

  • The operator's "Vault Token" (auth token) itself has a lease time, which must be kept renewed (usually around 80% of its lifetime?), as if it expires it looses its lease to anything that it had, which will cause the resulting service account credential to be invalidated.
  • Credential lease time - https://www.vaultproject.io/docs/secrets/gcp#service-account-keys -- firstly there is a limit of 10 leases, so that will need to be another "test case" (still trying to figure out how to automate that), but long story short, those credentials also have a lease time, which in some cases is renewable, like any other vault lease.

I think with both types of leases, you can "request" a certain length, but vault policies may limit that.

  1. For the expected key name, maybe this can be changed to the expected value by the user via https://github.com/ricoberger/vault-secrets-operator#using-templated-secrets

Yep, I am keeping templating in my back pocket, in case there is push back on using the name that vault/google call the secret. I think it should be ok, or even "better" to use a "common" name like private_key_data. Time will tell :)

  1. After an error the operator tries to reconcile the secret immediately. I'm not sure if we can change this behavior. Maybe we can save the number of failed reconciliations in the status field and use this for an exponential backoff logic.

Some sort of backoff does seem like a good idea, to reduce the impactful load against Vault. Or I could stop making mistakes? Nope. That doesn't sound like it will happen. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants