Before I started this project, I had PlanetScale Scaler Pro, Vercel Teams, Fly.io, Railway...One night I realised the bills from Vercel alone was higher than my monthly grocery budget!!!
Let's say if I am making a web app for my own convenience, I might buy a server in the UK, and deploy both the app and database on the same server, it will load instantly for me, and the data traveling between server and database would be short.
But how about my friends in the US? In Asia? In Australia? I soon bought a server in Singapore to test it.
Why slow? It took me close to 38.2 seconds to wait for data to come back after I clicked on login. The speed test was run under a full stack Golang project (the worse stack for frontend, amazing tool for backend, I won't say twice), all server render, no hydration. This means the page will be blank until the data comes in unless I build separate handlers for specific components that need data.
If I find a service provider that gives me this level of latency, I am out (unless they are very pretty 👉🏻👈🏻).
I took inspiration from Jeff Geerling's Raspberry Pi Cluster Project - if your stack can run under extreme conditions like the Rasberry Pi (ARM with 1GB RAM), you are golden. And you will learn a lot from running things on bare metal.
I stop thinking about distributed system, servers and their location but instead, I think about the OS of our generation -- Kubernetes to handle the complexity and separation of servers. Stripping down the distance, and physicality of servers, merging them as one. The Unix of distributed operation system excites me.
I know what I want. I can't afford EKS by myself. I want a deployment strategy that's optimised no matter how extreme the condition is. I can't guarantee the experience on the edge
with milliseconds of cold start-- the best I can do is a server close enough to my friends and the shortest distance between server and database.
Let the fiber handle the rest. I pray.
- GEO DNS: based on the requested user's location I route them to a server that's closest to them.
- Application replication: I don't want to only have one database on one server. I want every server to have a replica of the same database. Each server might have multiple databases for different applications. Each server should also have replicas of the same applications.
- Node Affinity: Each application in the same node should only talk to the database in the same node to allow the best speed. The communication happens within the node but not outside.
I bought 4 servers from a cloud provider around the world: London, Frankfurt, Seattle and Singapore. It's not the managed services from GKE or EKS that help you manage your Kubernetes cluster -- the only thing that came with them was the fact that they were booted with Debian 12.
Currently:
- Prometheus & Grafana dashboard: we install it via the community version kube-prometheus-stack it will come with every necessity baked in.
- Traefik: auto-installed by k3s
- Cert Manager
Todo:
- Longhorn
- BullMQ
- ...?
To provision all servers
ansible-playbook playbook/site.yml
To reset all servers
ansible-playbook playbook/reboot.yml
After k3s is installed on the master run:
scp root@<master-ip>:~/.kube/config ~/.kube/config-ctb-london
Edit the ~/.kube/config-ctb-london
server address to the master node's address
And then set it as environment variable as:
export KUBECONFIG=~/.kube/config-ctb-london
We can check all the nodes and roles by running
kubectl get nodes -o wide
I did something messed up on my first attempt: I forgot to make sure all pods from this stack should be on the same node. It had the database in Frankfurt, alert manager in Seattle and Grafana in Singapore. So I walked the ConfigSet and reassigned all nodes to London via the easiest way. I found the label I need for London node from OpenLens (OpenLens good).
- Run
ansible-playbook helm/prereq.yml
to ensure helm is installed and if it is the latest version.
- Install Prometheus and Grafana stack
In the prometheus install play, using ansible's built-in helm module to install helm chart.
To learn more about the relationship between nodes, pods, and their relationships, look at taint and toleration and node affinity - assigning pods to nodes on kubernetes documentation.
Run
ansible-playbook helm/prometheus/install.yml -vvv
:::tips
use -vvv
for VERY VERBOSE DEBUG MODE
:::
Check it health by running
kubectl --namespace monitoring get pods
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
After installing Prometheus and Grafana, remember to install openlens on the local computer to monitor the cluster by
brew install openlens
After installing openlens remember to add a plugin on openlens @alebcay/openlens-node-pod-menu
OpenLens will allow you to enter pods' terminal (imagine docker exec -it <mycontainer> bash
) and monitor their health without getting cancer. Be careful of what the plugin can do it might terminate a pod.
For services that are deployed to kubernetes, without active deployment you can access their dashboard via port forwarding on openlens.
Make sure that within the pip role we are installling "pyyaml" and "kubernetes" otherwise Helm configuration and ClusterIssuer configuration will fail.
In the site.yml we have splitted roles into different tags. To run a standalone role we can do
ansible-playbook plabook/site.yml -t cluster-pip # give it a tag
After that run to install cert manager as well as configuring ClusterIssuer for Kubernetes
ansible-playbook helm/cert-manager/install.yml
Current Available dashboard
- Prometheus alert manager
- Grafana
Without the help, discussion with my friends Anna and Martin this project wouldn't have started. Also thank Rancher's k3s-ansible project and Jeff's pip role.