Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat) central cluster ADR #1

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ During developement the question got raised whatever it is a good decision/archi
> User story: Customer onboards a newly created cluster and requires an Ingress to expose applications. Via Greenhouse the Ingress Plugin can be configured which results in a deployment of the ingress controller within the customer cluster.
> The PluginConfig, dashboard reflects the current status of relevant underlying resources.

## Related Decision Records

Superseded by [Greenhouse-ADR-6-central_cluster.md](Greenhouse-ADR-6-central_cluster.md)

## Decision Drivers

* Should work with/ focus on the for MVP in scope Applications
Expand Down
232 changes: 232 additions & 0 deletions architecture-decision-records/Greenhouse-ADR-6-central_cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
# ADR-6 Central cluster

## Decision Contributors

- Arno Uhlig
- Ivo Gosemann
- David Rochow
- Martin Vossen
- David Gogl
- Fabian Ruff
- Richard Tief
- Tommy Sauer
- Timo Johner

## Status

- Proposed

## Context and Problem Statement

The central cluster in Greenhouse hosts non-organization specific core components as well as organization-specific metadata and configuration.
Organizations are isolated by namespaces and permissions (RBAC) are restricted to Greenhouse resources.
Granting more permissions would increase the attack surface and introduce additional risks.

Another aspect to consider is billing.
The shared nature of the central cluster and underlying infrastructure does not allow tenant-specific measurement and billing of consumed resources.
Thus workload in the central cluster is charged on the provider.

Moreover, workload within the central cluster is neither transparent nor accessible to the customer.
It cannot be configured, its metrics, logs, etc. are not exposed and access (kubectl exec/delete pod) is restricted.
Thus operations of all workload within the central cluster is on the provider.

From a network perspective and as documented in the security concept, communication is only uni-directional from the central to the remote clusters.

Currently, the central Prometheus Alertmanager (AM) is being run within the central cluster for each organization as part of the alerts plugin.
Since Prometheus servers push alerts to the AM, it is exposed via an ingress resource incl. TLS certificates and DNS records.
While this contributes to simplicity and easiness of use, this violates the security concept and introduces additional costs for the provider.
Moreover, it assumes the network zone of the central Greenhouse cluster is a good fit across all organizations and cloud providers.

Use cases being:

1) Prometheus Alertmanager for holistic alerting capabilities
2) Thanos query and ruler component for organization-wide access to decentralized metric stores
3) Grafana/Plutono for holistic dashboards
4) Heureka having multiple agents running on multiple clusters and data beeing consumed centrally

## Related Decision Records

Supersedes [Greenhouse-ADR-3-location_of_plugins.md](Greenhouse-ADR-3-location_of_plugins.md)

## Decision Drivers

- **Network Compatibility**
It assumes that the network zone of the central Greenhouse cluster is suitable for all organizations and cloud providers.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit mention: This would enable use cases residing in different hyperscalers.


- **Security aspects**
Increased permissions and capabilities enlarge the attack surface, introducing risks.

- **Operational concerns**
User-configurable workloads in the central cluster are not transparent to customers and must be managed by the Greenhouse team.

- **Billing**
Tenant-specific resources must be charged to the respective tenant.

- **Easiness of use**
Greenhouse should offer an easy way to manage operational aspects with a low entry barrier.

## Decision

Go with Option 1 - Central Admin Plugins:

- No user-configurable plugins should be allowed in the Greenhouse central cluster.
- Maintain restrictive permissions within the central cluster limited to Greenhouse resources.
- Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities.
They cannot be configured by a user and are fully managed by Greenhouse.
- A customer has to onboard at least one cluster to instantiate plugins with a backend.

---

## Evaluated options, technical details, etc.

### Option 1: Central Admin Plugins

```mermaid
flowchart LR
subgraph CentralCluster["Central Cluster"]
centralVPNPod["VPN Pod"]
centralAdminPlugin["Admin Plugin"]
centralAdminPlugin --> centralVPNPod
end
subgraph Cluster1["Cluster 1"]
direction LR
c1VPNPod["VPN Pod"]
c1API["Remote Plugin"]
c1VPNPod --> c1API
end
subgraph Cluster2["Cluster 2"]
direction LR
c2VPNPod["VPN Pod"]
c2API["Remote Plugin"]
c2VPNPod --> c2API
end
subgraph Cluster3["Cluster 3"]
direction LR
c3VPNPod["VPN Pod"]
c3API["Remote Plugin"]
c3VPNPod --> c3API
end
user["User"] -. Via Greenhouse .-> centralAdminPlugin
centralVPNPod -. WireGuard Tunnel .-> c1VPNPod & c2VPNPod & c3VPNPod
```

- No user-configurable plugins should be allowed in the Greenhouse central cluster.
- Maintain restrictive permissions within the central cluster limited to Greenhouse resources.
- Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities.
They cannot be configured by a user and are fully managed by Greenhouse.
- A customer has to onboard at least one cluster to instantiate plugins with a backend.

#### Pros

* operational well manageable from Greenhouse
* the limitations to admin plugins ensure that no misconfiguration by consumer is possible
* works with "store local query global" scenarios

#### Contra

* Puts a hard dependency to the central cluster availability for all plugins with a backend
* is that already the case??
* Limits decentralization of Greenhouse applications
* Additional effort required to make central data collection scenarios work

### Option 2: Per Org Central Communication cluster

```mermaid

flowchart LR
subgraph CentralCluster["Central Cluster"]
centralVPNPod["VPN Pod"]
end
subgraph Cluster1["Communication Cluster"]
direction LR
c1VPNPod["VPN Pod"]
c1API["Remote Plugin"]
c1VPNPod --> c1API
end
subgraph Cluster2["Cluster 2"]
direction LR
c2VPNPod["VPN Pod"]
c2API["Remote Plugin"]
c2VPNPod --> c2API
end
subgraph Cluster3["Cluster 3"]
direction LR
c3VPNPod["VPN Pod"]
c3API["Remote Plugin"]
c3VPNPod --> c3API
end
user["User"] -. Via Greenhouse .-> centralVPNPod
centralVPNPod -. WireGuard Tunnel .-> c1VPNPod & c2VPNPod & c3VPNPod
c1VPNPod <-. WireGuardTunnel .-> c3VPNPod & c2VPNPod
```

* Each organization has its own communication cluster
* Communication cluster is responsible for establishing communication capability between clusters and establishes bi-directional connections
* Commnunication cluster is owned by respective organization / consumer
* No Plugins are allowed on the Communication Cluster nor on the Central Cluster
* As before no cluster is allowed to communicate with the central cluster

#### Pros

* allows consumers to establish full interconnectivity between clusters
* enables common use case for plugins where data is collected decentrally and stored centrally
* all plugins live in the consumer clusters

#### Contra

* additional operational complexity
* consumers may rely on the interconnectivity solution for other applications then greenhouse increasing blast radius of potential misconfigurations
* additional security risk for consumers as interconnected clusters potentially allow attackers to move between remote clusters

### Option 3: Greenhouse Cluster per Organization

```mermaid

flowchart LR
subgraph CentralClusterOrgA["Central Cluster Org A"]
adminPluginOrgA["Admin Plugin"]
end
subgraph CentralClusterOrgB["Central Cluster Org B"]
adminPluginOrgB["Admin Plugin"]
centralVPNPod["VPN Pod"]
adminPluginOrgB --> centralVPNPod
end
subgraph Cluster1["Cluster 1"]
direction LR
c1API["Remote Plugin"]
end
subgraph Cluster2["Cluster 2"]
direction LR
c2API["Remote Plugin"]
end
subgraph Cluster3["Cluster 3"]
direction LR
c3VPNPod["VPN Pod"]
c3API["Remote Plugin"]
c3VPNPod --> c3API
end
userOrgA["Org A User"] -. Via Greenhouse .-> adminPluginOrgA
userOrgB["Org B User"] -. Via Greenhouse .-> adminPluginOrgB
adminPluginOrgA -. Direct Access .-> c1API & c2API
centralVPNPod -. Wireguard Tunnel .-> c3VPNPod
```

- Each organization has their own central cluster, owned and operated by them
- Greenhouse (eventually) also provides a Managed Greenhouse Central Cluster
- Admin plugins are allowed in the central cluster, and may be configured by the organization
- Access to previously shared Greenhouse components possible

#### Pros

- Organization can choose a suitable network zone for their central cluster, allowing for direct access to Clusters
- No dependencies between Orgs on AdminPlugin updates due to CRD changes
- Organization can configure AdminPlugins to their needs
- Access to Greenhouse Logs & Metrics available
- No shared costs that are not billable to the organization
- Can be run on a trial period to evaluate the concept, as this can be reverted into the current state if needed

#### Contra

- Harder to support Organizations without access to their Greenhouse Central Cluster
- No longer a OOB solution for Organizations
- Increased operation complexity for Organizations