generated from cloudoperators/repository-template
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat) central cluster ADR #1
Open
auhlig
wants to merge
5
commits into
main
Choose a base branch
from
feat/adr-central-cluster
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
232 changes: 232 additions & 0 deletions
232
architecture-decision-records/Greenhouse-ADR-6-central_cluster.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,232 @@ | ||
# ADR-6 Central cluster | ||
|
||
## Decision Contributors | ||
|
||
- Arno Uhlig | ||
- Ivo Gosemann | ||
- David Rochow | ||
- Martin Vossen | ||
- David Gogl | ||
- Fabian Ruff | ||
- Richard Tief | ||
- Tommy Sauer | ||
- Timo Johner | ||
|
||
## Status | ||
|
||
- Proposed | ||
|
||
## Context and Problem Statement | ||
|
||
The central cluster in Greenhouse hosts non-organization specific core components as well as organization-specific metadata and configuration. | ||
Organizations are isolated by namespaces and permissions (RBAC) are restricted to Greenhouse resources. | ||
Granting more permissions would increase the attack surface and introduce additional risks. | ||
|
||
Another aspect to consider is billing. | ||
The shared nature of the central cluster and underlying infrastructure does not allow tenant-specific measurement and billing of consumed resources. | ||
Thus workload in the central cluster is charged on the provider. | ||
|
||
Moreover, workload within the central cluster is neither transparent nor accessible to the customer. | ||
It cannot be configured, its metrics, logs, etc. are not exposed and access (kubectl exec/delete pod) is restricted. | ||
Thus operations of all workload within the central cluster is on the provider. | ||
|
||
From a network perspective and as documented in the security concept, communication is only uni-directional from the central to the remote clusters. | ||
|
||
Currently, the central Prometheus Alertmanager (AM) is being run within the central cluster for each organization as part of the alerts plugin. | ||
Since Prometheus servers push alerts to the AM, it is exposed via an ingress resource incl. TLS certificates and DNS records. | ||
While this contributes to simplicity and easiness of use, this violates the security concept and introduces additional costs for the provider. | ||
Moreover, it assumes the network zone of the central Greenhouse cluster is a good fit across all organizations and cloud providers. | ||
|
||
Use cases being: | ||
|
||
1) Prometheus Alertmanager for holistic alerting capabilities | ||
2) Thanos query and ruler component for organization-wide access to decentralized metric stores | ||
3) Grafana/Plutono for holistic dashboards | ||
4) Heureka having multiple agents running on multiple clusters and data beeing consumed centrally | ||
|
||
## Related Decision Records | ||
|
||
Supersedes [Greenhouse-ADR-3-location_of_plugins.md](Greenhouse-ADR-3-location_of_plugins.md) | ||
|
||
## Decision Drivers | ||
|
||
- **Network Compatibility** | ||
It assumes that the network zone of the central Greenhouse cluster is suitable for all organizations and cloud providers. | ||
|
||
- **Security aspects** | ||
Increased permissions and capabilities enlarge the attack surface, introducing risks. | ||
|
||
- **Operational concerns** | ||
User-configurable workloads in the central cluster are not transparent to customers and must be managed by the Greenhouse team. | ||
|
||
- **Billing** | ||
Tenant-specific resources must be charged to the respective tenant. | ||
|
||
- **Easiness of use** | ||
Greenhouse should offer an easy way to manage operational aspects with a low entry barrier. | ||
|
||
## Decision | ||
|
||
Go with Option 1 - Central Admin Plugins: | ||
|
||
- No user-configurable plugins should be allowed in the Greenhouse central cluster. | ||
- Maintain restrictive permissions within the central cluster limited to Greenhouse resources. | ||
- Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. | ||
They cannot be configured by a user and are fully managed by Greenhouse. | ||
- A customer has to onboard at least one cluster to instantiate plugins with a backend. | ||
|
||
--- | ||
|
||
## Evaluated options, technical details, etc. | ||
|
||
### Option 1: Central Admin Plugins | ||
|
||
```mermaid | ||
flowchart LR | ||
subgraph CentralCluster["Central Cluster"] | ||
centralVPNPod["VPN Pod"] | ||
centralAdminPlugin["Admin Plugin"] | ||
centralAdminPlugin --> centralVPNPod | ||
end | ||
subgraph Cluster1["Cluster 1"] | ||
direction LR | ||
c1VPNPod["VPN Pod"] | ||
c1API["Remote Plugin"] | ||
c1VPNPod --> c1API | ||
end | ||
subgraph Cluster2["Cluster 2"] | ||
direction LR | ||
c2VPNPod["VPN Pod"] | ||
c2API["Remote Plugin"] | ||
c2VPNPod --> c2API | ||
end | ||
subgraph Cluster3["Cluster 3"] | ||
direction LR | ||
c3VPNPod["VPN Pod"] | ||
c3API["Remote Plugin"] | ||
c3VPNPod --> c3API | ||
end | ||
user["User"] -. Via Greenhouse .-> centralAdminPlugin | ||
centralVPNPod -. WireGuard Tunnel .-> c1VPNPod & c2VPNPod & c3VPNPod | ||
``` | ||
|
||
- No user-configurable plugins should be allowed in the Greenhouse central cluster. | ||
- Maintain restrictive permissions within the central cluster limited to Greenhouse resources. | ||
- Introduce `AdminPlugins` to utilize the plugin concept for handling core responsibilities. | ||
They cannot be configured by a user and are fully managed by Greenhouse. | ||
- A customer has to onboard at least one cluster to instantiate plugins with a backend. | ||
|
||
#### Pros | ||
|
||
* operational well manageable from Greenhouse | ||
* the limitations to admin plugins ensure that no misconfiguration by consumer is possible | ||
* works with "store local query global" scenarios | ||
|
||
#### Contra | ||
|
||
* Puts a hard dependency to the central cluster availability for all plugins with a backend | ||
* is that already the case?? | ||
* Limits decentralization of Greenhouse applications | ||
* Additional effort required to make central data collection scenarios work | ||
|
||
### Option 2: Per Org Central Communication cluster | ||
|
||
```mermaid | ||
|
||
flowchart LR | ||
subgraph CentralCluster["Central Cluster"] | ||
centralVPNPod["VPN Pod"] | ||
end | ||
subgraph Cluster1["Communication Cluster"] | ||
direction LR | ||
c1VPNPod["VPN Pod"] | ||
c1API["Remote Plugin"] | ||
c1VPNPod --> c1API | ||
end | ||
subgraph Cluster2["Cluster 2"] | ||
direction LR | ||
c2VPNPod["VPN Pod"] | ||
c2API["Remote Plugin"] | ||
c2VPNPod --> c2API | ||
end | ||
subgraph Cluster3["Cluster 3"] | ||
direction LR | ||
c3VPNPod["VPN Pod"] | ||
c3API["Remote Plugin"] | ||
c3VPNPod --> c3API | ||
end | ||
user["User"] -. Via Greenhouse .-> centralVPNPod | ||
centralVPNPod -. WireGuard Tunnel .-> c1VPNPod & c2VPNPod & c3VPNPod | ||
c1VPNPod <-. WireGuardTunnel .-> c3VPNPod & c2VPNPod | ||
``` | ||
|
||
* Each organization has its own communication cluster | ||
* Communication cluster is responsible for establishing communication capability between clusters and establishes bi-directional connections | ||
* Commnunication cluster is owned by respective organization / consumer | ||
* No Plugins are allowed on the Communication Cluster nor on the Central Cluster | ||
* As before no cluster is allowed to communicate with the central cluster | ||
|
||
#### Pros | ||
|
||
* allows consumers to establish full interconnectivity between clusters | ||
* enables common use case for plugins where data is collected decentrally and stored centrally | ||
* all plugins live in the consumer clusters | ||
|
||
#### Contra | ||
|
||
* additional operational complexity | ||
* consumers may rely on the interconnectivity solution for other applications then greenhouse increasing blast radius of potential misconfigurations | ||
* additional security risk for consumers as interconnected clusters potentially allow attackers to move between remote clusters | ||
|
||
### Option 3: Greenhouse Cluster per Organization | ||
|
||
```mermaid | ||
|
||
flowchart LR | ||
subgraph CentralClusterOrgA["Central Cluster Org A"] | ||
adminPluginOrgA["Admin Plugin"] | ||
end | ||
subgraph CentralClusterOrgB["Central Cluster Org B"] | ||
adminPluginOrgB["Admin Plugin"] | ||
centralVPNPod["VPN Pod"] | ||
adminPluginOrgB --> centralVPNPod | ||
end | ||
subgraph Cluster1["Cluster 1"] | ||
direction LR | ||
c1API["Remote Plugin"] | ||
end | ||
subgraph Cluster2["Cluster 2"] | ||
direction LR | ||
c2API["Remote Plugin"] | ||
end | ||
subgraph Cluster3["Cluster 3"] | ||
direction LR | ||
c3VPNPod["VPN Pod"] | ||
c3API["Remote Plugin"] | ||
c3VPNPod --> c3API | ||
end | ||
userOrgA["Org A User"] -. Via Greenhouse .-> adminPluginOrgA | ||
userOrgB["Org B User"] -. Via Greenhouse .-> adminPluginOrgB | ||
adminPluginOrgA -. Direct Access .-> c1API & c2API | ||
centralVPNPod -. Wireguard Tunnel .-> c3VPNPod | ||
``` | ||
|
||
- Each organization has their own central cluster, owned and operated by them | ||
- Greenhouse (eventually) also provides a Managed Greenhouse Central Cluster | ||
- Admin plugins are allowed in the central cluster, and may be configured by the organization | ||
- Access to previously shared Greenhouse components possible | ||
|
||
#### Pros | ||
|
||
- Organization can choose a suitable network zone for their central cluster, allowing for direct access to Clusters | ||
- No dependencies between Orgs on AdminPlugin updates due to CRD changes | ||
- Organization can configure AdminPlugins to their needs | ||
- Access to Greenhouse Logs & Metrics available | ||
- No shared costs that are not billable to the organization | ||
- Can be run on a trial period to evaluate the concept, as this can be reverted into the current state if needed | ||
|
||
#### Contra | ||
|
||
- Harder to support Organizations without access to their Greenhouse Central Cluster | ||
- No longer a OOB solution for Organizations | ||
- Increased operation complexity for Organizations |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicit mention: This would enable use cases residing in different hyperscalers.