Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KNI [Kubernetes Networking Interface] Initial Draft KEP #4477

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
2bede53
init of kni kep
MikeZappa87 Jan 11, 2024
14eeea2
update issue number
MikeZappa87 Jan 17, 2024
eae3341
WIP: KNI KEP
MikeZappa87 Jan 24, 2024
00209db
chore: remove shaneutt as approver
shaneutt Jan 25, 2024
9e9ef49
chore: add title to KEP
shaneutt Jan 25, 2024
eae3f0c
chore: first draft of a motivation section
shaneutt Jan 25, 2024
68738dd
Merge pull request #1 from shaneutt/kni-kep
MikeZappa87 Jan 26, 2024
9f215c6
Merge branch 'kubernetes:master' into KNI-KEP
MikeZappa87 Jan 26, 2024
217f1c3
change ordering of goals
MikeZappa87 Jan 26, 2024
64eca47
update goals and summary
MikeZappa87 Jan 27, 2024
664c2e0
update goals/non goals and notes
MikeZappa87 Jan 27, 2024
17a0fa6
Update keps/sig-network/4410-k8s-network-interface/README.md
MikeZappa87 Jan 30, 2024
d547f62
update with shane comments
MikeZappa87 Jan 30, 2024
f957158
Merge pull request #3 from MikeZappa87/zappa/v2
MikeZappa87 Jan 30, 2024
abc4210
add create network
MikeZappa87 Jan 30, 2024
cefc7c9
chore: cleanup template text and blank space
shaneutt Jan 30, 2024
a6e3c30
Merge pull request #4 from shaneutt/shaneutt/kni-cleanup-template
MikeZappa87 Jan 30, 2024
1f05981
support vm/kata
MikeZappa87 Jan 30, 2024
e770486
docs: another pass at the kni kep goals
shaneutt Jan 30, 2024
855d5e7
Merge pull request #5 from shaneutt/shaneutt/kni-goals-2
MikeZappa87 Jan 31, 2024
8a33b31
docs: add goal about Pod network ns APIs
shaneutt Feb 1, 2024
0c3fb89
docs: add a user story for network ns goals to KNI KEP
shaneutt Feb 1, 2024
325bbfc
Merge pull request #6 from shaneutt/patch-1
MikeZappa87 Feb 1, 2024
17baf99
update motivation
MikeZappa87 Feb 2, 2024
1bfd49b
Update keps/sig-network/4410-k8s-network-interface/kep.yaml
MikeZappa87 Feb 3, 2024
49e5614
update kep goals per discussions
MikeZappa87 Feb 7, 2024
34d21b7
update kep goals per discussions
MikeZappa87 Feb 7, 2024
d6d9a5c
update kep goals per discussions
MikeZappa87 Feb 7, 2024
82af8a0
Merge pull request #7 from MikeZappa87/update-kep
MikeZappa87 Feb 7, 2024
3177ee4
Update keps/sig-network/4410-k8s-network-interface/README.md
MikeZappa87 Feb 12, 2024
61281b5
Update keps/sig-network/4410-k8s-network-interface/README.md
MikeZappa87 Feb 14, 2024
1c3107b
update kep and temp remove user stories
MikeZappa87 Feb 15, 2024
2081e13
update goals
MikeZappa87 Feb 15, 2024
cd3f4b2
update goal
MikeZappa87 Feb 15, 2024
9d2ee29
docs: add options for KNI controllers to KNI KEP
shaneutt Feb 21, 2024
30d4804
Merge pull request #10 from shaneutt/shaneutt/kni-kep-alternatives-co…
MikeZappa87 Feb 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions keps/sig-network/4410-k8s-network-interface/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# KEP-4410: Kubernetes Networking reImagined

> **NOTE**: for the initial PR we've removed a lot of the templated text and
> aimed to keep this first iteration small and easier to consume. We are only
> focusing on the "What" and "Why" (e.g. motivation, goals, user stories) for
> this iteration so that we can build consensus on those first before we add
> any of the "How".

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [User Stories (Optional)](#user-stories-optional)
- [Story 1](#story-1)
- [Story 2](#story-2)
<!-- /toc -->

## Summary

This proposal is to design and implement the KNI [Kubernetes Networking Interface] or better known as Kubernetes Networking reImagined. KNI will create a Network resource and provide an API that will provide network status, availability, how to attach a pod to a network, detach the pod from the network and update a pods network.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's "better" known as anything, given that it mostly does not exist yet.

The Summary and Motivation should not assume that the reader is already familiar with the idea.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is funny you mention this because reImagined is how it got recirculated back to me and I just ran with it. However, 100% the summary/motivation should be written in a way that a nontechnical reader should understand. However, that is a high bar to hit as the current is very difficult and through the years, I have found that small numbers of people can actually articulate the current accurately.

Copy link
Member

@shaneutt shaneutt Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure we can call the project "Kubernetes Networking Interface" and then colloquially we can refer to the effort as "reImagined" in less formal settings, recommend:

Suggested change
This proposal is to design and implement the KNI [Kubernetes Networking Interface] or better known as Kubernetes Networking reImagined. KNI will create a Network resource and provide an API that will provide network status, availability, how to attach a pod to a network, detach the pod from the network and update a pods network.
This proposal is to design and implement a KNI [Kubernetes Networking Interface]. KNI will create a Network resource and provide an API that will provide network status, availability, how to attach a pod to a network, detach the pod from the network and update a pods network.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first time trying to get familiar with the KNI project, so started reading this, Apologies if this sounds like a total newbie question, perhaps I must read other material first before coming to this KEP and if so please point me the right way:

provide an API that will provide network status, availability, how to attach a pod to a network, detach the pod from the network and update a pods network

network status and availability means the cluster networking status? node networking status? pod networking status?

the second part refers to pod aspects:
how to attach a pod to a network, detach the pod from the network and update a pods network => this is like providing what the CNI spec does but through an API right?

MikeZappa87 marked this conversation as resolved.
Show resolved Hide resolved

## Motivation
MikeZappa87 marked this conversation as resolved.
Show resolved Hide resolved

Kubernetes networking has traditionally been challenging to understand for users
interacting with the Kubernetes API, and there has been considerable flexibility
in how Container Network Interfaces (CNIs) set up networking within clusters.
This has resulted in a scenario where things like pod networking (including pod
to pod networking) is opaque to users, with different implementations taking
markedly different approaches. This fragmentation has spread networking across
MikeZappa87 marked this conversation as resolved.
Show resolved Hide resolved
all layers of the stack which include k8s components like kube-proxy, netpol agents,
container runtime with CNI plugins and low level runtimes like kata and issues
with the API have negatively impacted adoption in sectors such as telecommunications.
MikeZappa87 marked this conversation as resolved.
Show resolved Hide resolved
Our goal is to transform Kubernetes networking by making networks and their components
MikeZappa87 marked this conversation as resolved.
Show resolved Hide resolved
actual resources within the Kubernetes API. This will allow for the development
of shared functionalities and their integration into the API. We anticipate that
MikeZappa87 marked this conversation as resolved.
Show resolved Hide resolved
this new approach will enhance support for areas that are currently struggling,
MikeZappa87 marked this conversation as resolved.
Show resolved Hide resolved
facilitate the development and promotion of common features, and better define
MikeZappa87 marked this conversation as resolved.
Show resolved Hide resolved
and accommodate advanced functionalities and potential areas for expansion.
MikeZappa87 marked this conversation as resolved.
Show resolved Hide resolved

### Goals

1. Design a cool looking t-shirt
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least no one has disagreed with designing a cool looking tshirt

2. Provide Kubernetes APIs for the creation, configuration and management of networks (e.g. `Pod` networks)
3. Provide documentation, examples, troubleshooting and FAQ's for KNI.
4. Establish feature parity with current CNI [ADD, DEL]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as an example of what is wrong with this KEP, you say here that you're going to provide feature parity with CNI, but you never said that this is a replacement for CNI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say this is a replacement for the CNI? In the POC I leverage the CNI as a means to prove out a migration strategy. It becomes an implementation detail. Should we rephrase this as we need a means to attach and detach a network without referring to the existing CNI ADD/DEL?

Copy link
Contributor

@danwinship danwinship Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should say what it is. If it's not a replacement for CNI then I'm already misunderstanding it!

5. Handle support levels like Gateway API (e.g. "core" and "extended")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support levels of what?

6. Handle implementation-specific use cases through extension points
7. Decouple the Pod and Node Network setup
8. Simplify/enable triggering garbage collection to ensure no resources are left behind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

garbage collection of what?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include specific examples of ebpf programs, linux network artifacts such as linux bridges?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point was more that you are giving goals like "we will garbage collect things" without having mentioned the existence of anything that will need to be garbage-collected.

9. Provide the ability to identify the IP address family without parsing the value (such as a field)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IP address family of what?

10. Provide as much backwards-compatibility with CNI as is feasible
11. Guarantee the network is setup and in a healthy state before containers are started (ephemeral, init, regular)
12. If feasible, provide API awareness of Pod network namespaces (e.g. interface names)
13. Provide support for Kata and other virtualized runtimes
14. Provide a reference implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reference implementation of what?


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add goal of having the pod object available at network runtime

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dougbtv I am drafting an update. So I might be able to get this. Do you have specific items you want off the Pod spec? Metadata (name, namespace, labels, annotations, ... )

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata nails it, thanks. At least I'm most interested in getting all you listed. Potentially someone might want more?

Copy link

@BlaineEXE BlaineEXE Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to request CIDRs as an available piece of metadata if possible. That would be great for legacy applications (e.g., Ceph) that use CIDR configurations as config values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE this sounds reasonable. I notice you are in Colorado. I am in the boulder area. Does the application need the pod cidr? You might be able to infer this via the Pod IP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can offer up our use case (attaching pods to raw devices, potentially with macvlan) as something that seems worthwhile for KNI to consider.

that is exactly what we need, these experiences, and this one is something I've identified in multiple places, attach netdevices to pods, so I feel this is a strong use case ... what I also see is that these interfaces are used as "external' networks that are only relevant to the app running on the specific pod, so I don't feel that these IPs from these interfaces should be represented on the kubernetes topology ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE I'm still trying to fully understand your use case, based on your comments it seems you need to have some prior work of setting up the infrastructure and the vlans,
who configures these vlans?
are these vlans configured on all these hosts?
...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly someone must configure the additional hardware. In practice, an admin must add a separate switch (or create a VLAN on an existing switch) that connects to a different interface on the host systems. So if eth0 underpins the k8s pod network, eth1 may be the result of the additional interface, unused by k8s itself.

Our current deployment strategy leverages Multus and NetworkAttachmentDefinitions to connect storage (Ceph) pods to eth1. We recommend CNI=macvlan (lower latency than bridge) with IPAM=whereabouts (ease of use), but users can practically use whatever CNI/IPAM they like.

This does work, but because Multus is such a complex feature to understand, users often seem lost trying to configure an already-complex storage system with NADs. In addition, there is developer complexity, and there are friction points -- like not being able to get a Service with a static IP on a Multus network.

Copy link
Member

@aojea aojea Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like not being able to get a Service with a static IP on a Multus network.

what do you mean by a multus network? some entity that is connected to the additional interface?

i.e , if eth1 is connected to an external vlan, some compute or host on that network?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BlaineEXE @dougbtv I want to understand well this use case, I don't quite get what are the expectations here and what is the problem we are trying to solve., definitively kubernetes can not manage the infra, connecting switches or create vlanes, there are other projects that cover that area ...
but this part of Service with staticIP on Multus network is the one I want to understand, the docs are not clear https://github.com/rook/rook/blob/master/design/ceph/multus-network.md, does it mean to make services available out of the cluster?

### Non-Goals

1. Any changes to the kube-scheduler
2. Any specific implementation other than the reference implementation. However we should ensure the KNI-API is flexible enough to support

## Proposal

The proposal of this KEP is to design and implement the KNI-API and make necessary changes to the CRI-API and container runtimes. The scope should be kept to a minimum and we should target feature parity.

### User Stories

We are constantly adding these user stories, please join the community sync to discuss.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are constantly adding these user stories [...]

Where?


#### Story 1

As a cluster operator, I need the ability to determine my network(s) is ready so that my pods come up with a working network.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has to have more details about what mean "network is ready", there is no such a thing as a "global network state", the whole point of IP networks is to be distributed,

"Network is ready" today means I can provide a netdevice to the Pod (veth) and assign an IP addresses

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently 'network is ready' is a CNI network configuration in /etc/cni/net.d. At a minimum, we should clearly define what 'network is ready' means. I can ask several people about this and get various answers depending on their environment. We should allow the user to implement this assuming it meets the criteria of K8s.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view is that "network ready" today is expected to mean "I can create a Pod that will be able to communicate with all the other Pods in the cluster", and this is a paradox https://github.com/kubernetes/enhancements/pull/4477/files#r1489293806, however, it is implemented today as "there is a cni config file that we are expecting to do what is right when we create a pod", and this use to mean "I can create a Pod and wil be able to communicate within the node"

@danwinship @thockin and @squeed on these philosophical questions :)


#### Story 2

As a cluster operator, I need the ability to determine what networks are available on my node so that upstream components can ensure the pod is scheduled on the appropriate node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this user story, this is an scheduling problem that is already solved today https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been in debate about whether or not to focus more on interfaces and not networks, taking a more CNI approach. However, if one was to implement multiple networks most likely they would do it with different interfaces. Thanks for pointing out the topology spread constraints. In the end we don't want to be involved with scheduling however we want to provide the API to give other efforts what is currently on this node.


#### Story 3

As a Kubernetes developer, I need the ability to have extension points for pod network setup, teardown and update so that I can support future Kubernetes networking features with either reducing the changes to core kubernetes or eliminating them
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm reading this as not just setup and teardown, but, within a pod's lifecycle, maybe we can expand this to say that it's not just "update" but during it's lifecycle, can that be more explicit about taking actions while the pod is running?

I need the ability to have extension points for pod network setup, teardown and updates while pods are running

I think of this as a current limitation at having execution points on pod creation or deletion.

Also this could be split into its own thing, because there's a kinda double thing here with both extension points and reducing or eliminating changes to core.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" kubernetes developer that wants to do network things " is not an user story, you need to define the user story that require those advanced networking features , and then we figure out what is thebest solution

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a use-case for consideration here that I can break down into 2 parts:

Part 1: Currently the Rook project uses Multus to allow storage pods to attach to host network devices without exposing pods on the host network namespace. The goal is to be able to keep as much network isolation for security as possible while also being able to give the Ceph storage platform access to host-local network speeds. It's not clear to me whether the high-level use case I am describing -- attaching pods to specific host devices -- has a place in KNI. I certainly hope so. (As a note here, Rook uses prefers to use macvlan to get subdevices of a host device rather than exposing the host device itself, but I can also imagine systems that want to just expose the host device itself.)

Part 2: Assuming part 1 fits into KNI's purpose, the next part relates to IP assignments. A big hangup the Rook project has with Multus is that we can't easily get static IP assignments on Multus networks. For the Pod network, it is easy to get a static IP via k8s Service, but we don't have the same ease when doing this for dedicated NICs (*). I would propose that KNI consider whether it can create API flexibility to allow Service functionality on any network, including host-device-attached networks.

(*): We are aware that MultusService exists, but it isn't at a stable enough place for us to requiring it as an add-on for users. Additionally, it requires Rook to

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeZappa87 this comment is proobing my point on the user story I added as example here #4477 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to rethink this user story.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Pod network, it is easy to get a static IP via k8s Service, but we don't have the same ease when doing this for dedicated NICs (*). I would propose that KNI consider whether it can create API flexibility to allow Service functionality on any network, including host-device-attached networks.

if the pod is connected through a external nic to a external network, is the external network responsibility to assign IPs, are we trying to say that kubernetes should manage these external networks ipam? what stops an admin on that network to assign a static ip?


#### Story 4

As a tool which manages eBPF programs on a Kubernetes cluster (bpfman,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the personal to an actually user?

inspektorgadget), I would like to be able to see the network interfaces of a
`Pod` via the Kubernetes API so that I can attach TC/XDP network programs to
those interfaces based on knowing the Pod name.

### Notes/Constraints/Caveats

Additional Information/Diagrams: https://docs.google.com/document/d/1Gz7iNtJNMI-zKJhaOcI3aflPCx3etJ01JMxzbtvruKk/edit?usp=sharing

Changes to the pod specification will require hard evidence.

The specifics of "Network Readiness" is an implementation detail. We need to provide this RPC to the user.

We should consider the trade offs to using a Native K8s Network object or CRD's.
Using a native object would allow passing a slice of network type to AttachNetwork

Since the network runtime can be run separated from the container runtime, you can package everything into a pod and not need to have binaries on disk. This allows the CNI plugins to be isolated in the pod and the pod will never need to mount /opt/cni/bin or /etc/cni/net.d. This offers a potentially more ability to control execution. Keep in mind CNI is the implementation however when this is used chaining is still available.
48 changes: 48 additions & 0 deletions keps/sig-network/4410-k8s-network-interface/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
title: k8s-network-interface
kep-number: 4410
authors:
- "@mikezappa87"
- "@shaneutt"
owning-sig: sig-network
participating-sigs:
- sig-network
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely at least SIG Node should be participating (there's no way this doesn't affect kubelet, CRI)..?

I would also tag Cluster Lifecycle at least as FYI / advisory since cluster lifecycle folks will know about and have suggestions re: node readiness and cluster configuration.

status: provisional
creation-date: 2024-01-11
reviewers:
- @aojea
- @danwinship
- @thockin
approvers:

see-also:
- "/keps/sig-aaa/1234-we-heard-you-like-keps"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete these, or update with relevant keps?

same below for replaces

- "/keps/sig-bbb/2345-everyone-gets-a-kep"
replaces:
- "/keps/sig-ccc/3456-replaced-kep"

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.30"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.31"
beta: "v1.32"
stable: "v1.33"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unlikely, without even defining an API yet?


# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: kni
components:
- kubelet
- cri-api
disable-supported: true

# The following PRR answers are required at beta release
metrics:
- my_feature_metric