Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topology: Added support for VPP running in docker container #1857

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

fgschwan
Copy link

This pull request extends docker topology by information from VPP running inside docker container:

  1. adding/removing VPP node into docker container namespace (if running VPP is detected inside docker container)
  2. adding/removing edge between VPP and veth interface in docker container (if it is usable/listed in interface list in VPP)
  3. adding/removing memif interface node in docker container (if created by VPP)
  4. adding/removing edge between memif intefaces (only one edge is supported by memif socket file)

Motivation for given topology extensions is this topology case:
host/node -> veth tunnel -> VPP (first docker container) -> memif tunnel -> VPP (second docker container) -> memif tunnel -> VPP (third docker container) -> veth tunnel -> host/node

Relation to existing VPP probe:
The existing VPP probe is communicating with VPP installed on host/node. My PR is detecting VPP inside docker containers. The existing VPP probe is using govpp, but i use for simplicity VPP CLI due to additional docker client layer.

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

21 similar comments
@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-bot
Copy link
Member

Can one of the admin review this patch ?

@skydive-project skydive-project deleted a comment from skydive-bot Jun 11, 2019
@skydive-project skydive-project deleted a comment from skydive-bot Jun 11, 2019
@skydive-project skydive-project deleted a comment from skydive-bot Jun 11, 2019
@skydive-project skydive-project deleted a comment from skydive-bot Jun 11, 2019
@lebauce
Copy link
Member

lebauce commented Jun 11, 2019

run tests

@lebauce
Copy link
Member

lebauce commented Jun 11, 2019

run skydive-functional-tests-backend-orientdb

@fgschwan
Copy link
Author

Code compilation is working for me (using "make") , "make fmt" is also success (i'm using VM with ubuntu 18.04, go 1.11) and elastic search failure is some test timeout problem (not sure about the cause).
So what can i do to resolve/help to resolve the failing tests?

@lebauce
Copy link
Member

lebauce commented Jun 11, 2019

@fgschwan First, thanks a lot for this pull request. Our CI is using Go 1.10.3 - which has different fmt policies than 1.11 that could explain the issues with skydive-go-fmt.

skydive-compile-tests fails on Windows compilation because of :

agent/probes.go:92:39: too many arguments in call to docker.NewProbe
	have (*netns.Probe, string, string, []subprobes.Subprobe)
	want (*netns.Probe, string, string)

@lebauce
Copy link
Member

lebauce commented Jun 24, 2019

@fgschwan Indeed. I just updated our Jenkins with your change.

Sorry for the delay, we were a bit busy those 2 last weeks. We will give a proper review of your PR this week for sure.

@lebauce
Copy link
Member

lebauce commented Jun 24, 2019

run skydive-functional-tests-backend-elasticsearch

@fgschwan
Copy link
Author

fgschwan commented Jul 1, 2019

run skydive-functional-tests-backend-orientdb

@fgschwan
Copy link
Author

fgschwan commented Jul 1, 2019

@lebauce thank you, tests are passing in skydive-functional-tests-backend-orientdb test suite (see log for tests TestRunningVPPInDocker, TestDockerVPPConnectingToVeth, TestTwoVPPsConnectedUsingMemifTunnel)

@lebauce
Copy link
Member

lebauce commented Jul 11, 2019

@fgschwan I took a pretty long to review your PR, sorry about that.

I actually spent quite a long time trying things :-)

My concern was that we would kinda have 2 different VPP probes : one that uses govpp, the other one using the CLI.
I therefore tried to run the existing VPP probe - that uses govpp - inside the container. The problem is that govpp uses /dev/shm to talk with VPP so we have to enter the PID and most importantly the mount namespace. I created a simple mode for Skydive (in addition to agent and analyzer) that is basically a very stripped down version of the agent but that runs only a single probe. Before starting the probe, it simply jumps to the right namespaces (so that we can use /dev/shm). I faced many issues (regarding the fact that the VPP libs don't support changing namespaces). Then I found that new versions of govpp allow talking to VPP using dead simple UNIX sockets. With this method, my trick worked nicely and it brings also a nice thing : the Skydive executable do not rely on VPP libs (libvppinfra.so, libvppapiclient.so, .....), so we don't care about the version of VPP that is installed on the host or inside the container.

I also tried to simplify the code that handles the links between the VPP node and the interfaces, and the links between the memifs (using Linker and MetadataIndexerLinker https://github.com/skydive-project/skydive/blob/master/graffiti/graph/linker.go and https://github.com/skydive-project/skydive/blob/master/graffiti/graph/indexer.go).

But I don't see the point in delaying the merging of your PR, I'll push a PR with the refactoring later and add you as a reviewer, if it's ok for you.

That being said, I have one remark with the PR. It's a bit odd to have a VPP folder in the docker probe, with a docker.go inside this VPP folder. So I suggest something :
In the NewProbe of the existing vpp probe, we can register a graph listener using g.AddEventListener(listener) so that we are notified when a container has been added. To do so, we can react on new edge in OnEdgeAdded. If the parent is a network namespace (Type: "netns") and the children a container ("Type": "container"), then we start the subprobe for this particular namespace.

With this change, all the related VPP code would be in topology/probes/vpp and not in topology/probes/docker. For the VPP probe to be able to use the Docker probe (for the docker exec thing), the docker probe could be passed to vpp.NewProbe (like how the Docker probe is passed the Netlink probe in https://github.com/skydive-project/skydive/blob/master/topology/probes/docker/docker.go#L289) and a public Exec function could be added to the Docker probe. This way, we don't have a instantiate a new Docker client and could use the existing one.

@fgschwan
Copy link
Author

@lebauce Thank you for your review.

the Skydive executable do not rely on VPP libs (libvppinfra.so, libvppapiclient.so, .....), so we don't care about the version of VPP that is installed on the host or inside the container.

I don't think that we can ignore VPP version, because govpp is using binary API of VPP and that can change from version to version. You probably can compile it, but actual communication with VPP for those binary API that changes may fail.

But I don't see the point in delaying the merging of your PR, I'll push a PR with the refactoring later and add you as a reviewer, if it's ok for you.

Yes, please add me as reviewer to your refactoring.

It's a bit odd to have a VPP folder in the docker probe, with a docker.go inside this VPP folder

The docker.go was meant as library for all subprobes, but there was only one subprobe (the VPP subprobe). So i left it inside the VPP subprobe. The only other place that i can put this file without getting cyclic dependency is topology/probes/docker and that seems odd to because it is for subprobes only.

So I suggest something :
In the NewProbe of the existing vpp probe, we can register a graph listener using g.AddEventListener(listener) so that we are notified when a container has been added. To do so, we can react on new edge in OnEdgeAdded. If the parent is a network namespace (Type: "netns") and the children a container ("Type": "container"), then we start the subprobe for this particular namespace.

This seems like a nice idea. Maybe easily breakable if someone changes docker probe code in some way and not knowing dependency in vpp probe, but that could be manageable by using tests. Also i checked the subprobe API if this is possible. All thing except https://github.com/skydive-project/skydive/pull/1857/files#diff-01d2cbebb5fe6a69afd9cd490e4791d6R39 are graph related and this one thing that is not is used to grab mounted volumes from docker to see whether memif interfaces are referring to the same physical socket file. This can be managed i.e. by docker probe exposing volume mount information in metadata (don't know if it does right know or not).

This way, we don't have a instantiate a new Docker client and could use the existing one.

I thought about using docker client provided from docker probe (to vpp subprobe), but there was no goroutine-safety (thread-safety) guarantee about the docker client and i was running docker "exec" concurrently in multiple goroutines due to performance. Ideally, the move to use govpp would probably solve performance issues that i tracked down to docker client (too many docker client calls for many docker containers). I would also allow us to get rid of docker client. I already discussed govpp usage in the end of #1857 (comment) . Using govpp could also unify vpp probe and docker vpp-subprobe code to use the same code for grabbing data from VPP. The only problem is that different version of VPP have different binary API (most thing doesn't change between version but some do)

@fgschwan
Copy link
Author

the Skydive executable do not rely on VPP libs (libvppinfra.so, libvppapiclient.so, .....), so we don't care about the version of VPP that is installed on the host or inside the container.

I don't think that we can ignore VPP version, because govpp is using binary API of VPP and that can change from version to version. You probably can compile it, but actual communication with VPP for those binary API that changes may fail.

I checked with govpp people what is the current state and future plans for handling differences in VPP versions. There are actually 2 binary API, one for handling configuration and on for statistics (2 different unix socket files). The important one (configuration binary api = get/set config values) currently needs to have generated structs from binary API json files from VPP. The statistic API needed in past generated structs as the configuration API, but now it works differently. It is more generic and without generated structs. To make binary api call you fill string name of method you want to call and values as parameters. Then govpp finds it in binary API definition files of VPP (json files) and tries to make the proper format for call to VPP. Of course if you try to call unsupported/dropped/modified API part of VPP, it will fail. But it will fail dynamically in runtime and not in compilation time. This gives more power to user of govpp, because with generation of struct any change in API invalidated usage of govpp for running VPP (version mismatch). With this approach, if API mismatch happens in part of API that is not used at all, everything works. As i mentioned earlier, this is approach is currently only in statistic binary API and some day it will be available for configuration binary API. So in the end, different VPP versions will still need to be handled, but it won't be that difficult as to have generated structs for all supported VPP versions and properly switch between them.

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

22 similar comments
@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

@skydive-bot
Copy link
Member

Can one of the admins verify this patch ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants