This repository presents complimentary material to the paper "Fast Packet Processing with eBPF and XDP: Concepts, Code, Challenges and Applications" submitted to ACM Computing Surveys (CSUR).
The contents are divided as as follows:
ansible/
: ansible script used to install required dependencies during VM creationexamples/
: examples of eBPF programsheaders/
: header files needed to compile the examplesimages/
: images used in this README
We created a virtual machine to be used in this tutorial. It contains all the code and tools required to complete the tutorial step-by-step.
- Download VirtualBox VM (user: ebpf, password: ebpf)
The virtual machine contains the following items:
- kernel v5.0.0
- iproute2-ss190319
- llvm 6.0.0
- bpftool
The directory /home/ebpf
includes a copy of this repository and also local copies of the following projects:
The following steps have been tested with VirtualBox 5.2.18 on Ubuntu.
After downloading the VM image, unzip the file .rar
. You should then see a file named ebpf-vm.vdi
.
Open the VirtualBox app and then create a new VM by pressing the New
button and picking a name for it:
Next, VirtualBox will allow modifications to the machine specification, such as the amount of RAM (this value can be modified later).
In the next step, VirtualBox will ask for the desired hard disk option. Here you must use an existing disk, which corresponds to the downloaded VM image:
Finally, it is necessary to configure the machine with two network interfaces: one in NAT mode (eth0
- Internet access) and another in HostOnly mode (eth1
- SSH access).
After the VM creation, right-click on the VM name and then select the Setting
option.
In the Network
tab, make sure these two interfaces are created:
Finished! The machine is now ready for the tutorial.
The kernel source code has several sample programs, available in the following directories:
samples/bpf
tools/testing/selftests/bpf
Here we present two examples from samples/bpf
folder. To compile them, run the following commands:
cd ~/net-next/
make headers_install
make samples/bpf/
The examples provided in this repository in the examples/
folder are accompanied by a Makefile. To compile them, run:
cd examples/
make
P.S.: The dependencies required for compilation are already installed on the virtual machine, so we recommend compiling the examples in that environment.
Below are the step-by-step instructions on how to compile and run each of the examples presented in the ACM CSUR paper, as well as some extra ones. In-depth explanations of each example are present in the paper.
File location: ./examples/dropworld.c
This example is one of the simplest programs possible. It just discards all received packets.
To compile it, run:
cd ./examples/
make
Next, the compiled program can be loaded using the ip
tool:
sudo ip -force link set dev eth0 xdp obj dropworld.o sec .text
The .text
argument refers to the ELF section in which the program is located. Check out the paper for more details.
It is possible to check the status of the program by using the following command:
ip link show eth0
Expected output:
ebpf@ebpf-vm:~/bpf-tutorial/examples$ ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 08:00:27:58:07:42 brd ff:ff:ff:ff:ff:ff
prog/xdp id 19
To remove the program, just run:
sudo ip link set dev eth0 xdp off
After the removal, the interface status will be as follows:
ebpf@ebpf-vm:~/bpf-tutorial/examples$ ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 08:00:27:58:07:42 brd ff:ff:ff:ff:ff:ff
Since in this case the eth0
interface is used for Internet access, discarding packets received by this interface will effectively cut off web access.
On another terminal, start a process to ping some domain on the internet:
ping google.com
Then load the dropworld.o
program into the eth0
interface and notice that the ping responses will be interrupted. This interruption will happen because all response messages sent to the eth0
interface will be discarded by the loaded program.
Extra: Modify the dropworld.c
file by changing the return value from XDP_DROP
to XDP_PASS
. Then compile and repeat the loading process. Observe that, in this case, the ping responses will still be received. Thus, this new program is effectively an empty operation, which merely receives and passes packets up to the kernel stack.
File location: ./examples/tcpfilter.c
This example parses packets received on an interface and only accepts the ones with TCP segments. Filtering is done by parsing the IP header protocol field. Only packets with a protocol equal to 6, which corresponds to TCP, are accepted.
Similar to the previous example, compile the program by running:
cd ./examples/
make
Before loading the program, try pinging a domain name and test the access to a web page:
ping google.com
curl http://www.google.com
The ping must be successful and the output of the second command should be a print of the requested page's HTML code. Since ping
uses ICMP packets and HTTP operates over TCP, once we load the program, we should continue to receive responses to curl requests and ping responses should be interrupted.
Load the program using the ip
tool:
sudo ip -force link set dev eth0 xdp obj tcpfilter.o sec .text
Now, try to access the same page again and then try to ping the same domain:
curl http://www.google.com
ping google.com
Because of program tcpfilter.o, packets are discarded as soon as they reach the interface eth0
, preventing access to any service that does not operate over TCP.
Extra: Modify the program in tcpfilter.c
so that it only accepts ICMP packets (used by ping
utility). Also check the program in portfilter.c
, which drops packets based on the application layer protocol used.
File locations: xdp1_kern.c
and xdp1_user.c
in samples/bpf/
in kernel source code (~/net-next/samples/bpf/
in the VM).
This example shows how to use maps in eBPF programs and how to interact with user space. The program in xdp1_kern.c
extracts the layer 4 protocol number (TCP = 6, UDP = 17, ICMP = 1, etc) from each received packet, updates counters for each protocol and then discards the packets.
The counter values are stored in a map named rxcnt
and later consulted by the program xdp1_user.c
, which executes in user space. Through the use of a map, both programs (one in the kernel and another in user space) can exchange information.
To compile the programs, follow the instructions given earlier on how to compile sample programs from the Linux kernel.
Unlike the previous examples, here the eBPF program is loaded into the kernel by the program xdp_user.c
, in user space, without requiring the use of ip
tool.
After the program compilation, the samples/bpf/
directory will contain the executable file xdp1
(generated from xdp_user.c
).
ebpf@ebpf-vm:~/net-next/samples/bpf$ ./xdp1
usage: xdp1 [OPTS] IFACE
OPTS:
-S use skb-mode
-N enforce native mode
To load the program in the eth0
interface, just pass it as a parameter to xdp1
:
./xdp1 eth0
The program will go into an infinite loop, printing the number of packets received per protocol number so far.
ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo ./xdp1 eth0
proto 17: 1 pkt/s
proto 17: 1 pkt/s
proto 17: 1 pkt/s
proto 0: 1 pkt/s
proto 17: 1 pkt/s
In another terminal, make a request using the ping
, curl
, wget
and similars to get packets to pass through the interface.
It is possible to analyze map content using the Bpftool tool, already compiled and installed on the provided VM. To do this, it is first necessary to check the eBPF programs loaded on the system:
sudo bpftool prog show
Expected output:
ebpf@ebpf-vm:~$ sudo bpftool prog show
2: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-04-23T12:24:29-0400 uid 0
xlated 296B jited 229B memlock 4096B map_ids 2,3
3: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-04-23T12:24:29-0400 uid 0
xlated 296B jited 229B memlock 4096B map_ids 2,3
4: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-04-23T12:24:29-0400 uid 0
xlated 296B jited 229B memlock 4096B map_ids 4,5
5: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-04-23T12:24:29-0400 uid 0
xlated 296B jited 229B memlock 4096B map_ids 4,5
6: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-04-23T12:24:35-0400 uid 0
xlated 296B jited 229B memlock 4096B map_ids 6,7
7: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-04-23T12:24:35-0400 uid 0
xlated 296B jited 229B memlock 4096B map_ids 6,7
28: xdp name xdp_prog1 tag 539ec6ce11b52f98 gpl
loaded_at 2019-04-23T14:34:06-0400 uid 0
xlated 488B jited 336B memlock 4096B map_ids 14
The last program listed corresponds to the XDP program loaded by xdp1
. The output also indicates that it has a map with id 14. We can use this value to query the map content:
sudo bpftool map dump id 14
Expected output:
ebpf@ebpf-vm:~$ sudo bpftool map dump id 14
key:
00 00 00 00
value (CPU 00): 4b 00 00 00 00 00 00 00
key:
01 00 00 00
value (CPU 00): 00 00 00 00 00 00 00 00
key:
02 00 00 00
value (CPU 00): 00 00 00 00 00 00 00 00
...
(rest of output omitted)
The map used is of type BPF_MAP_TYPE_PERCPU_ARRAY
. As the name implies, it has one array per CPU used. In the map declaration, the number of elements has been set to 256
, so the output of command bpftool
shows the 256
entries corresponding to CPU 0, the only one on the VM.
Extra: Change the program to let packages pass, rather than being dropped. Also, change the map type to BPF_MAP_TYPE_HASH
and check its content using bpftool
.
File location: ./examples/layercoop.c
This example uses two eBPF programs in different layers (XDP and TC) to collect joint statistics about communication between any pair of IPs that cross the corresponding interface.
As before, to compile the example just run:
cd ./examples/
make
Now, load and attach the program from section rx
to the XDP layer on a chosen interface, say eth0
:
sudo ip link set dev eth0 xdp obj layercoop.o sec rx
The extra flag -force
after ip
might be necessary if another XDP program was already attached to that interface.
Next, we need to load the program responsible for handling the stats collection on TX (ELF section tx
). But before that, we need to create the clsact
qdisc on TC:
sudo tc qdisc add dev eth0 clsact
Now we can load the program on the TC egress
hook, to run it on TX:
sudo tc filter add dev eth0 egress bpf da obj layercoop.o sec tx
From now on, all pair of communicating IPs will have an entry on the map shared by these programs, which can be inspected using bpftool
as explained in the previous example.
Finally, to unload both programs:
sudo ip link set dev eth0 xdp off
sudo tc filter del dev eth0 egress
File location: ./examples/portfilter.c
This example parses packets received on an interface and discards the ones with the HTTP protocol. Discarding is done by parsing the TCP header source and destination port fields. Packets in which one of these values is 80 are discarded.
Similar to the previous example, compile the program by running:
cd ./examples/
make
Before loading the program, test the access to a web page:
curl http://www.google.com
The output of this command should be a print of the requested page's HTML code.
Load the program using the ip
tool:
sudo ip -force link set dev eth0 xdp obj portfilter.o sec filter
Now, try to access the same page again:
curl http://www.google.com
Because of program portfilter.o, packets are discarded as soon as they reach the interface eth0
, preventing access to the web.
Extra: Modify the program in portfilter.c
so that it discards all ICMP packets (used by ping
utility).
File location: linux/samples/bpf/
: files xdp2skb_meta_kern.c
and xdp2skb_meta.sh
This example aims to demonstrate how the XDP and TC layers can interact through the use of metadata associated with a packet. File xdp2skb_meta_kern.c
contains two separate programs, one to be loaded into XDP and one to TC, both on reception. Packets received by XDP receive custom metadata, which is read at the TC layer. Script xdp2skb_meta.sh
is used to load the programs on their respective hooks and configure the system.
To help analyze these programs as well as demonstrate an alternative way to debug eBPF programs, let's modify the xdp2skb_meta_kern.c
file to print log messages after packet processing on each layer.
To do so, we will use the helper function bpf_trace_printk
. For ease of use, we can add the following macro to the file:
// Nicer way to call bpf_trace_printk()
#define bpf_custom_printk(fmt, ...) \
({ \
char ____fmt[] = fmt; \
bpf_trace_printk(____fmt, sizeof(____fmt), \
##__VA_ARGS__); \
})
Through this macro, we can use function bpf_trace_printk
indirectly, but with a syntax similar to function printf
.
Having added the macro, we can now use it to print metadata values on TC and XDP layers.
Add to end of function _xdp_mark():
SEC("xdp_mark")
int _xdp_mark(struct xdp_md *ctx)
{
struct meta_info *meta;
void *data, *data_end;
int ret;
<...> // code omitted
meta->mark = 42;
bpf_custom_printk("[XDP] metadata = %d\n",meta->mark); // <-- Add this line
return XDP_PASS;
}
Add to end of function _tc_mark:
SEC("tc_mark")
int _tc_mark(struct __sk_buff *ctx)
{
void *data = (void *)(unsigned long)ctx->data;
void *data_end = (void *)(unsigned long)ctx->data_end;
void *data_meta = (void *)(unsigned long)ctx->data_meta;
struct meta_info *meta = data_meta;
<...> // code omitted
ctx->mark = meta->mark; /* Transfer XDP-mark to SKB-mark */
bpf_custom_printk("[TC] metadata = %d\n",meta->mark); // <-- Add this line
return TC_ACT_OK;
}
The bpf_trace_printk
function requires programs that use it to be declared using GPL license. Otherwise, the program will be rejected by the verifier during kernel loading. The error message generated by the verifier is as follows:
ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo ./xdp2skb_meta.sh --dev eth0 [16/1675]
Prog section 'tc_mark' rejected: Invalid argument (22)!
- Type: 3
- Instructions: 25 (0 over limit)
- License:
Verifier analysis:
0: (61) r3 = *(u32 *)(r1 +76)
1: (61) r2 = *(u32 *)(r1 +140)
2: (bf) r4 = r2
3: (07) r4 += 4
4: (3d) if r3 >= r4 goto pc+3
R1=ctx(id=0,off=0,imm=0) R2=pkt_meta(id=0,off=0,r=0,imm=0) R3=pkt(id=0,off=0,r=0,imm=0) R4=pkt_meta(id=0,off=4,r=0,imm=0) R10=fp0,call_-1
5: (b7) r2 = 41
6: (63) *(u32 *)(r1 +8) = r2
7: (05) goto pc+15
23: (b7) r0 = 0
24: (95) exit
from 4 to 8: R1=ctx(id=0,off=0,imm=0) R2=pkt_meta(id=0,off=0,r=4,imm=0) R3=pkt(id=0,off=0,r=0,imm=0) R4=pkt_meta(id=0,off=4,r=4,imm=0) R10=fp0,call_-1
8: (61) r3 = *(u32 *)(r2 +0)
9: (63) *(u32 *)(r1 +8) = r3
10: (b7) r1 = 680997
11: (63) *(u32 *)(r10 -16) = r1
12: (18) r1 = 0x203d206f64616461
14: (7b) *(u64 *)(r10 -24) = r1
15: (18) r1 = 0x74656d205d43545b
17: (7b) *(u64 *)(r10 -32) = r1
18: (61) r3 = *(u32 *)(r2 +0)
19: (bf) r1 = r10
20: (07) r1 += -32
21: (b7) r2 = 20
22: (85) call bpf_trace_printk#6
cannot call GPL-restricted function from non-GPL compatible program
Error fetching program/map!
Unable to load program
ERROR: Exec error(1) occurred cmd: "tc filter add dev eth0 ingress prio 1 handle 1 bpf da obj ./xdp2skb_meta_kern.o sec tc_mark"
To overcome this limitation, it is necessary to declare a special global variable in the license
ELF section with this information. This can be done by adding the following line at the end of xdp2skb_meta_kern.c
file.
char _license[] SEC("license") = "GPL";
Finally, recompile the example:
cd ~/net-next
make samples/bpf/
Next, execute the script xdp2skb_meta.sh
to load the programs into the kernel:
ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo ./xdp2skb_meta.sh
Usage: ./xdp2skb_meta.sh [-vfh] --dev ethX
-d | --dev : Network device (required)
--flush : Cleanup flush TC and XDP progs
--list : ($LIST) List TC and XDP progs
-v | --verbose : ($VERBOSE) Verbose
--dry-run : ($DRYRUN) Dry-run only (echo commands)
ERROR: Please specify network device -- required option --dev
Load the programs in interface eth0
:
./xdp2skb_meta.sh --dev eth0
We can also load the programs directly using tools as ip
for the XDP program (sudo ip -force link set dev eth0 xdp obj xdp2skb_meta_kern.o sec xdp_mark
), just as before, and tc
for the TC hook program. In the latter case, it is necessary to create a special qdisc
in the Linux traffic controller, called clsact
. All this process can be done using the following commands:
sudo tc qdisc add dev eth0 clsact
sudo tc filter add dev eth0 ingress bpf da obj xdp2skb_meta_kern.o sec tc_mark
For more information about eBPF on the TC hook, check out the command man tc-bpf
.
Once the programs have been loaded on their respective hooks, we can analyze the log messages generated by each one in the file /sys/kernel/debug/tracing/trace
:
sudo cat /sys/kernel/debug/tracing/trace
For continuous reading, use the file trace_pipe
:
sudo cat /sys/kernel/debug/tracing/trace_pipe
With the eBPF programs loaded in the kernel and some traffic flowing through the interface, we can observe the generated messages:
ebpf@ebpf-vm:~/net-next/samples/bpf$ sudo cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 40/40 #P:1
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [000] ..s. 13699.213984: 0: [XDP] metadata = 42
<idle>-0 [000] ..s. 13699.214009: 0: [TC] metadata = 42
<idle>-0 [000] ..s. 13699.421529: 0: [XDP] metadata = 42
<idle>-0 [000] ..s. 13699.421542: 0: [TC] metadata = 42
<idle>-0 [000] ..s. 13704.450195: 0: [XDP] metadata = 42
<idle>-0 [000] ..s. 13704.450205: 0: [TC] metadata = 42
<idle>-0 [000] ..s. 13704.450216: 0: [XDP] metadata = 42
By looking at the messages, we can see that the metadata added on the XDP hook could be successfully received by the program on the TC hook, effectively sharing information between the two kernel stack layers.