This is mostly not implemented, just some code around parsing packets.
The code has comments on what stuff does.
- Run mininet:
sudo mn --topo tree,2,2
- Setup the network namespace for h1:
make setup
- Compile:
make
. NOTE: this requires a checkout of the linux source at../linux
. You can override this withmake LINUX_SOURCE=/path/to/linux
- Load the program:
sudo ./xdp-flowradar h1-eth0 /run/netns/h1
- Monitor the output (from
bpf_debug()
):sudo cat /sys/kernel/debug/tracing/trace_pipe
What this does: make
builds the bpf program and the binary to load it.
The load program (./xdp-flowradar
) loads the bpf program, pins the maps in
/sys/fs/bpf/<mapname>
, switches to the network namespace of the interface we
want to attach to (/run/netns/h1
), and attaches the program to the interface
(h1-eth0
).
- Run mininet:
sudo mn --topo tree,2,2
- Setup the network namespace for h1:
make setup
- Compile:
make
- Load the program:
make load
. This is where you migth get verifier errors - Monitor the output (from
bpf_debug()
):sudo cat /sys/kernel/debug/tracing/trace_pipe
kernel: You will need a modern kernel. I'm developing with 4.18.0-11-generic
apt:
sudo apt install -y build-essential libelf-dev binutils-dev make gcc \
libssl-dev bc libelf-dev libcap-dev clang gcc-multilib llvm libncurses5-dev \
git pkg-config libmnl0 bison flex graphviz mininet
kernel source: You will need the kernel source checked out to ../linux
(or
override it, as mentioned in the testing section).
bpftool is nice:
- Get kernel source
cd <source>/tools/bpf/bpftool/
make
- copy
bpftool
to whereever you put binaries
Writing for the bpf target is not like normal programming. Some things to keep in mind:
- The verifier might yell at you. The error messages are often anoying, but they often mean you need to add explicit bounds checks and make sure variables are initialized. Even if you know the code is correct, you need to "prove it"
- No loops. If you have a "loop" with a constant size, you can add
#pragma unroll
on the line before it. Keep in mind that this will tell the compiler to unroll it, so it may take up a lot of space - No function calls. We have "function calls" by adding
static __always_inline
before every function. Instead of function calls, it just copies the whole function body to the call location
You can get the compiled bytecode by running make dump
. This calls
llvm-objdump
on the object file, and includes the source lines above assembly
lines (as long as you leave -g
in the CFLAGS)
You can list the maps with bpftool map
, and dump the contents of a map with
bpftool map dump id <id>
.
TODO: make a tool to pin maps and read them more easily
test-hash.c
exists to test the hash function. analyze.py
takes the output
of that and produces an histogram.
Usage:
make test-hash
./test-hash saddr-start saddr-end daddr-start daddr-end sport-start sport-end dport-start dport-end proto-start proto-end hostnonce-start hostnonce-end k-start k-end
for example:./test-hash 0x20000000 0x2000ffff 0x10000000 0x10000005 1230 1231 80 81 0 1 0 1 0 1 > ../hashes
python3 -m venv venv; source venv/bin/activate; pip install -r requirements.txt
./analyze.py ../hashes ../hashes.svg
- Figure out a better way to export maps to userspace
- Implement flowradar - how to store the data we need to store?
- Investigate using
bpf_xdp_adjust_head
rather than l3/l4 offset (see https://patchwork.ozlabs.org/patch/702198/ ) - IPv6 support
make py-c-hash
- python:
from cHash import c_hash
- python:
[hex(c_hash(0x1000a,0x100000a,0x1389,0xcb66,0x6,10101,i)) for i in range(6)]
The paper: https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-li-yuliang.pdf
XDP BPF side:
- Bloom filter with m bits
- Each packet (5-tuple) hashed to k (/m) bits in filter
- Count Table: stores 5-tuple (13 bytes), flow count (2 bytes?), packet count (4 bytes?)
- Each packet: for each of k hash results, increments packet count in count table
- If Bloom filter indicates New Flow, store (table 5-tuple)xor(packet 5-tuple) in table 5-tuple and increment flow count for each of k hash results, and set all bits in bloom filter
Bloom filter size: m bits = (m/8) bytes
Count table size: ((13 + 2 + 4)*m) bytes
Hash result = hash(packet, host-nonce, k) % m
host-nonce: per host magic number that's mixed into hash to provide different hashes per host (important for decoding flow counts)
From wikipedia (https://en.wikipedia.org/wiki/Bloom_filter):
Optimal number of hash functions: k=(m/n)ln(2)
m: filter bits
n: number of flows
ln(2) ~= 0.69 ~= 0.5
minimum k: 1
given ^^, minimum m/n ratio: 2/1
Reasonable memory maximum: 16MB
memory required ~= m*20B
(B=bytes)
16777216 = m*20
m = 838912
Bloom filter bits: 104864, uint64_t
s: 13108 :o
max flows: 8192?
k = (838912/8192)*0.69 ~= 71
^^ is that too much? (Note that the current code does compile and load w/ k=71)
Actual reasonable bloom filter size: 2**16 = UINT16_MAX+1
==> ~1.25MB
m = 65536
Bloom filter bits: 65536, uint64_t
s: 1024
max flows: 8192?
k = (65536/8192)*ln(2) ~= 6
^^ Much better, but maybe too few?
Given 10Gb interface, 64byte minimum packet: max pps = 20971520 ~= 21,000,000
Time to process one packet: 1s/20971520 ~= 47ns :o
Modern CPU: 2Ghz = 2,000,000,000 inst/sec => ~95 instructions / packet :o
Reasonable MTU: 1500 bytes ==> pps = 894785 ~= 900,000
Time to process one packet: 1s/894785 ~= 1.1 microseconds
2Ghz CPU => ~2200 instructions / packet
TODO: figure out the mapping from BPF instructions to CPU instructions?