Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.2.0-dev executes setup.sh, and the following error occurs when compiling bpf. #679

Closed
ShawnLeung87 opened this issue Mar 6, 2024 · 22 comments
Milestone

Comments

@ShawnLeung87
Copy link

ShawnLeung87 commented Mar 6, 2024

In file included from granted.c:29:
In file included from /usr/local/include/rte_mbuf_core.h:22:
/usr/local/include/rte_byteorder.h:30:16: error: invalid output constraint '=Q' in asm
: [x1] "=Q" (x)
^
1 error generated.
make: *** [Makefile:17: granted.bpf] Error 1
clang -O2 -g -target bpf -I../include -Wall -Wextra -Wno-int-to-void-pointer-cast -o granted.bpf -c granted.c
In file included from granted.c:29:
In file included from /usr/local/include/rte_mbuf_core.h:22:
/usr/local/include/rte_byteorder.h:30:16: error: invalid output constraint '=Q' in asm
: [x1] "=Q" (x)
^
1 error generated.

ubuntu 20.04
Kernel uses 5.13.16-051316-generic

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 6, 2024

#672 Has this bug not been fixed yet?
diff --git a/bpf/Makefile b/bpf/Makefile
index d98f52b..c426214 100644
--- a/bpf/Makefile
+++ b/bpf/Makefile
@@ -14,7 +14,7 @@ copy: all
$(INSTALL) -m660 $(TARGETS) $(DESTDIR)

%.bpf: %.c

  •   $(CC) $(CFLAGS) -o $@ -c $^
    
  •   $(CC) $(CFLAGS) -o $@ -D RTE_FORCE_INTRINSICS -c $^
    

PHONY: cscope clean

I modified the bpf makefile, but I still get the same error when recompiling.

@AltraMayor AltraMayor added this to the Version 1.2 milestone Mar 6, 2024
@AltraMayor
Copy link
Owner

I'm in the middle of upgrading Gatekeeper v1.2 to DPDK v23.11. Once I finish this upgrade, the workaround you tried will work.

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 18, 2024

/path/dpdk/include/rte_byteorder.h
I moved the definition judgment code of this macro to the front of this ("static inline uint16_t rte_arch_bswap16(uint16_t _x)"), and it can currently be compiled and passed.

      #ifndef RTE_FORCE_INTRINSICS
      static inline uint16_t rte_arch_bswap16(uint16_t _x)
      {
          uint16_t x = _x;
          asm volatile ("xchgb %b[x1],%h[x2]"
	                : [x1] "=Q" (x)
	                : [x2] "0" (x)
	                );
          return x;
      }

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 20, 2024

I have a problem now. The output packets sent through gatekeeper cannot pass. I have not added any interception policy and all are allowed. Is it because you changed the KIN code? I can capture the data and see that the input data packet can arrive. output return packets are dropped.

Use granted.bpf
ERR Flow (src: 20.20.20.20, dst: 30.30.30.30) at index 0: [state: GK_BPF (3), flow_hash_value: 0x5238060, expire_at: 0xd4d88230d4fa, program_index=1, cookie=0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, 0000000000000000, grantor_ip: 10.0.2.11]

The input data packet passes through the gatekeeper, and the output data packet does not go through the gatekeeper and can reach the Internet. The bpf policy also filters normally. Version 1.2dev is required to solve the output packet problem.

@AltraMayor
Copy link
Owner

Hi @ShawnLeung87,

program_index=1 means that declined.bpf is associated with that flow, so Gatekeeper must drop all those packets. You can find the program indexes in the filelua/gk.lua. Could you check your Lua policy?

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 21, 2024

Hi @ShawnLeung87,

program_index=1 means that declined.bpf is associated with that flow, so Gatekeeper must drop all those packets. You can find the program indexes in the filelua/gk.lua. Could you check your Lua policy?

I've checked everything you mentioned. There is no bpf index problem.
Now my successful test scenario is that the server's input data packets pass through the gatekeeper, and the output data packets do not pass through the gatekeeper, and the server can access the Internet normally.
The failed test scenario is that both input and output packets pass through the gatekeeper. The server is unreachable.

@ShawnLeung87
Copy link
Author

image The display status of the network card is different from that of version 1.1.0, and it is displayed as unknown. Normally it should show up

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 21, 2024

The failed test scenario is that both input and output packets pass through the gatekeeper. Error log that appears

        Main/0 2024-03-21 01:27:19 NOTICE ipv4_flow_add(back, DstIP=10.0.2.1 UDP SrcPort=41120/0xffff DstPort=45232/0xffff): cannot validate IPv4 flow, errno=22 (Invalid argument), rte_flow_error_type=16: Not supported action.
        Main/0 2024-03-21 01:27:19 NOTICE Cannot register IPv4 flow on the back interface; falling back to software filters
        
        GK/6 2024-03-21 02:46:51 DEBUG acl: a packet failed to match any ACL rules, the whole packet is dumped below:
        dump mbuf at 0xcc3c60700, iova=0xcc3c60798, buf_len=2176
          pkt_len=98, ol_flags=0x182, nb_segs=1, port=1, ptype=0x10
          segment at 0xcc3c60700, data=0xcc3c60818, len=98, off=128, refcnt=1
          Dump data at [0xcc3c60818], len=98

Successful test scenario(the server's input data packets pass through the gatekeeper, and the output data packets do not pass through the gatekeeper, and the server can access the Internet normally) without this error log.

Preliminary suspicion is that the ACL policy of the back network card may have rejected the data packet.
rte_flow_action error when registering the back network card. It should be that dpdk has changed and the rte_flow_action type has been changed. The dpdk version I am currently using is 21.05

@AltraMayor
Copy link
Owner

I need to finish my ongoing DPDK port before looking at this issue. I'll report back later.

@AltraMayor
Copy link
Owner

image The display status of the network card is different from that of version 1.1.0, and it is displayed as unknown. Normally it should show up

Gatekeeper v1.2 uses a new kernel module to implement the KNI interfaces, and this unknown state is consistent with other kernel modules that implement virtual interfaces. For example, my work VPN interface shows the same unknown state. We may find small differences, but they should not interfere with the general working of Gatekeeper.

@AltraMayor
Copy link
Owner

Hi @ShawnLeung87,
program_index=1 means that declined.bpf is associated with that flow, so Gatekeeper must drop all those packets. You can find the program indexes in the filelua/gk.lua. Could you check your Lua policy?

I've checked everything you mentioned. There is no bpf index problem. Now my successful test scenario is that the server's input data packets pass through the gatekeeper, and the output data packets do not pass through the gatekeeper, and the server can access the Internet normally. The failed test scenario is that both input and output packets pass through the gatekeeper. The server is unreachable.

If I inferred it correctly, you mean that the packets going from the protected server to the Gatekeeper server is not being forwarded from the back network to the front network. Have you checked if there's a prefix to forward the packet in the routing table of the GK blocks?

@AltraMayor
Copy link
Owner

The failed test scenario is that both input and output packets pass through the gatekeeper. Error log that appears

        Main/0 2024-03-21 01:27:19 NOTICE ipv4_flow_add(back, DstIP=10.0.2.1 UDP SrcPort=41120/0xffff DstPort=45232/0xffff): cannot validate IPv4 flow, errno=22 (Invalid argument), rte_flow_error_type=16: Not supported action.
        Main/0 2024-03-21 01:27:19 NOTICE Cannot register IPv4 flow on the back interface; falling back to software filters
        
        GK/6 2024-03-21 02:46:51 DEBUG acl: a packet failed to match any ACL rules, the whole packet is dumped below:
        dump mbuf at 0xcc3c60700, iova=0xcc3c60798, buf_len=2176
          pkt_len=98, ol_flags=0x182, nb_segs=1, port=1, ptype=0x10
          segment at 0xcc3c60700, data=0xcc3c60818, len=98, off=128, refcnt=1
          Dump data at [0xcc3c60818], len=98

Successful test scenario(the server's input data packets pass through the gatekeeper, and the output data packets do not pass through the gatekeeper, and the server can access the Internet normally) without this error log.

Preliminary suspicion is that the ACL policy of the back network card may have rejected the data packet. rte_flow_action error when registering the back network card. It should be that dpdk has changed and the rte_flow_action type has been changed. The dpdk version I am currently using is 21.05

Weren't there log lines that showed the dropped packet in hexadecimal? That information would've allowed us to see which packet was dropped.

@AltraMayor
Copy link
Owner

Branch v1.2.0-dev is now running with DPDK 23.11. So once you update your local Gatekeeper repository, make sure that you update your local copy of DPDK and compile it again.

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 26, 2024

1、In the dpdk23.11 version of gatekeeper, there are still protected servers that cannot access the Internet through the gatekeeper's back network card.

The log of blocked data packets is as follows:
GK/5 2024-03-26 14:53:02 DEBUG acl: a packet failed to match any ACL rules, the whole packet is dumped below:
dump mbuf at 0x11ffad4280, iova=0x11ffad4318, buf_len=2176
pkt_len=98, ol_flags=0x182, nb_segs=1, port=1, ptype=0x10
segment at 0x11ffad4280, data=0x11ffad4398, len=98, off=128, refcnt=1
Dump data at [0x11ffad4398], len=98
00000000: 90 E2 BA 8E C0 75 64 00 F1 6B 0A 01 08 00 45 00 | .....ud..k....E.
00000010: 00 54 0A 7E 40 00 3F 01 CC C7 1E 1E 1E 1E 14 14 | .T.~@.?.........
00000020: 14 14 08 00 38 97 18 2B 00 06 48 00 02 66 00 00 | ....8..+..H..f..
00000030: 00 00 94 FE 09 00 00 00 00 00 10 11 12 13 14 15 | ................
00000040: 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 | .......... !"#$%
00000050: 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 | &'()*+,-./012345
00000060: 36 37 | 67

BPF data for flow:
BPF_INDEX_GRANTED = 0

GK/7 2024-03-26 14:57:57 ERR Flow (src: 20.20.20.20, dst: 30.30.30.30) at index 0: [state: GK_REQUEST (0), flow_hash_value: 0xc68c807, expire_at: 0x2f4ed04f11f0, last_packet_seen_at: 0x2f497e859842, last_priority: 39, allowance: 7, grantor_ip: 10.0.2.11]

2、This version cannot set flow_ht_size to 250000000.The dpdk 21.05 version of gatekeeper does not have this problem. It should be that the memory allocation of dpdk has not been modified.The error is reported as follows:
EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list
Main/0 2024-03-26 13:56:40 ERR setup_gk_instance(lcore=4): rte_calloc_socket() failed to allocate flow entry table
Main/0 2024-03-26 13:56:40 ERR Failed to setup gk instances for GK block at lcore 4

3、forwarding route prefix,
FIB entry for IP prefix: 0.0.0.0/0 with action FWD_GATEWAY_FRONT_NET (1) Default route prefix missing.I have released the default route to gatekeeper. But query fib. No default route seen
gkctl show_fib.lua
FIB entry for IP prefix: 30.30.30.0/24 with action FWD_GRANTOR (0)
Grantor IP address: 10.0.2.11
Ethernet cache entry: [state: fresh, nexthop ip: 10.0.2.3, d_addr: 64:00:F1:6B:0A:01]

@ShawnLeung87
Copy link
Author

This problem (there are still protected servers that cannot access the Internet through the gatekeeper's back network card.) has been solved by adding fib default route.

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 26, 2024

This version cannot set flow_ht_size to 250000000.The dpdk 21.05 version of gatekeeper does not have this problem. It should be that the memory allocation of dpdk has not been modified.The error is reported as follows:
EAL: eal_memalloc_alloc_seg_bulk(): couldn't find suitable memseg_list
Main/0 2024-03-26 13:56:40 ERR setup_gk_instance(lcore=4): rte_calloc_socket() failed to allocate flow entry table
Main/0 2024-03-26 13:56:40 ERR Failed to setup gk instances for GK block at lcore 4
The problem of not being able to add 250 million flows has not been solved yet
gatekeeper gk.lua is configured with 4 cores by default

@AltraMayor
Copy link
Owner

I recommend setting flow_ht_size to 200000000 while I work on the memory issue, so you can continue with your tests.

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 27, 2024

After modifying these constants, you can set flow_ht_size to 200 million. Before modification, even if flow_ht_size is set to 200 million, the memory will not be successfully initialized.
rte_config.h

           #define RTE_MAX_MEMSEG_PER_LIST (8192 << 1)
           #define RTE_MAX_MEM_MB_PER_LIST (32768 << 1)
           #define RTE_MAX_MEMSEG_PER_TYPE (32768 << 1)
           #define RTE_MAX_MEM_MB_PER_TYPE (65536 << 1)

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 27, 2024

Can this version integrate the BPF parameter extension patch #643? Currently, this patch has been added to the gatekeeper 1.2dev version of my test environment.

@AltraMayor
Copy link
Owner

Extending the cookie of the BPFs is outside our short-term roadmap. I recommend you make a case for why the BPF cookie should be extended. I'm not aware of which kinds of attacks are blocked or which features are enabled by extended BPF cookies. If you can come up with a convincing case to increase the memory requirement, other deployers will want the feature, and proper implementation will eventually happen. The patch in issue #643 was meant to allow you to experiment with an extended cookie.

@ShawnLeung87
Copy link
Author

ShawnLeung87 commented Mar 28, 2024

Extending the cookie of the BPFs is outside our short-term roadmap. I recommend you make a case for why the BPF cookie should be extended. I'm not aware of which kinds of attacks are blocked or which features are enabled by extended BPF cookies. If you can come up with a convincing case to increase the memory requirement, other deployers will want the feature, and proper implementation will eventually happen. The patch in issue #643 was meant to allow you to experiment with an extended cookie.

Now we have written our own BPF for tcp reflection defense, which has better defense effect. If the default number of bpf cookies parameters is used, such BPF cannot be compiled. According to the default cookie parameters, the bpf written can only do simple restriction of data packets. It is impossible to make a targeted judgment on the number of tcp flags.

After verification in our production environment, special DDoS attack types do require too many bpf cookie parameters to execute defense business logic functions. The default number of parameters is no longer suitable for many types of DDOS attacks. We have increased the number of bpf. Compared with your default parameter bpf, the defense success rate is as follows:
In the same tcp reflection scenario
Ours can reach 97%-99%
The default number of parameters for bpf, 40%-50%

@AltraMayor
Copy link
Owner

Please create an issue specifically to discuss the extension of the cookie of the BPFs. In addition to copying the text you already have above, add a fully functional BPF showing off your solution. Add comments to the code where appropriate. If we move forward with the cookie extension, we'll need to have a BPF in the folder bpf/ to showcase the extension. Moreover, explain the parameters of your BPF, list the values you pass to those parameters, and provide detailed examples of the attacks that motivate the BPF. The increased memory requirement is not small, so we need support from other deployers.

I've pushed a commit to branch v1.2.0-dev that increases DPDK's memory allocation. This seems to be the last problem in this issue. If so, we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants