Linux offers a rich set of debugging tools that allow the kernel to be instrumented, enabling a better understanding of its internals.
Today, we’d like to show you a way to understand packet drops without the need for any additional tools to be installed.
Suppose you need to understand why your machine is discarding packets, the first tools you might think about are tools like
However, what if none of these tools are installed, and you cannot install anything extra?
What if all we have is a shell?
In that case, it’s time to lol, living off the land.
Let’s consider a scenario where we need to understand where and at which point in the Linux network stack packets are being dropped. This is a task that senior system administrators sometimes have to tackle, and it’s also essential for pentesters when assessing the effectiveness of crafted packets.
Before we delve into the details, let’s clarify a few key aspects of packets within Linux’s network stack. In the kernel, a packet is referred to as a “socket buffer”3, commonly abbreviated as skbuff or just skb. A skb is a versatile data structure capable of holding various types of network packets. skbuffs are passed through the network stack both upwards and downwards. Different components in the stack can choose to pass the skb along, modify it, consume it, or release it.
Technically, both consuming and releasing it mean returning the buffer’s memory to the allocator and destroying the socket buffer data structure. Linux distinguishes between successful deliveries and error conditions where the network stack gives up on a particular packet. In this context, we’re concerned with the latter scenario.
A debugging gem hidden deep within the Linux kernel is the function tracer4, often referred to as “ftrace.”
Normally, it acts as the backbone for high-level tools like
However, even when you have virtually nothing installed, you can still leverage the function tracer through a filesystem-based interface.
This interface is provided by
tracefs, which is typically mounted at
This assumes that the function trace feature is enabled in the kernel configuration, which is usually the case.
One notable feature of ftrace is that it imposes zero runtime overhead when disabled.
While ftrace certainly merits its own extensive blog series due to its wealth of functionality, for now, let’s keep it minimal and focused.
For the purposes of this blog post, all file paths are considered relative to
ftrace records all events into a trace buffer, and you can access this buffer by reading from the file named
cat trace_pipe command will provide a continuous stream of events if any are present.
However, it’s important to note that if there are no events, the
cat command will block and wait for events to appear.
Based on our exploration of the kernel source code, we’ve already identified a function called
kfree_skb(), which is responsible for destroying a socket buffer (skbuff).
Our expectation is that this function is primarily used for non-consuming destruction, which means it’s applied every time the network stack gives up on a packet, resulting in the packet being discarded.
Now, let’s search for the function using ftrace:
$ grep -e kfree_skb available_filter_functions kfree_skbmem kfree_skb_reason kfree_skb_list_reason __kfree_skb kfree_skb_partial __kfree_skb_defer __dev_kfree_skb_irq __dev_kfree_skb_any rtnl_kfree_skbs __traceiter_kfree_skb trace_kfree_skb_hit net_dm_packet_trace_kfree_skb_hit
Nice! There is a
__kfree_skb, sounds good.
Let’s tell ftrace to trace every invocation of this function and arm the function tracer.
$ echo __kfree_skb > set_ftrace_filter $ echo function > current_tracer $ echo > trace # just in case, clear old contents in the buffer $ cat trace_pipe napi/-8411-1777  b..2. 713195.705113: __kfree_skb <-tcp_ack napi/-8411-1777  b..2. 713195.705120: __kfree_skb <-tcp_rcv_established [.. over and over ...] napi/-8411-1777  b..2. 713229.810263: __kfree_skb <-tcp_ack napi/-8411-1777  b..2. 713229.810271: __kfree_skb <-tcp_rcv_established
Dang! That’s a lot of output.
__kfree_skb() is always called for both consuming and non-consuming releases of an skb.
Let’s disable ftrace again.
$ echo nop > current_tracer $ echo > set_ftrace_filter $ echo > trace
Continuing our exploration of the network code, we’ve discovered the function
This function utilizes
__kfree_skb() but takes a crucial function parameter: the “reason”.
This appears to be precisely what we’re looking for.
The reason is of type
enum skb_drop_reason as defined in
These reasons encompass values such as
Our goal is to trace every invocation of
kfree_skb_reason() when the reason is not equal to
This will help us monitor situations where the network stack abandons a packet for reasons other than consumption.
In our quest to trace and analyze events deeper in the kernel, we encounter a limitation with ftrace’s basic function tracing.
It permits us to filter events based on the function name but doesn’t provide access to function arguments.
Moreover, we only observe function calls without their associated arguments, making a simple
grep -v on the
trace_pipe file insufficient.
To overcome this limitation, we can delve deeper into ftrace’s capabilities, particularly by leveraging custom probes known as “kprobes”6. A kprobe essentially serves as a probe point within the kernel and can be set at various locations, including function entry, function return, or anywhere within a function. By probing at the function entry, we gain access to the function’s arguments, allowing for more detailed analysis and tracing.
Manually installing a new kprobe is a bit cumbersome.
In our case, we want to probe the function
To set up the kprobe, we need to inform ftrace how to access the function’s arguments, which can be machine-specific.
In this example, we’ll use an x86_64 laptop.
On this architecture, the Linux kernel adheres to the System V AMD64 ABI, where the first function argument is stored in the
and the second one is stored in the
Setting up a kprobe requires specifying a name and a format string. We can use the register names in the format string to access the function’s arguments for analysis.
$ echo 'p:my_skb_probe kfree_skb_reason skb=%di:u64 reason=%si:u64' > kprobe_events
After this command a new directory named
my_skb_probe will show up in
Let’s arm it and read from the trace buffer.
$ echo 1 > events/kprobes/my_skb_probe/enable $ echo > trace $ cat trace_pipe ksoftirqd/1-23  ..s.. 714457.274916: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617083415795712 reason=2 <idle>-0  ..s1. 714457.290757: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617083415795200 reason=2 <idle>-0  ..s1. 714457.306989: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617083415797504 reason=2 <idle>-0  ..s1. 714457.323109: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617064304559104 reason=2 <idle>-0  .Ns1. 714457.342760: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617064304564736 reason=2 <idle>-0  .Ns1. 714457.358760: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617064304561664 reason=2 <idle>-0  ..s1. 714457.366230: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=0 reason=2 kworker/u32:33-28725  ..... 714457.366293: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617057572911104 reason=2 <idle>-0  .Ns1. 714457.374797: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617064304564992 reason=2
While we’ve made progress, there are still many events. However, we can now clearly see the reasons, with reason number
To filter the events more effectively using ftrace, we can instruct it to only emit events when the reason is larger than
2. We use “larger” because values
1 correspond to
This approach will allow us to focus on events where the network stack discards packets for reasons other than these two.
$ echo "reason > 2" > events/kprobes/my_skb_probe/filter $ echo > trace $ cat trace_pipe napi/-8412-1778  b.... 714699.253491: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617078699127552 reason=3
Excellent, by filtering for events with a reason larger than
2, we’ve successfully narrowed down the events to a more manageable and meaningful set.
Now we’re able to focus on events that truly matter.
3 corresponds to
SKB_DROP_REASON_NO_SOCKET, which, in your case, is associated with your test program attempting to connect to a closed port using telnet.
While it’s beneficial to see when Linux drops a packet, the events themselves provide limited information, primarily just the reason number. To gain deeper insights into the process and understand the path the skbuff took through the network stack, the next step is to print a stack trace. This will enable us to visualize the call chain and, hopefully, learn more about the skbuff’s journey within the network stack.
Every event in ftrace can be equipped with triggers, and one trigger type involves printing a stack trace.
$ echo 'stacktrace if reason > 2' > events/kprobes/my_skb_probe/trigger $ echo > trace $ cat trace_pipe napi/-8412-1778  b.... 715083.709832: my_skb_probe: (kfree_skb_reason+0x0/0x110) skb=18446617059302153472 reason=3 napi/-8412-1778  b..1. 715083.709868: <stack trace> => kprobe_trace_func => kprobe_dispatcher => kprobe_ftrace_handler => create_port => kfree_skb_reason => tcp_v4_rcv => ip_protocol_deliver_rcu => ip_local_deliver_finish => __netif_receive_skb_core.constprop.0 => __netif_receive_skb_list_core => netif_receive_skb_list_internal => napi_complete_done => ath11k_pcic_ext_grp_napi_poll => __napi_poll => napi_threaded_poll => kthread => ret_from_fork
Now we see that the dropped packet arrived through the ath11k WiFi driver, and the function
tcp_v4_rcv() decided to drop it on the floor!
To disable the custom probe run:
$ echo 0 > events/kprobes/my_skb_probe/enable $ echo '!stacktrace if reason > 2' > events/kprobes/my_skb_probe/trigger $ echo > kprobe_events
Delving into the intricacies of kernel internals and machine-specific function call ABI details can be quite challenging and may not be necessary for everyone. Fortunately, Linux provides a more accessible solution through predefined kprobes known as tracepoints. These tracepoints are strategically placed at crucial locations in the kernel where regular function tracing might not provide sufficient context. In the case of the Linux network stack, the maintainers have thoughtfully provided tracepoints for various skbuff operations. This makes it easier for users to access and gather meaningful information without the need to delve too deeply into the kernel’s internals.
$ ls -1 events/skb consume_skb enable filter kfree_skb skb_copy_datagram_iovec
kfree_skb look promising! In the event directory is also a file called
format which describes the event.
$ cat events/skb/kfree_skb/format name: kfree_skb [.. skipped output ..] field:void * skbaddr; offset:8; size:8; signed:0; field:void * location; offset:16; size:8; signed:0; field:unsigned short protocol; offset:24; size:2; signed:0; field:enum skb_drop_reason reason; offset:28; size:4; signed:0; [.. skipped output ..]
The event can provide us with the drop reason. Let’s enable it and filter for
reason > 2
$ echo 1 > events/skb/kfree_skb/enable $ echo "reason > 2" > events/skb/kfree_skb/filter $ echo > trace $ cat trace_pipe napi/-8411-1777  b.... 715880.566050: kfree_skb: skbaddr=000000002d74f5c5 protocol=2048 location=tcp_v4_rcv+0x7d/0xf20 reason: NO_SOCKET
The reason has even a symbolic name and we immediately see the caller! Having a stack trace is of course also possible.
$ echo 'stacktrace if reason > 2' > events/skb/kfree_skb/trigger $ cat trace_pipe napi/-8411-1777  b.... 715962.431448: kfree_skb: skbaddr=0000000096888eda protocol=2048 location=tcp_v4_rcv+0x7d/0xf20 reason: NO_SOCKET napi/-8411-1777  b..1. 715962.431472: <stack trace> => trace_event_raw_event_kfree_skb => kfree_skb_reason => tcp_v4_rcv => ip_protocol_deliver_rcu => ip_local_deliver_finish => __netif_receive_skb_core.constprop.0 => __netif_receive_skb_list_core => netif_receive_skb_list_internal => napi_complete_done => ath11k_pcic_ext_grp_napi_poll => __napi_poll => napi_threaded_poll => kthread => ret_from_fork
Why didn’t we immediately demonstrate the easy way using tracepoints? Because learning something the hard way helps in understanding the basics. Additionally, not all interesting areas in the kernel have matching tracepoints, so sooner or later, you may need to add your own kprobes.
Even with no tracing-specific tooling installed, Linux allows for advanced debugging using a file-based interface. Using the function trace, ftrace, it’s possible to inspect the kernel’s internals easily. In this example, we used ftrace to understand how and where network packets are dropped.