linux

mirror of https://github.com/torvalds/linux.git synced 2026-01-25 15:03:52 +08:00

Author	SHA1	Message	Date
Eric Biggers	4ef77dd584	libbpf: Replace AF_ALG with open coded SHA-256 Reimplement libbpf_sha256() using some basic SHA-256 C code. This eliminates the newly-added dependency on AF_ALG, which is a problematic UAPI that is not supported by all kernels. Make libbpf_sha256() return void, since it can no longer fail. This simplifies some callers. Also drop the unnecessary 'sha_out_sz' parameter. Finally, also fix the typo in "compute_sha_udpate_offsets". Fixes: `c297fe3e9f` ("libbpf: Implement SHA256 internal helper") Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/r/20250928003833.138407-1-ebiggers@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-28 04:25:31 -07:00
Kumar Kartikeya Dwivedi	15cf39221e	selftests/bpf: Add stress test for rqspinlock in NMI Introduce a kernel module that will exercise lock acquisition in the NMI path, and bias toward creating contention such that NMI waiters end up being non-head waiters. Prior to the rqspinlock fix made in the commit `0d80e7f951` ("rqspinlock: Choose trylock fallback for NMI waiters"), it was possible for the queueing path of non-head waiters to get stuck in NMI, which this stress test reproduces fairly easily with just 3 CPUs. Both AA and ABBA flavors are supported, and it will serve as a test case for future fixes that address this corner case. More information about the problem in question is available in the commit cited above. When the fix is reverted, this stress test will lock up the system. To enable this test automatically through the test_progs infrastructure, add a load_module_params API to exercise both AA and ABBA cases when running the test. Note that the test runs for at most 5 seconds, and becomes a noop after that, in order to allow the system to make forward progress. In addition, CPU 0 is always kept untouched by the created threads and NMIs. The test will automatically scale to the number of available online CPUs. Note that at least 3 CPUs are necessary to run this test, hence skip the selftest in case the environment has less than 3 CPUs available. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250927205304.199760-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-28 03:18:40 -07:00
Daniel Borkmann	0e8e60e86c	selftests/bpf: Add test case for different expected_attach_type Add a small test case which adds two programs - one calling the other through a tailcall - and check that BPF rejects them in case of different expected_attach_type values: # ./vmtest.sh -- ./test_progs -t xdp_devmap [...] #641/1 xdp_devmap_attach/DEVMAP with programs in entries:OK #641/2 xdp_devmap_attach/DEVMAP with frags programs in entries:OK #641/3 xdp_devmap_attach/Verifier check of DEVMAP programs:OK #641/4 xdp_devmap_attach/DEVMAP with programs in entries on veth:OK #641 xdp_devmap_attach:OK Summary: 2/4 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250926171201.188490-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-27 06:24:27 -07:00
Daniel Borkmann	4540aed51b	bpf: Enforce expected_attach_type for tailcall compatibility Yinhao et al. recently reported: Our fuzzer tool discovered an uninitialized pointer issue in the bpf_prog_test_run_xdp() function within the Linux kernel's BPF subsystem. This leads to a NULL pointer dereference when a BPF program attempts to deference the txq member of struct xdp_buff object. The test initializes two programs of BPF_PROG_TYPE_XDP: progA acts as the entry point for bpf_prog_test_run_xdp() and its expected_attach_type can neither be of be BPF_XDP_DEVMAP nor BPF_XDP_CPUMAP. progA calls into a slot of a tailcall map it owns. progB's expected_attach_type must be BPF_XDP_DEVMAP to pass xdp_is_valid_access() validation. The program returns struct xdp_md's egress_ifindex, and the latter is only allowed to be accessed under mentioned expected_attach_type. progB is then inserted into the tailcall which progA calls. The underlying issue goes beyond XDP though. Another example are programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR. sock_addr_is_valid_access() as well as sock_addr_func_proto() have different logic depending on the programs' expected_attach_type. Similarly, a program attached to BPF_CGROUP_INET4_GETPEERNAME should not be allowed doing a tailcall into a program which calls bpf_bind() out of BPF which is only enabled for BPF_CGROUP_INET4_CONNECT. In short, specifying expected_attach_type allows to open up additional functionality or restrictions beyond what the basic bpf_prog_type enables. The use of tailcalls must not violate these constraints. Fix it by enforcing expected_attach_type in __bpf_prog_map_compatible(). Note that we only enforce this for tailcall maps, but not for BPF devmaps or cpumaps: There, the programs are invoked through dev_map_bpf_prog_run() and cpu_map_bpf_prog_run() which set up a new environment / context and therefore these situations are not prone to this issue. Fixes: `5e43f899b0` ("bpf: Check attach type at prog load time") Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250926171201.188490-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-27 06:24:27 -07:00
Jiapeng Chong	4b2113413e	bpftool: Remove duplicate string.h header ./tools/bpf/bpftool/sign.c: string.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=25502 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/r/20250926095240.3397539-2-jiapeng.chong@linux.alibaba.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-27 04:23:15 -07:00
Jiapeng Chong	87608c2a77	bpf: Remove duplicate crypto/sha2.h header ./include/linux/bpf.h: crypto/sha2.h is included more than once. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=25501 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Acked-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/r/20250926095240.3397539-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-27 04:23:15 -07:00
D. Wythe	0cc114dc35	libbpf: Fix error when st-prefix_ops and ops from differ btf When a module registers a struct_ops, the struct_ops type and its corresponding map_value type ("bpf_struct_ops_") may reside in different btf objects, here are four possible case: +--------+---------------+-------------+---------------------------------+ \| \|bpf_struct_ops_\| xxx_ops \| \| +--------+---------------+-------------+---------------------------------+ \| case 0 \| btf_vmlinux \| btf_vmlinux \| be used and reg only in vmlinux \| +--------+---------------+-------------+---------------------------------+ \| case 1 \| btf_vmlinux \| mod_btf \| INVALID \| +--------+---------------+-------------+---------------------------------+ \| case 2 \| mod_btf \| btf_vmlinux \| reg in mod but be used both in \| \| \| \| \| vmlinux and mod. \| +--------+---------------+-------------+---------------------------------+ \| case 3 \| mod_btf \| mod_btf \| be used and reg only in mod \| +--------+---------------+-------------+---------------------------------+ Currently we figure out the mod_btf by searching with the struct_ops type, which makes it impossible to figure out the mod_btf when the struct_ops type is in btf_vmlinux while it's corresponding map_value type is in mod_btf (case 2). The fix is to use the corresponding map_value type ("bpf_struct_ops_") as the lookup anchor instead of the struct_ops type to figure out the `btf` and `mod_btf` via find_ksym_btf_id(), and then we can locate the kern_type_id via btf__find_by_name_kind() with the `btf` we just obtained from find_ksym_btf_id(). With this change the lookup obtains the correct btf and mod_btf for case 2, preserves correct behavior for other valid cases, and still fails as expected for the invalid scenario (case 1). Fixes: `590a008882` ("bpf: libbpf: Add STRUCT_OPS support") Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/bpf/20250926071751.108293-1-alibuda@linux.alibaba.com	2025-09-26 13:03:19 -07:00
Amery Hung	991e555eff	selftests/bpf: Test changing packet data from kfunc bpf_xdp_pull_data() is the first kfunc that changes packet data. Make sure the verifier clear all packet pointers after calling packet data changing kfunc. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250926164142.1850176-1-ameryhung@gmail.com	2025-09-26 10:44:51 -07:00
Tao Chen	d43029ff7d	selftests/bpf: Add stacktrace map lookup_and_delete_elem test case Add tests for stacktrace map lookup and delete: 1. use bpf_map_lookup_and_delete_elem to lookup and delete the target stack_id, 2. lookup the deleted stack_id again to double check. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250925175030.1615837-3-chen.dylane@linux.dev	2025-09-25 16:17:30 -07:00
Tao Chen	363b17e273	selftests/bpf: Refactor stacktrace_map case with skeleton The loading method of the stacktrace_map test case looks too outdated, refactor it with skeleton, and we can use global variable feature in the next patch. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250925175030.1615837-2-chen.dylane@linux.dev	2025-09-25 16:17:14 -07:00
Tao Chen	17f0d1f632	bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE The stacktrace map can be easily full, which will lead to failure in obtaining the stack. In addition to increasing the size of the map, another solution is to delete the stack_id after looking it up from the user, so extend the existing bpf_map_lookup_and_delete_elem() functionality to stacktrace map types. Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250925175030.1615837-1-chen.dylane@linux.dev	2025-09-25 16:12:14 -07:00
Mykyta Yatsenko	105eb5dc74	selftests/bpf: Fix flaky bpf_cookie selftest bpf_cookie can fail on perf_event_open(), when it runs after the task_work selftest. The task_work test causes perf to lower sysctl_perf_event_sample_rate, and bpf_cookie uses sample_freq, which is validated against that sysctl. As a result, perf_event_open() rejects the attr if the (now tighter) limit is exceeded. >From perf_event_open(): if (attr.freq) { if (attr.sample_freq > sysctl_perf_event_sample_rate) return -EINVAL; } else { if (attr.sample_period & (1ULL << 63)) return -EINVAL; } Switch bpf_cookie to use sample_period, which is not checked against sysctl_perf_event_sample_rate. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250925215230.265501-1-mykyta.yatsenko5@gmail.com	2025-09-25 15:55:44 -07:00
Amery Hung	1193c46c17	selftests/bpf: Test changing packet data from global functions with a kfunc The verifier should invalidate all packet pointers after a packet data changing kfunc is called. So, similar to commit `3f23ee5590` ("selftests/bpf: test for changing packet data from global functions"), test changing packet data from global functions to make sure packet pointers are indeed invalidated. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250925170013.1752561-2-ameryhung@gmail.com	2025-09-25 14:52:09 -07:00
Amery Hung	bc8712f2b5	bpf: Emit struct bpf_xdp_sock type in vmlinux BTF Similar to other BPF UAPI struct, force emit BTF of struct bpf_xdp_sock so that it is defined in vmlinux.h. In a later patch, a selftest will use vmlinux.h to get the definition of struct bpf_xdp_sock instead of bpf.h. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250925170013.1752561-1-ameryhung@gmail.com	2025-09-25 14:29:46 -07:00
Mykyta Yatsenko	5730dacb3f	selftests/bpf: Task_work selftest cleanup fixes task_work selftest does not properly handle cleanup during failures: * destroy bpf_link * perf event fd is passed to bpf_link, no need to close it if link was created successfully * goto cleanup if fork() failed, close pipe. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250924142954.129519-2-mykyta.yatsenko5@gmail.com	2025-09-25 11:00:01 -07:00
Magnus Karlsson	dd948aa63e	MAINTAINERS: Delete inactive maintainers from AF_XDP Delete Björn Töpel and Jonathan Lemon as maintainer and reviewer, respectively, as they have not been contributing towards AF_XDP for several years. I have spoken to Björn and he is ok with his removal. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20250915120148.2922-1-magnus.karlsson@gmail.com	2025-09-25 13:18:22 +02:00
Andrea Righi	d4680a11e1	bpf: Mark kfuncs as __noclone Some distributions (e.g., CachyOS) support building the kernel with -O3, but doing so may break kfuncs, resulting in their symbols not being properly exported. In fact, with gcc -O3, some kfuncs may be optimized away despite being annotated as noinline. This happens because gcc can still clone the function during IPA optimizations, e.g., by duplicating or inlining it into callers, and then dropping the standalone symbol. This breaks BTF ID resolution since resolve_btfids relies on the presence of a global symbol for each kfunc. Currently, this is not an issue for upstream, because we don't allow building the kernel with -O3, but it may be safer to address it anyway, to prevent potential issues in the future if compilers become more aggressive with optimizations. Therefore, add __noclone to __bpf_kfunc to ensure kfuncs are never cloned and remain distinct, globally visible symbols, regardless of the optimization level. Fixes: `57e7c169cd` ("bpf: Add __bpf_kfunc tag for marking kernel functions as kfuncs") Acked-by: David Vernet <void@manifault.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrea Righi <arighi@nvidia.com> Link: https://lore.kernel.org/r/20250924081426.156934-1-arighi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:48:05 -07:00
Alexei Starovoitov	ceeaa71357	Merge branch 'uprobe-bpf-allow-to-change-app-registers-from-uprobe-registers' Jiri Olsa says: ==================== we recently had several requests for tetragon to be able to change user application function return value or divert its execution through instruction pointer change. This patchset adds support for uprobe program to change app's registers including instruction pointer. v4 changes: - rebased on bpf-next/master, we will handle the future simple conflict with tip/perf/core - changed condition in kprobe_prog_is_valid_access [Andrii] - added acks ==================== Link: https://patch.msgid.link/20250916215301.664963-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:43:35 -07:00
Jiri Olsa	3d237467a4	selftests/bpf: Add kprobe multi write ctx attach test Adding test to check we can't attach kprobe multi program that writes to the context. It's x86_64 specific test. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-7-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Jiri Olsa	1b881ee294	selftests/bpf: Add kprobe write ctx attach test Adding test to check we can't attach standard kprobe program that writes to the context. It's x86_64 specific test. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Jiri Olsa	6a4ea0d1cb	selftests/bpf: Add uprobe context ip register change test Adding test to check we can change the application execution through instruction pointer change through uprobe program. It's x86_64 specific test. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Jiri Olsa	7f8a05c5d3	selftests/bpf: Add uprobe context registers changes test Adding test to check we can change common register values through uprobe program. It's x86_64 specific test. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Jiri Olsa	4363264111	uprobe: Do not emulate/sstep original instruction when ip is changed If uprobe handler changes instruction pointer we still execute single step) or emulate the original instruction and increment the (new) ip with its length. This makes the new instruction pointer bogus and application will likely crash on illegal instruction execution. If user decided to take execution elsewhere, it makes little sense to execute the original instruction, so let's skip it. Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Jiri Olsa	7384893d97	bpf: Allow uprobe program to change context registers Currently uprobe (BPF_PROG_TYPE_KPROBE) program can't write to the context registers data. While this makes sense for kprobe attachments, for uprobe attachment it might make sense to be able to change user space registers to alter application execution. Since uprobe and kprobe programs share the same type (BPF_PROG_TYPE_KPROBE), we can't deny write access to context during the program load. We need to check on it during program attachment to see if it's going to be kprobe or uprobe. Storing the program's write attempt to context and checking on it during the attachment. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20250916215301.664963-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-24 02:25:06 -07:00
Martin KaFai Lau	34f033a6c9	Merge branch 'bpf-next/xdp_pull_data' into 'bpf-next/master' Merge the xdp_pull_data stable branch into the master branch. No conflict. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-09-23 16:23:58 -07:00
Martin KaFai Lau	5000380e32	Merge branch 'add-kfunc-bpf_xdp_pull_data' Amery Hung says: ==================== Add kfunc bpf_xdp_pull_data v7 -> v6 patch 5 (new patch) - Rename variables in bpf_prog_test_run_xdp() patch 6 - Fix bugs (Martin) v6 -> v5 patch 6 - v5 selftest failed on S390 when changing how tailroom occupied by skb_shared_info is calculated. Revert selftest to v4, where we get SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) by running an XDP program Link: https://lore.kernel.org/bpf/20250919230952.3628709-1-ameryhung@gmail.com/ v5 -> v4 patch 1 - Add a new patch clearing pfmemalloc bit in xdp->frags when all frags are freed in bpf_xdp_adjust_tail() (Maciej) patch 2 - Refactor bpf_xdp_shrink_data() (Maciej) patch 3 - Clear pfmemalloc when all frags are freed in bpf_xdp_pull_data() (Maciej) patch 6 - Use BTF to get sizes of skb_shared_info and xdp_frame (Maciej) Link: https://lore.kernel.org/bpf/20250919182100.1925352-1-ameryhung@gmail.com/ v3 -> v4 patch 2 - Improve comments (Jakub) - Drop new_end and len_free to simplify code (Jakub) patch 4 - Instead of adding is_xdp to bpf_test_init, move lower-bound check of user_size to callers (Martin) - Simplify linear data size calculation (Martin) patch 5 - Add static function identifier (Martin) - Free calloc-ed buf (Martin) Link: https://lore.kernel.org/bpf/20250917225513.3388199-1-ameryhung@gmail.com/ v2 -> v3 Separate mlx5 fixes from the patchset patch 2 - Use headroom for pulling data by shifting metadata and data down (Jakub) - Drop the flags argument (Martin) patch 4 - Support empty linear xdp data for BPF_PROG_TEST_RUN Link: https://lore.kernel.org/bpf/20250915224801.2961360-1-ameryhung@gmail.com/ v1 -> v2 Rebase onto bpf-next Try to build on top of the mlx5 patchset that avoids copying payload to linear part by Christoph but got a kernel panic. Will rebase on that patchset if it got merged first, or separate the mlx5 fix from this set. patch 1 - Remove the unnecessary head frag search (Dragos) - Rewind the end frag pointer to simplify the change (Dragos) - Rewind the end frag pointer and recalculate truesize only when the number of frags changed (Dragos) patch 3 - Fix len == zero behavior. To mirror bpf_skb_pull_data() correctly, the kfunc should do nothing (Stanislav) - Fix a pointer wrap around bug (Jakub) - Use memmove() when moving sinfo->frags (Jakub) Link: https://lore.kernel.org/bpf/20250905173352.3759457-1-ameryhung@gmail.com/ ==================== Link: https://patch.msgid.link/20250922233356.3356453-1-ameryhung@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2025-09-23 15:21:41 -07:00
Amery Hung	efec2e55bd	selftests: drv-net: Pull data before parsing headers It is possible for drivers to generate xdp packets with data residing entirely in fragments. To keep parsing headers using direct packet access, call bpf_xdp_pull_data() to pull headers into the linear data area. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250922233356.3356453-9-ameryhung@gmail.com	2025-09-23 15:21:26 -07:00
Amery Hung	323302f54d	selftests/bpf: Test bpf_xdp_pull_data Test bpf_xdp_pull_data() with xdp packets with different layouts. The xdp bpf program first checks if the layout is as expected. Then, it calls bpf_xdp_pull_data(). Finally, it checks the 0xbb marker at offset 1024 using directly packet access. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250922233356.3356453-8-ameryhung@gmail.com	2025-09-23 13:35:12 -07:00
Amery Hung	fe9544ed1a	bpf: Support specifying linear xdp packet data size for BPF_PROG_TEST_RUN To test bpf_xdp_pull_data(), an xdp packet containing fragments as well as free linear data area after xdp->data_end needs to be created. However, bpf_prog_test_run_xdp() always fills the linear area with data_in before creating fragments, leaving no space to pull data. This patch will allow users to specify the linear data size through ctx->data_end. Currently, ctx_in->data_end must match data_size_in and will not be the final ctx->data_end seen by xdp programs. This is because ctx->data_end is populated according to the xdp_buff passed to test_run. The linear data area available in an xdp_buff, max_linear_sz, is alawys filled up before copying data_in into fragments. This patch will allow users to specify the size of data that goes into the linear area. When ctx_in->data_end is different from data_size_in, only ctx_in->data_end bytes of data will be put into the linear area when creating the xdp_buff. While ctx_in->data_end will be allowed to be different from data_size_in, it cannot be larger than the data_size_in as there will be no data to copy from user space. If it is larger than the maximum linear data area size, the layout suggested by the user will not be honored. Data beyond max_linear_sz bytes will still be copied into fragments. Finally, since it is possible for a NIC to produce a xdp_buff with empty linear data area, allow it when calling bpf_test_init() from bpf_prog_test_run_xdp() so that we can test XDP kfuncs with such xdp_buff. This is done by moving lower-bound check to callers as most of them already do except bpf_prog_test_run_skb(). The change also fixes a bug that allows passing an xdp_buff with data < ETH_HLEN. This can happen when ctx is used and metadata is at least ETH_HLEN. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250922233356.3356453-7-ameryhung@gmail.com	2025-09-23 13:35:12 -07:00
Amery Hung	7eb83bff02	bpf: Make variables in bpf_prog_test_run_xdp less confusing Change the variable naming in bpf_prog_test_run_xdp() to make the overall logic less confusing. As different modes were added to the function over the time, some variables got overloaded, making it hard to understand and changing the code becomes error-prone. Replace "size" with "linear_sz" where it refers to the size of metadata and data. If "size" refers to input data size, use test.data_size_in directly. Replace "max_data_sz" with "max_linear_sz" to better reflect the fact that it is the maximum size of metadata and data (i.e., linear_sz). Also, xdp_rxq.frags_size is always PAGE_SIZE, so just set it directly instead of subtracting headroom and tailroom and adding them back. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250922233356.3356453-6-ameryhung@gmail.com	2025-09-23 13:35:12 -07:00
Amery Hung	0e7a733ab3	bpf: Clear packet pointers after changing packet data in kfuncs bpf_xdp_pull_data() may change packet data and therefore packet pointers need to be invalidated. Add bpf_xdp_pull_data() to the special kfunc list instead of introducing a new KF_ flag until there are more kfuncs changing packet data. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20250922233356.3356453-5-ameryhung@gmail.com	2025-09-23 13:35:12 -07:00
Amery Hung	4dce1a0d7c	bpf: Support pulling non-linear xdp data Add kfunc, bpf_xdp_pull_data(), to support pulling data from xdp fragments. Similar to bpf_skb_pull_data(), bpf_xdp_pull_data() makes the first len bytes of data directly readable and writable in bpf programs. If the "len" argument is larger than the linear data size, data in fragments will be copied to the linear data area when there is enough room. Specifically, the kfunc will try to use the tailroom first. When the tailroom is not enough, metadata and data will be shifted down to make room for pulling data. A use case of the kfunc is to decapsulate headers residing in xdp fragments. It is possible for a NIC driver to place headers in xdp fragments. To keep using direct packet access for parsing and decapsulating headers, users can pull headers into the linear data area by calling bpf_xdp_pull_data() and then pop the header with bpf_xdp_adjust_head(). Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250922233356.3356453-4-ameryhung@gmail.com	2025-09-23 13:35:12 -07:00
Amery Hung	dea1526fba	bpf: Allow bpf_xdp_shrink_data to shrink a frag from head and tail Move skb_frag_t adjustment into bpf_xdp_shrink_data() and extend its functionality to be able to shrink an xdp fragment from both head and tail. In a later patch, bpf_xdp_pull_data() will reuse it to shrink an xdp fragment from head. Additionally, in bpf_xdp_frags_shrink_tail(), breaking the loop when bpf_xdp_shrink_data() returns false (i.e., not releasing the current fragment) is not necessary as the loop condition, offset > 0, has the same effect. Remove the else branch to simplify the code. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250922233356.3356453-3-ameryhung@gmail.com	2025-09-23 13:35:12 -07:00
Amery Hung	8f12d1137c	bpf: Clear pfmemalloc flag when freeing all fragments It is possible for bpf_xdp_adjust_tail() to free all fragments. The kfunc currently clears the XDP_FLAGS_HAS_FRAGS bit, but not XDP_FLAGS_FRAGS_PF_MEMALLOC. So far, this has not caused a issue when building sk_buff from xdp_buff since all readers of xdp_buff->flags use the flag only when there are fragments. Clear the XDP_FLAGS_FRAGS_PF_MEMALLOC bit as well to make the flags correct. Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://patch.msgid.link/20250922233356.3356453-2-ameryhung@gmail.com	2025-09-23 13:35:11 -07:00
Alexei Starovoitov	8b52d09a1d	Merge branch 'riscv-bpf-fix-uninitialized-symbol-retval_off' Chenghao Duan says: ==================== riscv: bpf: Fix uninitialized symbol 'retval_off' v2: Adjust the commit log URL for version v1: https://lore.kernel.org/all/20250820062520.846720-1-duanchenghao@kylinos.cn/ ==================== Link: https://patch.msgid.link/20250922062244.822937-1-duanchenghao@kylinos.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:29:04 -07:00
Chenghao Duan	d0bf7cd5df	riscv: bpf: Fix uninitialized symbol 'retval_off' In the __arch_prepare_bpf_trampoline() function, retval_off is only meaningful when save_ret is true, so the current logic is correct. However, in the original logic, retval_off is only initialized under certain conditions; for example, in the fmod_ret logic, the compiler is not aware that the flags of the fmod_ret program (prog) have set BPF_TRAMP_F_CALL_ORIG, which results in an uninitialized symbol compilation warning. So initialize retval_off unconditionally to fix it. Signed-off-by: Chenghao Duan <duanchenghao@kylinos.cn> Reviewed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/r/20250922062244.822937-2-duanchenghao@kylinos.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:29:03 -07:00
Quentin Monnet	0d3bf643b4	bpftool: Add bash completion for program signing options Commit `40863f4d6e` ("bpftool: Add support for signing BPF programs") added new options for "bpftool prog load" and "bpftool gen skeleton". This commit brings the relevant update to the bash completion file. We rework slightly the processing of options to make completion more resilient for options that take an argument. Signed-off-by: Quentin Monnet <qmo@kernel.org> Link: https://lore.kernel.org/r/20250923103802.57695-1-qmo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:28:33 -07:00
Alexei Starovoitov	f0b5c1490a	Merge branch 'bpf-allow-union-argument-in-trampoline-based-programs' Leon Hwang says: ==================== bpf: Allow union argument in trampoline based programs While tracing 'release_pages' with bpfsnoop[0], the verifier reports: The function release_pages arg0 type UNION is unsupported. However, it should be acceptable to trace functions that have 'union' arguments. This patch set enables such support in the verifier by allowing 'union' as a valid argument type. Changes: v3 -> v4: * Address comments from Alexei: * Trim bpftrace output in patch #1 log. * Drop the referenced commit info and the test output in patch #2 log. v2 -> v3: * Address comments from Alexei: * Reuse the existing flag BTF_FMODEL_STRUCT_ARG. * Update the comment of the flag BTF_FMODEL_STRUCT_ARG. v1 -> v2: * Add 16B 'union' argument support in x86_64 trampoline. * Update selftests using bpf_testmod. * Add test case about 16-bytes 'union' argument. * Address comments from Alexei: * Study the patch set about 'struct' argument support. * Update selftests to cover more cases. v1: https://lore.kernel.org/bpf/20250905133226.84675-1-leon.hwang@linux.dev/ Links: [0] https://github.com/bpfsnoop/bpfsnoop ==================== Link: https://patch.msgid.link/20250919044110.23729-1-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:07:47 -07:00
Leon Hwang	1c6686bf7f	selftests/bpf: Add union argument tests using fexit programs Add test coverage for union argument support using fexit programs: * 8B union argument - verify that the verifier accepts it and that fexit programs can trace such functions. * 16B union argument - verify that the verifier accepts it and that fexit programs can access the argument, which is passed using two registers. Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20250919044110.23729-3-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:07:47 -07:00
Leon Hwang	ccb4f5d91e	bpf: Allow union argument in trampoline based programs Currently, functions with 'union' arguments cannot be traced with fentry/fexit: bpftrace -e 'fentry:release_pages { exit(); }' -v The function release_pages arg0 type UNION is unsupported. The type of the 'release_pages' arg0 is defined as: typedef union { struct page pages; struct folio folios; struct encoded_page **encoded_pages; } release_pages_arg __attribute__ ((__transparent_union__)); This patch relaxes the restriction by allowing function arguments of type 'union' to be traced in verifier. Reviewed-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20250919044110.23729-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:07:46 -07:00
Alexei Starovoitov	2383e45f1d	Merge branch 'signed-loads-from-arena' Puranjay Mohan says: ==================== Signed loads from Arena Changelog: v3 -> v4: v3: https://lore.kernel.org/all/20250915162848.54282-1-puranjay@kernel.org/ - Update bpf_jit_supports_insn() in riscv jit to reject signed arena loads (Eduard) - Fix coding style related to braces usage in an if statement in x86 jit (Eduard) v2 -> v3: v2: https://lore.kernel.org/bpf/20250514175415.2045783-1-memxor@gmail.com/ - Fix encoding for the generated instructions in x86 JIT (Eduard) The patch in v2 was generating instructions like: 42 63 44 20 f8 movslq -0x8(%rax,%r12), %eax This doesn't make sense because movslq outputs a 64-bit result, but the destination register here is set to eax (32-bit). The fix it to set the REX.W bit in the opcode, that means changing EMIT2(add_3mod(0x40, ...)) to EMIT2(add_3mod(0x48, ...)) - Add arm64 support - Add selftests signed laods from arena. v1 -> v2: v1: https://lore.kernel.org/bpf/20250509194956.1635207-1-memxor@gmail.com - Use bpf_jit_supports_insn. (Alexei) Currently, signed load instructions into arena memory are unsupported. The compiler is free to generate these, and on GCC-14 we see a corresponding error when it happens. The hurdle in supporting them is deciding which unused opcode to use to mark them for the JIT's own consumption. After much thinking, it appears 0xc0 / BPF_NOSPEC can be combined with load instructions to identify signed arena loads. Use this to recognize and JIT them appropriately, and remove the verifier side limitation on the program if the JIT supports them. ==================== Link: https://patch.msgid.link/20250923110157.18326-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:00:23 -07:00
Puranjay Mohan	f616549124	selftests: bpf: Add tests for signed loads from arena Add tests for loading 8, 16, and 32 bits with sign extension from arena, also verify that exception handling is working correctly and correct assembly is being generated by the x86 and arm64 JITs. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20250923110157.18326-4-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:00:23 -07:00
Puranjay Mohan	eab2a71f3a	bpf, arm64: Add support for signed arena loads Add support for signed loads from arena which are internally converted to loads with mode set BPF_PROBE_MEM32SX by the verifier. The implementation is similar to BPF_PROBE_MEMSX and BPF_MEMSX but for BPF_PROBE_MEM32SX, arena_vm_base is added to the src register to form the address. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20250923110157.18326-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:00:22 -07:00
Kumar Kartikeya Dwivedi	a91ae3c893	bpf, x86: Add support for signed arena loads Currently, signed load instructions into arena memory are unsupported. The compiler is free to generate these, and on GCC-14 we see a corresponding error when it happens. The hurdle in supporting them is deciding which unused opcode to use to mark them for the JIT's own consumption. After much thinking, it appears 0xc0 / BPF_NOSPEC can be combined with load instructions to identify signed arena loads. Use this to recognize and JIT them appropriately, and remove the verifier side limitation on the program if the JIT supports them. Co-developed-by: Puranjay Mohan <puranjay@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20250923110157.18326-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 12:00:22 -07:00
Alexei Starovoitov	348f6117c1	Merge branch 'bpf-introduce-deferred-task-context-execution' Mykyta Yatsenko says: ==================== bpf: Introduce deferred task context execution From: Mykyta Yatsenko <yatsenko@meta.com> This patch introduces a new mechanism for BPF programs to schedule deferred execution in the context of a specific task using the kernel’s task_work infrastructure. The new bpf_task_work interface enables BPF use cases that require sleepable subprogram execution within task context, for example, scheduling sleepable function from the context that does not allow sleepable, such as NMI. Introduced kfuncs bpf_task_work_schedule_signal() and bpf_task_work_schedule_resume() for scheduling BPF callbacks correspond to different modes used by task_work (TWA_SIGNAL or TWA_RESUME). The implementation manages scheduling state via metadata objects (struct bpf_task_work_context). Pointers to bpf_task_work_context are stored in BPF map values. State transitions are handled via an atomic state machine (bpf_task_work_state) to ensure correctness under concurrent usage and deletion, lifetime is guarded by refcounting and RCU Tasks Trace. Kfuncs call task_work_add() indirectly via irq_work to avoid locking in potentially NMI context. Changelog: --- v7 -> v8 v7: https://lore.kernel.org/bpf/20250922232611.614512-1-mykyta.yatsenko5@gmail.com/ * Fix unused variable warning in patch 1 * Decrease stress test time from 2 to 1 second * Went through CI warnings, other than unused variable, there are just 2 new in kernel/bpf/helpers.c related to newly introduced kfuncs, these look expected. v6 -> v7 v6: https://lore.kernel.org/bpf/20250918132615.193388-1-mykyta.yatsenko5@gmail.com/ * Added stress test * Extending refactoring in patch 1 * Changing comment and removing one check for map->usercnt in patch 7 v5 -> v6 v5: https://lore.kernel.org/bpf/20250916233651.258458-1-mykyta.yatsenko5@gmail.com/ * Fixing readability in verifier.c:check_map_field_pointer() * Removing BUG_ON from helpers.c v4 -> v5 v4: https://lore.kernel.org/all/20250915201820.248977-1-mykyta.yatsenko5@gmail.com/ * Fix invalid/null pointer dereference bug, reported by syzbot * Nits in selftests v3 -> v4 v3: https://lore.kernel.org/all/20250905164508.1489482-1-mykyta.yatsenko5@gmail.com/ * Modify async callback return value processing in verifier, to allow non-zero return values. * Change return type of the callback from void to int, as verifier expects scalar value. * Switched to void* for bpf_map API kfunc arguments to avoid casts. * Addressing numerous nits and small improvements. v2 -> v3 v2: https://lore.kernel.org/all/20250815192156.272445-1-mykyta.yatsenko5@gmail.com/ * Introduce ref counting * Add patches with minor verifier and btf.c refactorings to avoid code duplication * Rework initiation of the task work scheduling to handle race with map usercnt dropping to zero ==================== Link: https://patch.msgid.link/20250923112404.668720-1-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 07:34:40 -07:00
Mykyta Yatsenko	c6ae18e0af	selftests/bpf: add bpf task work stress tests Add stress tests for BPF task-work scheduling kfuncs. The tests spawn multiple threads that concurrently schedule task_work callbacks against the same and different map values to exercise the kfuncs under high contention. Verify callbacks are reliably enqueued and executed with no drops. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20250923112404.668720-10-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 07:34:39 -07:00
Mykyta Yatsenko	39fd74dfd5	selftests/bpf: BPF task work scheduling tests Introducing selftests that check BPF task work scheduling mechanism. Validate that verifier does not accepts incorrect calls to bpf_task_work_schedule kfunc. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250923112404.668720-9-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 07:34:39 -07:00
Mykyta Yatsenko	38aa7003e3	bpf: task work scheduling kfuncs Implementation of the new bpf_task_work_schedule kfuncs, that let a BPF program schedule task_work callbacks for a target task: * bpf_task_work_schedule_signal() - schedules with TWA_SIGNAL * bpf_task_work_schedule_resume() - schedules with TWA_RESUME Each map value should embed a struct bpf_task_work, which the kernel side pairs with struct bpf_task_work_kern, containing a pointer to struct bpf_task_work_ctx, that maintains metadata relevant for the concrete callback scheduling. A small state machine and refcounting scheme ensures safe reuse and teardown. State transitions: _______________________________ \| \| v \| [standby] ---> [pending] --> [scheduling] --> [scheduled] ^ \|________________\|_________ \| \| \| v \| [running] \|_______________________________________________________\| All states may transition into FREED state: [pending] [scheduling] [scheduled] [running] [standby] -> [freed] A FREED terminal state coordinates with map-value deletion (bpf_task_work_cancel_and_free()). Scheduling itself is deferred via irq_work to keep the kfunc callable from NMI context. Lifetime is guarded with refcount_t + RCU Tasks Trace. Main components: * struct bpf_task_work_context – Metadata and state management per task work. * enum bpf_task_work_state – A state machine to serialize work scheduling and execution. * bpf_task_work_schedule() – The central helper that initiates scheduling. * bpf_task_work_acquire_ctx() - Attempts to take ownership of the context, pointed by passed struct bpf_task_work, allocates new context if none exists yet. * bpf_task_work_callback() – Invoked when the actual task_work runs. * bpf_task_work_irq() – An intermediate step (runs in softirq context) to enqueue task work. * bpf_task_work_cancel_and_free() – Cleanup for deleted BPF map entries. Flow of successful task work scheduling 1) bpf_task_work_schedule_* is called from BPF code. 2) Transition state from STANDBY to PENDING, mark context as owned by this task work scheduler 3) irq_work_queue() schedules bpf_task_work_irq(). 4) Transition state from PENDING to SCHEDULING (noop if transition successful) 5) bpf_task_work_irq() attempts task_work_add(). If successful, state transitions to SCHEDULED. 6) Task work calls bpf_task_work_callback(), which transition state to RUNNING. 7) BPF callback is executed 8) Context is cleaned up, refcounts released, context state set back to STANDBY. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Reviewed-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250923112404.668720-8-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 07:34:39 -07:00
Mykyta Yatsenko	5e8134f50d	bpf: extract map key pointer calculation Calculation of the BPF map key, given the pointer to a value is duplicated in a couple of places in helpers already, in the next patch another use case is introduced as well. This patch extracts that functionality into a separate function. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250923112404.668720-7-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 07:34:38 -07:00
Mykyta Yatsenko	5c8fd7e2b5	bpf: bpf task work plumbing This patch adds necessary plumbing in verifier, syscall and maps to support handling new kfunc bpf_task_work_schedule and kernel structure bpf_task_work. The idea is similar to how we already handle bpf_wq and bpf_timer. verifier changes validate calls to bpf_task_work_schedule to make sure it is safe and expected invariants hold. btf part is required to detect bpf_task_work structure inside map value and store its offset, which will be used in the next patch to calculate key and value addresses. arraymap and hashtab changes are needed to handle freeing of the bpf_task_work: run code needed to deinitialize it, for example cancel task_work callback if possible. The use of bpf_task_work and proper implementation for kfuncs are introduced in the next patch. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250923112404.668720-6-mykyta.yatsenko5@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2025-09-23 07:34:38 -07:00

1 2 3 4 5 ...

1383316 Commits