ghsa-4grq-496q-c892
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
bpf: fix ktls panic with sockmap
[ 2172.936997] ------------[ cut here ]------------ [ 2172.936999] kernel BUG at lib/iov_iter.c:629! ...... [ 2172.944996] PKRU: 55555554 [ 2172.945155] Call Trace: [ 2172.945299] [ 2172.945428] ? die+0x36/0x90 [ 2172.945601] ? do_trap+0xdd/0x100 [ 2172.945795] ? iov_iter_revert+0x178/0x180 [ 2172.946031] ? iov_iter_revert+0x178/0x180 [ 2172.946267] ? do_error_trap+0x7d/0x110 [ 2172.946499] ? iov_iter_revert+0x178/0x180 [ 2172.946736] ? exc_invalid_op+0x50/0x70 [ 2172.946961] ? iov_iter_revert+0x178/0x180 [ 2172.947197] ? asm_exc_invalid_op+0x1a/0x20 [ 2172.947446] ? iov_iter_revert+0x178/0x180 [ 2172.947683] ? iov_iter_revert+0x5c/0x180 [ 2172.947913] tls_sw_sendmsg_locked.isra.0+0x794/0x840 [ 2172.948206] tls_sw_sendmsg+0x52/0x80 [ 2172.948420] ? inet_sendmsg+0x1f/0x70 [ 2172.948634] __sys_sendto+0x1cd/0x200 [ 2172.948848] ? find_held_lock+0x2b/0x80 [ 2172.949072] ? syscall_trace_enter+0x140/0x270 [ 2172.949330] ? __lock_release.isra.0+0x5e/0x170 [ 2172.949595] ? find_held_lock+0x2b/0x80 [ 2172.949817] ? syscall_trace_enter+0x140/0x270 [ 2172.950211] ? lockdep_hardirqs_on_prepare+0xda/0x190 [ 2172.950632] ? ktime_get_coarse_real_ts64+0xc2/0xd0 [ 2172.951036] __x64_sys_sendto+0x24/0x30 [ 2172.951382] do_syscall_64+0x90/0x170 ......
After calling bpf_exec_tx_verdict(), the size of msg_pl->sg may increase, e.g., when the BPF program executes bpf_msg_push_data().
If the BPF program sets cork_bytes and sg.size is smaller than cork_bytes, it will return -ENOSPC and attempt to roll back to the non-zero copy logic. However, during rollback, msg->msg_iter is reset, but since msg_pl->sg.size has been increased, subsequent executions will exceed the actual size of msg_iter. ''' iov_iter_revert(&msg->msg_iter, msg_pl->sg.size - orig_size); '''
The changes in this commit are based on the following considerations:
-
When cork_bytes is set, rolling back to non-zero copy logic is pointless and can directly go to zero-copy logic.
-
We can not calculate the correct number of bytes to revert msg_iter.
Assume the original data is "abcdefgh" (8 bytes), and after 3 pushes by the BPF program, it becomes 11-byte data: "abc?de?fgh?". Then, we set cork_bytes to 6, which means the first 6 bytes have been processed, and the remaining 5 bytes "?fgh?" will be cached until the length meets the cork_bytes requirement.
However, some data in "?fgh?" is not within 'sg->msg_iter' (but in msg_pl instead), especially the data "?" we pushed.
So it doesn't seem as simple as just reverting through an offset of msg_iter.
- For non-TLS sockets in tcp_bpf_sendmsg, when a "cork" situation occurs, the user-space send() doesn't return an error, and the returned length is the same as the input length parameter, even if some data is cached.
Additionally, I saw that the current non-zero-copy logic for handling corking is written as: ''' line 1177 else if (ret != -EAGAIN) { if (ret == -ENOSPC) ret = 0; goto send_end; '''
So it's ok to just return 'copied' without error when a "cork" situation occurs.
{ "affected": [], "aliases": [ "CVE-2025-38166" ], "database_specific": { "cwe_ids": [], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2025-07-03T09:15:32Z", "severity": null }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nbpf: fix ktls panic with sockmap\n\n[ 2172.936997] ------------[ cut here ]------------\n[ 2172.936999] kernel BUG at lib/iov_iter.c:629!\n......\n[ 2172.944996] PKRU: 55555554\n[ 2172.945155] Call Trace:\n[ 2172.945299] \u003cTASK\u003e\n[ 2172.945428] ? die+0x36/0x90\n[ 2172.945601] ? do_trap+0xdd/0x100\n[ 2172.945795] ? iov_iter_revert+0x178/0x180\n[ 2172.946031] ? iov_iter_revert+0x178/0x180\n[ 2172.946267] ? do_error_trap+0x7d/0x110\n[ 2172.946499] ? iov_iter_revert+0x178/0x180\n[ 2172.946736] ? exc_invalid_op+0x50/0x70\n[ 2172.946961] ? iov_iter_revert+0x178/0x180\n[ 2172.947197] ? asm_exc_invalid_op+0x1a/0x20\n[ 2172.947446] ? iov_iter_revert+0x178/0x180\n[ 2172.947683] ? iov_iter_revert+0x5c/0x180\n[ 2172.947913] tls_sw_sendmsg_locked.isra.0+0x794/0x840\n[ 2172.948206] tls_sw_sendmsg+0x52/0x80\n[ 2172.948420] ? inet_sendmsg+0x1f/0x70\n[ 2172.948634] __sys_sendto+0x1cd/0x200\n[ 2172.948848] ? find_held_lock+0x2b/0x80\n[ 2172.949072] ? syscall_trace_enter+0x140/0x270\n[ 2172.949330] ? __lock_release.isra.0+0x5e/0x170\n[ 2172.949595] ? find_held_lock+0x2b/0x80\n[ 2172.949817] ? syscall_trace_enter+0x140/0x270\n[ 2172.950211] ? lockdep_hardirqs_on_prepare+0xda/0x190\n[ 2172.950632] ? ktime_get_coarse_real_ts64+0xc2/0xd0\n[ 2172.951036] __x64_sys_sendto+0x24/0x30\n[ 2172.951382] do_syscall_64+0x90/0x170\n......\n\nAfter calling bpf_exec_tx_verdict(), the size of msg_pl-\u003esg may increase,\ne.g., when the BPF program executes bpf_msg_push_data().\n\nIf the BPF program sets cork_bytes and sg.size is smaller than cork_bytes,\nit will return -ENOSPC and attempt to roll back to the non-zero copy\nlogic. However, during rollback, msg-\u003emsg_iter is reset, but since\nmsg_pl-\u003esg.size has been increased, subsequent executions will exceed the\nactual size of msg_iter.\n\u0027\u0027\u0027\niov_iter_revert(\u0026msg-\u003emsg_iter, msg_pl-\u003esg.size - orig_size);\n\u0027\u0027\u0027\n\nThe changes in this commit are based on the following considerations:\n\n1. When cork_bytes is set, rolling back to non-zero copy logic is\npointless and can directly go to zero-copy logic.\n\n2. We can not calculate the correct number of bytes to revert msg_iter.\n\nAssume the original data is \"abcdefgh\" (8 bytes), and after 3 pushes\nby the BPF program, it becomes 11-byte data: \"abc?de?fgh?\".\nThen, we set cork_bytes to 6, which means the first 6 bytes have been\nprocessed, and the remaining 5 bytes \"?fgh?\" will be cached until the\nlength meets the cork_bytes requirement.\n\nHowever, some data in \"?fgh?\" is not within \u0027sg-\u003emsg_iter\u0027\n(but in msg_pl instead), especially the data \"?\" we pushed.\n\nSo it doesn\u0027t seem as simple as just reverting through an offset of\nmsg_iter.\n\n3. For non-TLS sockets in tcp_bpf_sendmsg, when a \"cork\" situation occurs,\nthe user-space send() doesn\u0027t return an error, and the returned length is\nthe same as the input length parameter, even if some data is cached.\n\nAdditionally, I saw that the current non-zero-copy logic for handling\ncorking is written as:\n\u0027\u0027\u0027\nline 1177\nelse if (ret != -EAGAIN) {\n\tif (ret == -ENOSPC)\n\t\tret = 0;\n\tgoto send_end;\n\u0027\u0027\u0027\n\nSo it\u0027s ok to just return \u0027copied\u0027 without error when a \"cork\" situation\noccurs.", "id": "GHSA-4grq-496q-c892", "modified": "2025-07-03T09:30:35Z", "published": "2025-07-03T09:30:35Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-38166" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/2e36a81d388ec9c3f78b6223f7eda2088cd40adb" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/328cac3f9f8ae394748485e769a527518a9137c8" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/54a3ecaeeeae8176da8badbd7d72af1017032c39" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/57fbbe29e86042bbaa31c1a30d2afa16c427e3f7" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/603943f022a7fe5cc83ca7005faf34798fb7853f" } ], "schema_version": "1.4.0", "severity": [] }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.