ghsa-979q-6jjc-29jh
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_conntrack: fix crash due to removal of uninitialised entry
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..]
The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state:
ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected.
Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value.
If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry.
Theory is that we did hit following race:
cpu x cpu y cpu z found entry E found entry E E is expired nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value.
cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit.
->refcnt set to 1
E now owned by skb
->timeout set to 30000
If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit.
nf_conntrack_confirm gets called
sets: ct->status |= CONFIRMED
This is wrong: E is not yet added
to hashtable.
cpu y resumes, it observes E as expired but CONFIRMED: nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set.
cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists
Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks:
wait for spinlock held by z
CONFIRMED is set but there is no
guarantee ct will be added to hash:
"chaintoolong" or "clash resolution"
logic both skip the insert step.
reply hnnode.pprev still stores the
hash value.
unlocks spinlock
return NF_DROP
<unblocks, then
crashes on hlist_nulls_del_rcu pprev>
In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs.
Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy.
To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock.
Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this.
It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set:
Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped.
Also change nf_ct_should_gc() to first check the confirmed bit.
The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1.
nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check pas ---truncated---
{ "affected": [], "aliases": [ "CVE-2025-38472" ], "database_specific": { "cwe_ids": [], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2025-07-28T12:15:29Z", "severity": null }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nnetfilter: nf_conntrack: fix crash due to removal of uninitialised entry\n\nA crash in conntrack was reported while trying to unlink the conntrack\nentry from the hash bucket list:\n [exception RIP: __nf_ct_delete_from_lists+172]\n [..]\n #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]\n #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]\n #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]\n [..]\n\nThe nf_conn struct is marked as allocated from slab but appears to be in\na partially initialised state:\n\n ct hlist pointer is garbage; looks like the ct hash value\n (hence crash).\n ct-\u003estatus is equal to IPS_CONFIRMED|IPS_DYING, which is expected\n ct-\u003etimeout is 30000 (=30s), which is unexpected.\n\nEverything else looks like normal udp conntrack entry. If we ignore\nct-\u003estatus and pretend its 0, the entry matches those that are newly\nallocated but not yet inserted into the hash:\n - ct hlist pointers are overloaded and store/cache the raw tuple hash\n - ct-\u003etimeout matches the relative time expected for a new udp flow\n rather than the absolute \u0027jiffies\u0027 value.\n\nIf it were not for the presence of IPS_CONFIRMED,\n__nf_conntrack_find_get() would have skipped the entry.\n\nTheory is that we did hit following race:\n\ncpu x \t\t\tcpu y\t\t\tcpu z\n found entry E\t\tfound entry E\n E is expired\t\t\u003cpreemption\u003e\n nf_ct_delete()\n return E to rcu slab\n\t\t\t\t\tinit_conntrack\n\t\t\t\t\tE is re-inited,\n\t\t\t\t\tct-\u003estatus set to 0\n\t\t\t\t\treply tuplehash hnnode.pprev\n\t\t\t\t\tstores hash value.\n\ncpu y found E right before it was deleted on cpu x.\nE is now re-inited on cpu z. cpu y was preempted before\nchecking for expiry and/or confirm bit.\n\n\t\t\t\t\t-\u003erefcnt set to 1\n\t\t\t\t\tE now owned by skb\n\t\t\t\t\t-\u003etimeout set to 30000\n\nIf cpu y were to resume now, it would observe E as\nexpired but would skip E due to missing CONFIRMED bit.\n\n\t\t\t\t\tnf_conntrack_confirm gets called\n\t\t\t\t\tsets: ct-\u003estatus |= CONFIRMED\n\t\t\t\t\tThis is wrong: E is not yet added\n\t\t\t\t\tto hashtable.\n\ncpu y resumes, it observes E as expired but CONFIRMED:\n\t\t\t\u003cresumes\u003e\n\t\t\tnf_ct_expired()\n\t\t\t -\u003e yes (ct-\u003etimeout is 30s)\n\t\t\tconfirmed bit set.\n\ncpu y will try to delete E from the hashtable:\n\t\t\tnf_ct_delete() -\u003e set DYING bit\n\t\t\t__nf_ct_delete_from_lists\n\nEven this scenario doesn\u0027t guarantee a crash:\ncpu z still holds the table bucket lock(s) so y blocks:\n\n\t\t\twait for spinlock held by z\n\n\t\t\t\t\tCONFIRMED is set but there is no\n\t\t\t\t\tguarantee ct will be added to hash:\n\t\t\t\t\t\"chaintoolong\" or \"clash resolution\"\n\t\t\t\t\tlogic both skip the insert step.\n\t\t\t\t\treply hnnode.pprev still stores the\n\t\t\t\t\thash value.\n\n\t\t\t\t\tunlocks spinlock\n\t\t\t\t\treturn NF_DROP\n\t\t\t\u003cunblocks, then\n\t\t\t crashes on hlist_nulls_del_rcu pprev\u003e\n\nIn case CPU z does insert the entry into the hashtable, cpu y will unlink\nE again right away but no crash occurs.\n\nWithout \u0027cpu y\u0027 race, \u0027garbage\u0027 hlist is of no consequence:\nct refcnt remains at 1, eventually skb will be free\u0027d and E gets\ndestroyed via: nf_conntrack_put -\u003e nf_conntrack_destroy -\u003e nf_ct_destroy.\n\nTo resolve this, move the IPS_CONFIRMED assignment after the table\ninsertion but before the unlock.\n\nPablo points out that the confirm-bit-store could be reordered to happen\nbefore hlist add resp. the timeout fixup, so switch to set_bit and\nbefore_atomic memory barrier to prevent this.\n\nIt doesn\u0027t matter if other CPUs can observe a newly inserted entry right\nbefore the CONFIRMED bit was set:\n\nSuch event cannot be distinguished from above \"E is the old incarnation\"\ncase: the entry will be skipped.\n\nAlso change nf_ct_should_gc() to first check the confirmed bit.\n\nThe gc sequence is:\n 1. Check if entry has expired, if not skip to next entry\n 2. Obtain a reference to the expired entry.\n 3. Call nf_ct_should_gc() to double-check step 1.\n\nnf_ct_should_gc() is thus called only for entries that already failed an\nexpiry check. After this patch, once the confirmed bit check pas\n---truncated---", "id": "GHSA-979q-6jjc-29jh", "modified": "2025-07-28T12:30:35Z", "published": "2025-07-28T12:30:34Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-38472" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/2d72afb340657f03f7261e9243b44457a9228ac7" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/76179961c423cd698080b5e4d5583cf7f4fcdde9" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/938ce0e8422d3793fe30df2ed0e37f6bc0598379" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/a47ef874189d47f934d0809ae738886307c0ea22" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/fc38c249c622ff5e3011b8845fd49dbfd9289afc" } ], "schema_version": "1.4.0", "severity": [] }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.