fkie_cve-2025-38472
Vulnerability from fkie_nvd
Published
2025-07-28 12:15
Modified
2025-07-29 14:14
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved: netfilter: nf_conntrack: fix crash due to removal of uninitialised entry A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check pas ---truncated---
Impacted products
Vendor Product Version



{
  "cveTags": [],
  "descriptions": [
    {
      "lang": "en",
      "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nnetfilter: nf_conntrack: fix crash due to removal of uninitialised entry\n\nA crash in conntrack was reported while trying to unlink the conntrack\nentry from the hash bucket list:\n    [exception RIP: __nf_ct_delete_from_lists+172]\n    [..]\n #7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack]\n #8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack]\n #9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack]\n    [..]\n\nThe nf_conn struct is marked as allocated from slab but appears to be in\na partially initialised state:\n\n ct hlist pointer is garbage; looks like the ct hash value\n (hence crash).\n ct-\u003estatus is equal to IPS_CONFIRMED|IPS_DYING, which is expected\n ct-\u003etimeout is 30000 (=30s), which is unexpected.\n\nEverything else looks like normal udp conntrack entry.  If we ignore\nct-\u003estatus and pretend its 0, the entry matches those that are newly\nallocated but not yet inserted into the hash:\n  - ct hlist pointers are overloaded and store/cache the raw tuple hash\n  - ct-\u003etimeout matches the relative time expected for a new udp flow\n    rather than the absolute \u0027jiffies\u0027 value.\n\nIf it were not for the presence of IPS_CONFIRMED,\n__nf_conntrack_find_get() would have skipped the entry.\n\nTheory is that we did hit following race:\n\ncpu x \t\t\tcpu y\t\t\tcpu z\n found entry E\t\tfound entry E\n E is expired\t\t\u003cpreemption\u003e\n nf_ct_delete()\n return E to rcu slab\n\t\t\t\t\tinit_conntrack\n\t\t\t\t\tE is re-inited,\n\t\t\t\t\tct-\u003estatus set to 0\n\t\t\t\t\treply tuplehash hnnode.pprev\n\t\t\t\t\tstores hash value.\n\ncpu y found E right before it was deleted on cpu x.\nE is now re-inited on cpu z.  cpu y was preempted before\nchecking for expiry and/or confirm bit.\n\n\t\t\t\t\t-\u003erefcnt set to 1\n\t\t\t\t\tE now owned by skb\n\t\t\t\t\t-\u003etimeout set to 30000\n\nIf cpu y were to resume now, it would observe E as\nexpired but would skip E due to missing CONFIRMED bit.\n\n\t\t\t\t\tnf_conntrack_confirm gets called\n\t\t\t\t\tsets: ct-\u003estatus |= CONFIRMED\n\t\t\t\t\tThis is wrong: E is not yet added\n\t\t\t\t\tto hashtable.\n\ncpu y resumes, it observes E as expired but CONFIRMED:\n\t\t\t\u003cresumes\u003e\n\t\t\tnf_ct_expired()\n\t\t\t -\u003e yes (ct-\u003etimeout is 30s)\n\t\t\tconfirmed bit set.\n\ncpu y will try to delete E from the hashtable:\n\t\t\tnf_ct_delete() -\u003e set DYING bit\n\t\t\t__nf_ct_delete_from_lists\n\nEven this scenario doesn\u0027t guarantee a crash:\ncpu z still holds the table bucket lock(s) so y blocks:\n\n\t\t\twait for spinlock held by z\n\n\t\t\t\t\tCONFIRMED is set but there is no\n\t\t\t\t\tguarantee ct will be added to hash:\n\t\t\t\t\t\"chaintoolong\" or \"clash resolution\"\n\t\t\t\t\tlogic both skip the insert step.\n\t\t\t\t\treply hnnode.pprev still stores the\n\t\t\t\t\thash value.\n\n\t\t\t\t\tunlocks spinlock\n\t\t\t\t\treturn NF_DROP\n\t\t\t\u003cunblocks, then\n\t\t\t crashes on hlist_nulls_del_rcu pprev\u003e\n\nIn case CPU z does insert the entry into the hashtable, cpu y will unlink\nE again right away but no crash occurs.\n\nWithout \u0027cpu y\u0027 race, \u0027garbage\u0027 hlist is of no consequence:\nct refcnt remains at 1, eventually skb will be free\u0027d and E gets\ndestroyed via: nf_conntrack_put -\u003e nf_conntrack_destroy -\u003e nf_ct_destroy.\n\nTo resolve this, move the IPS_CONFIRMED assignment after the table\ninsertion but before the unlock.\n\nPablo points out that the confirm-bit-store could be reordered to happen\nbefore hlist add resp. the timeout fixup, so switch to set_bit and\nbefore_atomic memory barrier to prevent this.\n\nIt doesn\u0027t matter if other CPUs can observe a newly inserted entry right\nbefore the CONFIRMED bit was set:\n\nSuch event cannot be distinguished from above \"E is the old incarnation\"\ncase: the entry will be skipped.\n\nAlso change nf_ct_should_gc() to first check the confirmed bit.\n\nThe gc sequence is:\n 1. Check if entry has expired, if not skip to next entry\n 2. Obtain a reference to the expired entry.\n 3. Call nf_ct_should_gc() to double-check step 1.\n\nnf_ct_should_gc() is thus called only for entries that already failed an\nexpiry check. After this patch, once the confirmed bit check pas\n---truncated---"
    },
    {
      "lang": "es",
      "value": "En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: netfilter: nf_conntrack: correcci\u00f3n de fallo debido a la eliminaci\u00f3n de una entrada no inicializada Se inform\u00f3 de un fallo en conntrack al intentar desvincular la entrada de conntrack de la lista de cubos hash: [exception RIP: __nf_ct_delete_from_lists+172] [..] #7 [ff539b5a2b043aa0] nf_ct_delete en ffffffffc124d421 [nf_conntrack] #8 [ff539b5a2b043ad0] nf_ct_gc_expired en ffffffffc124d999 [nf_conntrack] #9 [ff539b5a2b043ae0] __nf_conntrack_find_get en ffffffffc124efbc [nf_conntrack] [..] La estructura nf_conn est\u00e1 marcada como asignada desde slab, pero parece estar parcialmente inicializada: el puntero hlist ct es basura; parece el valor hash ct (de ah\u00ed el fallo). ct-\u0026gt;status es igual a IPS_CONFIRMED|IPS_DYING, que es lo esperado ct-\u0026gt;timeout es 30000 (=30 s), lo cual es inesperado. Todo lo dem\u00e1s parece una entrada conntrack udp normal. Si ignoramos ct-\u0026gt;status y suponemos que es 0, la entrada coincide con las que se acaban de asignar pero que a\u00fan no se han insertado en el hash: - los punteros hlist ct est\u00e1n sobrecargados y almacenan/cachean el hash de la tupla sin procesar - ct-\u0026gt;timeout coincide con el tiempo relativo esperado para un nuevo flujo udp en lugar del valor absoluto de \u0027jiffies\u0027. Si no fuera por la presencia de IPS_CONFIRMED, __nf_conntrack_find_get() habr\u00eda omitido la entrada. La teor\u00eda es que alcanzamos la siguiente ejecuci\u00f3n: cpu x cpu y cpu z encontr\u00f3 la entrada E encontr\u00f3 la entrada EE est\u00e1 vencida  nf_ct_delete() devuelve E a rcu slab init_conntrack E se reinicia, ct-\u0026gt;status establecido en 0 respuesta tuplehash hnnode.pprev almacena el valor hash. cpu y encontr\u00f3 E justo antes de que se eliminara en la cpu x. E ahora se reinicia en la cpu z. La cpu y fue interrumpida antes de verificar la expiraci\u00f3n y/o el bit de confirmaci\u00f3n. -\u0026gt;refcnt establecido en 1 E ahora es propiedad de skb -\u0026gt;timeout establecido en 30000 Si la cpu y se reanudara ahora, observar\u00eda que E ha expirado, pero omitir\u00eda E debido a que falta el bit CONFIRMED. nf_conntrack_confirm se llama establece: ct-\u0026gt;status |= CONFIRMED Esto es incorrecto: E a\u00fan no se agreg\u00f3 a la tabla hash. La CPU y se reanuda, observa que E ha expirado pero CONFIRMADO:  nf_ct_expired() -\u0026gt; s\u00ed (ct-\u0026gt;el tiempo de espera es de 30 s) bit confirmado establecido. La CPU y intentar\u00e1 eliminar E de la tabla hash: nf_ct_delete() -\u0026gt; establecer bit MORIR __nf_ct_delete_from_lists Incluso este escenario no garantiza un fallo: la CPU z a\u00fan mantiene el/los bloqueo(s) del dep\u00f3sito de la tabla, por lo que y bloquea: esperar a que z mantenga el bloqueo de giro CONFIRMADO est\u00e1 establecido, pero no hay garant\u00eda de que ct se agregue al hash: la l\u00f3gica \"chaintoolong\" o \"clash resolution\" omiten el paso de inserci\u00f3n. responder hnnode.pprev a\u00fan almacena el valor del hash. desbloquea el bloqueo de giro devolver NF_DROP  En caso de que la CPU z inserte la entrada en la tabla hash, la CPU y desvincular\u00e1 E nuevamente de inmediato, pero no ocurre ning\u00fan fallo. Sin la ejecuci\u00f3n de la CPU y, la lista de memoria basura no tiene importancia: ct refcnt permanece en 1, skb se liberar\u00e1 y E se destruir\u00e1 mediante nf_conntrack_put -\u0026gt; nf_conntrack_destroy -\u0026gt; nf_ct_destroy. Para resolver esto, mueva la asignaci\u00f3n IPS_CONFIRMED despu\u00e9s de la inserci\u00f3n de la tabla, pero antes del desbloqueo. Pablo se\u00f1ala que el almacenamiento de bits de confirmaci\u00f3n podr\u00eda reordenarse para que ocurra antes de la adici\u00f3n de la lista de memoria o de la correcci\u00f3n del tiempo de espera, por lo que se debe cambiar a set_bit y a la barrera de memoria before_atomic para evitarlo. No importa si otras CPU pueden observar una entrada reci\u00e9n insertada justo antes de que se establezca el bit CONFIRMED: este evento no se distingue del caso anterior, \"E es la encarnaci\u00f3n anterior\": la entrada se omitir\u00e1. Tambi\u00e9n modifique nf_ct_should_gc() para que primero verifique el bit confirmado. La secuencia de gc es: 1. Verificar si la entrada ---truncado---"
    }
  ],
  "id": "CVE-2025-38472",
  "lastModified": "2025-07-29T14:14:29.590",
  "metrics": {},
  "published": "2025-07-28T12:15:29.003",
  "references": [
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/2d72afb340657f03f7261e9243b44457a9228ac7"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/76179961c423cd698080b5e4d5583cf7f4fcdde9"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/938ce0e8422d3793fe30df2ed0e37f6bc0598379"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/a47ef874189d47f934d0809ae738886307c0ea22"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/fc38c249c622ff5e3011b8845fd49dbfd9289afc"
    }
  ],
  "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
  "vulnStatus": "Awaiting Analysis"
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.


Loading…

Loading…