CVE-2025-38242 (GCVE-0-2025-38242)
Vulnerability from cvelistv5
Published
2025-07-09 10:42
Modified
2025-07-28 04:15
Severity ?
VLAI Severity ?
EPSS score ?
Summary
In the Linux kernel, the following vulnerability has been resolved:
mm: userfaultfd: fix race of userfaultfd_move and swap cache
This commit fixes two kinds of races, they may have different results:
Barry reported a BUG_ON in commit c50f8e6053b0, we may see the same
BUG_ON if the filemap lookup returned NULL and folio is added to swap
cache after that.
If another kind of race is triggered (folio changed after lookup) we
may see RSS counter is corrupted:
[ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0
type:MM_ANONPAGES val:-1
[ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0
type:MM_SHMEMPAGES val:1
Because the folio is being accounted to the wrong VMA.
I'm not sure if there will be any data corruption though, seems no.
The issues above are critical already.
On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache
lookup, and tries to move the found folio to the faulting vma. Currently,
it relies on checking the PTE value to ensure that the moved folio still
belongs to the src swap entry and that no new folio has been added to the
swap cache, which turns out to be unreliable.
While working and reviewing the swap table series with Barry, following
existing races are observed and reproduced [1]:
In the example below, move_pages_pte is moving src_pte to dst_pte, where
src_pte is a swap entry PTE holding swap entry S1, and S1 is not in the
swap cache:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1
... < interrupted> ...
<swapin src_pte, alloc and use folio A>
// folio A is a new allocated folio
// and get installed into src_pte
<frees swap entry S1>
// src_pte now points to folio A, S1
// has swap count == 0, it can be freed
// by folio_swap_swap or swap
// allocator's reclaim.
<try to swap out another folio B>
// folio B is a folio in another VMA.
<put folio B to swap cache using S1 >
// S1 is freed, folio B can use it
// for swap out with no problem.
...
folio = filemap_get_folio(S1)
// Got folio B here !!!
... < interrupted again> ...
<swapin folio B and free S1>
// Now S1 is free to be used again.
<swapout src_pte & folio A using S1>
// Now src_pte is a swap entry PTE
// holding S1 again.
folio_trylock(folio)
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// Moved invalid folio B here !!!
The race window is very short and requires multiple collisions of multiple
rare events, so it's very unlikely to happen, but with a deliberately
constructed reproducer and increased time window, it can be reproduced
easily.
This can be fixed by checking if the folio returned by filemap is the
valid swap cache folio after acquiring the folio lock.
Another similar race is possible: filemap_get_folio may return NULL, but
folio (A) could be swapped in and then swapped out again using the same
swap entry after the lookup. In such a case, folio (A) may remain in the
swap cache, so it must be moved too:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1, and S1 is not in swap cache
folio = filemap_get
---truncated---
References
Impacted products
{ "containers": { "cna": { "affected": [ { "defaultStatus": "unaffected", "product": "Linux", "programFiles": [ "mm/userfaultfd.c" ], "repo": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git", "vendor": "Linux", "versions": [ { "lessThan": "4c443046d8c9ed8724a4f4c3c2457d3ac8814b2f", "status": "affected", "version": "adef440691bab824e39c1b17382322d195e1fab0", "versionType": "git" }, { "lessThan": "db2ca8074955ca64187a4fb596dd290b9c446cd3", "status": "affected", "version": "adef440691bab824e39c1b17382322d195e1fab0", "versionType": "git" }, { "lessThan": "0ea148a799198518d8ebab63ddd0bb6114a103bc", "status": "affected", "version": "adef440691bab824e39c1b17382322d195e1fab0", "versionType": "git" } ] }, { "defaultStatus": "affected", "product": "Linux", "programFiles": [ "mm/userfaultfd.c" ], "repo": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git", "vendor": "Linux", "versions": [ { "status": "affected", "version": "6.8" }, { "lessThan": "6.8", "status": "unaffected", "version": "0", "versionType": "semver" }, { "lessThanOrEqual": "6.12.*", "status": "unaffected", "version": "6.12.37", "versionType": "semver" }, { "lessThanOrEqual": "6.15.*", "status": "unaffected", "version": "6.15.5", "versionType": "semver" }, { "lessThanOrEqual": "*", "status": "unaffected", "version": "6.16", "versionType": "original_commit_for_fix" } ] } ], "cpeApplicability": [ { "nodes": [ { "cpeMatch": [ { "criteria": "cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:*", "versionEndExcluding": "6.12.37", "versionStartIncluding": "6.8", "vulnerable": true }, { "criteria": "cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:*", "versionEndExcluding": "6.15.5", "versionStartIncluding": "6.8", "vulnerable": true }, { "criteria": "cpe:2.3:o:linux:linux_kernel:*:*:*:*:*:*:*:*", "versionEndExcluding": "6.16", "versionStartIncluding": "6.8", "vulnerable": true } ], "negate": false, "operator": "OR" } ] } ], "descriptions": [ { "lang": "en", "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nmm: userfaultfd: fix race of userfaultfd_move and swap cache\n\nThis commit fixes two kinds of races, they may have different results:\n\nBarry reported a BUG_ON in commit c50f8e6053b0, we may see the same\nBUG_ON if the filemap lookup returned NULL and folio is added to swap\ncache after that.\n\nIf another kind of race is triggered (folio changed after lookup) we\nmay see RSS counter is corrupted:\n\n[ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0\ntype:MM_ANONPAGES val:-1\n[ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0\ntype:MM_SHMEMPAGES val:1\n\nBecause the folio is being accounted to the wrong VMA.\n\nI\u0027m not sure if there will be any data corruption though, seems no. \nThe issues above are critical already.\n\n\nOn seeing a swap entry PTE, userfaultfd_move does a lockless swap cache\nlookup, and tries to move the found folio to the faulting vma. Currently,\nit relies on checking the PTE value to ensure that the moved folio still\nbelongs to the src swap entry and that no new folio has been added to the\nswap cache, which turns out to be unreliable.\n\nWhile working and reviewing the swap table series with Barry, following\nexisting races are observed and reproduced [1]:\n\nIn the example below, move_pages_pte is moving src_pte to dst_pte, where\nsrc_pte is a swap entry PTE holding swap entry S1, and S1 is not in the\nswap cache:\n\nCPU1 CPU2\nuserfaultfd_move\n move_pages_pte()\n entry = pte_to_swp_entry(orig_src_pte);\n // Here it got entry = S1\n ... \u003c interrupted\u003e ...\n \u003cswapin src_pte, alloc and use folio A\u003e\n // folio A is a new allocated folio\n // and get installed into src_pte\n \u003cfrees swap entry S1\u003e\n // src_pte now points to folio A, S1\n // has swap count == 0, it can be freed\n // by folio_swap_swap or swap\n // allocator\u0027s reclaim.\n \u003ctry to swap out another folio B\u003e\n // folio B is a folio in another VMA.\n \u003cput folio B to swap cache using S1 \u003e\n // S1 is freed, folio B can use it\n // for swap out with no problem.\n ...\n folio = filemap_get_folio(S1)\n // Got folio B here !!!\n ... \u003c interrupted again\u003e ...\n \u003cswapin folio B and free S1\u003e\n // Now S1 is free to be used again.\n \u003cswapout src_pte \u0026 folio A using S1\u003e\n // Now src_pte is a swap entry PTE\n // holding S1 again.\n folio_trylock(folio)\n move_swap_pte\n double_pt_lock\n is_pte_pages_stable\n // Check passed because src_pte == S1\n folio_move_anon_rmap(...)\n // Moved invalid folio B here !!!\n\nThe race window is very short and requires multiple collisions of multiple\nrare events, so it\u0027s very unlikely to happen, but with a deliberately\nconstructed reproducer and increased time window, it can be reproduced\neasily.\n\nThis can be fixed by checking if the folio returned by filemap is the\nvalid swap cache folio after acquiring the folio lock.\n\nAnother similar race is possible: filemap_get_folio may return NULL, but\nfolio (A) could be swapped in and then swapped out again using the same\nswap entry after the lookup. In such a case, folio (A) may remain in the\nswap cache, so it must be moved too:\n\nCPU1 CPU2\nuserfaultfd_move\n move_pages_pte()\n entry = pte_to_swp_entry(orig_src_pte);\n // Here it got entry = S1, and S1 is not in swap cache\n folio = filemap_get\n---truncated---" } ], "providerMetadata": { "dateUpdated": "2025-07-28T04:15:59.615Z", "orgId": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "shortName": "Linux" }, "references": [ { "url": "https://git.kernel.org/stable/c/4c443046d8c9ed8724a4f4c3c2457d3ac8814b2f" }, { "url": "https://git.kernel.org/stable/c/db2ca8074955ca64187a4fb596dd290b9c446cd3" }, { "url": "https://git.kernel.org/stable/c/0ea148a799198518d8ebab63ddd0bb6114a103bc" } ], "title": "mm: userfaultfd: fix race of userfaultfd_move and swap cache", "x_generator": { "engine": "bippy-1.2.0" } } }, "cveMetadata": { "assignerOrgId": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "assignerShortName": "Linux", "cveId": "CVE-2025-38242", "datePublished": "2025-07-09T10:42:25.396Z", "dateReserved": "2025-04-16T04:51:23.996Z", "dateUpdated": "2025-07-28T04:15:59.615Z", "state": "PUBLISHED" }, "dataType": "CVE_RECORD", "dataVersion": "5.1", "vulnerability-lookup:meta": { "nvd": "{\"cve\":{\"id\":\"CVE-2025-38242\",\"sourceIdentifier\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\",\"published\":\"2025-07-09T11:15:26.233\",\"lastModified\":\"2025-07-10T15:15:26.957\",\"vulnStatus\":\"Awaiting Analysis\",\"cveTags\":[],\"descriptions\":[{\"lang\":\"en\",\"value\":\"In the Linux kernel, the following vulnerability has been resolved:\\n\\nmm: userfaultfd: fix race of userfaultfd_move and swap cache\\n\\nThis commit fixes two kinds of races, they may have different results:\\n\\nBarry reported a BUG_ON in commit c50f8e6053b0, we may see the same\\nBUG_ON if the filemap lookup returned NULL and folio is added to swap\\ncache after that.\\n\\nIf another kind of race is triggered (folio changed after lookup) we\\nmay see RSS counter is corrupted:\\n\\n[ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0\\ntype:MM_ANONPAGES val:-1\\n[ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0\\ntype:MM_SHMEMPAGES val:1\\n\\nBecause the folio is being accounted to the wrong VMA.\\n\\nI\u0027m not sure if there will be any data corruption though, seems no. \\nThe issues above are critical already.\\n\\n\\nOn seeing a swap entry PTE, userfaultfd_move does a lockless swap cache\\nlookup, and tries to move the found folio to the faulting vma. Currently,\\nit relies on checking the PTE value to ensure that the moved folio still\\nbelongs to the src swap entry and that no new folio has been added to the\\nswap cache, which turns out to be unreliable.\\n\\nWhile working and reviewing the swap table series with Barry, following\\nexisting races are observed and reproduced [1]:\\n\\nIn the example below, move_pages_pte is moving src_pte to dst_pte, where\\nsrc_pte is a swap entry PTE holding swap entry S1, and S1 is not in the\\nswap cache:\\n\\nCPU1 CPU2\\nuserfaultfd_move\\n move_pages_pte()\\n entry = pte_to_swp_entry(orig_src_pte);\\n // Here it got entry = S1\\n ... \u003c interrupted\u003e ...\\n \u003cswapin src_pte, alloc and use folio A\u003e\\n // folio A is a new allocated folio\\n // and get installed into src_pte\\n \u003cfrees swap entry S1\u003e\\n // src_pte now points to folio A, S1\\n // has swap count == 0, it can be freed\\n // by folio_swap_swap or swap\\n // allocator\u0027s reclaim.\\n \u003ctry to swap out another folio B\u003e\\n // folio B is a folio in another VMA.\\n \u003cput folio B to swap cache using S1 \u003e\\n // S1 is freed, folio B can use it\\n // for swap out with no problem.\\n ...\\n folio = filemap_get_folio(S1)\\n // Got folio B here !!!\\n ... \u003c interrupted again\u003e ...\\n \u003cswapin folio B and free S1\u003e\\n // Now S1 is free to be used again.\\n \u003cswapout src_pte \u0026 folio A using S1\u003e\\n // Now src_pte is a swap entry PTE\\n // holding S1 again.\\n folio_trylock(folio)\\n move_swap_pte\\n double_pt_lock\\n is_pte_pages_stable\\n // Check passed because src_pte == S1\\n folio_move_anon_rmap(...)\\n // Moved invalid folio B here !!!\\n\\nThe race window is very short and requires multiple collisions of multiple\\nrare events, so it\u0027s very unlikely to happen, but with a deliberately\\nconstructed reproducer and increased time window, it can be reproduced\\neasily.\\n\\nThis can be fixed by checking if the folio returned by filemap is the\\nvalid swap cache folio after acquiring the folio lock.\\n\\nAnother similar race is possible: filemap_get_folio may return NULL, but\\nfolio (A) could be swapped in and then swapped out again using the same\\nswap entry after the lookup. In such a case, folio (A) may remain in the\\nswap cache, so it must be moved too:\\n\\nCPU1 CPU2\\nuserfaultfd_move\\n move_pages_pte()\\n entry = pte_to_swp_entry(orig_src_pte);\\n // Here it got entry = S1, and S1 is not in swap cache\\n folio = filemap_get\\n---truncated---\"},{\"lang\":\"es\",\"value\":\"En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: mm: userfaultfd: corrige la ejecuci\u00f3n de userfaultfd_move y la cach\u00e9 de intercambio. Esta confirmaci\u00f3n corrige dos tipos de ejecuciones, pueden tener resultados diferentes: Barry inform\u00f3 un BUG_ON en el commit c50f8e6053b0, podemos ver el mismo BUG_ON si la b\u00fasqueda del mapa de archivos devolvi\u00f3 NULL y folio se agrega a la cach\u00e9 de intercambio despu\u00e9s de eso. Si se activa otro tipo de ejecuci\u00f3n (folio modificado tras la b\u00fasqueda), es posible que el contador RSS est\u00e9 da\u00f1ado: [406.893936] ERROR: Estado incorrecto del contador RSS mm:ffff0000c5a9ddc0 tipo:MM_ANONPAGES val:-1 [406.894071] ERROR: Estado incorrecto del contador RSS mm:ffff0000c5a9ddc0 tipo:MM_SHMEMPAGES val:1 Porque el folio se est\u00e1 contabilizando en la VMA incorrecta. No estoy seguro de si habr\u00e1 alguna corrupci\u00f3n de datos, aunque parece que no. Los problemas anteriores ya son cr\u00edticos. Al ver un PTE de entrada de intercambio, userfaultfd_move realiza una b\u00fasqueda de cach\u00e9 de intercambio sin bloqueo e intenta mover el folio encontrado a la VMA que falla. Actualmente, se basa en la comprobaci\u00f3n del valor de PTE para garantizar que el folio movido siga perteneciendo a la entrada de intercambio src y que no se haya a\u00f1adido ning\u00fan folio nuevo a la cach\u00e9 de intercambio, lo cual resulta poco fiable. Al trabajar y revisar la serie de tablas de intercambio con Barry, se observan y reproducen las siguientes ejecuciones existentes [1]: En el siguiente ejemplo, move_pages_pte mueve src_pte a dst_pte, donde src_pte es una PTE de entrada de intercambio que contiene la entrada de intercambio S1, y S1 no est\u00e1 en la cach\u00e9 de intercambio: CPU1 CPU2 userfaultfd_move move_pages_pte() entry = pte_to_swp_entry(orig_src_pte); // Aqu\u00ed tiene entrada = S1 ... ... // folio A es un nuevo folio asignado // y se instala en src_pte // src_pte ahora apunta al folio A, S1 // tiene conteo de intercambio == 0, puede liberarse // mediante folio_swap_swap o la recuperaci\u00f3n del asignador de intercambio. // folio B es un folio en otro VMA. // S1 se libera, el folio B puede usarlo // para intercambiar sin problemas. ... folio = filemap_get_folio(S1) // \u00a1\u00a1\u00a1Tengo el folio B aqu\u00ed!!! ... ... // Ahora S1 est\u00e1 libre para volver a usarse. // Ahora src_pte es una entrada de intercambio PTE // que mantiene S1 de nuevo. folio_trylock(folio) move_swap_pte double_pt_lock is_pte_pages_stable // Comprobaci\u00f3n aprobada porque src_pte == S1 folio_move_anon_rmap(...) // \u00a1\u00a1\u00a1Se movi\u00f3 el folio B inv\u00e1lido aqu\u00ed!!! La ventana de ejecuci\u00f3n es muy corta y requiere m\u00faltiples colisiones de m\u00faltiples eventos raros, por lo que es muy improbable que suceda, pero con un reproductor construido deliberadamente y una ventana de tiempo mayor, se puede reproducir f\u00e1cilmente. Esto se puede arreglar comprobando si el folio devuelto por filemap es el folio de cach\u00e9 de intercambio v\u00e1lido despu\u00e9s de adquirir el bloqueo de folio. Otra ejecuci\u00f3n similar es posible: filemap_get_folio puede devolver NULL, pero el folio (A) podr\u00eda intercambiarse dentro y fuera de nuevo usando la misma entrada de intercambio despu\u00e9s de la b\u00fasqueda. En tal caso, el folio (A) puede permanecer en el cach\u00e9 de intercambio, por lo que tambi\u00e9n debe moverse: CPU1 CPU2 userfaultfd_move move_pages_pte() entry = pte_to_swp_entry(orig_src_pte); // Aqu\u00ed obtuvo entry = S1, y S1 no est\u00e1 en el cach\u00e9 de intercambio folio = filemap_get ---truncated---\"}],\"metrics\":{},\"references\":[{\"url\":\"https://git.kernel.org/stable/c/0ea148a799198518d8ebab63ddd0bb6114a103bc\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"},{\"url\":\"https://git.kernel.org/stable/c/4c443046d8c9ed8724a4f4c3c2457d3ac8814b2f\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"},{\"url\":\"https://git.kernel.org/stable/c/db2ca8074955ca64187a4fb596dd290b9c446cd3\",\"source\":\"416baaa9-dc9f-4396-8d5f-8c081fb06d67\"}]}}" } }
Loading…
Loading…
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.
Loading…
Loading…