github - ghsa-9w23-c35q-mrgj

ghsa-9w23-c35q-mrgj

Vulnerability from github

Published

2025-07-09 12:31

Modified

2025-07-10 15:31

Details

In the Linux kernel, the following vulnerability has been resolved:

mm: userfaultfd: fix race of userfaultfd_move and swap cache

This commit fixes two kinds of races, they may have different results:

Barry reported a BUG_ON in commit c50f8e6053b0, we may see the same BUG_ON if the filemap lookup returned NULL and folio is added to swap cache after that.

If another kind of race is triggered (folio changed after lookup) we may see RSS counter is corrupted:

[ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0 type:MM_ANONPAGES val:-1 [ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0 type:MM_SHMEMPAGES val:1

Because the folio is being accounted to the wrong VMA.

I'm not sure if there will be any data corruption though, seems no. The issues above are critical already.

On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache lookup, and tries to move the found folio to the faulting vma. Currently, it relies on checking the PTE value to ensure that the moved folio still belongs to the src swap entry and that no new folio has been added to the swap cache, which turns out to be unreliable.

While working and reviewing the swap table series with Barry, following existing races are observed and reproduced [1]:

In the example below, move_pages_pte is moving src_pte to dst_pte, where src_pte is a swap entry PTE holding swap entry S1, and S1 is not in the swap cache:

CPU1 CPU2 userfaultfd_move move_pages_pte() entry = pte_to_swp_entry(orig_src_pte); // Here it got entry = S1 ... < interrupted> ... // folio A is a new allocated folio // and get installed into src_pte // src_pte now points to folio A, S1 // has swap count == 0, it can be freed // by folio_swap_swap or swap // allocator's reclaim. // folio B is a folio in another VMA. // S1 is freed, folio B can use it // for swap out with no problem. ... folio = filemap_get_folio(S1) // Got folio B here !!! ... < interrupted again> ... // Now S1 is free to be used again. // Now src_pte is a swap entry PTE // holding S1 again. folio_trylock(folio) move_swap_pte double_pt_lock is_pte_pages_stable // Check passed because src_pte == S1 folio_move_anon_rmap(...) // Moved invalid folio B here !!!

The race window is very short and requires multiple collisions of multiple rare events, so it's very unlikely to happen, but with a deliberately constructed reproducer and increased time window, it can be reproduced easily.

This can be fixed by checking if the folio returned by filemap is the valid swap cache folio after acquiring the folio lock.

Another similar race is possible: filemap_get_folio may return NULL, but folio (A) could be swapped in and then swapped out again using the same swap entry after the lookup. In such a case, folio (A) may remain in the swap cache, so it must be moved too:

CPU1 CPU2 userfaultfd_move move_pages_pte() entry = pte_to_swp_entry(orig_src_pte); // Here it got entry = S1, and S1 is not in swap cache folio = filemap_get ---truncated---

Show details on source website

JSON

To clipboard

{
  "affected": [],
  "aliases": [
    "CVE-2025-38242"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2025-07-09T11:15:26Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nmm: userfaultfd: fix race of userfaultfd_move and swap cache\n\nThis commit fixes two kinds of races, they may have different results:\n\nBarry reported a BUG_ON in commit c50f8e6053b0, we may see the same\nBUG_ON if the filemap lookup returned NULL and folio is added to swap\ncache after that.\n\nIf another kind of race is triggered (folio changed after lookup) we\nmay see RSS counter is corrupted:\n\n[  406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0\ntype:MM_ANONPAGES val:-1\n[  406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0\ntype:MM_SHMEMPAGES val:1\n\nBecause the folio is being accounted to the wrong VMA.\n\nI\u0027m not sure if there will be any data corruption though, seems no. \nThe issues above are critical already.\n\n\nOn seeing a swap entry PTE, userfaultfd_move does a lockless swap cache\nlookup, and tries to move the found folio to the faulting vma.  Currently,\nit relies on checking the PTE value to ensure that the moved folio still\nbelongs to the src swap entry and that no new folio has been added to the\nswap cache, which turns out to be unreliable.\n\nWhile working and reviewing the swap table series with Barry, following\nexisting races are observed and reproduced [1]:\n\nIn the example below, move_pages_pte is moving src_pte to dst_pte, where\nsrc_pte is a swap entry PTE holding swap entry S1, and S1 is not in the\nswap cache:\n\nCPU1                               CPU2\nuserfaultfd_move\n  move_pages_pte()\n    entry = pte_to_swp_entry(orig_src_pte);\n    // Here it got entry = S1\n    ... \u003c interrupted\u003e ...\n                                   \u003cswapin src_pte, alloc and use folio A\u003e\n                                   // folio A is a new allocated folio\n                                   // and get installed into src_pte\n                                   \u003cfrees swap entry S1\u003e\n                                   // src_pte now points to folio A, S1\n                                   // has swap count == 0, it can be freed\n                                   // by folio_swap_swap or swap\n                                   // allocator\u0027s reclaim.\n                                   \u003ctry to swap out another folio B\u003e\n                                   // folio B is a folio in another VMA.\n                                   \u003cput folio B to swap cache using S1 \u003e\n                                   // S1 is freed, folio B can use it\n                                   // for swap out with no problem.\n                                   ...\n    folio = filemap_get_folio(S1)\n    // Got folio B here !!!\n    ... \u003c interrupted again\u003e ...\n                                   \u003cswapin folio B and free S1\u003e\n                                   // Now S1 is free to be used again.\n                                   \u003cswapout src_pte \u0026 folio A using S1\u003e\n                                   // Now src_pte is a swap entry PTE\n                                   // holding S1 again.\n    folio_trylock(folio)\n    move_swap_pte\n      double_pt_lock\n      is_pte_pages_stable\n      // Check passed because src_pte == S1\n      folio_move_anon_rmap(...)\n      // Moved invalid folio B here !!!\n\nThe race window is very short and requires multiple collisions of multiple\nrare events, so it\u0027s very unlikely to happen, but with a deliberately\nconstructed reproducer and increased time window, it can be reproduced\neasily.\n\nThis can be fixed by checking if the folio returned by filemap is the\nvalid swap cache folio after acquiring the folio lock.\n\nAnother similar race is possible: filemap_get_folio may return NULL, but\nfolio (A) could be swapped in and then swapped out again using the same\nswap entry after the lookup.  In such a case, folio (A) may remain in the\nswap cache, so it must be moved too:\n\nCPU1                               CPU2\nuserfaultfd_move\n  move_pages_pte()\n    entry = pte_to_swp_entry(orig_src_pte);\n    // Here it got entry = S1, and S1 is not in swap cache\n    folio = filemap_get\n---truncated---",
  "id": "GHSA-9w23-c35q-mrgj",
  "modified": "2025-07-10T15:31:21Z",
  "published": "2025-07-09T12:31:34Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-38242"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/0ea148a799198518d8ebab63ddd0bb6114a103bc"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/4c443046d8c9ed8724a4f4c3c2457d3ac8814b2f"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/db2ca8074955ca64187a4fb596dd290b9c446cd3"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}

CVE-2025-38242 (GCVE-0-2025-38242)

Vulnerability from cvelistv5

Published

2025-07-09 10:42

Modified

2025-07-28 04:15

Severity ?

Summary

In the Linux kernel, the following vulnerability has been resolved: mm: userfaultfd: fix race of userfaultfd_move and swap cache This commit fixes two kinds of races, they may have different results: Barry reported a BUG_ON in commit c50f8e6053b0, we may see the same BUG_ON if the filemap lookup returned NULL and folio is added to swap cache after that. If another kind of race is triggered (folio changed after lookup) we may see RSS counter is corrupted: [ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0 type:MM_ANONPAGES val:-1 [ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0 type:MM_SHMEMPAGES val:1 Because the folio is being accounted to the wrong VMA. I'm not sure if there will be any data corruption though, seems no. The issues above are critical already. On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache lookup, and tries to move the found folio to the faulting vma. Currently, it relies on checking the PTE value to ensure that the moved folio still belongs to the src swap entry and that no new folio has been added to the swap cache, which turns out to be unreliable. While working and reviewing the swap table series with Barry, following existing races are observed and reproduced [1]: In the example below, move_pages_pte is moving src_pte to dst_pte, where src_pte is a swap entry PTE holding swap entry S1, and S1 is not in the swap cache: CPU1 CPU2 userfaultfd_move move_pages_pte() entry = pte_to_swp_entry(orig_src_pte); // Here it got entry = S1 ... < interrupted> ... <swapin src_pte, alloc and use folio A> // folio A is a new allocated folio // and get installed into src_pte <frees swap entry S1> // src_pte now points to folio A, S1 // has swap count == 0, it can be freed // by folio_swap_swap or swap // allocator's reclaim. <try to swap out another folio B> // folio B is a folio in another VMA. <put folio B to swap cache using S1 > // S1 is freed, folio B can use it // for swap out with no problem. ... folio = filemap_get_folio(S1) // Got folio B here !!! ... < interrupted again> ... <swapin folio B and free S1> // Now S1 is free to be used again. <swapout src_pte & folio A using S1> // Now src_pte is a swap entry PTE // holding S1 again. folio_trylock(folio) move_swap_pte double_pt_lock is_pte_pages_stable // Check passed because src_pte == S1 folio_move_anon_rmap(...) // Moved invalid folio B here !!! The race window is very short and requires multiple collisions of multiple rare events, so it's very unlikely to happen, but with a deliberately constructed reproducer and increased time window, it can be reproduced easily. This can be fixed by checking if the folio returned by filemap is the valid swap cache folio after acquiring the folio lock. Another similar race is possible: filemap_get_folio may return NULL, but folio (A) could be swapped in and then swapped out again using the same swap entry after the lookup. In such a case, folio (A) may remain in the swap cache, so it must be moved too: CPU1 CPU2 userfaultfd_move move_pages_pte() entry = pte_to_swp_entry(orig_src_pte); // Here it got entry = S1, and S1 is not in swap cache folio = filemap_get ---truncated---

References

►

URL

Tags

	https://git.kernel.org/stable/c/4c443046d8c9ed8724a4f4c3c2457d3ac8814b2f
	https://git.kernel.org/stable/c/db2ca8074955ca64187a4fb596dd290b9c446cd3
	https://git.kernel.org/stable/c/0ea148a799198518d8ebab63ddd0bb6114a103bc

Impacted products

Vendor

Product

Version

►

Linux

Version: adef440691bab824e39c1b17382322d195e1fab0
Version: adef440691bab824e39c1b17382322d195e1fab0
Version: adef440691bab824e39c1b17382322d195e1fab0

Linux

Version: 6.8

Show details on NVD website