ghsa-6rc5-mh5g-63wm
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups
In commit 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():
- if (!pmd_present(pmde))
- return SCAN_PMD_NULL;
+ if (pmd_none(pmde))
+ return SCAN_PMD_NONE;
This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE might identify a pte-mapped hugepage, only to have khugepaged race-in, free the pte table, and clear the pmd. Such codepaths include:
A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER already in the pagecache. B) In retract_page_tables(), if we fail to grab mmap_lock for the target mm/address.
In these cases, collapse_pte_mapped_thp() really does expect a none (not just !present) pmd, and we want to suitably identify that case separate from the case where no pmd is found, or it's a bad-pmd (of course, many things could happen once we drop mmap_lock, and the pmd could plausibly undergo multiple transitions due to intervening fault, split, etc). Regardless, the code is prepared install a huge-pmd only when the existing pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.
However, the commit introduces a logical hole; namely, that we've allowed !none- && !huge- && !bad-pmds to be classified as genuine pte-table-mapping-pmds. One such example that could leak through are swap entries. The pmd values aren't checked again before use in pte_offset_map_lock(), which is expecting nothing less than a genuine pte-table-mapping-pmd.
We want to put back the !pmd_present() check (below the pmd_none() check), but need to be careful to deal with subtleties in pmd transitions and treatments by various arch.
The issue is that __split_huge_pmd_locked() temporarily clears the present bit (or otherwise marks the entry as invalid), but pmd_present() and pmd_trans_huge() still need to return true while the pmd is in this transitory state. For example, x86's pmd_present() also checks the _PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also checks a PMD_PRESENT_INVALID bit.
Covering all 4 cases for x86 (all checks done on the same pmd value):
1) pmd_present() && pmd_trans_huge() All we actually know here is that the PSE bit is set. Either: a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE is set. => huge-pmd b) We are currently racing with __split_huge_page(). The danger here is that we proceed as-if we have a huge-pmd, but really we are looking at a pte-mapping-pmd. So, what is the risk of this danger?
The only relevant path is:
madvise_collapse() -> collapse_pte_mapped_thp()
Where we might just incorrectly report back "success", when really
the memory isn't pmd-backed. This is fine, since split could
happen immediately after (actually) successful madvise_collapse().
So, it should be safe to just assume huge-pmd here.
2) pmd_present() && !pmd_trans_huge() Either: a) PSE not set and either PRESENT or PROTNONE is. => pte-table-mapping pmd (or PROT_NONE) b) devmap. This routine can be called immediately after unlocking/locking mmap_lock -- or called with no locks held (see khugepaged_scan_mm_slot()), so previous VMA checks have since been invalidated.
3) !pmd_present() && pmd_trans_huge() Not possible.
4) !pmd_present() && !pmd_trans_huge() Neither PRESENT nor PROTNONE set => not present
I've checked all archs that implement pmd_trans_huge() (arm64, riscv, powerpc, longarch, x86, mips, s390) and this logic roughly translates (though devmap treatment is unique to x86 and powerpc, and (3) doesn't necessarily hold in general -- but that doesn't matter since !pmd_present() always takes failure path).
Also, add a comment above find_pmd_or_thp_or_none() ---truncated---
{ "affected": [], "aliases": [ "CVE-2023-52934" ], "database_specific": { "cwe_ids": [], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2025-03-27T17:15:43Z", "severity": null }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nmm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups\n\nIn commit 34488399fa08 (\"mm/madvise: add file and shmem support to\nMADV_COLLAPSE\") we make the following change to find_pmd_or_thp_or_none():\n\n\t- if (!pmd_present(pmde))\n\t- return SCAN_PMD_NULL;\n\t+ if (pmd_none(pmde))\n\t+ return SCAN_PMD_NONE;\n\nThis was for-use by MADV_COLLAPSE file/shmem codepaths, where\nMADV_COLLAPSE might identify a pte-mapped hugepage, only to have\nkhugepaged race-in, free the pte table, and clear the pmd. Such codepaths\ninclude:\n\nA) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER\n already in the pagecache.\nB) In retract_page_tables(), if we fail to grab mmap_lock for the target\n mm/address.\n\nIn these cases, collapse_pte_mapped_thp() really does expect a none (not\njust !present) pmd, and we want to suitably identify that case separate\nfrom the case where no pmd is found, or it\u0027s a bad-pmd (of course, many\nthings could happen once we drop mmap_lock, and the pmd could plausibly\nundergo multiple transitions due to intervening fault, split, etc). \nRegardless, the code is prepared install a huge-pmd only when the existing\npmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.\n\nHowever, the commit introduces a logical hole; namely, that we\u0027ve allowed\n!none- \u0026\u0026 !huge- \u0026\u0026 !bad-pmds to be classified as genuine\npte-table-mapping-pmds. One such example that could leak through are swap\nentries. The pmd values aren\u0027t checked again before use in\npte_offset_map_lock(), which is expecting nothing less than a genuine\npte-table-mapping-pmd.\n\nWe want to put back the !pmd_present() check (below the pmd_none() check),\nbut need to be careful to deal with subtleties in pmd transitions and\ntreatments by various arch.\n\nThe issue is that __split_huge_pmd_locked() temporarily clears the present\nbit (or otherwise marks the entry as invalid), but pmd_present() and\npmd_trans_huge() still need to return true while the pmd is in this\ntransitory state. For example, x86\u0027s pmd_present() also checks the\n_PAGE_PSE , riscv\u0027s version also checks the _PAGE_LEAF bit, and arm64 also\nchecks a PMD_PRESENT_INVALID bit.\n\nCovering all 4 cases for x86 (all checks done on the same pmd value):\n\n1) pmd_present() \u0026\u0026 pmd_trans_huge()\n All we actually know here is that the PSE bit is set. Either:\n a) We aren\u0027t racing with __split_huge_page(), and PRESENT or PROTNONE\n is set.\n =\u003e huge-pmd\n b) We are currently racing with __split_huge_page(). The danger here\n is that we proceed as-if we have a huge-pmd, but really we are\n looking at a pte-mapping-pmd. So, what is the risk of this\n danger?\n\n The only relevant path is:\n\n\tmadvise_collapse() -\u003e collapse_pte_mapped_thp()\n\n Where we might just incorrectly report back \"success\", when really\n the memory isn\u0027t pmd-backed. This is fine, since split could\n happen immediately after (actually) successful madvise_collapse().\n So, it should be safe to just assume huge-pmd here.\n\n2) pmd_present() \u0026\u0026 !pmd_trans_huge()\n Either:\n a) PSE not set and either PRESENT or PROTNONE is.\n =\u003e pte-table-mapping pmd (or PROT_NONE)\n b) devmap. This routine can be called immediately after\n unlocking/locking mmap_lock -- or called with no locks held (see\n khugepaged_scan_mm_slot()), so previous VMA checks have since been\n invalidated.\n\n3) !pmd_present() \u0026\u0026 pmd_trans_huge()\n Not possible.\n\n4) !pmd_present() \u0026\u0026 !pmd_trans_huge()\n Neither PRESENT nor PROTNONE set\n =\u003e not present\n\nI\u0027ve checked all archs that implement pmd_trans_huge() (arm64, riscv,\npowerpc, longarch, x86, mips, s390) and this logic roughly translates\n(though devmap treatment is unique to x86 and powerpc, and (3) doesn\u0027t\nnecessarily hold in general -- but that doesn\u0027t matter since\n!pmd_present() always takes failure path).\n\nAlso, add a comment above find_pmd_or_thp_or_none()\n---truncated---", "id": "GHSA-6rc5-mh5g-63wm", "modified": "2025-03-27T18:31:25Z", "published": "2025-03-27T18:31:25Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2023-52934" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/96aaaf8666010a39430cecf8a65c7ce2908a030f" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/edb5d0cf5525357652aff6eacd9850b8ced07143" } ], "schema_version": "1.4.0", "severity": [] }
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.