ghsa-6rc5-mh5g-63wm
Vulnerability from github
Published
2025-03-27 18:31
Modified
2025-03-27 18:31
Details

In the Linux kernel, the following vulnerability has been resolved:

mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups

In commit 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():

-       if (!pmd_present(pmde))
-               return SCAN_PMD_NULL;
+       if (pmd_none(pmde))
+               return SCAN_PMD_NONE;

This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE might identify a pte-mapped hugepage, only to have khugepaged race-in, free the pte table, and clear the pmd. Such codepaths include:

A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER already in the pagecache. B) In retract_page_tables(), if we fail to grab mmap_lock for the target mm/address.

In these cases, collapse_pte_mapped_thp() really does expect a none (not just !present) pmd, and we want to suitably identify that case separate from the case where no pmd is found, or it's a bad-pmd (of course, many things could happen once we drop mmap_lock, and the pmd could plausibly undergo multiple transitions due to intervening fault, split, etc). Regardless, the code is prepared install a huge-pmd only when the existing pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.

However, the commit introduces a logical hole; namely, that we've allowed !none- && !huge- && !bad-pmds to be classified as genuine pte-table-mapping-pmds. One such example that could leak through are swap entries. The pmd values aren't checked again before use in pte_offset_map_lock(), which is expecting nothing less than a genuine pte-table-mapping-pmd.

We want to put back the !pmd_present() check (below the pmd_none() check), but need to be careful to deal with subtleties in pmd transitions and treatments by various arch.

The issue is that __split_huge_pmd_locked() temporarily clears the present bit (or otherwise marks the entry as invalid), but pmd_present() and pmd_trans_huge() still need to return true while the pmd is in this transitory state. For example, x86's pmd_present() also checks the _PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also checks a PMD_PRESENT_INVALID bit.

Covering all 4 cases for x86 (all checks done on the same pmd value):

1) pmd_present() && pmd_trans_huge() All we actually know here is that the PSE bit is set. Either: a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE is set. => huge-pmd b) We are currently racing with __split_huge_page(). The danger here is that we proceed as-if we have a huge-pmd, but really we are looking at a pte-mapping-pmd. So, what is the risk of this danger?

  The only relevant path is:

madvise_collapse() -> collapse_pte_mapped_thp()

  Where we might just incorrectly report back "success", when really
  the memory isn't pmd-backed.  This is fine, since split could
  happen immediately after (actually) successful madvise_collapse().
  So, it should be safe to just assume huge-pmd here.

2) pmd_present() && !pmd_trans_huge() Either: a) PSE not set and either PRESENT or PROTNONE is. => pte-table-mapping pmd (or PROT_NONE) b) devmap. This routine can be called immediately after unlocking/locking mmap_lock -- or called with no locks held (see khugepaged_scan_mm_slot()), so previous VMA checks have since been invalidated.

3) !pmd_present() && pmd_trans_huge() Not possible.

4) !pmd_present() && !pmd_trans_huge() Neither PRESENT nor PROTNONE set => not present

I've checked all archs that implement pmd_trans_huge() (arm64, riscv, powerpc, longarch, x86, mips, s390) and this logic roughly translates (though devmap treatment is unique to x86 and powerpc, and (3) doesn't necessarily hold in general -- but that doesn't matter since !pmd_present() always takes failure path).

Also, add a comment above find_pmd_or_thp_or_none() ---truncated---

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2023-52934"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2025-03-27T17:15:43Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nmm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups\n\nIn commit 34488399fa08 (\"mm/madvise: add file and shmem support to\nMADV_COLLAPSE\") we make the following change to find_pmd_or_thp_or_none():\n\n\t-       if (!pmd_present(pmde))\n\t-               return SCAN_PMD_NULL;\n\t+       if (pmd_none(pmde))\n\t+               return SCAN_PMD_NONE;\n\nThis was for-use by MADV_COLLAPSE file/shmem codepaths, where\nMADV_COLLAPSE might identify a pte-mapped hugepage, only to have\nkhugepaged race-in, free the pte table, and clear the pmd.  Such codepaths\ninclude:\n\nA) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER\n   already in the pagecache.\nB) In retract_page_tables(), if we fail to grab mmap_lock for the target\n   mm/address.\n\nIn these cases, collapse_pte_mapped_thp() really does expect a none (not\njust !present) pmd, and we want to suitably identify that case separate\nfrom the case where no pmd is found, or it\u0027s a bad-pmd (of course, many\nthings could happen once we drop mmap_lock, and the pmd could plausibly\nundergo multiple transitions due to intervening fault, split, etc). \nRegardless, the code is prepared install a huge-pmd only when the existing\npmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.\n\nHowever, the commit introduces a logical hole; namely, that we\u0027ve allowed\n!none- \u0026\u0026 !huge- \u0026\u0026 !bad-pmds to be classified as genuine\npte-table-mapping-pmds.  One such example that could leak through are swap\nentries.  The pmd values aren\u0027t checked again before use in\npte_offset_map_lock(), which is expecting nothing less than a genuine\npte-table-mapping-pmd.\n\nWe want to put back the !pmd_present() check (below the pmd_none() check),\nbut need to be careful to deal with subtleties in pmd transitions and\ntreatments by various arch.\n\nThe issue is that __split_huge_pmd_locked() temporarily clears the present\nbit (or otherwise marks the entry as invalid), but pmd_present() and\npmd_trans_huge() still need to return true while the pmd is in this\ntransitory state.  For example, x86\u0027s pmd_present() also checks the\n_PAGE_PSE , riscv\u0027s version also checks the _PAGE_LEAF bit, and arm64 also\nchecks a PMD_PRESENT_INVALID bit.\n\nCovering all 4 cases for x86 (all checks done on the same pmd value):\n\n1) pmd_present() \u0026\u0026 pmd_trans_huge()\n   All we actually know here is that the PSE bit is set. Either:\n   a) We aren\u0027t racing with __split_huge_page(), and PRESENT or PROTNONE\n      is set.\n      =\u003e huge-pmd\n   b) We are currently racing with __split_huge_page().  The danger here\n      is that we proceed as-if we have a huge-pmd, but really we are\n      looking at a pte-mapping-pmd.  So, what is the risk of this\n      danger?\n\n      The only relevant path is:\n\n\tmadvise_collapse() -\u003e collapse_pte_mapped_thp()\n\n      Where we might just incorrectly report back \"success\", when really\n      the memory isn\u0027t pmd-backed.  This is fine, since split could\n      happen immediately after (actually) successful madvise_collapse().\n      So, it should be safe to just assume huge-pmd here.\n\n2) pmd_present() \u0026\u0026 !pmd_trans_huge()\n   Either:\n   a) PSE not set and either PRESENT or PROTNONE is.\n      =\u003e pte-table-mapping pmd (or PROT_NONE)\n   b) devmap.  This routine can be called immediately after\n      unlocking/locking mmap_lock -- or called with no locks held (see\n      khugepaged_scan_mm_slot()), so previous VMA checks have since been\n      invalidated.\n\n3) !pmd_present() \u0026\u0026 pmd_trans_huge()\n  Not possible.\n\n4) !pmd_present() \u0026\u0026 !pmd_trans_huge()\n  Neither PRESENT nor PROTNONE set\n  =\u003e not present\n\nI\u0027ve checked all archs that implement pmd_trans_huge() (arm64, riscv,\npowerpc, longarch, x86, mips, s390) and this logic roughly translates\n(though devmap treatment is unique to x86 and powerpc, and (3) doesn\u0027t\nnecessarily hold in general -- but that doesn\u0027t matter since\n!pmd_present() always takes failure path).\n\nAlso, add a comment above find_pmd_or_thp_or_none()\n---truncated---",
  "id": "GHSA-6rc5-mh5g-63wm",
  "modified": "2025-03-27T18:31:25Z",
  "published": "2025-03-27T18:31:25Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2023-52934"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/96aaaf8666010a39430cecf8a65c7ce2908a030f"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/edb5d0cf5525357652aff6eacd9850b8ced07143"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.


Loading…

Loading…