fkie_cve-2025-22034
Vulnerability from fkie_nvd
Published
2025-04-16 15:15
Modified
2025-04-17 20:22
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved:
mm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs
Patch series "mm: fixes for device-exclusive entries (hmm)", v2.
Discussing the PageTail() call in make_device_exclusive_range() with
Willy, I recently discovered [1] that device-exclusive handling does not
properly work with THP, making the hmm-tests selftests fail if THPs are
enabled on the system.
Looking into more details, I found that hugetlb is not properly fenced,
and I realized that something that was bugging me for longer -- how
device-exclusive entries interact with mapcounts -- completely breaks
migration/swapout/split/hwpoison handling of these folios while they have
device-exclusive PTEs.
The program below can be used to allocate 1 GiB worth of pages and making
them device-exclusive on a kernel with CONFIG_TEST_HMM.
Once they are device-exclusive, these folios cannot get swapped out
(proc$pid/smaps_rollup will always indicate 1 GiB RSS no matter how much
one forces memory reclaim), and when having a memory block onlined to
ZONE_MOVABLE, trying to offline it will loop forever and complain about
failed migration of a page that should be movable.
# echo offline > /sys/devices/system/memory/memory136/state
# echo online_movable > /sys/devices/system/memory/memory136/state
# ./hmm-swap &
... wait until everything is device-exclusive
# echo offline > /sys/devices/system/memory/memory136/state
[ 285.193431][T14882] page: refcount:2 mapcount:0 mapping:0000000000000000
index:0x7f20671f7 pfn:0x442b6a
[ 285.196618][T14882] memcg:ffff888179298000
[ 285.198085][T14882] anon flags: 0x5fff0000002091c(referenced|uptodate|
dirty|active|owner_2|swapbacked|node=1|zone=3|lastcpupid=0x7ff)
[ 285.201734][T14882] raw: ...
[ 285.204464][T14882] raw: ...
[ 285.207196][T14882] page dumped because: migration failure
[ 285.209072][T14882] page_owner tracks the page as allocated
[ 285.210915][T14882] page last allocated via order 0, migratetype
Movable, gfp_mask 0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO),
id 14926, tgid 14926 (hmm-swap), ts 254506295376, free_ts 227402023774
[ 285.216765][T14882] post_alloc_hook+0x197/0x1b0
[ 285.218874][T14882] get_page_from_freelist+0x76e/0x3280
[ 285.220864][T14882] __alloc_frozen_pages_noprof+0x38e/0x2740
[ 285.223302][T14882] alloc_pages_mpol+0x1fc/0x540
[ 285.225130][T14882] folio_alloc_mpol_noprof+0x36/0x340
[ 285.227222][T14882] vma_alloc_folio_noprof+0xee/0x1a0
[ 285.229074][T14882] __handle_mm_fault+0x2b38/0x56a0
[ 285.230822][T14882] handle_mm_fault+0x368/0x9f0
...
This series fixes all issues I found so far. There is no easy way to fix
without a bigger rework/cleanup. I have a bunch of cleanups on top (some
previous sent, some the result of the discussion in v1) that I will send
out separately once this landed and I get to it.
I wish we could just use some special present PROT_NONE PTEs instead of
these (non-present, non-none) fake-swap entries; but that just results in
the same problem we keep having (lack of spare PTE bits), and staring at
other similar fake-swap entries, that ship has sailed.
With this series, make_device_exclusive() doesn't actually belong into
mm/rmap.c anymore, but I'll leave moving that for another day.
I only tested this series with the hmm-tests selftests due to lack of HW,
so I'd appreciate some testing, especially if the interaction between two
GPUs wanting a device-exclusive entry works as expected.
<program>
#include <stdio.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/ioctl.h>
#include <linux/types.h>
#include <linux/ioctl.h>
#define HMM_DMIRROR_EXCLUSIVE _IOWR('H', 0x05, struct hmm_dmirror_cmd)
struct hmm_dmirror_cmd {
__u64 addr;
__u64 ptr;
__u64 npages;
__u64 cpages;
__u64 faults;
};
const size_t size = 1 * 1024 * 1024 * 1024ul;
const size_t chunk_size = 2 * 1024 * 1024ul;
int m
---truncated---
References
Impacted products
Vendor | Product | Version |
---|
{ "cveTags": [], "descriptions": [ { "lang": "en", "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nmm/gup: reject FOLL_SPLIT_PMD with hugetlb VMAs\n\nPatch series \"mm: fixes for device-exclusive entries (hmm)\", v2.\n\nDiscussing the PageTail() call in make_device_exclusive_range() with\nWilly, I recently discovered [1] that device-exclusive handling does not\nproperly work with THP, making the hmm-tests selftests fail if THPs are\nenabled on the system.\n\nLooking into more details, I found that hugetlb is not properly fenced,\nand I realized that something that was bugging me for longer -- how\ndevice-exclusive entries interact with mapcounts -- completely breaks\nmigration/swapout/split/hwpoison handling of these folios while they have\ndevice-exclusive PTEs.\n\nThe program below can be used to allocate 1 GiB worth of pages and making\nthem device-exclusive on a kernel with CONFIG_TEST_HMM.\n\nOnce they are device-exclusive, these folios cannot get swapped out\n(proc$pid/smaps_rollup will always indicate 1 GiB RSS no matter how much\none forces memory reclaim), and when having a memory block onlined to\nZONE_MOVABLE, trying to offline it will loop forever and complain about\nfailed migration of a page that should be movable.\n\n# echo offline \u003e /sys/devices/system/memory/memory136/state\n# echo online_movable \u003e /sys/devices/system/memory/memory136/state\n# ./hmm-swap \u0026\n... wait until everything is device-exclusive\n# echo offline \u003e /sys/devices/system/memory/memory136/state\n[ 285.193431][T14882] page: refcount:2 mapcount:0 mapping:0000000000000000\n index:0x7f20671f7 pfn:0x442b6a\n[ 285.196618][T14882] memcg:ffff888179298000\n[ 285.198085][T14882] anon flags: 0x5fff0000002091c(referenced|uptodate|\n dirty|active|owner_2|swapbacked|node=1|zone=3|lastcpupid=0x7ff)\n[ 285.201734][T14882] raw: ...\n[ 285.204464][T14882] raw: ...\n[ 285.207196][T14882] page dumped because: migration failure\n[ 285.209072][T14882] page_owner tracks the page as allocated\n[ 285.210915][T14882] page last allocated via order 0, migratetype\n Movable, gfp_mask 0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO),\n id 14926, tgid 14926 (hmm-swap), ts 254506295376, free_ts 227402023774\n[ 285.216765][T14882] post_alloc_hook+0x197/0x1b0\n[ 285.218874][T14882] get_page_from_freelist+0x76e/0x3280\n[ 285.220864][T14882] __alloc_frozen_pages_noprof+0x38e/0x2740\n[ 285.223302][T14882] alloc_pages_mpol+0x1fc/0x540\n[ 285.225130][T14882] folio_alloc_mpol_noprof+0x36/0x340\n[ 285.227222][T14882] vma_alloc_folio_noprof+0xee/0x1a0\n[ 285.229074][T14882] __handle_mm_fault+0x2b38/0x56a0\n[ 285.230822][T14882] handle_mm_fault+0x368/0x9f0\n...\n\nThis series fixes all issues I found so far. There is no easy way to fix\nwithout a bigger rework/cleanup. I have a bunch of cleanups on top (some\nprevious sent, some the result of the discussion in v1) that I will send\nout separately once this landed and I get to it.\n\nI wish we could just use some special present PROT_NONE PTEs instead of\nthese (non-present, non-none) fake-swap entries; but that just results in\nthe same problem we keep having (lack of spare PTE bits), and staring at\nother similar fake-swap entries, that ship has sailed.\n\nWith this series, make_device_exclusive() doesn\u0027t actually belong into\nmm/rmap.c anymore, but I\u0027ll leave moving that for another day.\n\nI only tested this series with the hmm-tests selftests due to lack of HW,\nso I\u0027d appreciate some testing, especially if the interaction between two\nGPUs wanting a device-exclusive entry works as expected.\n\n\u003cprogram\u003e\n#include \u003cstdio.h\u003e\n#include \u003cfcntl.h\u003e\n#include \u003cstdint.h\u003e\n#include \u003cunistd.h\u003e\n#include \u003cstdlib.h\u003e\n#include \u003cstring.h\u003e\n#include \u003csys/mman.h\u003e\n#include \u003csys/ioctl.h\u003e\n#include \u003clinux/types.h\u003e\n#include \u003clinux/ioctl.h\u003e\n\n#define HMM_DMIRROR_EXCLUSIVE _IOWR(\u0027H\u0027, 0x05, struct hmm_dmirror_cmd)\n\nstruct hmm_dmirror_cmd {\n\t__u64 addr;\n\t__u64 ptr;\n\t__u64 npages;\n\t__u64 cpages;\n\t__u64 faults;\n};\n\nconst size_t size = 1 * 1024 * 1024 * 1024ul;\nconst size_t chunk_size = 2 * 1024 * 1024ul;\n\nint m\n---truncated---" }, { "lang": "es", "value": "En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: mm/gup: rechazo de FOLL_SPLIT_PMD con VMAs hugetlb. Serie de parches \"mm: correcciones para entradas exclusivas de dispositivo (hmm)\", v2. Al hablar con Willy sobre la llamada a PageTail() en make_device_exclusive_range(), descubr\u00ed recientemente [1] que la gesti\u00f3n exclusiva de dispositivo no funciona correctamente con THP, lo que provoca que las autopruebas hmm-tests fallen si las THP est\u00e1n habilitadas en el sistema. Al analizar m\u00e1s a fondo, descubr\u00ed que hugetlb no est\u00e1 correctamente protegido y me di cuenta de que algo que me hab\u00eda estado molestando durante mucho tiempo (la interacci\u00f3n de las entradas exclusivas de dispositivo con mapcounts) interrumpe por completo la gesti\u00f3n de migraci\u00f3n/intercambio/divisi\u00f3n/hwpoison de estos folios mientras tienen PTE exclusivas de dispositivo. El programa a continuaci\u00f3n se puede usar para asignar 1 GiB de p\u00e1ginas y convertirlas en exclusivas de dispositivo en un kernel con CONFIG_TEST_HMM. Una vez que son exclusivos del dispositivo, estos folios no se pueden intercambiar (proc$pid/smaps_rollup siempre indicar\u00e1 1 GiB RSS sin importar cu\u00e1nto se fuerce la recuperaci\u00f3n de memoria) y cuando se tiene un bloque de memoria en l\u00ednea en ZONE_MOVABLE, al intentar desconectarlo se repetir\u00e1 eternamente y se quejar\u00e1 sobre la migraci\u00f3n fallida de una p\u00e1gina que deber\u00eda ser movible. # echo offline \u0026gt; /sys/devices/system/memory/memory136/state # echo online_movable \u0026gt; /sys/devices/system/memory/memory136/state # ./hmm-swap \u0026amp; ... wait until everything is device-exclusive # echo offline \u0026gt; /sys/devices/system/memory/memory136/state [ 285.193431][T14882] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x7f20671f7 pfn:0x442b6a [ 285.196618][T14882] memcg:ffff888179298000 [ 285.198085][T14882] anon flags: 0x5fff0000002091c(referenced|uptodate| dirty|active|owner_2|swapbacked|node=1|zone=3|lastcpupid=0x7ff) [ 285.201734][T14882] raw: ... [ 285.204464][T14882] raw: ... [ 285.207196][T14882] page dumped because: migration failure [ 285.209072][T14882] page_owner tracks the page as allocated [ 285.210915][T14882] page last allocated via order 0, migratetype Movable, gfp_mask 0x140dca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_ZERO), id 14926, tgid 14926 (hmm-swap), ts 254506295376, free_ts 227402023774 [ 285.216765][T14882] post_alloc_hook+0x197/0x1b0 [ 285.218874][T14882] get_page_from_freelist+0x76e/0x3280 [ 285.220864][T14882] __alloc_frozen_pages_noprof+0x38e/0x2740 [ 285.223302][T14882] alloc_pages_mpol+0x1fc/0x540 [ 285.225130][T14882] folio_alloc_mpol_noprof+0x36/0x340 [ 285.227222][T14882] vma_alloc_folio_noprof+0xee/0x1a0 [ 285.229074][T14882] __handle_mm_fault+0x2b38/0x56a0 [ 285.230822][T14882] handle_mm_fault+0x368/0x9f0 ... Esta serie corrige todos los problemas que he encontrado hasta ahora. No hay una soluci\u00f3n sencilla sin una revisi\u00f3n o limpieza m\u00e1s profunda. Tengo varias correcciones adicionales (algunas enviadas previamente, otras resultantes de la discusi\u00f3n en la v1) que publicar\u00e9 por separado una vez que est\u00e9 disponible y pueda con ello. Ojal\u00e1 pudi\u00e9ramos usar algunas PTE PROT_NONE presentes especiales en lugar de estas entradas de intercambio falso (no presentes, no ninguna); pero eso solo resulta en el mismo problema que seguimos teniendo (falta de bits de PTE de repuesto), y al observar otras entradas de intercambio falso similares, ese barco ya pas\u00f3. Con esta serie, make_device_exclusive() ya no pertenece a mm/rmap.c, pero lo dejar\u00e9 para otro d\u00eda. Solo prob\u00e9 esta serie con las autopruebas hmm-tests debido a la falta de hardware, as\u00ed que agradecer\u00eda algunas pruebas, especialmente si la interacci\u00f3n entre dos GPU que buscan una entrada de dispositivo exclusivo funciona como se espera. #include #include #include #include #include #include #include #include #include #include #define HMM_DMIRROR_EXCLUSIVE _IOWR(\u0027H\u0027, 0x05, ---truncado---" } ], "id": "CVE-2025-22034", "lastModified": "2025-04-17T20:22:16.240", "metrics": {}, "published": "2025-04-16T15:15:56.013", "references": [ { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/2e877ff3492267def06dd50cb165dc9ab8838e7d" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/48d28417c66cce2f3b0ba773fcb6695a56eff220" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/8977752c8056a6a094a279004a49722da15bace3" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/fd900832e8440046627b60697687ab5d04398008" } ], "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "vulnStatus": "Awaiting Analysis" }
Loading…
Loading…
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.
Loading…
Loading…