fkie_cve-2025-38358
Vulnerability from fkie_nvd
Published
2025-07-25 13:15
Modified
2025-07-25 15:29
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix race between async reclaim worker and close_ctree()
Syzbot reported an assertion failure due to an attempt to add a delayed
iput after we have set BTRFS_FS_STATE_NO_DELAYED_IPUT in the fs_info
state:
WARNING: CPU: 0 PID: 65 at fs/btrfs/inode.c:3420 btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420
Modules linked in:
CPU: 0 UID: 0 PID: 65 Comm: kworker/u8:4 Not tainted 6.15.0-next-20250530-syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Workqueue: btrfs-endio-write btrfs_work_helper
RIP: 0010:btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420
Code: 4e ad 5d (...)
RSP: 0018:ffffc9000213f780 EFLAGS: 00010293
RAX: ffffffff83c635b7 RBX: ffff888058920000 RCX: ffff88801c769e00
RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffff888058921b67 R09: 1ffff1100b12436c
R10: dffffc0000000000 R11: ffffed100b12436d R12: 0000000000000001
R13: dffffc0000000000 R14: ffff88807d748000 R15: 0000000000000100
FS: 0000000000000000(0000) GS:ffff888125c53000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002000000bd038 CR3: 000000006a142000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
btrfs_put_ordered_extent+0x19f/0x470 fs/btrfs/ordered-data.c:635
btrfs_finish_one_ordered+0x11d8/0x1b10 fs/btrfs/inode.c:3312
btrfs_work_helper+0x399/0xc20 fs/btrfs/async-thread.c:312
process_one_work kernel/workqueue.c:3238 [inline]
process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3321
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3402
kthread+0x70e/0x8a0 kernel/kthread.c:464
ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
This can happen due to a race with the async reclaim worker like this:
1) The async metadata reclaim worker enters shrink_delalloc(), which calls
btrfs_start_delalloc_roots() with an nr_pages argument that has a value
less than LONG_MAX, and that in turn enters start_delalloc_inodes(),
which sets the local variable 'full_flush' to false because
wbc->nr_to_write is less than LONG_MAX;
2) There it finds inode X in a root's delalloc list, grabs a reference for
inode X (with igrab()), and triggers writeback for it with
filemap_fdatawrite_wbc(), which creates an ordered extent for inode X;
3) The unmount sequence starts from another task, we enter close_ctree()
and we flush the workqueue fs_info->endio_write_workers, which waits
for the ordered extent for inode X to complete and when dropping the
last reference of the ordered extent, with btrfs_put_ordered_extent(),
when we call btrfs_add_delayed_iput() we don't add the inode to the
list of delayed iputs because it has a refcount of 2, so we decrement
it to 1 and return;
4) Shortly after at close_ctree() we call btrfs_run_delayed_iputs() which
runs all delayed iputs, and then we set BTRFS_FS_STATE_NO_DELAYED_IPUT
in the fs_info state;
5) The async reclaim worker, after calling filemap_fdatawrite_wbc(), now
calls btrfs_add_delayed_iput() for inode X and there we trigger an
assertion failure since the fs_info state has the flag
BTRFS_FS_STATE_NO_DELAYED_IPUT set.
Fix this by setting BTRFS_FS_STATE_NO_DELAYED_IPUT only after we wait for
the async reclaim workers to finish, after we call cancel_work_sync() for
them at close_ctree(), and by running delayed iputs after wait for the
reclaim workers to finish and before setting the bit.
This race was recently introduced by commit 19e60b2a95f5 ("btrfs: add
extra warning if delayed iput is added when it's not allowed"). Without
the new validation at btrfs_add_delayed_iput(),
---truncated---
References
Impacted products
Vendor | Product | Version |
---|
{ "cveTags": [], "descriptions": [ { "lang": "en", "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nbtrfs: fix race between async reclaim worker and close_ctree()\n\nSyzbot reported an assertion failure due to an attempt to add a delayed\niput after we have set BTRFS_FS_STATE_NO_DELAYED_IPUT in the fs_info\nstate:\n\n WARNING: CPU: 0 PID: 65 at fs/btrfs/inode.c:3420 btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420\n Modules linked in:\n CPU: 0 UID: 0 PID: 65 Comm: kworker/u8:4 Not tainted 6.15.0-next-20250530-syzkaller #0 PREEMPT(full)\n Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025\n Workqueue: btrfs-endio-write btrfs_work_helper\n RIP: 0010:btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420\n Code: 4e ad 5d (...)\n RSP: 0018:ffffc9000213f780 EFLAGS: 00010293\n RAX: ffffffff83c635b7 RBX: ffff888058920000 RCX: ffff88801c769e00\n RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000\n RBP: 0000000000000001 R08: ffff888058921b67 R09: 1ffff1100b12436c\n R10: dffffc0000000000 R11: ffffed100b12436d R12: 0000000000000001\n R13: dffffc0000000000 R14: ffff88807d748000 R15: 0000000000000100\n FS: 0000000000000000(0000) GS:ffff888125c53000(0000) knlGS:0000000000000000\n CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033\n CR2: 00002000000bd038 CR3: 000000006a142000 CR4: 00000000003526f0\n DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000\n DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400\n Call Trace:\n \u003cTASK\u003e\n btrfs_put_ordered_extent+0x19f/0x470 fs/btrfs/ordered-data.c:635\n btrfs_finish_one_ordered+0x11d8/0x1b10 fs/btrfs/inode.c:3312\n btrfs_work_helper+0x399/0xc20 fs/btrfs/async-thread.c:312\n process_one_work kernel/workqueue.c:3238 [inline]\n process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3321\n worker_thread+0x8a0/0xda0 kernel/workqueue.c:3402\n kthread+0x70e/0x8a0 kernel/kthread.c:464\n ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148\n ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245\n \u003c/TASK\u003e\n\nThis can happen due to a race with the async reclaim worker like this:\n\n1) The async metadata reclaim worker enters shrink_delalloc(), which calls\n btrfs_start_delalloc_roots() with an nr_pages argument that has a value\n less than LONG_MAX, and that in turn enters start_delalloc_inodes(),\n which sets the local variable \u0027full_flush\u0027 to false because\n wbc-\u003enr_to_write is less than LONG_MAX;\n\n2) There it finds inode X in a root\u0027s delalloc list, grabs a reference for\n inode X (with igrab()), and triggers writeback for it with\n filemap_fdatawrite_wbc(), which creates an ordered extent for inode X;\n\n3) The unmount sequence starts from another task, we enter close_ctree()\n and we flush the workqueue fs_info-\u003eendio_write_workers, which waits\n for the ordered extent for inode X to complete and when dropping the\n last reference of the ordered extent, with btrfs_put_ordered_extent(),\n when we call btrfs_add_delayed_iput() we don\u0027t add the inode to the\n list of delayed iputs because it has a refcount of 2, so we decrement\n it to 1 and return;\n\n4) Shortly after at close_ctree() we call btrfs_run_delayed_iputs() which\n runs all delayed iputs, and then we set BTRFS_FS_STATE_NO_DELAYED_IPUT\n in the fs_info state;\n\n5) The async reclaim worker, after calling filemap_fdatawrite_wbc(), now\n calls btrfs_add_delayed_iput() for inode X and there we trigger an\n assertion failure since the fs_info state has the flag\n BTRFS_FS_STATE_NO_DELAYED_IPUT set.\n\nFix this by setting BTRFS_FS_STATE_NO_DELAYED_IPUT only after we wait for\nthe async reclaim workers to finish, after we call cancel_work_sync() for\nthem at close_ctree(), and by running delayed iputs after wait for the\nreclaim workers to finish and before setting the bit.\n\nThis race was recently introduced by commit 19e60b2a95f5 (\"btrfs: add\nextra warning if delayed iput is added when it\u0027s not allowed\"). Without\nthe new validation at btrfs_add_delayed_iput(), \n---truncated---" }, { "lang": "es", "value": "En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: btrfs: correcci\u00f3n de la ejecuci\u00f3n entre el trabajador de recuperaci\u00f3n as\u00edncrono y close_ctree() Syzbot inform\u00f3 de un error de aserci\u00f3n debido a un intento de agregar una entrada retrasada despu\u00e9s de que hayamos establecido BTRFS_FS_STATE_NO_DELAYED_IPUT en el estado WARNING: CPU: 0 PID: 65 at fs/btrfs/inode.c:3420 btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420 Modules linked in: CPU: 0 UID: 0 PID: 65 Comm: kworker/u8:4 Not tainted 6.15.0-next-20250530-syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 Workqueue: btrfs-endio-write btrfs_work_helper RIP: 0010:btrfs_add_delayed_iput+0x2f8/0x370 fs/btrfs/inode.c:3420 Code: 4e ad 5d (...) RSP: 0018:ffffc9000213f780 EFLAGS: 00010293 RAX: ffffffff83c635b7 RBX: ffff888058920000 RCX: ffff88801c769e00 RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000000 RBP: 0000000000000001 R08: ffff888058921b67 R09: 1ffff1100b12436c R10: dffffc0000000000 R11: ffffed100b12436d R12: 0000000000000001 R13: dffffc0000000000 R14: ffff88807d748000 R15: 0000000000000100 FS: 0000000000000000(0000) GS:ffff888125c53000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 ---truncado--- Call Trace: btrfs_put_ordered_extent+0x19f/0x470 fs/btrfs/ordered-data.c:635 btrfs_finish_one_ordered+0x11d8/0x1b10 fs/btrfs/inode.c:3312 btrfs_work_helper+0x399/0xc20 fs/btrfs/async-thread.c:312 process_one_work kernel/workqueue.c:3238 [inline] process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3321 worker_thread+0x8a0/0xda0 kernel/workqueue.c:3402 kthread+0x70e/0x8a0 kernel/kthread.c:464 ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 Esto puede suceder debido a una ejecuci\u00f3n con el trabajador de recuperaci\u00f3n as\u00edncrono como este: 1) El trabajador de recuperaci\u00f3n de metadatos as\u00edncrono ingresa a shrink_delalloc(), que llama a btrfs_start_delalloc_roots() con un argumento nr_pages que tiene un valor menor que LONG_MAX, y que a su vez ingresa a start_delalloc_inodes(), que establece la variable local \u0027full_flush\u0027 en falso porque wbc-\u0026gt;nr_to_write es menor que LONG_MAX; 2) All\u00ed encuentra el inodo X en una lista delalloc de la ra\u00edz, toma una referencia para el inodo X (con igrab()), y activa la escritura diferida para \u00e9l con filemap_fdatawrite_wbc(), que crea una extensi\u00f3n ordenada para el inodo X; 3) La secuencia de desmontaje comienza desde otra tarea, ingresamos close_ctree() y limpiamos la cola de trabajo fs_info-\u0026gt;endio_write_workers, que espera a que se complete la extensi\u00f3n ordenada para el inodo X y cuando eliminamos la \u00faltima referencia de la extensi\u00f3n ordenada, con btrfs_put_ordered_extent(), cuando llamamos a btrfs_add_delayed_iput() no agregamos el inodo a la lista de entradas retrasadas porque tiene un refcount de 2, por lo que lo decrementamos a 1 y regresamos; 4) Poco despu\u00e9s, en close_ctree(), llamamos a btrfs_run_delayed_iputs(), que ejecuta todas las entradas retrasadas, y luego establecemos BTRFS_FS_STATE_NO_DELAYED_IPUT en el estado fs_info. 5) El trabajador de recuperaci\u00f3n as\u00edncrono, despu\u00e9s de llamar a filemap_fdatawrite_wbc(), ahora llama a btrfs_add_delayed_iput() para el inodo X y all\u00ed desencadenamos un error de aserci\u00f3n, ya que el estado fs_info tiene el indicador BTRFS_FS_STATE_NO_DELAYED_IPUT establecido. Solucione esto estableciendo BTRFS_FS_STATE_NO_DELAYED_IPUT solo despu\u00e9s de esperar a que finalicen los trabajadores de recuperaci\u00f3n as\u00edncronos, despu\u00e9s de llamar a cancel_work_sync() para ellos en close_ctree(), y ejecutando las entradas retrasadas despu\u00e9s de esperar a que finalicen los trabajadores de recuperaci\u00f3n y antes de establecer el bit. Esta ejecuci\u00f3n se introdujo recientemente mediante el commit 19e60b2a95f5 (\"btrfs: a\u00f1adir una advertencia adicional si se a\u00f1ade una entrada retardada cuando no est\u00e1 permitida\"). Sin la nueva validaci\u00f3n en btrfs_add_delayed_iput(), ---truncado---" } ], "id": "CVE-2025-38358", "lastModified": "2025-07-25T15:29:19.837", "metrics": {}, "published": "2025-07-25T13:15:24.573", "references": [ { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/4693cda2c06039c875f2eef0123b22340c34bfa0" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/a26bf338cdad3643a6e7c3d78a172baadba15c1a" } ], "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "vulnStatus": "Awaiting Analysis" }
Loading…
Loading…
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.
Loading…
Loading…