fkie_cve-2025-38365
Vulnerability from fkie_nvd
Published
2025-07-25 13:15
Modified
2025-07-25 15:29
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix a race between renames and directory logging
We have a race between a rename and directory inode logging that if it
happens and we crash/power fail before the rename completes, the next time
the filesystem is mounted, the log replay code will end up deleting the
file that was being renamed.
This is best explained following a step by step analysis of an interleaving
of steps that lead into this situation.
Consider the initial conditions:
1) We are at transaction N;
2) We have directories A and B created in a past transaction (< N);
3) We have inode X corresponding to a file that has 2 hardlinks, one in
directory A and the other in directory B, so we'll name them as
"A/foo_link1" and "B/foo_link2". Both hard links were persisted in a
past transaction (< N);
4) We have inode Y corresponding to a file that as a single hard link and
is located in directory A, we'll name it as "A/bar". This file was also
persisted in a past transaction (< N).
The steps leading to a file loss are the following and for all of them we
are under transaction N:
1) Link "A/foo_link1" is removed, so inode's X last_unlink_trans field
is updated to N, through btrfs_unlink() -> btrfs_record_unlink_dir();
2) Task A starts a rename for inode Y, with the goal of renaming from
"A/bar" to "A/baz", so we enter btrfs_rename();
3) Task A inserts the new BTRFS_INODE_REF_KEY for inode Y by calling
btrfs_insert_inode_ref();
4) Because the rename happens in the same directory, we don't set the
last_unlink_trans field of directoty A's inode to the current
transaction id, that is, we don't cal btrfs_record_unlink_dir();
5) Task A then removes the entries from directory A (BTRFS_DIR_ITEM_KEY
and BTRFS_DIR_INDEX_KEY items) when calling __btrfs_unlink_inode()
(actually the dir index item is added as a delayed item, but the
effect is the same);
6) Now before task A adds the new entry "A/baz" to directory A by
calling btrfs_add_link(), another task, task B is logging inode X;
7) Task B starts a fsync of inode X and after logging inode X, at
btrfs_log_inode_parent() it calls btrfs_log_all_parents(), since
inode X has a last_unlink_trans value of N, set at in step 1;
8) At btrfs_log_all_parents() we search for all parent directories of
inode X using the commit root, so we find directories A and B and log
them. Bu when logging direct A, we don't have a dir index item for
inode Y anymore, neither the old name "A/bar" nor for the new name
"A/baz" since the rename has deleted the old name but has not yet
inserted the new name - task A hasn't called yet btrfs_add_link() to
do that.
Note that logging directory A doesn't fallback to a transaction
commit because its last_unlink_trans has a lower value than the
current transaction's id (see step 4);
9) Task B finishes logging directories A and B and gets back to
btrfs_sync_file() where it calls btrfs_sync_log() to persist the log
tree;
10) Task B successfully persisted the log tree, btrfs_sync_log() completed
with success, and a power failure happened.
We have a log tree without any directory entry for inode Y, so the
log replay code deletes the entry for inode Y, name "A/bar", from the
subvolume tree since it doesn't exist in the log tree and the log
tree is authorative for its index (we logged a BTRFS_DIR_LOG_INDEX_KEY
item that covers the index range for the dentry that corresponds to
"A/bar").
Since there's no other hard link for inode Y and the log replay code
deletes the name "A/bar", the file is lost.
The issue wouldn't happen if task B synced the log only after task A
called btrfs_log_new_name(), which would update the log with the new name
for inode Y ("A/bar").
Fix this by pinning the log root during renames before removing the old
directory entry, and unpinning af
---truncated---
References
Impacted products
Vendor | Product | Version |
---|
{ "cveTags": [], "descriptions": [ { "lang": "en", "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nbtrfs: fix a race between renames and directory logging\n\nWe have a race between a rename and directory inode logging that if it\nhappens and we crash/power fail before the rename completes, the next time\nthe filesystem is mounted, the log replay code will end up deleting the\nfile that was being renamed.\n\nThis is best explained following a step by step analysis of an interleaving\nof steps that lead into this situation.\n\nConsider the initial conditions:\n\n1) We are at transaction N;\n\n2) We have directories A and B created in a past transaction (\u003c N);\n\n3) We have inode X corresponding to a file that has 2 hardlinks, one in\n directory A and the other in directory B, so we\u0027ll name them as\n \"A/foo_link1\" and \"B/foo_link2\". Both hard links were persisted in a\n past transaction (\u003c N);\n\n4) We have inode Y corresponding to a file that as a single hard link and\n is located in directory A, we\u0027ll name it as \"A/bar\". This file was also\n persisted in a past transaction (\u003c N).\n\nThe steps leading to a file loss are the following and for all of them we\nare under transaction N:\n\n 1) Link \"A/foo_link1\" is removed, so inode\u0027s X last_unlink_trans field\n is updated to N, through btrfs_unlink() -\u003e btrfs_record_unlink_dir();\n\n 2) Task A starts a rename for inode Y, with the goal of renaming from\n \"A/bar\" to \"A/baz\", so we enter btrfs_rename();\n\n 3) Task A inserts the new BTRFS_INODE_REF_KEY for inode Y by calling\n btrfs_insert_inode_ref();\n\n 4) Because the rename happens in the same directory, we don\u0027t set the\n last_unlink_trans field of directoty A\u0027s inode to the current\n transaction id, that is, we don\u0027t cal btrfs_record_unlink_dir();\n\n 5) Task A then removes the entries from directory A (BTRFS_DIR_ITEM_KEY\n and BTRFS_DIR_INDEX_KEY items) when calling __btrfs_unlink_inode()\n (actually the dir index item is added as a delayed item, but the\n effect is the same);\n\n 6) Now before task A adds the new entry \"A/baz\" to directory A by\n calling btrfs_add_link(), another task, task B is logging inode X;\n\n 7) Task B starts a fsync of inode X and after logging inode X, at\n btrfs_log_inode_parent() it calls btrfs_log_all_parents(), since\n inode X has a last_unlink_trans value of N, set at in step 1;\n\n 8) At btrfs_log_all_parents() we search for all parent directories of\n inode X using the commit root, so we find directories A and B and log\n them. Bu when logging direct A, we don\u0027t have a dir index item for\n inode Y anymore, neither the old name \"A/bar\" nor for the new name\n \"A/baz\" since the rename has deleted the old name but has not yet\n inserted the new name - task A hasn\u0027t called yet btrfs_add_link() to\n do that.\n\n Note that logging directory A doesn\u0027t fallback to a transaction\n commit because its last_unlink_trans has a lower value than the\n current transaction\u0027s id (see step 4);\n\n 9) Task B finishes logging directories A and B and gets back to\n btrfs_sync_file() where it calls btrfs_sync_log() to persist the log\n tree;\n\n10) Task B successfully persisted the log tree, btrfs_sync_log() completed\n with success, and a power failure happened.\n\n We have a log tree without any directory entry for inode Y, so the\n log replay code deletes the entry for inode Y, name \"A/bar\", from the\n subvolume tree since it doesn\u0027t exist in the log tree and the log\n tree is authorative for its index (we logged a BTRFS_DIR_LOG_INDEX_KEY\n item that covers the index range for the dentry that corresponds to\n \"A/bar\").\n\n Since there\u0027s no other hard link for inode Y and the log replay code\n deletes the name \"A/bar\", the file is lost.\n\nThe issue wouldn\u0027t happen if task B synced the log only after task A\ncalled btrfs_log_new_name(), which would update the log with the new name\nfor inode Y (\"A/bar\").\n\nFix this by pinning the log root during renames before removing the old\ndirectory entry, and unpinning af\n---truncated---" }, { "lang": "es", "value": "En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: btrfs: corrige una ejecuci\u00f3n entre los cambios de nombre y el registro de directorios Tenemos una ejecuci\u00f3n entre un cambio de nombre y el registro del inodo del directorio que, si ocurre y nos bloqueamos o falla la energ\u00eda antes de que se complete el cambio de nombre, la pr\u00f3xima vez que se monte el sistema de archivos, el c\u00f3digo de reproducci\u00f3n del registro terminar\u00e1 eliminando el archivo que se estaba cambiando de nombre. Esto se explica mejor siguiendo un an\u00e1lisis paso a paso de un intercalado de pasos que conducen a esta situaci\u00f3n. Considere las condiciones iniciales: 1) Estamos en la transacci\u00f3n N; 2) Tenemos los directorios A y B creados en una transacci\u00f3n anterior (\u0026lt; N); 3) Tenemos el inodo X correspondiente a un archivo que tiene 2 enlaces duros, uno en el directorio A y el otro en el directorio B, por lo que los nombraremos como \"A/foo_link1\" y \"B/foo_link2\". Ambos enlaces duros persistieron en una transacci\u00f3n anterior (\u0026lt; N); 4) Tenemos el inodo Y, correspondiente a un archivo con un \u00fanico enlace f\u00edsico ubicado en el directorio A, al que llamaremos \"A/bar\". Este archivo tambi\u00e9n se conserv\u00f3 en una transacci\u00f3n anterior (\u0026lt; N). Los pasos que conducen a la p\u00e9rdida del archivo son los siguientes, y para todos ellos, estamos en la transacci\u00f3n N: 1) Se elimina el enlace \"A/foo_link1\", por lo que el campo X last_unlink_trans del inodo se actualiza a N mediante btrfs_unlink() -\u0026gt; btrfs_record_unlink_dir(); 2) La tarea A inicia un cambio de nombre para el inodo Y, con el objetivo de cambiar de \"A/bar\" a \"A/baz\", por lo que introducimos btrfs_rename(); 3) La tarea A inserta la nueva clave BTRFS_INODE_REF_KEY para el inodo Y mediante la llamada a btrfs_insert_inode_ref(); 4) Debido a que el cambio de nombre ocurre en el mismo directorio, no establecemos el campo last_unlink_trans del inodo del directorio A en el id de transacci\u00f3n actual, es decir, no llamamos a btrfs_record_unlink_dir(); 5) Luego, la tarea A elimina las entradas del directorio A (elementos BTRFS_DIR_ITEM_KEY y BTRFS_DIR_INDEX_KEY) cuando llama a __btrfs_unlink_inode() (en realidad, el elemento de \u00edndice del directorio se agrega como un elemento retrasado, pero el efecto es el mismo); 6) Ahora, antes de que la tarea A agregue la nueva entrada \"A/baz\" al directorio A llamando a btrfs_add_link(), otra tarea, la tarea B, est\u00e1 registrando el inodo X; 7) La tarea B inicia una sincronizaci\u00f3n fsync del inodo X y, tras registrarlo, en btrfs_log_inode_parent() llama a btrfs_log_all_parents(), ya que el inodo X tiene un valor de last_unlink_trans de N, establecido en el paso 1. 8) En btrfs_log_all_parents() buscamos todos los directorios padre del inodo X utilizando un root commit, por lo que encontramos los directorios A y B y los registramos. Sin embargo, al registrar directamente A, ya no tenemos un elemento de \u00edndice de directorio para el inodo Y, ni para el nombre antiguo \"A/bar\" ni para el nuevo nombre \"A/baz\", ya que el cambio de nombre ha eliminado el nombre antiguo, pero a\u00fan no ha insertado el nuevo. La tarea A a\u00fan no ha llamado a btrfs_add_link() para hacerlo. Tenga en cuenta que registrar el directorio A no recurre a un commit de transacci\u00f3n porque su valor de last_unlink_trans es menor que el ID de la transacci\u00f3n actual (v\u00e9ase el paso 4). 9) La tarea B finaliza el registro de los directorios A y B y regresa a btrfs_sync_file(), donde invoca btrfs_sync_log() para persistir el \u00e1rbol de registro. 10) La tarea B persisti\u00f3 correctamente el \u00e1rbol de registro, btrfs_sync_log() se complet\u00f3 correctamente y se produjo un corte de energ\u00eda. Tenemos un \u00e1rbol de registro sin ninguna entrada de directorio para el inodo Y, por lo que el c\u00f3digo de reproducci\u00f3n del registro elimina la entrada del inodo Y, llamada \"A/bar\", del \u00e1rbol de subvolumen, ya que no existe en el \u00e1rbol de registro y este es autoritario para su \u00edndice (registramos un elemento BTRFS_DIR_LOG_INDEX_KEY que cubre el rango de \u00edndices de la entrada dentry correspondiente a \"A/bar\"). ---truncado---" } ], "id": "CVE-2025-38365", "lastModified": "2025-07-25T15:29:19.837", "metrics": {}, "published": "2025-07-25T13:15:25.380", "references": [ { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/2088895d5903082bb9021770b919e733c57edbc1" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/3ca864de852bc91007b32d2a0d48993724f4abad" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/51bd363c7010d033d3334daf457c824484bf9bf0" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/8c6874646c21bd820cf475e2874e62c133954023" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/aeeae8feeaae4445a86f9815273e81f902dc1f5b" } ], "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "vulnStatus": "Awaiting Analysis" }
Loading…
Loading…
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.
Loading…
Loading…