fkie_nvd - fkie_cve-2025-37931

fkie_cve-2025-37931

Vulnerability from fkie_nvd

Published

2025-05-20 16:15

Modified

2025-05-21 20:25

Severity ?

Summary

In the Linux kernel, the following vulnerability has been resolved: btrfs: adjust subpage bit start based on sectorsize When running machines with 64k page size and a 16k nodesize we started seeing tree log corruption in production. This turned out to be because we were not writing out dirty blocks sometimes, so this in fact affects all metadata writes. When writing out a subpage EB we scan the subpage bitmap for a dirty range. If the range isn't dirty we do bit_start++; to move onto the next bit. The problem is the bitmap is based on the number of sectors that an EB has. So in this case, we have a 64k pagesize, 16k nodesize, but a 4k sectorsize. This means our bitmap is 4 bits for every node. With a 64k page size we end up with 4 nodes per page. To make this easier this is how everything looks [0 16k 32k 48k ] logical address [0 4 8 12 ] radix tree offset [ 64k page ] folio [ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers [ | | | | | | | | | | | | | | | | ] bitmap Now we use all of our addressing based on fs_info->sectorsize_bits, so as you can see the above our 16k eb->start turns into radix entry 4. When we find a dirty range for our eb, we correctly do bit_start += sectors_per_node, because if we start at bit 0, the next bit for the next eb is 4, to correspond to eb->start 16k. However if our range is clean, we will do bit_start++, which will now put us offset from our radix tree entries. In our case, assume that the first time we check the bitmap the block is not dirty, we increment bit_start so now it == 1, and then we loop around and check again. This time it is dirty, and we go to find that start using the following equation start = folio_start + bit_start * fs_info->sectorsize; so in the case above, eb->start 0 is now dirty, and we calculate start as 0 + 1 * fs_info->sectorsize = 4096 4096 >> 12 = 1 Now we're looking up the radix tree for 1, and we won't find an eb. What's worse is now we're using bit_start == 1, so we do bit_start += sectors_per_node, which is now 5. If that eb is dirty we will run into the same thing, we will look at an offset that is not populated in the radix tree, and now we're skipping the writeout of dirty extent buffers. The best fix for this is to not use sectorsize_bits to address nodes, but that's a larger change. Since this is a fs corruption problem fix it simply by always using sectors_per_node to increment the start bit.

References

▶	URL	Tags
	416baaa9-dc9f-4396-8d5f-8c081fb06d67	https://git.kernel.org/stable/c/396f4002710030ea1cfd4c789ebaf0a6969ab34f
	416baaa9-dc9f-4396-8d5f-8c081fb06d67	https://git.kernel.org/stable/c/b80db09b614cb7edec5bada1bc7c7b0eb3b453ea
	416baaa9-dc9f-4396-8d5f-8c081fb06d67	https://git.kernel.org/stable/c/e08e49d986f82c30f42ad0ed43ebbede1e1e3739

Impacted products

	Vendor	Product	Version

JSON

To clipboard

{
  "cveTags": [],
  "descriptions": [
    {
      "lang": "en",
      "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nbtrfs: adjust subpage bit start based on sectorsize\n\nWhen running machines with 64k page size and a 16k nodesize we started\nseeing tree log corruption in production.  This turned out to be because\nwe were not writing out dirty blocks sometimes, so this in fact affects\nall metadata writes.\n\nWhen writing out a subpage EB we scan the subpage bitmap for a dirty\nrange.  If the range isn\u0027t dirty we do\n\n\tbit_start++;\n\nto move onto the next bit.  The problem is the bitmap is based on the\nnumber of sectors that an EB has.  So in this case, we have a 64k\npagesize, 16k nodesize, but a 4k sectorsize.  This means our bitmap is 4\nbits for every node.  With a 64k page size we end up with 4 nodes per\npage.\n\nTo make this easier this is how everything looks\n\n[0         16k       32k       48k     ] logical address\n[0         4         8         12      ] radix tree offset\n[               64k page               ] folio\n[ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers\n[ | | | |  | | | |   | | | |   | | | | ] bitmap\n\nNow we use all of our addressing based on fs_info-\u003esectorsize_bits, so\nas you can see the above our 16k eb-\u003estart turns into radix entry 4.\n\nWhen we find a dirty range for our eb, we correctly do bit_start +=\nsectors_per_node, because if we start at bit 0, the next bit for the\nnext eb is 4, to correspond to eb-\u003estart 16k.\n\nHowever if our range is clean, we will do bit_start++, which will now\nput us offset from our radix tree entries.\n\nIn our case, assume that the first time we check the bitmap the block is\nnot dirty, we increment bit_start so now it == 1, and then we loop\naround and check again.  This time it is dirty, and we go to find that\nstart using the following equation\n\n\tstart = folio_start + bit_start * fs_info-\u003esectorsize;\n\nso in the case above, eb-\u003estart 0 is now dirty, and we calculate start\nas\n\n\t0 + 1 * fs_info-\u003esectorsize = 4096\n\t4096 \u003e\u003e 12 = 1\n\nNow we\u0027re looking up the radix tree for 1, and we won\u0027t find an eb.\nWhat\u0027s worse is now we\u0027re using bit_start == 1, so we do bit_start +=\nsectors_per_node, which is now 5.  If that eb is dirty we will run into\nthe same thing, we will look at an offset that is not populated in the\nradix tree, and now we\u0027re skipping the writeout of dirty extent buffers.\n\nThe best fix for this is to not use sectorsize_bits to address nodes,\nbut that\u0027s a larger change.  Since this is a fs corruption problem fix\nit simply by always using sectors_per_node to increment the start bit."
    },
    {
      "lang": "es",
      "value": "En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: btrfs: ajustar el inicio del bit de la subp\u00e1gina en funci\u00f3n del tama\u00f1o del sector Al ejecutar m\u00e1quinas con un tama\u00f1o de p\u00e1gina de 64k y un tama\u00f1o de nodo de 16k, comenzamos a ver corrupci\u00f3n en el registro del \u00e1rbol en producci\u00f3n. Esto result\u00f3 ser porque a veces no escrib\u00edamos bloques sucios, por lo que, de hecho, afecta a todas las escrituras de metadatos. Al escribir un EB de subp\u00e1gina, escaneamos el mapa de bits de la subp\u00e1gina en busca de un rango sucio. Si el rango no est\u00e1 sucio, hacemos bit_start++; para pasar al siguiente bit. El problema es que el mapa de bits se basa en la cantidad de sectores que tiene un EB. Entonces, en este caso, tenemos un tama\u00f1o de p\u00e1gina de 64k, un tama\u00f1o de nodo de 16k, pero un tama\u00f1o de sector de 4k. Esto significa que nuestro mapa de bits es de 4 bits para cada nodo. Con un tama\u00f1o de p\u00e1gina de 64k, terminamos con 4 nodos por p\u00e1gina. Para hacer esto m\u00e1s f\u00e1cil as\u00ed es como se ve todo [0 16k 32k 48k ] direcci\u00f3n l\u00f3gica [0 4 8 12 ] desplazamiento del \u00e1rbol de radix [ p\u00e1gina 64k ] folio [ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] b\u00faferes de extensi\u00f3n [ | | | | | | | | | | | | | | | | ] mapa de bits Ahora usamos todo nuestro direccionamiento basado en fs_info-\u0026gt;sectorsize_bits, as\u00ed que como puedes ver arriba nuestro eb-\u0026gt;start de 16k se convierte en la entrada de radix 4. Cuando encontramos un rango sucio para nuestro eb, hacemos correctamente bit_start += sectores_per_node, porque si empezamos en el bit 0, el siguiente bit para el siguiente eb es 4, para corresponder a eb-\u0026gt;start 16k. Sin embargo, si nuestro rango est\u00e1 limpio, haremos bit_start++, que ahora nos pondr\u00e1 en un desplazamiento desde nuestras entradas del \u00e1rbol de radix. En nuestro caso, supongamos que la primera vez que comprobamos el mapa de bits, el bloque no est\u00e1 sucio, incrementamos bit_start para que ahora sea == 1, y luego hacemos un bucle y comprobamos de nuevo. Esta vez est\u00e1 sucio, y vamos a encontrar ese inicio usando la siguiente ecuaci\u00f3n start = folio_start + bit_start * fs_info-\u0026gt;sectorsize; as\u00ed que en el caso anterior, eb-\u0026gt;start 0 ahora est\u00e1 sucio, y calculamos start como 0 + 1 * fs_info-\u0026gt;sectorsize = 4096 4096 \u0026gt;\u0026gt; 12 = 1 Ahora estamos buscando el \u00e1rbol de bases para 1, y no encontraremos un eb. Lo que es peor es que ahora estamos usando bit_start == 1, as\u00ed que hacemos bit_start += sectores_por_nodo, que ahora es 5. Si ese eb est\u00e1 sucio, nos encontraremos con lo mismo, veremos un desplazamiento que no est\u00e1 rellenado en el \u00e1rbol de bases, y ahora estamos omitiendo la escritura de los b\u00faferes de extensi\u00f3n sucios. La mejor soluci\u00f3n es no usar sectorsize_bits para direccionar nodos, pero ese es un cambio mayor. Dado que se trata de un problema de corrupci\u00f3n del sistema de archivos, corr\u00edjalo simplemente usando siempre sectores_por_nodo para incrementar el bit de inicio."
    }
  ],
  "id": "CVE-2025-37931",
  "lastModified": "2025-05-21T20:25:16.407",
  "metrics": {},
  "published": "2025-05-20T16:15:29.713",
  "references": [
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/396f4002710030ea1cfd4c789ebaf0a6969ab34f"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/b80db09b614cb7edec5bada1bc7c7b0eb3b453ea"
    },
    {
      "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
      "url": "https://git.kernel.org/stable/c/e08e49d986f82c30f42ad0ed43ebbede1e1e3739"
    }
  ],
  "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67",
  "vulnStatus": "Awaiting Analysis"
}

CVE-2025-37931 (GCVE-0-2025-37931)

Vulnerability from cvelistv5

Published

2025-05-20 15:21

Modified

2025-05-26 05:23

Severity ?

Summary

References

►

URL

Tags

	https://git.kernel.org/stable/c/b80db09b614cb7edec5bada1bc7c7b0eb3b453ea
	https://git.kernel.org/stable/c/396f4002710030ea1cfd4c789ebaf0a6969ab34f
	https://git.kernel.org/stable/c/e08e49d986f82c30f42ad0ed43ebbede1e1e3739

Impacted products

Vendor

Product

Version

►

Linux

Version: c4aec299fa8f73f0fd10bc556f936f0da50e3e83
Version: c4aec299fa8f73f0fd10bc556f936f0da50e3e83
Version: c4aec299fa8f73f0fd10bc556f936f0da50e3e83

Linux

Version: 5.13

Show details on NVD website