fkie_cve-2022-49394
Vulnerability from fkie_nvd
Published
2025-02-26 07:01
Modified
2025-02-26 07:01
Severity ?
Summary
In the Linux kernel, the following vulnerability has been resolved:
blk-iolatency: Fix inflight count imbalances and IO hangs on offline
iolatency needs to track the number of inflight IOs per cgroup. As this
tracking can be expensive, it is disabled when no cgroup has iolatency
configured for the device. To ensure that the inflight counters stay
balanced, iolatency_set_limit() freezes the request_queue while manipulating
the enabled counter, which ensures that no IO is in flight and thus all
counters are zero.
Unfortunately, iolatency_set_limit() isn't the only place where the enabled
counter is manipulated. iolatency_pd_offline() can also dec the counter and
trigger disabling. As this disabling happens without freezing the q, this
can easily happen while some IOs are in flight and thus leak the counts.
This can be easily demonstrated by turning on iolatency on an one empty
cgroup while IOs are in flight in other cgroups and then removing the
cgroup. Note that iolatency shouldn't have been enabled elsewhere in the
system to ensure that removing the cgroup disables iolatency for the whole
device.
The following keeps flipping on and off iolatency on sda:
echo +io > /sys/fs/cgroup/cgroup.subtree_control
while true; do
mkdir -p /sys/fs/cgroup/test
echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency
sleep 1
rmdir /sys/fs/cgroup/test
sleep 1
done
and there's concurrent fio generating direct rand reads:
fio --name test --filename=/dev/sda --direct=1 --rw=randread \
--runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k
while monitoring with the following drgn script:
while True:
for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()):
for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list):
blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node')
pd = blkg.pd[prog['blkcg_policy_iolatency'].plid]
if pd.value_() == 0:
continue
iolat = container_of(pd, 'struct iolatency_grp', 'pd')
inflight = iolat.rq_wait.inflight.counter.value_()
if inflight:
print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} '
f'{cgroup_path(css.cgroup).decode("utf-8")}')
time.sleep(1)
The monitoring output looks like the following:
inflight=1 sda /user.slice
inflight=1 sda /user.slice
...
inflight=14 sda /user.slice
inflight=13 sda /user.slice
inflight=17 sda /user.slice
inflight=15 sda /user.slice
inflight=18 sda /user.slice
inflight=17 sda /user.slice
inflight=20 sda /user.slice
inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19
inflight=19 sda /user.slice
inflight=19 sda /user.slice
If a cgroup with stuck inflight ends up getting throttled, the throttled IOs
will never get issued as there's no completion event to wake it up leading
to an indefinite hang.
This patch fixes the bug by unifying enable handling into a work item which
is automatically kicked off from iolatency_set_min_lat_nsec() which is
called from both iolatency_set_limit() and iolatency_pd_offline() paths.
Punting to a work item is necessary as iolatency_pd_offline() is called
under spinlocks while freezing a request_queue requires a sleepable context.
This also simplifies the code reducing LOC sans the comments and avoids the
unnecessary freezes which were happening whenever a cgroup's latency target
is newly set or cleared.
References
Impacted products
Vendor | Product | Version |
---|
{ "cveTags": [], "descriptions": [ { "lang": "en", "value": "In the Linux kernel, the following vulnerability has been resolved:\n\nblk-iolatency: Fix inflight count imbalances and IO hangs on offline\n\niolatency needs to track the number of inflight IOs per cgroup. As this\ntracking can be expensive, it is disabled when no cgroup has iolatency\nconfigured for the device. To ensure that the inflight counters stay\nbalanced, iolatency_set_limit() freezes the request_queue while manipulating\nthe enabled counter, which ensures that no IO is in flight and thus all\ncounters are zero.\n\nUnfortunately, iolatency_set_limit() isn\u0027t the only place where the enabled\ncounter is manipulated. iolatency_pd_offline() can also dec the counter and\ntrigger disabling. As this disabling happens without freezing the q, this\ncan easily happen while some IOs are in flight and thus leak the counts.\n\nThis can be easily demonstrated by turning on iolatency on an one empty\ncgroup while IOs are in flight in other cgroups and then removing the\ncgroup. Note that iolatency shouldn\u0027t have been enabled elsewhere in the\nsystem to ensure that removing the cgroup disables iolatency for the whole\ndevice.\n\nThe following keeps flipping on and off iolatency on sda:\n\n echo +io \u003e /sys/fs/cgroup/cgroup.subtree_control\n while true; do\n mkdir -p /sys/fs/cgroup/test\n echo \u00278:0 target=100000\u0027 \u003e /sys/fs/cgroup/test/io.latency\n sleep 1\n rmdir /sys/fs/cgroup/test\n sleep 1\n done\n\nand there\u0027s concurrent fio generating direct rand reads:\n\n fio --name test --filename=/dev/sda --direct=1 --rw=randread \\\n --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k\n\nwhile monitoring with the following drgn script:\n\n while True:\n for css in css_for_each_descendant_pre(prog[\u0027blkcg_root\u0027].css.address_of_()):\n for pos in hlist_for_each(container_of(css, \u0027struct blkcg\u0027, \u0027css\u0027).blkg_list):\n blkg = container_of(pos, \u0027struct blkcg_gq\u0027, \u0027blkcg_node\u0027)\n pd = blkg.pd[prog[\u0027blkcg_policy_iolatency\u0027].plid]\n if pd.value_() == 0:\n continue\n iolat = container_of(pd, \u0027struct iolatency_grp\u0027, \u0027pd\u0027)\n inflight = iolat.rq_wait.inflight.counter.value_()\n if inflight:\n print(f\u0027inflight={inflight} {disk_name(blkg.q.disk).decode(\"utf-8\")} \u0027\n f\u0027{cgroup_path(css.cgroup).decode(\"utf-8\")}\u0027)\n time.sleep(1)\n\nThe monitoring output looks like the following:\n\n inflight=1 sda /user.slice\n inflight=1 sda /user.slice\n ...\n inflight=14 sda /user.slice\n inflight=13 sda /user.slice\n inflight=17 sda /user.slice\n inflight=15 sda /user.slice\n inflight=18 sda /user.slice\n inflight=17 sda /user.slice\n inflight=20 sda /user.slice\n inflight=19 sda /user.slice \u003c- fio stopped, inflight stuck at 19\n inflight=19 sda /user.slice\n inflight=19 sda /user.slice\n\nIf a cgroup with stuck inflight ends up getting throttled, the throttled IOs\nwill never get issued as there\u0027s no completion event to wake it up leading\nto an indefinite hang.\n\nThis patch fixes the bug by unifying enable handling into a work item which\nis automatically kicked off from iolatency_set_min_lat_nsec() which is\ncalled from both iolatency_set_limit() and iolatency_pd_offline() paths.\nPunting to a work item is necessary as iolatency_pd_offline() is called\nunder spinlocks while freezing a request_queue requires a sleepable context.\n\nThis also simplifies the code reducing LOC sans the comments and avoids the\nunnecessary freezes which were happening whenever a cgroup\u0027s latency target\nis newly set or cleared." }, { "lang": "es", "value": "En el kernel de Linux, se ha resuelto la siguiente vulnerabilidad: blk-iolatency: corrige los desequilibrios de recuento en vuelo y los bloqueos de IO en modo sin conexi\u00f3n iolatency necesita rastrear la cantidad de IO en vuelo por cgroup. Como este seguimiento puede ser costoso, se deshabilita cuando ning\u00fan cgroup tiene iolatency configurado para el dispositivo. Para garantizar que los contadores en vuelo se mantengan equilibrados, iolatency_set_limit() congela la request_queue mientras manipula el contador habilitado, lo que garantiza que no haya IO en vuelo y, por lo tanto, todos los contadores sean cero. Desafortunadamente, iolatency_set_limit() no es el \u00fanico lugar donde se manipula el contador habilitado. iolatency_pd_offline() tambi\u00e9n puede dec el contador y activar la desactivaci\u00f3n. Como esta desactivaci\u00f3n ocurre sin congelar el q, esto puede suceder f\u00e1cilmente mientras algunas IO est\u00e1n en vuelo y, por lo tanto, filtrar los recuentos. Esto se puede demostrar f\u00e1cilmente activando iolatency en un cgroup vac\u00edo mientras los IO est\u00e1n en tr\u00e1nsito en otros cgroups y luego eliminando el cgroup. Tenga en cuenta que iolatency no deber\u00eda haberse habilitado en ninguna otra parte del sistema para garantizar que la eliminaci\u00f3n del cgroup deshabilite iolatency para todo el dispositivo. Lo siguiente sigue activando y desactivando iolatency on sda: echo +io \u0026gt; /sys/fs/cgroup/cgroup.subtree_control while true; do mkdir -p /sys/fs/cgroup/test echo \u00278:0 target=100000\u0027 \u0026gt; /sys/fs/cgroup/test/io.latency sleep 1 rmdir /sys/fs/cgroup/test sleep 1 done and there\u0027s concurrent fio generating direct rand reads: fio --name test --filename=/dev/sda --direct=1 --rw=randread \\ --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k while monitoring with the following drgn script: while True: for css in css_for_each_descendant_pre(prog[\u0027blkcg_root\u0027].css.address_of_()): for pos in hlist_for_each(container_of(css, \u0027struct blkcg\u0027, \u0027css\u0027).blkg_list): blkg = container_of(pos, \u0027struct blkcg_gq\u0027, \u0027blkcg_node\u0027) pd = blkg.pd[prog[\u0027blkcg_policy_iolatency\u0027].plid] if pd.value_() == 0: continue iolat = container_of(pd, \u0027struct iolatency_grp\u0027, \u0027pd\u0027) inflight = iolat.rq_wait.inflight.counter.value_() if inflight: print(f\u0027inflight={inflight} {disk_name(blkg.q.disk).decode(\"utf-8\")} \u0027 f\u0027{cgroup_path(css.cgroup).decode(\"utf-8\")}\u0027) time.sleep(1) The monitoring output looks like the following: inflight=1 sda /user.slice inflight=1 sda /user.slice ... inflight=14 sda /user.slice inflight=13 sda /user.slice inflight=17 sda /user.slice inflight=15 sda /user.slice inflight=18 sda /user.slice inflight=17 sda /user.slice inflight=20 sda /user.slice inflight=19 sda /user.slice \u0026lt;- fio stopped, inflight stuck at 19 inflight=19 sda /user.slice inflight=19 sda /user.slice Si un cgroup con inflight atascado termina siendo limitado, las IO limitadas nunca se emitir\u00e1n ya que no hay un evento de finalizaci\u00f3n para despertarlo, lo que genera un bloqueo indefinido. Este parche corrige el error al unificar la gesti\u00f3n de habilitaci\u00f3n en un elemento de trabajo que se inicia autom\u00e1ticamente desde iolatency_set_min_lat_nsec(), que se llama desde las rutas iolatency_set_limit() y iolatency_pd_offline(). Es necesario apuntar a un elemento de trabajo ya que iolatency_pd_offline() se llama bajo bloqueos de giro, mientras que congelar una cola de solicitudes requiere un contexto que se pueda suspender. Esto tambi\u00e9n simplifica el c\u00f3digo, lo que reduce el LOC sin los comentarios y evita los bloqueos innecesarios que ocurr\u00edan cada vez que se configuraba o borraba el objetivo de latencia de un cgroup." } ], "id": "CVE-2022-49394", "lastModified": "2025-02-26T07:01:15.983", "metrics": {}, "published": "2025-02-26T07:01:15.983", "references": [ { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/515d077ee3085ae343b6bea7fd031f9906645f38" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/5b0ff3ebbef791341695b718f8d2870869cf1d01" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/77692c02e1517c54f2fd0535f41aa4286ac9f140" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/8a177a36da6c54c98b8685d4f914cb3637d53c0d" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/968f7a239c590454ffba79c126fbe0e963a0ba78" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/a30acbb5dfb7bcc813ad6a18ca31011ac44e5547" }, { "source": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "url": "https://git.kernel.org/stable/c/d19fa8f252000d141f9199ca32959c50314e1f05" } ], "sourceIdentifier": "416baaa9-dc9f-4396-8d5f-8c081fb06d67", "vulnStatus": "Awaiting Analysis" }
Loading…
Loading…
Sightings
Author | Source | Type | Date |
---|
Nomenclature
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.
Loading…
Loading…