github - ghsa-qv4f-mr6g-r994

ghsa-qv4f-mr6g-r994

Vulnerability from github

Published

2025-04-18 15:31

Modified

2025-07-17 18:31

Details

In the Linux kernel, the following vulnerability has been resolved:

drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV

RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb.

The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock).

Fixes the below:

[ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.0170 ---truncated---

Show details on source website

JSON

To clipboard

{
  "affected": [],
  "aliases": [
    "CVE-2025-38104"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2025-04-18T07:15:43Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\ndrm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV\n\nRLCG Register Access is a way for virtual functions to safely access GPU\nregisters in a virtualized environment., including TLB flushes and\nregister reads. When multiple threads or VFs try to access the same\nregisters simultaneously, it can lead to race conditions. By using the\nRLCG interface, the driver can serialize access to the registers. This\nmeans that only one thread can access the registers at a time,\npreventing conflicts and ensuring that operations are performed\ncorrectly. Additionally, when a low-priority task holds a mutex that a\nhigh-priority task needs, ie., If a thread holding a spinlock tries to\nacquire a mutex, it can lead to priority inversion. register access in\namdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.\n\nThe call stack shows that the function amdgpu_virt_rlcg_reg_rw is being\ncalled, which attempts to acquire the mutex. This function is invoked\nfrom amdgpu_sriov_wreg, which in turn is called from\ngmc_v11_0_flush_gpu_tlb.\n\nThe [ BUG: Invalid wait context ] indicates that a thread is trying to\nacquire a mutex while it is in a context that does not allow it to sleep\n(like holding a spinlock).\n\nFixes the below:\n\n[  253.013423] =============================\n[  253.013434] [ BUG: Invalid wait context ]\n[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G     U     OE\n[  253.013464] -----------------------------\n[  253.013475] kworker/0:1/10 is trying to lock:\n[  253.013487] ffff9f30542e3cf8 (\u0026adev-\u003evirt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]\n[  253.013815] other info that might help us debug this:\n[  253.013827] context-{4:4}\n[  253.013835] 3 locks held by kworker/0:1/10:\n[  253.013847]  #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680\n[  253.013877]  #1: ffffb789c008be40 ((work_completion)(\u0026wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680\n[  253.013905]  #2: ffff9f3054281838 (\u0026adev-\u003egmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]\n[  253.014154] stack backtrace:\n[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G     U     OE      6.12.0-amdstaging-drm-next-lol-050225 #14\n[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE\n[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024\n[  253.014224] Workqueue: events work_for_cpu_fn\n[  253.014241] Call Trace:\n[  253.014250]  \u003cTASK\u003e\n[  253.014260]  dump_stack_lvl+0x9b/0xf0\n[  253.014275]  dump_stack+0x10/0x20\n[  253.014287]  __lock_acquire+0xa47/0x2810\n[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5\n[  253.014321]  lock_acquire+0xd1/0x300\n[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]\n[  253.014562]  ? __lock_acquire+0xa6b/0x2810\n[  253.014578]  __mutex_lock+0x85/0xe20\n[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]\n[  253.014782]  ? sched_clock_noinstr+0x9/0x10\n[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5\n[  253.014808]  ? local_clock_noinstr+0xe/0xc0\n[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]\n[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5\n[  253.015029]  mutex_lock_nested+0x1b/0x30\n[  253.015044]  ? mutex_lock_nested+0x1b/0x30\n[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]\n[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]\n[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]\n[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]\n[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]\n[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5\n[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]\n[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]\n[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]\n[  253.0170\n---truncated---",
  "id": "GHSA-qv4f-mr6g-r994",
  "modified": "2025-07-17T18:31:09Z",
  "published": "2025-04-18T15:31:38Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-38104"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/07ed75bfa7ede8bfcfa303fd6efc85db1c8684c7"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/1c0378830e42c98acd69e0289882c8637d92f285"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/5c1741a0c176ae11675a64cb7f2dd21d72db6b91"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/dc0297f3198bd60108ccbd167ee5d9fa4af31ed0"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}

CVE-2025-38104 (GCVE-0-2025-38104)

Vulnerability from cvelistv5

Published

2025-04-18 07:01

Modified

2025-07-17 16:55

Severity ?

Summary

In the Linux kernel, the following vulnerability has been resolved: drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV RLCG Register Access is a way for virtual functions to safely access GPU registers in a virtualized environment., including TLB flushes and register reads. When multiple threads or VFs try to access the same registers simultaneously, it can lead to race conditions. By using the RLCG interface, the driver can serialize access to the registers. This means that only one thread can access the registers at a time, preventing conflicts and ensuring that operations are performed correctly. Additionally, when a low-priority task holds a mutex that a high-priority task needs, ie., If a thread holding a spinlock tries to acquire a mutex, it can lead to priority inversion. register access in amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical. The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being called, which attempts to acquire the mutex. This function is invoked from amdgpu_sriov_wreg, which in turn is called from gmc_v11_0_flush_gpu_tlb. The [ BUG: Invalid wait context ] indicates that a thread is trying to acquire a mutex while it is in a context that does not allow it to sleep (like holding a spinlock). Fixes the below: [ 253.013423] ============================= [ 253.013434] [ BUG: Invalid wait context ] [ 253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE [ 253.013464] ----------------------------- [ 253.013475] kworker/0:1/10 is trying to lock: [ 253.013487] ffff9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.013815] other info that might help us debug this: [ 253.013827] context-{4:4} [ 253.013835] 3 locks held by kworker/0:1/10: [ 253.013847] #0: ffff9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 253.013877] #1: ffffb789c008be40 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 253.013905] #2: ffff9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu] [ 253.014154] stack backtrace: [ 253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE 6.12.0-amdstaging-drm-next-lol-050225 #14 [ 253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024 [ 253.014224] Workqueue: events work_for_cpu_fn [ 253.014241] Call Trace: [ 253.014250] <TASK> [ 253.014260] dump_stack_lvl+0x9b/0xf0 [ 253.014275] dump_stack+0x10/0x20 [ 253.014287] __lock_acquire+0xa47/0x2810 [ 253.014303] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014321] lock_acquire+0xd1/0x300 [ 253.014333] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014562] ? __lock_acquire+0xa6b/0x2810 [ 253.014578] __mutex_lock+0x85/0xe20 [ 253.014591] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.014782] ? sched_clock_noinstr+0x9/0x10 [ 253.014795] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.014808] ? local_clock_noinstr+0xe/0xc0 [ 253.014822] ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015012] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.015029] mutex_lock_nested+0x1b/0x30 [ 253.015044] ? mutex_lock_nested+0x1b/0x30 [ 253.015057] amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu] [ 253.015249] amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu] [ 253.015435] gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu] [ 253.015667] gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu] [ 253.015901] ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu] [ 253.016159] ? srso_alias_return_thunk+0x5/0xfbef5 [ 253.016173] ? smu_hw_init+0x18d/0x300 [amdgpu] [ 253.016403] amdgpu_device_init+0x29ad/0x36a0 [amdgpu] [ 253.016614] amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu] [ 253.0170 ---truncated---

References

►

URL

Tags

	https://git.kernel.org/stable/c/07ed75bfa7ede8bfcfa303fd6efc85db1c8684c7
	https://git.kernel.org/stable/c/1c0378830e42c98acd69e0289882c8637d92f285
	https://git.kernel.org/stable/c/5c1741a0c176ae11675a64cb7f2dd21d72db6b91
	https://git.kernel.org/stable/c/dc0297f3198bd60108ccbd167ee5d9fa4af31ed0

Impacted products

Vendor

Product

Version

►

Linux

Version: e864180ee49b4d30e640fd1e1d852b86411420c9
Version: e864180ee49b4d30e640fd1e1d852b86411420c9
Version: e864180ee49b4d30e640fd1e1d852b86411420c9
Version: e864180ee49b4d30e640fd1e1d852b86411420c9
Version: f39a3bc42815a7016a915f6cb35e9a1448788f06
Version: 1adb5ebe205e96af77a93512e2d5b8c437548787
Version: e1ab38e99d1607f80a1670a399511a56464c0253

Linux

Version: 6.11

Show details on NVD website