ghsa-pvrm-6wq8-v932
Vulnerability from github
Published
2025-02-27 03:34
Modified
2025-02-27 03:34
Details

In the Linux kernel, the following vulnerability has been resolved:

RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error

This patch addresses a race condition for an ODP MR that can result in a CQE with an error on the UMR QP.

During the __mlx5_ib_dereg_mr() flow, the following sequence of calls occurs:

mlx5_revoke_mr() mlx5r_umr_revoke_mr() mlx5r_umr_post_send_wait()

At this point, the lkey is freed from the hardware's perspective.

However, concurrently, mlx5_ib_invalidate_range() might be triggered by another task attempting to invalidate a range for the same freed lkey.

This task will: - Acquire the umem_odp->umem_mutex lock. - Call mlx5r_umr_update_xlt() on the UMR QP. - Since the lkey has already been freed, this can lead to a CQE error, causing the UMR QP to enter an error state [1].

To resolve this race condition, the umem_odp->umem_mutex lock is now also acquired as part of the mlx5_revoke_mr() scope. Upon successful revoke, we set umem_odp->private which points to that MR to NULL, preventing any further invalidation attempts on its lkey.

[1] From dmesg:

infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2

WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib] [..] Call Trace: mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib] mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib] __mmu_notifier_invalidate_range_start+0x1e1/0x240 zap_page_range_single+0xf1/0x1a0 madvise_vma_behavior+0x677/0x6e0 do_madvise+0x1a2/0x4b0 __x64_sys_madvise+0x25/0x30 do_syscall_64+0x6b/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Show details on source website


{
  "affected": [],
  "aliases": [
    "CVE-2025-21732"
  ],
  "database_specific": {
    "cwe_ids": [],
    "github_reviewed": false,
    "github_reviewed_at": null,
    "nvd_published_at": "2025-02-27T03:15:13Z",
    "severity": null
  },
  "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nRDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error\n\nThis patch addresses a race condition for an ODP MR that can result in a\nCQE with an error on the UMR QP.\n\nDuring the __mlx5_ib_dereg_mr() flow, the following sequence of calls\noccurs:\n\nmlx5_revoke_mr()\n mlx5r_umr_revoke_mr()\n mlx5r_umr_post_send_wait()\n\nAt this point, the lkey is freed from the hardware\u0027s perspective.\n\nHowever, concurrently, mlx5_ib_invalidate_range() might be triggered by\nanother task attempting to invalidate a range for the same freed lkey.\n\nThis task will:\n - Acquire the umem_odp-\u003eumem_mutex lock.\n - Call mlx5r_umr_update_xlt() on the UMR QP.\n - Since the lkey has already been freed, this can lead to a CQE error,\n   causing the UMR QP to enter an error state [1].\n\nTo resolve this race condition, the umem_odp-\u003eumem_mutex lock is now also\nacquired as part of the mlx5_revoke_mr() scope.  Upon successful revoke,\nwe set umem_odp-\u003eprivate which points to that MR to NULL, preventing any\nfurther invalidation attempts on its lkey.\n\n[1] From dmesg:\n\n   infiniband rocep8s0f0: dump_cqe:277:(pid 0): WC error: 6, Message: memory bind operation error\n   cqe_dump: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n   cqe_dump: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n   cqe_dump: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00\n   cqe_dump: 00000030: 00 00 00 00 08 00 78 06 25 00 11 b9 00 0e dd d2\n\n   WARNING: CPU: 15 PID: 1506 at drivers/infiniband/hw/mlx5/umr.c:394 mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]\n   Modules linked in: ip6table_mangle ip6table_natip6table_filter ip6_tables iptable_mangle xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core\n   CPU: 15 UID: 0 PID: 1506 Comm: ibv_rc_pingpong Not tainted 6.12.0-rc7+ #1626\n   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014\n   RIP: 0010:mlx5r_umr_post_send_wait+0x15a/0x2b0 [mlx5_ib]\n   [..]\n   Call Trace:\n   \u003cTASK\u003e\n   mlx5r_umr_update_xlt+0x23c/0x3e0 [mlx5_ib]\n   mlx5_ib_invalidate_range+0x2e1/0x330 [mlx5_ib]\n   __mmu_notifier_invalidate_range_start+0x1e1/0x240\n   zap_page_range_single+0xf1/0x1a0\n   madvise_vma_behavior+0x677/0x6e0\n   do_madvise+0x1a2/0x4b0\n   __x64_sys_madvise+0x25/0x30\n   do_syscall_64+0x6b/0x140\n   entry_SYSCALL_64_after_hwframe+0x76/0x7e",
  "id": "GHSA-pvrm-6wq8-v932",
  "modified": "2025-02-27T03:34:03Z",
  "published": "2025-02-27T03:34:03Z",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-21732"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/5297f5ddffef47b94172ab0d3d62270002a3dcc1"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/abb604a1a9c87255c7a6f3b784410a9707baf467"
    },
    {
      "type": "WEB",
      "url": "https://git.kernel.org/stable/c/b13d32786acabf70a7b04ed24b7468fc3c82977c"
    }
  ],
  "schema_version": "1.4.0",
  "severity": []
}


Log in or create an account to share your comment.




Tags
Taxonomy of the tags.


Loading…

Loading…

Loading…

Sightings

Author Source Type Date

Nomenclature

  • Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
  • Confirmed: The vulnerability is confirmed from an analyst perspective.
  • Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
  • Patched: This vulnerability was successfully patched by the user reporting the sighting.
  • Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
  • Not confirmed: The user expresses doubt about the veracity of the vulnerability.
  • Not patched: This vulnerability was not successfully patched by the user reporting the sighting.


Loading…

Loading…