[PATCH V2 1/2] md: propagate BLK_FEAT_PCI_P2PDMA from member devices
Chaitanya Kulkarni
kch at nvidia.com
Wed Apr 8 00:25:36 PDT 2026
From: Kiran Kumar Modukuri <kmodukuri at nvidia.com>
MD RAID does not propagate BLK_FEAT_PCI_P2PDMA from member devices to
the RAID device, preventing peer-to-peer DMA through the RAID layer even
when all underlying devices support it.
Enable BLK_FEAT_PCI_P2PDMA in raid0, raid1 and raid10 personalities
during queue limits setup and clear it in mddev_stack_rdev_limits()
during array init and mddev_stack_new_rdev() during hot-add if any
member device lacks support. Parity RAID personalities (raid4/5/6) are
excluded because they need CPU access to data pages for parity
computation, which is incompatible with P2P mappings.
Tested with RAID0/1/10 arrays containing multiple NVMe devices with P2PDMA
support, confirming that peer-to-peer transfers work correctly through
the RAID layer.
Signed-off-by: Kiran Kumar Modukuri <kmodukuri at nvidia.com>
Signed-off-by: Chaitanya Kulkarni <kch at nvidia.com>
---
drivers/md/md.c | 4 ++++
drivers/md/raid0.c | 1 +
drivers/md/raid1.c | 1 +
drivers/md/raid10.c | 1 +
4 files changed, 7 insertions(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 521d9b34cd9e..48d7a3ca8c66 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6176,6 +6176,8 @@ int mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim,
if ((flags & MDDEV_STACK_INTEGRITY) &&
!queue_limits_stack_integrity_bdev(lim, rdev->bdev))
return -EINVAL;
+ if (!blk_queue_pci_p2pdma(rdev->bdev->bd_disk->queue))
+ lim->features &= ~BLK_FEAT_PCI_P2PDMA;
}
/*
@@ -6231,6 +6233,8 @@ int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
lim = queue_limits_start_update(mddev->gendisk->queue);
queue_limits_stack_bdev(&lim, rdev->bdev, rdev->data_offset,
mddev->gendisk->disk_name);
+ if (!blk_queue_pci_p2pdma(rdev->bdev->bd_disk->queue))
+ lim.features &= ~BLK_FEAT_PCI_P2PDMA;
if (!queue_limits_stack_integrity_bdev(&lim, rdev->bdev)) {
pr_err("%s: incompatible integrity profile for %pg\n",
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index ef0045db409f..1cdcafd31744 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -392,6 +392,7 @@ static int raid0_set_limits(struct mddev *mddev)
lim.io_opt = lim.io_min * mddev->raid_disks;
lim.chunk_sectors = mddev->chunk_sectors;
lim.features |= BLK_FEAT_ATOMIC_WRITES;
+ lim.features |= BLK_FEAT_PCI_P2PDMA;
err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
if (err)
return err;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 16f671ab12c0..b25e661e9738 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3192,6 +3192,7 @@ static int raid1_set_limits(struct mddev *mddev)
lim.max_hw_wzeroes_unmap_sectors = 0;
lim.logical_block_size = mddev->logical_block_size;
lim.features |= BLK_FEAT_ATOMIC_WRITES;
+ lim.features |= BLK_FEAT_PCI_P2PDMA;
err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
if (err)
return err;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 4901ebe45c87..07a5b734c8f3 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3939,6 +3939,7 @@ static int raid10_set_queue_limits(struct mddev *mddev)
lim.chunk_sectors = mddev->chunk_sectors;
lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
lim.features |= BLK_FEAT_ATOMIC_WRITES;
+ lim.features |= BLK_FEAT_PCI_P2PDMA;
err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
if (err)
return err;
--
2.39.5
More information about the Linux-nvme
mailing list