[bug report] NVMe/IB: reset_controller need more than 1min
Max Gurtovoy
mgurtovoy at nvidia.com
Sun Mar 20 08:11:29 PDT 2022
On 3/20/2022 3:03 PM, Sagi Grimberg wrote:
>
>>>> These are very long times for a non-debug kernel...
>>>> Max, do you see the root cause for this?
>>>>
>>>> Yi, does this happen with rxe/siw as well?
>>> Hi Sagi
>>>
>>> rxe/siw will take less than 1s
>>> with rdma_rxe
>>> # time nvme reset /dev/nvme0
>>> real 0m0.094s
>>> user 0m0.000s
>>> sys 0m0.006s
>>>
>>> with siw
>>> # time nvme reset /dev/nvme0
>>> real 0m0.097s
>>> user 0m0.000s
>>> sys 0m0.006s
>>>
>>> This is only reproducible with mlx IB card, as I mentioned before, the
>>> reset operation time changed from 3s to 12s after the below commit,
>>> could you check this commit?
>>>
>>> commit 5ec5d3bddc6b912b7de9e3eb6c1f2397faeca2bc
>>> Author: Max Gurtovoy <maxg at mellanox.com>
>>> Date: Tue May 19 17:05:56 2020 +0300
>>>
>>> nvme-rdma: add metadata/T10-PI support
>>>
>> I couldn't repro these long reset times.
>
> It appears to be when setting up a controller with lots of queues
> maybe?
I'm doing that.
>
>> Nevertheless, the above commit added T10-PI offloads.
>>
>> In this commit, for supported devices we create extra resources in HW
>> (more memory keys per task).
>>
>> I suggested doing this configuration as part of the "nvme connect"
>> command and save this resource allocation by default but during the
>> review I was asked to make it the default behavior.
>
> Don't know if I gave you this feedback or not, but it probably didn't
> occur to the commenter that it will make the connection establishment
> take tens of seconds.
It was known that we create more resources than needed for
"non-PI-desired" controllers.
>
>> Sagi/Christoph,
>>
>> WDYT ? should we reconsider the "nvme connect --with_metadata" option ?
>
> Maybe you can make these lazily allocated?
You mean something like:
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fd4720d37cc0..367ba0bb62ab 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1620,10 +1620,19 @@ int nvme_getgeo(struct block_device *bdev,
struct hd_geometry *geo)
}
#ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
- u32 max_integrity_segments)
+static int nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns)
{
struct blk_integrity integrity = { };
+ u16 ms = ns->ms;
+ u8 pi_type = ns->pi_type;
+ u32 max_integrity_segments = ns->ctrl->max_integrity_segments;
+ int ret;
+
+ if (ns->ctrl->ops->init_integrity) {
+ ret = ns->ctrl->ops->init_integrity(ns->ctrl);
+ if (ret)
+ return ret;
+ }
switch (pi_type) {
case NVME_NS_DPS_PI_TYPE3:
@@ -1644,11 +1653,13 @@ static void nvme_init_integrity(struct gendisk
*disk, u16 ms, u8 pi_type,
integrity.tuple_size = ms;
blk_integrity_register(disk, &integrity);
blk_queue_max_integrity_segments(disk->queue,
max_integrity_segments);
+
+ return 0;
}
#else
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
- u32 max_integrity_segments)
+static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns)
{
+ return 0;
}
#endif /* CONFIG_BLK_DEV_INTEGRITY */
@@ -1853,8 +1864,8 @@ static void nvme_update_disk_info(struct gendisk
*disk,
if (ns->ms) {
if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
(ns->features & NVME_NS_METADATA_SUPPORTED))
- nvme_init_integrity(disk, ns->ms, ns->pi_type,
- ns->ctrl->max_integrity_segments);
+ if (nvme_init_integrity(disk, ns))
+ capacity = 0;
else if (!nvme_ns_has_pi(ns))
capacity = 0;
}
@@ -4395,7 +4406,7 @@ EXPORT_SYMBOL_GPL(nvme_stop_ctrl);
and create the resources for the first namespace we find as PI formatted ?
More information about the Linux-nvme
mailing list