[PATCH] nvme: fix SRCU protection of nvme_ns_head list

Caleb Sander csander at purestorage.com
Thu Dec 1 13:27:16 PST 2022


On Tue, Nov 22, 2022 at 7:08 AM Sagi Grimberg <sagi at grimberg.me> wrote:
> ...
> The original patch report did not include any sequence that removes all
> namespaces, and given that it came from RockyLinux 8.6 kernel, it is not
> 6.1... Hence I think that we need to understand how a namespace removal
> happened at the same time that the namespace is being scanned. Maybe
> something else is broken.

After some more debugging, I found the main cause:
ns->disk isn't set until after ns is linked into the ns_head list.
The ns is linked in nvme_init_ns_head(),
which nvme_alloc_ns() calls before creating the disk:
static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid,
struct nvme_ns_ids *ids)
{
// ...
if (nvme_init_ns_head(ns, nsid, ids, id->nmic & NVME_NS_NMIC_SHARED))
goto out_free_queue;

disk = alloc_disk_node(0, node);
if (!disk)
goto out_unlink_ns;
// ...
ns->disk = disk;
// ...
}

It looks like this has never been a problem upstream:
5f432cceb3e9 ("nvme: use blk_mq_alloc_disk") fixed the order before
e7d65803e2bb ("nvme-multipath: revalidate paths during rescan") was committed.
Strange that the Red Hat 8.6 kernel has pulled in only the later commit.

The list traversal here still needs to be protected by (S)RCU,
so I would still push for this patch.
Sagi had suggested using RCU here instead, but discussing with Chris,
it sounds like we reached a consensus to consistently protect this list
with SRCU. We can evaluate switching to RCU later if needed,
but since this is not in the I/O path, I don't see a big performance concern.



More information about the Linux-nvme mailing list