[PATCH nvme-cli] nvme: warn about attaching a namespace to unknown controller

Keith Busch kbusch at kernel.org
Mon Jun 24 12:32:25 PDT 2024


On Sat, Jun 22, 2024 at 04:05:01PM +0530, Nilay Shroff wrote:
> 
> 
> On 6/21/24 20:38, Keith Busch wrote:
> > On Thu, Jun 20, 2024 at 06:25:45PM +0530, Nilay Shroff wrote:
> >> Sometime it's possible for a multi-controller NVMe disk to have only
> >> one of its controller discovered by the kernel. And if this happens
> >> then it's also possible for a user to create and then attach a namespace
> >> to a controller which has not been discovered by the kernel. In such a
> >> case the attached namespace can't be used for IO because there's no path
> >> to reach such namespace from the kernel.
> > 
> > Isn't that a pretty normal thing to do, though? Like for an sriov
> > situation, a primary controller attaches namespaces to secondaries, but
> > not to itself.
> Yes correct, but in case of sriov, AFAIK, the "nvme list-secondary ...." shall 
> not list any secondary controller unless the secondary controller's corresponding 
> VF is enabled (i.e. /sys/class/nvme/nvmeX/device/sriov_numvfs set to at least the 
> secondary controller's VF number). 
> 
> So it means that before attaching namespace to secondaries, user would first set 
> sriov_numvfs. Setting numvfs would then enable the kernel to discover all secondary
> controllers. 

You'd usually bind those to vfio, not nvme. So while the kernel knows
those pci functions exist, the nvme driver doesn't necessarily know
about these.

And that's not even an sriov exclusive thing. You can bind other PFs
just the same, though they just don't have the resource provisioning
features that VFs can do. But you can still attach and detach namespaces
to/from them all the same.
 
> However, in the proposed patch we're trying to address a case where "nvme list-ctrl ..."
> shows multiple controller entries however only one of those controllers is discovered
> by kernel. All controllers shown in the output of "nvme list-ctrl ..." are physical
> controllers on the nvme disk. In this case attaching a namespace to any of the 
> undiscovered controller has a side effect due which the namespace rendered unusable.  

Unusable from the driver the admin ran the command from. Some other
machine (real or virtual) may be able to use it, or maybe even the same
machine that you handed some function to a user space driver, and that's
not really an error.

> This patch tries to address this issue by showing a WARNING to the user and optionally
> allowing user to cancel the attach operation. However if user still prefers to go 
> ahead without cancelling the attach operation then nvme-cli would execute this command
> as usual. 

Have you received bug reports of people mistakenly attaching namespaces
to the wrong controller or something?



More information about the Linux-nvme mailing list