kernel oops after nvme_set_queue_count()
Sagi Grimberg
sagi at grimberg.me
Thu Jan 21 04:06:22 EST 2021
> Hi all,
>
> a customer of ours ran into this oops:
>
> [44157.918962] nvme nvme5: I/O 22 QID 0 timeout
> [44163.347467] nvme nvme5: Could not set queue count (880)
> [44163.347551] nvme nvme5: Successfully reconnected (6 attempts)
> [44168.414977] BUG: unable to handle kernel paging request at
> ffff888e261e7808
> [44168.414988] IP: 0xffff888e261e7808
> [44168.414994] PGD 98c2ae067 P4D 98c2ae067 PUD f57937063 PMD
> 8000000f660001e3
>
> It's related to this code snippet in drivers/nvme/host/core.c
>
> /*
> * Degraded controllers might return an error when setting the queue
> * count. We still want to be able to bring them online and offer
> * access to the admin queue, as that might be only way to fix them
> up.
> */
> if (status > 0) {
> dev_err(ctrl->device, "Could not set queue count (%d)\n", status);
> *count = 0;
>
>
> causing nvme_set_queue_count() _not_ to return an error, but rather let
> the reconnect complete.
> Of course, as this failure is due to a timeout (cf the status code; 880
> is NVME_SC_HOST_PATH_ERROR), the admin queue has been torn down by the
> transport, causing this crash.
>
> So, question: _why_ do we ignore the status?
This used to exist in pci where a controller reset will fail to set up
I/O queues, at least the controller can accept admin commands to get
some diagnostics (perhaps an error log page).
> For fabrics I completely fail to see the reason here; even _if_ it
> worked we would end up with a connection for which just the admin queue
> is operable, the state is LIVE, and all information we could glance
> would indicate that the connection is perfectly healthy.
We also had a ADMIN_ONLY state at some point, but that was dropped as
well for reasons I don't remember at the moment.
> It just doesn't have any I/O queues.
> Which will lead to some very confused customers and some very unhappy
> support folks trying to figure out what has happened.
>
> Can we just kill this statement and always return an error?
> In all other cases we are quite trigger-happy with controller reset; why
> not here?
I think we will want to keep the existing behavior for pci, but agree we
probably want to change it for fabrics...
More information about the Linux-nvme
mailing list