[PATCH] NVMe: Shutdown fixes
Keith Busch
keith.busch at intel.com
Wed Nov 25 08:57:03 PST 2015
On Wed, Nov 25, 2015 at 08:42:57AM -0800, Christoph Hellwig wrote:
> > CPU 0 CPU 1
> >
> > blk_mq_check_expired nvme_irq
> > set_bit(REQ_ATOM_COMPLETE) nvme_process_cq
> > nvme_timeout blk_mq_complete_request
> > if (!test_and_set_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags) ||
> > test_and_clear_bit(REQ_ATOM_QUIESCED, &rq->atomic_flags))
> > return BLK_EH_QUIESCED
> > set_bit REQ_ATOM_QUIESCED
> >
> > Both checks on blk_mq_complete_request will fail since the timeout
> > handler hasn't returned "quiesce" yet, and the request will be orphaned.
>
> BLK_EH_QUIESCED is never returned with your patch applied..
I'm just showing why it was broken before. My patch fixes it since we
reap the command and return BLK_EH_HANDLED.
> > Instead we can let nvme_dev_shudtown reap the request and set req->errors
> > directly through one of the two paths described above, then return
> > BLK_EH_HANDLED.
>
> nvme_dev_shutdown will not complete the command that we timed out,
> blk_mq_complete_request will skip it because REQ_ATOM_COMPLETE is set,
> and blk_mq_rq_timed_out will complete after we returned from the timeout
> handler.
I'm not saying nvme_dev_shutdown "completes" the command. I'm just
saying it reaps it and sets an appropriate req->errors. The actual blk-mq
completion happens as you described through nvme_timeout's return code
of BLK_EH_HANDLED.
> > This is implied through dev->bar != NULL, and the nvmeq's cq_vector value.
>
> Well, we're not doing anything in the low-level functions at the moment
> (at the checks in nvme_dev_unmap, and all requests with
> REQ_ATOM_COMPLETE to the list), but it still seems rather fragile to
> rely on these low level functions deep down the stack to get everything
> right.
Agreed.
> > To set an appropriate non-zero result, we'd need to add pci_is_enabled()
> > checks in nvme_setup_io_queues to distinguish a non-fatal command error
> > vs timeout. Is that preferable to the check where I have it?
>
> No, that's not my preference. My preference is to figure out why you
> get a zero req->errors on timed request. That really shouldn't happen
> to start with.
req->errors wouldn't be 0, but nvme_set_queue_count returns "0" queues
if req->errors is non-zero. It's not a fatal error to have 0 queues so
we don't want to propogate that error to the initialization unless it's
a timeout.
We also don't return error codes on create sq/cq commands because an error
here isn't fatal either unless, of course, it's a timeout.
More information about the Linux-nvme
mailing list