[PATCH 04/47] block: provide a new BLK_EH_QUIESCED timeout return value
Christoph Hellwig
hch at lst.de
Tue Nov 24 07:40:19 PST 2015
On Tue, Nov 24, 2015 at 10:16:51AM -0500, Jeff Moyer wrote:
> Hi Christoph,
>
> Christoph Hellwig <hch at lst.de> writes:
>
> > This marks the request as one that's not actually completed yet, but
> > should be reaped next time blk_mq_complete_request comes in. This is
> > useful it the abort handler kicked of a reset that will complete all
> > pending requests.
>
> What's the purpose, though? Is this an optimization?
It allows us to to correctly implement controller reset (like SCSI target
resets) from the timeout handler. The current HANDLED/NOT_HANDLED returns
are not very useful if you want to eventually kick of a reset that will
abort all requests, but needs to ensure the the requests don't get reused
before that. Only SCSI handles that for now, and needs it's own per-LUN
command list and a lot of complex code for that - something we'd like
to avoid for NVMe or other new drivers.
> We've had "fun" problems with races between completion and timeout
> before. I can't say I'm too keen on adding more complexity to this code
> path. Have you considered what happens in your new code when this race
> occurs? I don't expect it to cause any issues in the mq case, since the
> timeout handler should run on the same cpu as the completion code for a
> given request (right?). However, for the old code path, they could run
> in parallel.
>
> blk_complete_request:
> A if (!blk_mark_rq_complete(rq) ||
> B test_and_cleart_bit(REQ_ATOM_QUIESCED, &req->atomic_flags)) {
> C __blk_mq_complete_request(rq);
>
> could run alongside of:
>
> blk_rq_check_expired:
> 1 if (!blk_mark_rq_complete(rq))
> 2 blk_rq_timed_out(rq);
>
> So, if 1 comes before A, we have two cases to consider:
>
> i. the expiration path does not yet set REQ_ATOM_QUIESCED before the
> completion code runs, and so the completion code does nothing.
The command has timed out and sets REQ_ATOM_COMPLETED first,
the the actual completion comes in and does indeed nothing. We now
set REQ_ATOM_QUIESCED and kick off a controller reset, which will
ultimatively complete all commands using blk_mq_complete_request.
Now REQ_ATOM_QUIESCED is set on the command that caused the timeout,
so it will be completed as well.
> ii. the expiration path *does* SET REQ_ATOM_QUIESCED. In this instance,
> will we get yet another completion for the request when the command
> is ultimately retired by the adapter reset?
The command has timed out and sets REQ_ATOM_COMPLETED first, then
REQ_ATOM_QUIESCED as well. Now the actual completion comes in and does
nothing because REQ_ATOM_COMPLETED was set. We will then kick off the
controller reset, which will ultimatively complete all commands using
blk_mq_complete_request. Now REQ_ATOM_QUIESCED is set on the command that
caused the timeout, so it will be completed as well.
More information about the Linux-nvme
mailing list