[PATCH 04/47] block: provide a new BLK_EH_QUIESCED timeout return value

Christoph Hellwig hch at lst.de
Tue Nov 24 07:40:19 PST 2015


On Tue, Nov 24, 2015 at 10:16:51AM -0500, Jeff Moyer wrote:
> Hi Christoph,
> 
> Christoph Hellwig <hch at lst.de> writes:
> 
> > This marks the request as one that's not actually completed yet, but
> > should be reaped next time blk_mq_complete_request comes in.  This is
> > useful it the abort handler kicked of a reset that will complete all
> > pending requests.
> 
> What's the purpose, though?  Is this an optimization?

It allows us to to correctly implement controller reset (like SCSI target
resets) from the timeout handler.  The current HANDLED/NOT_HANDLED returns
are not very useful if you want to eventually kick of a reset that will
abort all requests, but needs to ensure the the requests don't get reused
before that.  Only SCSI handles that for now, and needs it's own per-LUN
command list and a lot of complex code for that - something we'd like
to avoid for NVMe or other new drivers.

> We've had "fun" problems with races between completion and timeout
> before.  I can't say I'm too keen on adding more complexity to this code
> path.  Have you considered what happens in your new code when this race
> occurs?  I don't expect it to cause any issues in the mq case, since the
> timeout handler should run on the same cpu as the completion code for a
> given request (right?).  However, for the old code path, they could run
> in parallel.
> 
> blk_complete_request:
> A  if (!blk_mark_rq_complete(rq) ||
> B      test_and_cleart_bit(REQ_ATOM_QUIESCED, &req->atomic_flags)) {
> C        __blk_mq_complete_request(rq);
> 
> could run alongside of:
> 
> blk_rq_check_expired:
> 1 if (!blk_mark_rq_complete(rq))
> 2   blk_rq_timed_out(rq);
> 
> So, if 1 comes before A, we have two cases to consider:
> 
> i.  the expiration path does not yet set REQ_ATOM_QUIESCED before the
>     completion code runs, and so the completion code does nothing.

The command has timed out and sets REQ_ATOM_COMPLETED first,
the the actual completion comes in and does indeed nothing.  We now
set REQ_ATOM_QUIESCED and kick off a controller reset, which will
ultimatively complete all commands using blk_mq_complete_request.
Now REQ_ATOM_QUIESCED is set on the command that caused the timeout,
so it will be completed as well.

> ii. the expiration path *does* SET REQ_ATOM_QUIESCED.  In this instance,
>     will we get yet another completion for the request when the command
>     is ultimately retired by the adapter reset?

The command has timed out and sets REQ_ATOM_COMPLETED first, then
REQ_ATOM_QUIESCED as well.  Now the actual completion comes in and does
nothing because REQ_ATOM_COMPLETED was set.  We will then kick off the
controller reset, which will ultimatively complete all commands using
blk_mq_complete_request. Now REQ_ATOM_QUIESCED is set on the command that
caused the timeout, so it will be completed as well.



More information about the Linux-nvme mailing list