[PATCH 04/47] block: provide a new BLK_EH_QUIESCED timeout return value
Jeff Moyer
jmoyer at redhat.com
Tue Nov 24 07:51:04 PST 2015
Christoph Hellwig <hch at lst.de> writes:
> On Tue, Nov 24, 2015 at 10:16:51AM -0500, Jeff Moyer wrote:
>> Hi Christoph,
>>
>> Christoph Hellwig <hch at lst.de> writes:
>>
>> > This marks the request as one that's not actually completed yet, but
>> > should be reaped next time blk_mq_complete_request comes in. This is
>> > useful it the abort handler kicked of a reset that will complete all
>> > pending requests.
>>
>> What's the purpose, though? Is this an optimization?
>
> It allows us to to correctly implement controller reset (like SCSI target
> resets) from the timeout handler. The current HANDLED/NOT_HANDLED returns
> are not very useful if you want to eventually kick of a reset that will
> abort all requests, but needs to ensure the the requests don't get reused
> before that. Only SCSI handles that for now, and needs it's own per-LUN
> command list and a lot of complex code for that - something we'd like
> to avoid for NVMe or other new drivers.
Thanks for the explanation. One more question below.
>> We've had "fun" problems with races between completion and timeout
>> before. I can't say I'm too keen on adding more complexity to this code
>> path. Have you considered what happens in your new code when this race
>> occurs? I don't expect it to cause any issues in the mq case, since the
>> timeout handler should run on the same cpu as the completion code for a
>> given request (right?). However, for the old code path, they could run
>> in parallel.
>>
>> blk_complete_request:
>> A if (!blk_mark_rq_complete(rq) ||
>> B test_and_cleart_bit(REQ_ATOM_QUIESCED, &req->atomic_flags)) {
>> C __blk_mq_complete_request(rq);
>>
>> could run alongside of:
>>
>> blk_rq_check_expired:
>> 1 if (!blk_mark_rq_complete(rq))
>> 2 blk_rq_timed_out(rq);
>>
>> So, if 1 comes before A, we have two cases to consider:
>>
>> i. the expiration path does not yet set REQ_ATOM_QUIESCED before the
>> completion code runs, and so the completion code does nothing.
>
> The command has timed out and sets REQ_ATOM_COMPLETED first,
> the the actual completion comes in and does indeed nothing. We now
> set REQ_ATOM_QUIESCED and kick off a controller reset, which will
> ultimatively complete all commands using blk_mq_complete_request.
> Now REQ_ATOM_QUIESCED is set on the command that caused the timeout,
> so it will be completed as well.
>
>> ii. the expiration path *does* SET REQ_ATOM_QUIESCED. In this instance,
>> will we get yet another completion for the request when the command
>> is ultimately retired by the adapter reset?
>
> The command has timed out and sets REQ_ATOM_COMPLETED first, then
> REQ_ATOM_QUIESCED as well. Now the actual completion comes in and does
> nothing because REQ_ATOM_COMPLETED was set. We will then kick off the
See B above. REQ_ATOM_COMPLETE is set, so the first half of that
statement is false, but then test_and_clear_bit(REQ_ATOM_QUIESCED...)
returns true, so we call __blk_complete_request. So the question is,
will we get a double completion for that request after the reset is
performed?
-Jeff
p.s. That should be __blk_complete_request up there in 'C'.
> controller reset, which will ultimatively complete all commands using
> blk_mq_complete_request. Now REQ_ATOM_QUIESCED is set on the command that
> caused the timeout, so it will be completed as well.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the Linux-nvme
mailing list