[LSF/MM/BPF TOPIC] Improving Zoned Storage Support

Jens Axboe axboe at kernel.dk
Wed Jan 17 13:40:55 PST 2024


On 1/17/24 2:33 PM, Bart Van Assche wrote:
> On 1/17/24 13:14, Jens Axboe wrote:
>>   /* Maps an I/O priority class to a deadline scheduler priority. */
>> @@ -600,6 +604,10 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>       struct request *rq;
>>       enum dd_prio prio;
>>   +    if (test_bit(0, &dd->dispatch_state) ||
>> +        test_and_set_bit(0, &dd->dispatch_state))
>> +        return NULL;
>> +
>>       spin_lock(&dd->lock);
>>       rq = dd_dispatch_prio_aged_requests(dd, now);
>>       if (rq)
>> @@ -616,6 +624,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>       }
>>     unlock:
>> +    clear_bit(0, &dd->dispatch_state);
>>       spin_unlock(&dd->lock);
> 
> Can the above code be simplified by using spin_trylock() instead of
> test_bit() and test_and_set_bit()?

It can't, because you can't assume that just because dd->lock is already
being held that dispatch is running.

> Please note that whether or not spin_trylock() is used, there is a
> race condition in this approach: if dd_dispatch_request() is called
> just before another CPU calls spin_unlock() from inside
> dd_dispatch_request() then some requests won't be dispatched until the
> next time dd_dispatch_request() is called.

Sure, that's not surprising. What I cared most about here is that we
should not have a race such that we'd stall. Since we haven't returned
this request just yet if we race, we know at least one will be issued
and we'll re-run at completion. So yeah, we may very well skip an issue,
that's well known within that change, which will be postponed to the
next queue run.

The patch is more to demonstrate that it would not take much to fix this
case, at least, it's a proof-of-concept.

-- 
Jens Axboe




More information about the Linux-nvme mailing list