[PATCH 5/6] blk-mq: Fix queue freeze deadlock

Sagi Grimberg sagi at grimberg.me
Wed Jan 18 23:54:23 PST 2017


>>> If hardware queues are stopped for some event, like the device has been
>>> suspended by power management, requests allocated on that hardware queue
>>> are indefinitely stuck causing a queue freeze to wait forever.
>>
>> I have a problem with this patch. IMO, this is a general issue so, so
>> why do we tie a fix to calling blk_mq_update_nr_hw_queues()? We might
>> not need to update nr_hw_queues at all. I'm fine with the
>> blk_mq_abandon_stopped_requests but not with its call-site.
>>
>> Usually a driver knows when it wants to abandon all busy requests
>> blk_mq_tagset_busy_iter(), maybe the right approach is to add
>> a hook for all allocated tags? Or have blk_mq_quisce_queue get a
>> fail all requests parameter from the callers?
>
> This patch is overly aggressive on failing allocated requests. There
> are scenarios where we wouldn't want to abandon them, like if the hw
> context is about to be brough back online, but this patch assumes all
> need to be abandoned. I'll see if there's some other tricks we can have
> a driver do. Thanks for the suggestions.

I agree,

I do think though that this should be driven from the driver, because
for fabrics, we might have some fabric error that triggers a periodic
reconnect. So the "hw context is about to be brought back" is unknown
from the driver pov, and when we delete the controller (because we give
up) this is exactly where we need to abandon the allocated requests.



More information about the Linux-nvme mailing list