[PATCH RFC 3/3] nvme: delay failover by command quiesce timeout
Sagi Grimberg
sagi at grimberg.me
Wed Apr 16 15:15:16 PDT 2025
>> CQT comes from the controller, and if it is high, it effectively means
>> that the
>> controller cannot handle faster failover reliably. So I think we should
>> leave it
>> as is. It is the vendor problem.
> Okay, that is one way to approach it. However, because of the hung
> task issue, we would be allowing the vendor to panic the initiator
> with a hung task. Until CCR, and without implementing other checks
> (for events which might not happen), this hung task would happen on
> every messy disconnect with that vendor/array.
Its kind of pick your poison situation I guess.
We can log an error for controllers that expose overly long CQT...
Not sure we'll see a hung task here tho, its not like there is a kthread
blocking
on this, its a delayed work so I think the watchdog won't complain about
it...
More information about the Linux-nvme
mailing list