[PATCH RFC 3/3] nvme: delay failover by command quiesce timeout

Sagi Grimberg sagi at grimberg.me
Wed Apr 16 15:15:16 PDT 2025


>> CQT comes from the controller, and if it is high, it effectively means
>> that the
>> controller cannot handle faster failover reliably. So I think we should
>> leave it
>> as is. It is the vendor problem.
> Okay, that is one way to approach it.  However, because of the hung
> task issue, we would be allowing the vendor to panic the initiator
> with a hung task.  Until CCR, and without implementing other checks
> (for events which might not happen), this hung task would happen on
> every messy disconnect with that vendor/array.

Its kind of pick your poison situation I guess.
We can log an error for controllers that expose overly long CQT...

Not sure we'll see a hung task here tho, its not like there is a kthread 
blocking
on this, its a delayed work so I think the watchdog won't complain about 
it...



More information about the Linux-nvme mailing list