[PATCH 0/2] nvme: sanitize KATO handling

Chao Leng lengchao at huawei.com
Wed Feb 24 01:42:11 EST 2021



On 2021/2/23 20:07, Hannes Reinecke wrote:
> Hi all,
> 
> one of our customer had been running into a deadlock trying to terminate
> outstanding KATO commands during reset.
> Looking closer at it, I found that we never actually _track_ if a KATO
> command is submitted, so we might happily be sending several KATO commands
> to the same controller simultaneously.
Can you explain how can send KATO commands simultaneously?
> Also, I found it slightly odd that we signal a different KATO value to the
> controller than what we're using internally; I would have thought that both
> sides should agree on the same KATO value. And even that wouldn't be so
> bad, but we really should be using the KATO value we annouonced to the
> controller when setting the request timeout.
> 
> With these patches I attempt to resolve the situation; the first patch
> ensures that only one KATO command to a given controller is outstanding.
> With that the delay between sending KATO commands and the KATO timeout
> are decoupled, and we can follow the recommendation from the base spec
> to send the KATO commands at half the KATO timeout intervals.
> 
> As usual, comments and reviews are welcome.
> 
> Hannes Reinecke (2):
>    nvme: fixup kato deadlock
>    nvme: sanitize KATO setting
> 
>   drivers/nvme/host/core.c    | 22 +++++++++++++++++-----
>   drivers/nvme/host/fabrics.c |  2 +-
>   drivers/nvme/host/nvme.h    |  2 +-
>   3 files changed, 19 insertions(+), 7 deletions(-)
> 



More information about the Linux-nvme mailing list