[PATCH 0/2] nvme: sanitize KATO handling

Hannes Reinecke hare at suse.de
Tue Feb 23 07:07:26 EST 2021


Hi all,

one of our customer had been running into a deadlock trying to terminate
outstanding KATO commands during reset.
Looking closer at it, I found that we never actually _track_ if a KATO
command is submitted, so we might happily be sending several KATO commands
to the same controller simultaneously.
Also, I found it slightly odd that we signal a different KATO value to the
controller than what we're using internally; I would have thought that both
sides should agree on the same KATO value. And even that wouldn't be so
bad, but we really should be using the KATO value we annouonced to the
controller when setting the request timeout.

With these patches I attempt to resolve the situation; the first patch
ensures that only one KATO command to a given controller is outstanding.
With that the delay between sending KATO commands and the KATO timeout
are decoupled, and we can follow the recommendation from the base spec
to send the KATO commands at half the KATO timeout intervals.

As usual, comments and reviews are welcome.

Hannes Reinecke (2):
  nvme: fixup kato deadlock
  nvme: sanitize KATO setting

 drivers/nvme/host/core.c    | 22 +++++++++++++++++-----
 drivers/nvme/host/fabrics.c |  2 +-
 drivers/nvme/host/nvme.h    |  2 +-
 3 files changed, 19 insertions(+), 7 deletions(-)

-- 
2.29.2




More information about the Linux-nvme mailing list