[PATCH 0/2] nvme: sanitize KATO handling
Chao Leng
lengchao at huawei.com
Wed Feb 24 01:42:11 EST 2021
On 2021/2/23 20:07, Hannes Reinecke wrote:
> Hi all,
>
> one of our customer had been running into a deadlock trying to terminate
> outstanding KATO commands during reset.
> Looking closer at it, I found that we never actually _track_ if a KATO
> command is submitted, so we might happily be sending several KATO commands
> to the same controller simultaneously.
Can you explain how can send KATO commands simultaneously?
> Also, I found it slightly odd that we signal a different KATO value to the
> controller than what we're using internally; I would have thought that both
> sides should agree on the same KATO value. And even that wouldn't be so
> bad, but we really should be using the KATO value we annouonced to the
> controller when setting the request timeout.
>
> With these patches I attempt to resolve the situation; the first patch
> ensures that only one KATO command to a given controller is outstanding.
> With that the delay between sending KATO commands and the KATO timeout
> are decoupled, and we can follow the recommendation from the base spec
> to send the KATO commands at half the KATO timeout intervals.
>
> As usual, comments and reviews are welcome.
>
> Hannes Reinecke (2):
> nvme: fixup kato deadlock
> nvme: sanitize KATO setting
>
> drivers/nvme/host/core.c | 22 +++++++++++++++++-----
> drivers/nvme/host/fabrics.c | 2 +-
> drivers/nvme/host/nvme.h | 2 +-
> 3 files changed, 19 insertions(+), 7 deletions(-)
>
More information about the Linux-nvme
mailing list