[PATCH 0/2] nvme: sanitize KATO handling

Hannes Reinecke hare at suse.de
Wed Feb 24 02:59:08 EST 2021


On 2/24/21 8:20 AM, Chao Leng wrote:
> 
> 
> On 2021/2/24 15:06, Hannes Reinecke wrote:
>> On 2/24/21 7:42 AM, Chao Leng wrote:
>>>
>>>
>>> On 2021/2/23 20:07, Hannes Reinecke wrote:
>>>> Hi all,
>>>>
>>>> one of our customer had been running into a deadlock trying to 
>>>> terminate
>>>> outstanding KATO commands during reset.
>>>> Looking closer at it, I found that we never actually _track_ if a KATO
>>>> command is submitted, so we might happily be sending several KATO 
>>>> commands
>>>> to the same controller simultaneously.
>>> Can you explain how can send KATO commands simultaneously?
>>
>> Sure.
>> Call nvme_start_keep_alive() on a dead connection.
>> Just _after_ the KATO request has been sent,
>> call nvme_start_keep_alive() again.
> Call nvme_start_keep_alive() again? why?
> Now just nvme_start_ctrl call nvme_start_keep_alive().
> The ka_work will be canceled sync before start reconnection.
> Did I miss something?

My point was that there _can_ be a ka_work() entry even when a KATO 
command is running.

And yes, the ka_work entry will be cancelled, but _before_ the 
outstanding commands are cancelled.
And cancelling the ka_work entry might cause the function to be 
executed, which leads to a deadlock if blk_mq_get_request() is blocked
(eg if the queue is already stopped due to recovery)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



More information about the Linux-nvme mailing list