[PATCH 0/2] nvme: sanitize KATO handling
Hannes Reinecke
hare at suse.de
Wed Feb 24 02:59:08 EST 2021
On 2/24/21 8:20 AM, Chao Leng wrote:
>
>
> On 2021/2/24 15:06, Hannes Reinecke wrote:
>> On 2/24/21 7:42 AM, Chao Leng wrote:
>>>
>>>
>>> On 2021/2/23 20:07, Hannes Reinecke wrote:
>>>> Hi all,
>>>>
>>>> one of our customer had been running into a deadlock trying to
>>>> terminate
>>>> outstanding KATO commands during reset.
>>>> Looking closer at it, I found that we never actually _track_ if a KATO
>>>> command is submitted, so we might happily be sending several KATO
>>>> commands
>>>> to the same controller simultaneously.
>>> Can you explain how can send KATO commands simultaneously?
>>
>> Sure.
>> Call nvme_start_keep_alive() on a dead connection.
>> Just _after_ the KATO request has been sent,
>> call nvme_start_keep_alive() again.
> Call nvme_start_keep_alive() again? why?
> Now just nvme_start_ctrl call nvme_start_keep_alive().
> The ka_work will be canceled sync before start reconnection.
> Did I miss something?
My point was that there _can_ be a ka_work() entry even when a KATO
command is running.
And yes, the ka_work entry will be cancelled, but _before_ the
outstanding commands are cancelled.
And cancelling the ka_work entry might cause the function to be
executed, which leads to a deadlock if blk_mq_get_request() is blocked
(eg if the queue is already stopped due to recovery)
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
More information about the Linux-nvme
mailing list