kernel panic due to a missing work initialization in case of zero
Hou Pu
houpu.main at gmail.com
Thu Apr 22 13:16:35 BST 2021
On Wed, Apr 21, 2021 at 9:29 PM Engel, Amit <Amit.Engel at dell.com> wrote:
>
> Hi Hou,
> Yes, commit 7b96918a173 (nvmet: avoid queuing keep-alive timer if it is disabled) fixes the panic we hit.
Thanks.
>
> One comment:
> It might be more elegant to move
> INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer);
> From nvmet_start_keep_alive_timer To nvmet_alloc_ctrl
> This way, we will not INIT ka_work each time the keep alive timer is started
> (each nvmet_set_feat_kato for example, will start_keep_alive_timer)
> IMO it make more sense to INIT_DELAYED_WORK only once (as part of alloc_ctrl)
>
> Let me know what you think and if you want me to provide this minor change
>
Yes, this makes more sense AFAIK.
I'm OK with it.
Thanks,
Hou
> Thanks
> Amit
>
> -----Original Message-----
> From: Hou Pu <houpu.main at gmail.com>
> Sent: Wednesday, April 21, 2021 5:32 AM
> To: Engel, Amit
> Cc: linux-nvme at lists.infradead.org; sagi at grimberg.me
> Subject: kernel panic due to a missing work initialization in case of zero
>
>
> [EXTERNAL EMAIL]
>
> On 4/20/21 11:46, Engel, Amit wrote:
> > Hello,
> >
> > We hit a kernel panic as a result of the below sequence:
> > In the current nvmet implementation, as part of 'nvmet_start_keep_alive_timer'
> > nvmet_keep_alive_timer work will be initialized only if kato != 0
> >
> > when nvme connect cmd is being executed with a zero kato value
> > 'INIT_DELAYED_WORK(&ctrl->ka_work, nvmet_keep_alive_timer)' will not
> > be called
> >
> > once keep alive cmd arrives, we call 'mod_delayed_work' for a work
> > that has not been initialized this will lead to kernel WARNING:
> > Apr 20 10:32:59 FNM00190700796-A kernel: WARNING: CPU: 11 PID: 75133
> > at kernel/workqueue.c:1447 __queue_work.cold.55+0xc/0x3c And
> > eventually to soft lockup
>
> Hello Engel,
>
> Could you verify this with latest nvme-5.13 branch? I think this might be the same problem as commit 7b96918a173 (nvmet: avoid queuing keep-alive timer if it is disabled) fixed.
>
> Thanks,
> Hou
>
> >
> > A simple fix for this issue (I will post a patch soon) is to
> > initialize the work (as part of 'nvmet_start_keep_alive_timer') even
> > if kato == 0
> >
> > Thanks
> > Amit E
More information about the Linux-nvme
mailing list