[Bug Report] nvme connect deadlock in allocating tag
Sagi Grimberg
sagi at grimberg.me
Sun Apr 28 02:30:30 PDT 2024
On 28/04/2024 12:16, Wangbing Kuang wrote:
> "The error_recovery work should unquiesce the admin_q, which should fail
> fast all pending admin commands,
> so it is unclear to me how the connect process gets stuck."
> I think the reason is: the command can be unquiesce but the tag cannot be
> return until command success.
The error recovery also cancels all pending requests. See
nvme_cancel_admin_tagset
>
> "What is step (2) - make nvme io timeout to recover the connection?"
> I use spdk-nvmf-target for backend. It is easy to set read/write
> nvmf-target io hang and unhang. So I just set the io hang for over 30
> seconds, then trigger linux-nvmf-host trigger io timeout event. then io
> timeout will trigger connection recover.
> by the way, I use multipath=0
Interesting, does this happen with multipath=Y ?
I didn't expect people to be using multipath=0 for fabrics in the past few
years.
>
> "Is this reproducing with upstream nvme? or is this some distro kernel
> where this happens?"
> it is reproduced in a kernel based from v5.15, but I think this is common
> error.
It would be beneficial to verify this.
More information about the Linux-nvme
mailing list