Deadlock on device removal event for NVMeF target

Robert LeBlanc robert at leblancnet.us
Thu Jun 29 09:18:26 PDT 2017


Sagi,

Thanks for the update.

On Thu, Jun 29, 2017 at 8:32 AM, Sagi Grimberg <sagi at grimberg.me> wrote:
> Hey Robert,
>
>> Could something like this be causing the D state problem I was seeing
>> in iSER almost a year ago?
>
>
> No, that is a bug in the mlx5 device as far as I'm concerned (although I
> couldn't prove it). I've tried to track it down but without access to
> the FW tools I can't understand what is going on. I've seen this same
> phenomenon with nvmet-rdma before as well.

Do you know who I could contact about it? I can reproduce the problem
pretty easy with two hosts back to back, so it should be easy for
someone with mlx5 Eth devices to replicate.

> It looks like when we perform QP draining in the presence of rdma
> operations it may not complete, meaning that the zero-length rdma write
> never generates a completion. Maybe it has something to do with the qp
> moving to error state when some rdma operations have not completed.
>
>> I tried writing a patch for iSER based on
>> this, but it didn't help. Either the bug is not being triggered in
>> device removal,
>
>
> It's 100% not related to device removal.
>
>> or I didn't line up the statuses correctly. But it
>> seems that things are getting stuck in the work queue and some sort of
>> deadlock is happening so I was hopeful that something similar may be
>> in iSER.
>
>
> The hang is the ULP code waiting for QP drain.

Yeah, the patches I wrote did nothing to help the problem. The only
thing that kind of worked, was forcing the queue to drop (maybe I was
just ignoring the old queue, I can't remember exactly), but it was
leaving some stale iSCSI session info around. Now that I've read more
of the iSCSI code, I wonder if I should revisit that. I think Bart
said that the sledgehammer approach I took should not be necessary.

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1



More information about the Linux-nvme mailing list