Deadlock on device removal event for NVMeF target

Sagi Grimberg sagi at grimberg.me
Thu Jun 29 07:32:36 PDT 2017


Hey Robert,

> Could something like this be causing the D state problem I was seeing
> in iSER almost a year ago?

No, that is a bug in the mlx5 device as far as I'm concerned (although I
couldn't prove it). I've tried to track it down but without access to
the FW tools I can't understand what is going on. I've seen this same
phenomenon with nvmet-rdma before as well.

It looks like when we perform QP draining in the presence of rdma
operations it may not complete, meaning that the zero-length rdma write
never generates a completion. Maybe it has something to do with the qp
moving to error state when some rdma operations have not completed.

> I tried writing a patch for iSER based on
> this, but it didn't help. Either the bug is not being triggered in
> device removal,

It's 100% not related to device removal.

> or I didn't line up the statuses correctly. But it
> seems that things are getting stuck in the work queue and some sort of
> deadlock is happening so I was hopeful that something similar may be
> in iSER.

The hang is the ULP code waiting for QP drain.



More information about the Linux-nvme mailing list