nvme_rdma - leaves provider resources allocated
Steve Wise
swise at opengridcomputing.com
Tue Aug 23 09:58:56 PDT 2016
Assume an nvme_rdma host has one attached controller in RECONNECTING state, and
that controller has failed to reconnect at least once and thus is in the
delay_schedule time before retrying the connection. At that moment, there are
no cm_ids allocated for that controller because the admin queue and the io
queues have been freed. So nvme_rdma cannot get a DEVICE_REMOVAL from the
rdma_cm. This means if the underlying provider module is removed, it will be
removed with resources still allocated by nvme_rdma. For iw_cxgb4, this causes
a BUG_ON() in gen_pool_destroy() because MRs are still allocated for the
controller.
Thoughts on how to fix this?
Thanks,
Steve.
More information about the Linux-nvme
mailing list