nvme-fabrics: shutdown 1-minute deadlock

Martin Belanger nitram_67 at hotmail.com
Mon Sep 20 08:09:37 PDT 2021


Hello NVMe community,

I ran into a 1-minute deadlock trying to disconnect from a remote discovery controller (over tcp) while the controller is unreachable. The remote controller could be unreachable because the network is down or the controller simply crashed unexpectedly. The cause is irrelevant. Suffice it to say that the kernel does not yet know that the controller is unreachable and we're trying to disconnect from the controller. The problem is that during a disconnect we try to write commands to the remote controller and the default timeout for a write operation is 1 minute (admin_timeout). 

For example, in nvme_shutdown_ctrl() we call reg_write32() (which really invokes nvmf_reg_write32) to set the NVME_CC_SHN_NORMAL bit, and this operation will block waiting for a response that will never come. Interestingly, in the same function we also call reg_read32() to read the CSTS register, but this time we specify a 5-sec timeout (i.e. ctrl->shutdown_timeout). In other words, there is an inconsistency between the write and the read timeouts in that one function. 

Similarly, nvme_disable_ctrl() calls reg_write32() (i.e. nvmf_reg_write32) to clear NVME_CC_ENABLE and NVME_CC_SHN_MASK, and once again this will block for 1 minute if the controller in unreachable. 

I would like to propose that the prototype for reg_write32() be changed to allow for the caller to specify a timeout as follows:

int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val, unsigned timeout);

This timeout will simply be passed to nvmf_reg_write32() and in turn to __nvme_submit_sync_cmd().

When invoking reg_write32(), one would set timeout to 0 (zero) to indicate that the default 1-minute timeout shall be used. Otherwise, a non-0 timeout would overwrite the default. It's only in functions nvme_shutdown_ctrl() and nvme_disable_ctrl() that we would specify a timeout shorter than 1 minute. For example, we could use ctrl->shutdown_timeout as the value for timeout.

I would like to hear your thoughts before I submit a patch. Maybe there's a better or easier way to work around this.

Regards,
Martin Belanger
Engineering Technologist, Dell Inc.


More information about the Linux-nvme mailing list