[PATCH] nvme-pci: Harden drive presence detect in nvme_dev_disable()

Stefan Roese sr at denx.de
Thu May 12 23:18:14 PDT 2022


On 13.05.22 04:00, Keith Busch wrote:
> On Fri, May 13, 2022 at 04:07:18AM +0300, Max Gurtovoy wrote:
>> On 5/12/2022 5:33 PM, Keith Busch wrote:
>>> It might happen if the device requires a Subsystem Reset to activate the new
>>> firmware.
>>
>> but isn't the reset something that the admin should do ?
>>
>> some power-cycle or cold-reboot or other reset.
>>
>> In the reported case the device resets itself. I'm not sure it's expected.
> 
> I'm not familiar with this device or the procedure used to update the firmware,
> but I'm aware many vendors still provide their firmware bundled within their
> own tooling, so simply running that utility could do all sorts of things
> including a device link reset.

We are using standard nvme-cli tools for the FW update process, like:
nvme fw-download ...
nvme fw-commit ...

> If the device is resetting itself without a user or driver app initiating it,
> though, that would be bad behavior outside the spec. But if this harmless patch
> improves interop regardless of device behavior, then it should be okay to
> include.

Yes. The device resetting itself seems to be outside of the spec. Still
we somehow need to handle this situation.

>> BTW, for sure the fix is good for the hot unplug case but FW reset shouldn't
>> cause this scenario IMO.
> 
> Yeah, the patch is fine as far as I'm concerned, though it is impossible to
> prevent all register reads executing concurrently with any random link-down
> event. The commit log indicates reading registers during such an event causes a
> kernel crash, so a more complete fix probably needs to come from the platform
> level.

Agreed. Even with this patch applied, we have still seen Kernel crashes
after the "nvme fw-commit", when the drive reset'ed itself twice (IIRC).
We were able to work around these issues by removing the nvme modules
exactly when the PCIe device was removed (via HP event) for now.

Thanks,
Stefan



More information about the Linux-nvme mailing list