processor reboots if nvme host controller is surprise removed
Keith Busch
kbusch at kernel.org
Mon Sep 21 15:24:03 EDT 2020
On Mon, Sep 21, 2020 at 11:38:48AM -0700, Kallol Biswas wrote:
> I have an issue with powering down a nvme host controller while fio is active.
> Hoping someone from this list can provide some input so that the
> problem can be resolved or worked around.
>
>
> System info:
>
> description: Motherboard
> product: X570 Phantom Gaming X
> vendor: ASRock
>
> *-cpu
> description: CPU
> product: AMD Ryzen 5 3600 6-Core Processor
>
> Fio with 50-50% rdwr traffic is active and when the power to the
> device is removed by an external means.
>
>
> A few commands are active in a submission queue.
>
> I/Os time out.
>
> The nvme_timeout routine is called. First register access is CSTS.
> Sometimes the read to the register returns 0xffffffff.... sometimes
> causes the processor to restart. When this returns 0xffffffff the
> next processor restarts trying to access the PCIe config register
> PCI_STATUS.
>
> The root port had big CTO value, I changed to 0, still it did not help.
>
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not
> Supported ARIFwd-
> DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR+, OBFF
> Disabled ARIFwd-
>
> To:
>
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not
> Supported ARIFwd-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF
> Disabled ARIFwd-
>
> This change does not help processor restart in nvme_timeout() routine.
It doesn't sound like your platform handles an unexpected link down.
What does your root port's Link Capabilities register show?
More information about the Linux-nvme
mailing list