processor reboots if nvme host controller is surprise removed

Keith Busch kbusch at kernel.org
Mon Sep 21 15:24:03 EDT 2020


On Mon, Sep 21, 2020 at 11:38:48AM -0700, Kallol Biswas wrote:
> I have an issue with powering down a nvme host controller while fio is active.
> Hoping someone from this list can provide some input so that the
> problem can be resolved or worked around.
> 
> 
> System info:
> 
> description: Motherboard
>        product: X570 Phantom Gaming X
>        vendor: ASRock
> 
> *-cpu
>           description: CPU
>           product: AMD Ryzen 5 3600 6-Core Processor
> 
> Fio with 50-50% rdwr traffic is active and when the power to the
> device is removed by an external means.
>
> 
> A few commands are active in a submission queue.
> 
> I/Os time out.
> 
> The nvme_timeout routine is called. First register access is CSTS.
> Sometimes the read to the register returns 0xffffffff.... sometimes
> causes the processor to restart. When this returns  0xffffffff the
> next processor restarts trying to access the PCIe config register
> PCI_STATUS.
> 
> The root port had big CTO value, I changed to 0, still it did not help.
> 
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not
> Supported ARIFwd-
> DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR+, OBFF
> Disabled ARIFwd-
> 
> To:
> 
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not
> Supported ARIFwd-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF
> Disabled ARIFwd-
> 
> This change does not help  processor restart in nvme_timeout()  routine.

It doesn't sound like your platform handles an unexpected link down.
What does your root port's Link Capabilities register show?



More information about the Linux-nvme mailing list