processor reboots if nvme host controller is surprise removed

Kallol Biswas kallol at nucleodyne.com
Mon Sep 21 14:38:48 EDT 2020


I have an issue with powering down a nvme host controller while fio is active.
Hoping someone from this list can provide some input so that the
problem can be resolved or worked around.


System info:

description: Motherboard
       product: X570 Phantom Gaming X
       vendor: ASRock

*-cpu
          description: CPU
          product: AMD Ryzen 5 3600 6-Core Processor

Fio with 50-50% rdwr traffic is active and when the power to the
device is removed by an external means.

A few commands are active in a submission queue.

I/Os time out.

The nvme_timeout routine is called. First register access is CSTS.
Sometimes the read to the register returns 0xffffffff.... sometimes
causes the processor to restart. When this returns  0xffffffff the
next processor restarts trying to access the PCIe config register
PCI_STATUS.

The root port had big CTO value, I changed to 0, still it did not help.

DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not
Supported ARIFwd-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR+, OBFF
Disabled ARIFwd-

To:

DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not
Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF
Disabled ARIFwd-

This change does not help  processor restart in nvme_timeout()  routine.



More information about the Linux-nvme mailing list