nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot

Tue Oct 29 09:07:26 PDT 2024

On Mon, 2024-10-07 at 11:56 -0400, Laurence Oberman wrote:
> On Thu, 2024-10-03 at 15:04 -0600, Keith Busch wrote:
> > On Thu, Sep 26, 2024 at 05:11:05PM -0400, Laurence Oberman wrote:
> > > It was reported to Red Hat, seeing issues with using a
> > > "nvme subsystem-reset /dev/nvme0" command to test resets.
> > 
> > I really dislike that command. The side effects are overkill for
> > the
> > pci
> > transport...
> >  
> > > On multiple servers I tested on two types of nvme attached
> > > devices
> > > These are not the rootfs devices
> > > 
> > > 1. The front slot (hotplug) devices in a 2.5in format 
> > > reset and after some time recover (what is expected)
> > > 
> > > Example of one working
> > > 
> > > Does not trap and land up as a machine-check
> > 
> > <snip>
> > 
> > > 2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes 
> > > a machine check and panics the box when its against a nvme in a 
> > > PCIE slot
> > > 
> > > [  263.862919] mce: [Hardware Error]: CPU 12: Machine Check
> > > Exception: 5 Bank 6: ba00000000000e0b
> > > [  263.862924] mce: [Hardware Error]: RIP !INEXACT!
> > > 10:<ffffffff8571dce4> {intel_idle+0x54/0x90}
> > 
> > So this wasn't failing before 6.11? As Nilay mentioned, there are
> > some
> > changes on how nvme subsystem reset is handled. The main thing
> > being
> > this ioctl doesn't automatically trigger an nvme reset. I expected
> > delayed recovery might happen, but machine checks are not expected.
> > If
> > this was working before, I can only guess right now that the
> > previous
> > behavior was accessing MMIO and config quicker and triggered a
> > different
> > error path. If you're successful with the PPC patch reverted, I
> > would
> > be
> > interested to hear about it.
> > 
> 
> Hello
> 
> Quick update about this.
> I went back all the way to 6.8 and this still happens.
> I started to think that these HPE servers were more susceptible to
> the
> machine checks on the PCIE state changes.
> 
> So I tested on a Lenovo and still had panics.
> I do not think this is worth pursuing given that Keith already
> confirmed this is not recommended and way too heavy handed on the
> PCIE
> path.
> 
> I have told the reporter of this that they are not to use this type
> of
> fault injection on directly attached nvme devices.
> 
> Thanks
> Laurence
> 
Hello

Finishing this thread off but have a final question. 
Bottom line is certain server hardware sees the nvme reset command
create a machine check for PCIE plugged NVME devices going back quite
far in kernel versions,  and we panic.

As Keith had said, that nvme reset command is too much impact

There is a final simple question for M2 connected NVME devices. 
Are these expected to auto-re-connect after an nvme reset is issued. 

The complaint is the following

nvme subsystem-reset /dev/nvme0 
Device is disconnected as expected but requires the following to
reconnect

echo 1 >  /sys/bus/pci/devices/0000:02:00.0/remove
echo 1 > /sys/bus/pci/rescan

Then it is reconnected.

Thanks
Laurence