[PATCH v3 1/1] nvme-pci : Fix EEH failure on ppc after subsystem reset
Keith Busch
kbusch at kernel.org
Mon Jun 24 09:07:28 PDT 2024
On Sat, Jun 22, 2024 at 08:37:02PM +0530, Nilay Shroff wrote:
> On 6/21/24 22:07, Keith Busch wrote:
> > static inline int nvme_reset_subsystem(struct nvme_ctrl *ctrl)
> > {
> > + u32 val;
> > int ret;
> >
> > if (!ctrl->subsystem)
> > @@ -657,10 +660,10 @@ static inline int nvme_reset_subsystem(struct nvme_ctrl *ctrl)
> > return -EBUSY;
> >
> > ret = ctrl->ops->reg_write32(ctrl, NVME_REG_NSSR, 0x4E564D65);
> > - if (ret)
> > - return ret;
> > -
> > - return nvme_try_sched_reset(ctrl);
> > + nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
> > + if (!ret)
> > + ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &val);
> > + return ret;
> This is a nice idea! These changes look good. I have tested it on powerpc with
> EEH and I observed that post nvme subsystem-reset, EEH is able to recover the disk.
> I have also tested it on a platform which *doesn't* support EEH or pci error recovery
> and on this platform I observed that nvme disk falls through the dead state.
>
> So I think you may submit a formal patch with this change.
Just a little concerned about the reg_read32 at the end there. A hot
plug event is potentially expected outcome from the reg write, and that
may unmap the pci bar before read.
And come to think of it, a hot plug could occur before the reg_write32,
too, for a reason unrelated to the requested subsys-reset operation...
Anyway, I think this needs a driver specific op to handle it safely.
I'll send a patch.
More information about the Linux-nvme
mailing list