pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)

Bjorn Helgaas helgaas at kernel.org
Mon Apr 30 14:17:40 PDT 2018


On Mon, Apr 30, 2018 at 04:48:15PM -0400, Sinan Kaya wrote:
> Bjorn,
> 
> On 4/28/2018 9:03 AM, okaya at codeaurora.org wrote:
> >> Hmm, if it is the remove() method then kexec does not use it.  kexec use
> >> the shutdown() method instead.  I missed this details when I replied.
> > 
> > Portdrv hooks up remove handler to shutdown. That's why remove is getting called.
> 
> What should we do about this?
> 
> Since there is an actual HW errata involved, should we quirk this
> root port and not wait as if remove/shutdown doesn't exist?

I was hoping to avoid a quirk because AFAIK all Intel parts have this
issue so it will be an ongoing maintenance issue.  I tried to avoid
the timeout delays, e.g., with 40b960831cfa ("PCI: pciehp: Compute
timeout from hotplug command start time").

But we still see the alarming messages, so we should probably add a
quirk to get rid of those.

But I haven't given up on the idea of getting rid of the
pciehp_remove() path.  I'm not convinced yet that we actually need to
do anything to shut this device down.  I don't like the assumption
that kexec requires this.  The kexec is fundamentally just a branch,
and anything we do before the branch (i.e., in the old kernel), we
should also be able to do after the branch (i.e., in the kexec-ed
kernel).

> Paul,
> You might want to file a bugzilla so that we can keep our debug
> efforts out of this list.



More information about the kexec mailing list