[PATCH] PCI: Mark Nvidia GB10 to avoid bus reset
Bjorn Helgaas
helgaas at kernel.org
Wed Jan 14 09:28:32 PST 2026
[+cc Jason, Alex for Nvidia input]
On Wed, Jan 14, 2026 at 06:39:24AM +0000, Johnny-CC Chang (張晋嘉) wrote:
> On Tue, 2025-11-18 at 17:39 +0800, Johnny-CC Chang wrote:
> > On Thu, 2025-11-13 at 10:39 +0100, Lukas Wunner wrote:
> > > On Thu, Nov 13, 2025 at 04:44:06PM +0800, Johnny Chang wrote:
> > > > Nvidia GB10 PCIe hosts will encounter problem occasionally
> > > > after SBR(secondary bus reset) is applied.
> > >
> > > Could you elaborate what kinds of problems occur, how often they
> > > occur, etc?
> >
> > There is about 1/1000 chance that after SBR is applied, any further
> > access via this root port will be blocked and make system crash.
What sort of crash happens? It's useful if we can include a bread
crumb that will help people identify the crash and find a fix.
What I would expect is some kind of PCIe error like a config read
timeout or unsupport request error. But usually those just result in
~0 data back to the CPU, which usually doesn't directly cause a crash.
> I would like to update below description to replace original comment in
> v1 patch, is this information sufficient?
> --------
> /*
> * After SBR(secondary bus reset) is applied on an Nvidia GB10
> * PCIe root port, there is 1/1000 chance that further requests
> * via this root port will be blocked and cause system unstable.
I'm confused about what the topology is. I first assumed GB10 was a
PCIe Endpoint, since Secondary Bus Reset only applies to devices below
a bridge, so SBR would be applied to a device by a config write to
that bridge.
But you mention a GB10 Root Port here, which obviously is not an
Endpoint, so there's no bridge upstream from the GB10 that could
initiate SBR to the GB10.
If this is actually a GB10 issue, it sounds like a hardware erratum
that lots of users would see and Nvidia would likely be aware of.
Bjorn
More information about the linux-arm-kernel
mailing list