Instability in ALL stable and LTS distro kernels (IRQ #16 being disabled, PCIe bus errors, ath10k_pci) in Dell Inspiron 5567
Bandhan Pramanik
bandhanpramanik06.foss at gmail.com
Sat Jul 5 08:00:46 PDT 2025
Hello,
The dmesg log (the older one) is present here:
https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/dmesg.log
The newer dmesg log includes the first line and is not overwritten by
the ring buffer (used pci=noaer in this case):
https://gist.githubusercontent.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180/raw/78460e6931a055b6776afe756a95d467913d5ebd/updated-dmesg
(The newer one doesn't have the error recorded).
You should check out the older dmesg, the quoted line was taken from
there verbatim, including any additional details.
Bandhan
On Sat, Jul 5, 2025 at 7:20 PM Bjorn Helgaas <helgaas at kernel.org> wrote:
>
> On Sat, Jul 05, 2025 at 01:00:23AM +0530, Bandhan Pramanik wrote:
> > Hi everyone,
> >
> > Here after a week. I did my research.
> >
> > I talked to some folks on IRC and the glaring issue was basically this:
> >
> > > [ 1146.810055] pcieport 0000:00:1c.0: AER: Uncorrectable (Fatal) error message received from 0000:01:00.0
>
> Where is the complete dmesg log from which this is extracted?
>
> > This basically means that the root port (that 1c thing written with
> > colons) of PCIe is the main problem here.
> >
> > One particular note: this issue can be reproduced on the models of
> > this same laptop. Therefore, this happens in most if not all of the
> > laptops of the same model.
> >
> > For starters, the root port basically manages the communication
> > between the CPU and the device. Now, this root port itself is
> > reporting fatal errors.
> >
> > This is not a Wi-Fi error, but something deeper.
>
> Devices that support AER have extra log registers to capture details
> about an error. A device that detects an error sends a PCIe Error
> Message upstream to a Root Port. The Root Port generates an
> interrupt, which is handled by the aer driver. In this case, the
> 01:00.0 device detected an error and sent an ERR_FATAL message
> upstream, and the 00:1c.0 Root Port received it and generated an
> interrupt. The ERR_FATAL message doesn't contain any details about
> the error itself, so the aer driver looks for the AER registers in the
> 01:00.0 device and logs those details to the dmesg log. Normally
> there would be a few lines after the one you quoted that would include
> those details.
>
> Bjorn
More information about the ath10k
mailing list