RFC on Kdump and PCIe on ARM64

Sinan Kaya okaya at codeaurora.org
Fri Mar 2 06:20:54 PST 2018


On 3/1/2018 7:03 PM, Bjorn Helgaas wrote:
>> 3. The last one is adapter gets into fuzzy state due to not coming
>> out of clean state in the second time init and being rejected by
>> SMMUv3 multiple times.
>>
>> [   16.093441] pci 0000:01:00.0: aer_status: 0x00040000, aer_mask: 0x00000000
>> [   16.099356] pci 0000:01:00.0: Malformed TLP
>> [   16.103522] pci 0000:01:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID
>> [   16.110900] pci 0000:01:00.0: aer_uncor_severity: 0x00062011
>> [   16.116543] pci 0000:01:00.0:   TLP Header: 0a00a000 00008100 01010100 00000000
> I'm not clear on this.  I don't remember what an IOMMU fault looks
> like to an Endpoint.  Are you saying that if an Endpoint sees too many
> of those faults, it gets into this "fuzzy state" (whatever that is :))?
> Is this a hardware defect?  Do we care (this is a kdump kernel, after
> all)?  If we do care, can we fix the device by resetting it?

fuzzy=funky=funny=wierd

Regardless of what we do in the IOMMU driver, I think we still have to reset
the endpoint in order to have a clean initialization.

I'm not sure if all endpoint drivers can recover an adapter from a live state.

I wasn't expecting to see a Malformed TLP error. I was guessing that this was
caused by SMMU giving a CA or UR to the endpoint or having a live adapter
in the middle of driver initialization. 

I think we do care about the adapter coming up properly otherwise how would
you collect the dumps from the system?

I was expecting to come through the network interface and download it from
the target.

That's why, I was suggesting FLR/PM reset etc. when we know that we are
booting a kdump kernel.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.



More information about the linux-arm-kernel mailing list