[PATCH v7 0/5] Reset PCIe devices to address DMA problem on kdump with iommu

Takao Indoh indou.takao at jp.fujitsu.com
Sun Mar 3 19:56:45 EST 2013


(2013/01/23 9:47), Thomas Renninger wrote:
> On Monday, January 21, 2013 10:11:04 AM Takao Indoh wrote:
>> (2013/01/08 4:09), Thomas Renninger wrote:
> ...
>>> I tried the provided patches first on 2.6.32, then I verfied with 3.8-rc2
>>> and in both cases the disk is not detected anymore in
>>> reset_devices (kexec'ed/kdump) case (but things work fine without these
>>> patches).
>>
>> So the problem that the disk is not detected was caused by exactmap
>> problem you guys are discussing? Or still not detected even if exactmap
>> problem is fixed?
> This problem is related to the 5 PCI resetting patches.
> Dumping worked with a 2.6.32 and a 3.8-rc2 kernel, adding the PCI resetting
> patches broke both. I first tried 2.6.32 and verified with 3.8-rc2 to make sure
> I didn't mess up the backport adjustings of the patches to 2.6.32.
>
> Unfortunately this Dell platform takes really long to boot.
> I can give it the one or other test, but please do not bomb me with patches.
>
> For info:
> About the interrupt remapping error interrupt storm in kdump case I tried to
> reproduce on this machine, but never could: The guys who saw that also cannot
> reproduce this anymore.
>
> Two ideas I had about this:
>    - As said already, (also) try to catch the error case and try to reset the
>      the device in AER/Specific iterrupt remapping error interrupt caught.

I tried this idea but it did not work on megaraid_sas.

I made a experimental patch so that devices are reset when DMAR error is
detected on it. What happened is that:
1) megaraid_sas module is loaded.
2) DMAR error is detected during the driver initialization.
3) Reset device
4) kdump fails because the disk is not found.

When I tested patches which reset all devices in early boot time, the
disk was recognized correctly, so it seems that device reset during its
driver loading does something wrong. I think we need reset device at
least before its driver is loaded.

Thanks,
Takao Indoh


>    - Have a look at coreboot, these guys should know how to initialize the PCI
>      subsystem from scratch and might have some well tested PCI resetting
>      code in place already (no idea, just a thought).
>
>      Thomas
>
>




More information about the kexec mailing list