Kernel panics when using kexec for rebooting

Eric W. Biederman ebiederm at xmission.com
Tue May 14 18:33:29 EDT 2013


Dave Lloyd <dave at davelloyd.com> writes:

> On Tue, May 14, 2013 at 5:01 PM, Eric W. Biederman
> <ebiederm at xmission.com> wrote:
>
>>
>> Yes this does seem to be all over the place, and memory corruption
>> probably caused by ongoing-dma seems like a reasonable hypothesis.
>
> Thank goodness it's not just me! :-)

It is a classic issue, although I suspect something is unique in your
setup because it has (to my knowledge) not been a widespread problem for
years.

>> The easy first thing to try is to remove all of your kernel modules
>> before you reboot with kexec.  Not infrequently the module remove path
>> is better tested than the device shutdown path.
>
> I'm trying this now. In one panic, the pte referenced was
> 0x100010000000000 which sure looks a whole like someone wrote his
> registers in there. It certainly doesn't look like a valid pte.
>
> So far, unloading pata_acpi and pata_amd seem to have eliminated the
> ACPI exception messages. I believe that this resets the device
> properly. Unfortunately, it looks like lots of drivers don't implement
> the pci_driver->shutdown call, so it would make sense that this is a
> relatively widespread problem.

Most devices don't leave dma setup if you reboot, and even more the
generic pci clears the bus master DMA bit which shuts down a lot more
dma.

So the actual lack of a shutdown method is not as much of an issue as it
might appear.

Eric




More information about the kexec mailing list