Kernel panics when using kexec for rebooting

Eric W. Biederman ebiederm at xmission.com
Tue May 14 19:14:33 EDT 2013


Dave Lloyd <dave at davelloyd.com> writes:

> On Tue, May 14, 2013 at 5:33 PM, Eric W. Biederman
> <ebiederm at xmission.com> wrote:
>> Dave Lloyd <dave at davelloyd.com> writes:
>>
>>> On Tue, May 14, 2013 at 5:01 PM, Eric W. Biederman
>>> <ebiederm at xmission.com> wrote:
>>>
>>>>
>>>> Yes this does seem to be all over the place, and memory corruption
>>>> probably caused by ongoing-dma seems like a reasonable hypothesis.
>>>
>>> Thank goodness it's not just me! :-)
>>
>> It is a classic issue, although I suspect something is unique in your
>> setup because it has (to my knowledge) not been a widespread problem for
>> years.
>
> It could certainly be buggy hardware. Other details include:
>
> Kernel 3.0.29.0 and we are also using infiniband (which I believe I
> found a reference to the Mellanox hardware potentially causing this
> issue unless the driver was unloaded before reboot with kexec). The
> potential issue with unloading the IB drivers doesn't bug me nearly as
> much as not unloading pata_amd and pata_acpi causing the ACPI Error
> messages upon reboot with kexec.

Oh. Yeah.  IB definitely sets up memory for ongoing dma.  So if it
doesn't have a shutdown method and IB traffic comes in during boot just
about anything cood happen.

> I'm inclined to chalk the ACPI Error mesages up to potentially buggy
> BIOS/hardware from the vendor since pata_amd and pata_acpi are in wide
> use and I would expect to see more issues reported were there truly an
> issue with rebooting with kexec and not unloading pata_amd and
> pata_acpi.

Maybe.  Or it might be luck of timing, which memory was stomped when
incomming IB packets stomped on memory.

Eric



More information about the kexec mailing list