[PATCH 0/3] Cleanup kdump memmap= passing and e820 usage

Eric W. Biederman ebiederm at xmission.com
Wed Feb 6 18:39:50 EST 2013


"H. Peter Anvin" <hpa at zytor.com> writes:

> On 02/06/2013 03:04 PM, Eric W. Biederman wrote:
>>>
>>> There is another important point, why the command line approach
>>> should be preferred:
>>> Backward compatibility and the ability to backport the whole stuff to
>>> fix mmconf in kdump which would be nice for example for SLES11.
>> 
>> Backward compatibility argues for editing the e820 map because we can do
>> that at any time, with no dependencies on any kernel changes.  Only
>> the E820_RAM type will be treated as ram.  Any unregcognized e820 type
>> will be treated as reserved.  The code has always been like that.
>> 
>> A new reserved value would be nice to communicate to the kernel areas
>> that are really ram but it isn't allowed to touch but is unnecessary at
>> this point.  Even with just marking memory regions we don't use as
>> E820_RESERVED we match what is currently being done.
>> 
>> Since a new reserved value has not been selected let me suggest.
>> 0x6b646d70 aka kdmp in asii.
>> 
>
> I (somewhat) would like to keep the reserved numbers in a small(ish)
> range which argue against that specific constant.  I kind of like
> 0x6bxxxxxx ("k") though, it has some flair to it.

Well if someone doesn't reserve such a constant in a well know place the
historical solution is to pick a random number and hope you don't
collide with someone else's random number.  We are pretty close to that
right now with the e820 map.

And coming up sometime soonish is how do we do this for the efi memory
map.

We do need to regenerate the map in /sbin/kexec though to handle
the case of memory hotplug (which necessitates reloading our crash
kernel).

>> For backwards compatibility I prefer editing the e820 map in
>> /sbin/kexec.
>> 
>> 
>> My real preference would be to define a command line option that will
>> work on all architectures that implement kdump, as the craskernel option
>> does.  Unfortunately it looks like that ship has sailed, and there isn't
>> enough desire to fix this to come up with a generic option that will
>> work on more than just x86.  But if we could get past the kernel
>> versioning and figure out a arch-generic solution it might be worth it.
>> 
>
> What would that option look like?

Probably something like "usemem=<size>@<addr>,..."

>>> kexec-tools can detect the kernel version of the kernel which is loaded
>>> as kdump/crash kernel. If its version is:
>>> "$MAINLINE_VERSION_THE_CHANGE_GETS_INTRODUCED"
>>> or newer, things are fine.
>>> But if the kernel version is older, there is no way for kexec-tools to
>>> find out whether the older kernel may have the feature included.
>>> That's bad!
>> 
>> That is totally unnecessary for the e820 map because anything
>> unrecognized is treated as reserved, and for the sufficiently paranoid
>> we don't need to use a new memory type.
>
> The only issue is if kdump needs the memory it is going to dump to be
> mapped; we don't map reserved memory anymore unless explicitly requested
> via ioremap().  Does it?

I don't think that it makes sense for the memory to be permanently
mapped.  Even at 4MB per terabyte with 2M pages for the bigger systems
that becomes a noticable amount of our memory to reserve for kdump.

In the general picture we do need to track the memory so that we
remember how the memory should be cached or we run into the possibility
of getting the caching bits set into an inconsistent state.

There is presently work to modify /dev/oldmem and /proc/vmcore so
that they are mmapable, so that userspace can control how much is
ioremapped at once.  As currently on the larger systems there is
major performance problem with mapping a single page at a time and
copying that to userspace.

>> The existing e820 handling for unknown type is much much better.  It
>> just treats them as reserved and goes about it's merry way.
>
> It sounds like this is the way to go.

It certainly looks good.  We still need someone with the time to write
the patch and test it.

Eric



More information about the kexec mailing list