[Patch v2] align crash_notes allocation to make it be inside one physical page

Baoquan He bhe at redhat.com
Mon Aug 3 16:01:52 PDT 2015


Hi Andrew,

Thanks a lot for your reviewing and suggestiong.

On 08/03/15 at 03:04pm, Andrew Morton wrote:
> On Mon,  3 Aug 2015 20:50:43 +0800 Baoquan He <bhe at redhat.com> wrote:
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -1620,7 +1620,16 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> >  static int __init crash_notes_memory_init(void)
> >  {
> >  	/* Allocate memory for saving cpu registers. */
> > -	crash_notes = alloc_percpu(note_buf_t);
> > +	size_t size, align;
> > +	int order;
> > +
> > +	size = sizeof(note_buf_t);
> > +	order = get_count_order(size);
> > +	align = min_t(size_t, (1<<order), PAGE_SIZE);
> > +
> > +	WARN_ON(size > PAGE_SIZE);
> > +
> > +	crash_notes = __alloc_percpu(size, align);
> 
> A code comment would be helpful - the reason for this code's existence
> is otherwise utterly unobvious.

Will add in new post.

> 
> I think it can be done this way:
> 
> 	align = min(roundup_pow_of_two(sizeof(note_buf_t)), PAGE_SIZE);
> 
> 
> I never noticed get_count_order() before.  afaict it does the same as
> order_base_2(), except get_count_order() generates better code and has
> a ridiculous name.

OK, will change the code as you suggested.

> 
> And I think the WARN_ON can be replaced with a
> BUILD_BUG_ON(sizeof>PAGE_SIZE)?  That would avoid adding runtime
> overhead.

I am not sure about this. BUILD_BUG_ON will break kernel compiling.
Before we got the root cause several work around fix were introduced to
skip this kind of crash_note.

  c4082f3 vmcore: continue vmcore initialization if PT_NOTE is found empty
  38dfac8 vmcore: prevent PT_NOTE p_memsz overflow during header update

That means if (sizeof(note_buf_t)>PAGE_SIZE) really happened, normal
kernel works well, kdump kernel can work but we will lose those
crash_notes. And if on one certain ARCH sizeof(note_buf_t) is bigger
than PAGE_SIZE, the design here must be changed to avoid using percpu
variable or adjust their note_buf_t. That may take a not short time to
discuss and review. Comparing with this it may be better to tolerate the
dumping vmcore with uncomplete crash_notes for a while until new design
is taken.

Thanks
Baoquan



More information about the kexec mailing list