[PATCH] vmcoreinfo: Warn if we exceed vmcoreinfo data size

Wed Nov 9 09:00:14 PST 2022

On 11/8/22 17:04, Baoquan He wrote:
> On 11/08/22 at 03:48pm, Andrew Morton wrote:
>> On Thu, 27 Oct 2022 13:50:08 -0700 Stephen Brennan <stephen.s.brennan at oracle.com> wrote:
>>
>>> Though vmcoreinfo is intended to be small, at just one page, useful
>>> information is still added to it, so we risk running out of space.
>>> Currently there is no runtime check to see whether the vmcoreinfo buffer
>>> has been exhausted. Add a warning for this case.
>>>
>>> Currently, my static checking tool[1] indicates that a good upper bound
>>> for vmcoreinfo size is currently 3415 bytes, but the best time to add
>>> warnings is before the risk becomes too high.
>>>
>>> ...
>>>
>>> --- a/kernel/crash_core.c
>>> +++ b/kernel/crash_core.c
>>> @@ -383,6 +383,9 @@ void vmcoreinfo_append_str(const char *fmt, ...)
>>>   	memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
>>>   
>>>   	vmcoreinfo_size += r;
>>> +
>>> +	WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES,
>>> +		  "vmcoreinfo data exceeds allocated size, truncating");
>>>   }
>>
>> Seems that vmcoreinfo_append_str() will truncate (ie: corrupt) the
>> final entry when limiting the overall data size to VMCOREINFO_BYTES.
>> And that final entry will be missing any terminating \n or \0.
>>
>> Is all this desirable, or should we be checking for (and warning about)
>> sufficient space _before_ appending this string?
> 
> 
> Hmm, once we really reach that point, truncated vmcoreinfo should not be
> useful for later vmcore dumping and analyzing. As we can see, the
> arch_crash_save_vmcoreinfo() is called at the end of
> crash_save_vmcoreinfo_init(). E.g on x86_64, the phys_base,
> init_top_pgt, etc are very important for memory layout analyzing.
> Fortunatly this insufficient vmcoreinfo page won't impact the normal
> kernel running.
> 
> So, the current change looks good to me.
> 
> My further thinking is if we should print the truncated or first skipped
> entry in the warning so that people know better what's happening, even
> though whatever we will do is to increase one page for vmcoreinfo buffer.
> Not strong opinion though.

This is a bit nicer, it would save us needing to figure it out from the
stack. Of course, regardless of _which_ line puts us over the limit, it
seems like the response is the same: increase the size or remove info. It's
just a matter of how much to increase or how much to remove.

I'm happy with it either way.

Thanks,
Stephen

> 
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index a0eb4d5cf557..8ba4dd90694d 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -383,6 +383,9 @@ void vmcoreinfo_append_str(const char *fmt, ...)
>   	memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
>   
>   	vmcoreinfo_size += r;
> +
> +	WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES,
> +		  "vmcoreinfo data exceeds allocated size when adding: %s\n", buf);
>   }
>   
>   /*
>