[PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
bhe at redhat.com
bhe at redhat.com
Tue May 13 22:44:13 PDT 2014
On 05/09/14 at 05:36am, Atsushi Kumagai wrote:
> There were more than 10 MB remain memory in your case (A pattern),
> and probably they were used as page cache mostly, so it sounds OOM
> couldn't happen since such pages are reclaimable.
>
> I tried to reproduce OOM in my environment.
> Unfortunately, I couldn't get a chance to use a large memory machine,
> so I controlled the bitmap buffer size with --cyclic-buffer like below:
>
> / # free
> total used free shared buffers
> Mem: 37544 19796 17748 0 56
> Swap: 0 0 0
> Total: 37544 19796 17748
> / # /mnt/usr/sbin/makedumpfile_static -E --cyclic-buffer=8000 /proc/vmcore /mnt/tmp/dumpfile.E
> Copying data : [100.0 %] |
>
> The dumpfile is saved to /mnt/tmp/dumpfile.E.
>
> makedumpfile Completed.
> VmHWM: 16456 kB
Hi Atsushi,
What is the dump target in your test? The OOM bugs happened if and only
if it nfs target. Other dump target including ssh/local fs/ didn't
happen ever though those cyclic buffer size code bugs existed.
We added some debug code and made test, the result shows the OOM
happened when the left memory is only about 2M. But when we drop page
caches every 1000 times of write, it's OK to complete the dump. And the
OOM happened when page cache need allocate page for writing.
So could you adjust your test to nfs dump with elf format or lzo
compression? I think nfs dump have heavy page cache affect which is
different with others.
I will take machines with 100G memory and 10G memory to test separately,
will paste the result with our configuration.
Thanks
Baoquan
> / #
>
> As above, OOM didn't happen even when makedumpfile consumed most of the
> available memory (the remains were only 1MB).
>
> Of course, OOM happened when the memory usage exceeded the limit:
>
> / # free
> total used free shared buffers
> Mem: 37544 21608 15936 0 368
> Swap: 0 0 0
> Total: 37544 21608 15936
> / # /mnt/usr/sbin/makedumpfile_static -E --cyclic-buffer=8192 /proc/vmcore /mnt/tmp/dumpfile.E
> Copying data : [ 0.0 %] /Out of memory: Kill process 1389 (makedumpfile_st) score 428 or sacrifice child
> Killed process 1389 (makedumpfile_st) total-vm:22196kB, anon-rss:16524kB, file-rss:8kB
> KILL
> / #
>
>
> I think we should investigate why OOM happened in your environment,
> otherwise we can't decide a safety limit of a user process's memory usage.
>
>
> Thanks
> Atsushi Kumagai
>
> >>
> >> By the way, I'm going on holiday for 8 days, I can't reply
> >> during that period. Thanks in advance.
> >
More information about the kexec
mailing list