[PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
Atsushi Kumagai
kumagai-atsushi at mxc.nes.nec.co.jp
Sun Apr 27 22:04:46 PDT 2014
>> >> >> It definitely will cause OOM. On my test machine, it has 100G memory. So
>> >> >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
>> >> >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
>> >> >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
>> >> >> use, e.g page cache, dynamic allocation. OOM will happen.
>> >> >>
>> >> >
>> >> >BTW, in our case, there's about 30M free memory when we started saving
>> >> >dump. It should be caused by my coarse estimation above.
>> >>
>> >> Thanks for your description, I understand that situation and
>> >> the nature of the problem.
>> >>
>> >> That is, the assumption that 20% of free memory is enough for
>> >> makedumpfile can be broken if free memory is too small.
>> >> If your machine has 200GB memory, OOM will happen even after fix
>> >> the too allocation bug.
>> >
>> >Why? In cyclic mode, shouldn't makedumpfile's memory usage be fixed
>> >and should not be dependent on amount of RAM present in the system?
>>
>> Strictly speaking, it's not fixed but just restricted by the safe
>> limit(80% of free memory) like below:
>>
>
>What I meant was memory allocation is of the order O(1) and not O(N).
>
>> - bitmap size: used for 1st and 2nd bitmaps
>> - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
>>
>> pattern | bitmap size | remains
>> ----------------------------------------------+---------------+-------------
>> A. 100G memory with the too allocation bug | 12.8 MB | 17.2 MB
>> B. 100G memory with fixed makedumpfile | 6.4 MB | 23.6 MB
>> C. 200G memory with fixed makedumpfile | 12.8 MB | 17.2 MB
>> D. 300G memory with fixed makedumpfile | 19.2 MB | 10.8 MB
>> E. 400G memory with fixed makedumpfile | 24.0 MB | 6.0 MB
>> F. 500G memory with fixed makedumpfile | 24.0 MB | 6.0 MB
>> ...
>>
>> Baoquan got OOM in A pattern and didn't get it in B, so C must also
>> fail due to OOM. This is just what I wanted to say.
>
>ok, So here bitmap size is growing because we have not hit the 80% of
>available memory limit yet. But it gets limited at 24MB once we hit
>80% limit. I think that's fine. That's what I was looking for.
>
>Now key question will remain is that is using 80% of free memory by
>bitmaps too much. Are other things happening in system which consume
>memory and because memory is not available OOM hits. If that's the
>case we probably need to lower the amount of memory allocated to
>bit maps. Say 70% or 60% or may be 50%. But this should be data driven.
>
>
>>
>> >Also, so even 30MB is not sufficient to run makedumpfile. That looks
>> >like a lot of free memory to me.
>>
>> According to the table above and Baoquan's report, 23.6MB was enough
>> but 17.2 MB was short for the other works, it sounds too much requirement
>> also to me.
>>
>> >> I don't think this is a problem, it's natural that a lack of memory
>> >> causes OOM. However, there is a thing we can do for improvement.
>> >>
>> >> What I think is:
>> >>
>> >> 1. Use a constant value as safe limit to calculate bufsize_cyclic
>> >> instead of 80% of free memory. This value must be enough for
>> >> makedumpfile's work except bitmap.
>> >>
>> >> 2. If free memory is smaller than the value, makedumpfile gives up
>> >> to work early.
>> >
>> >What do we gain by makedumpfile giving up. System will reboot. System
>> >will reboot anyway after OOM.
>>
>> Oh, you got a point.
>>
>> >>
>> >> This change may reduce the possibility of lack of memory, but the
>> >> required memory size will be changing every version, so maintaining
>> >> it sounds tough to me.
>> >
>> >I think we need to dive deeper to figure out why 30MB of free memory
>> >is not sufficient. To me something looks wrong here.
>>
>> I agree. I'm trying to get the peak size of memory footprint except
>> cyclic_buffer with memory cgroup (memory.max_usage_in_bytes), but
>> I don't make it yet.
>
>That sounds like a good idea.
>
>I am wondering if we are doing anything which requires kernel to allocate
>memory and that leads to OOM.
max_usage_in_bytes seems to include page cache, so I used simpler way
like the patch at the bottom of this mail instead.
Here is the result:
parameter | result
dump_lv | buffer[KiB] | mmap (=4MiB) | VmHWM [KiB]
--------+-------------+---------------+------------------
d31 | 1 | on | 5872
Ed31 | 1 | on | 5804
d31 | 1 | off | 1784
Ed31 | 1 | off | 1720
d31 | 1000 | on | 8884 (*1)
Ed31 | 1000 | on | 7812
d31 | 1000 | off | 4788 (*1)
Ed31 | 1000 | off | 3720
*1. 2nd bitmap is allocated twice by the bug found by Arthur Zou.
According to this result and the estimation below produced from the
design of makedumpfile,
base size + (buffer size * 2) + (for mmap size) = VmHWM
the base size may be about 2 MB, so about 6MB (base + mmap) will
be the deadline. If 20% of the available memory is less than 6MB,
OOM will happen.
As for Baoquan's case, remained 17.2MB sounds enough even if
makedumpfile consumes 6MB as for the other works.
So I don't know why OOM happened yet, but the current safety
limit looks still reasonable to me at least.
By the way, I noticed it's better to remove 4MB(for mmap)
from the available memory before calculate the bitmap buffer
size. I'll do this anyway.
Thanks
Atsushi Kumagai
diff --git a/makedumpfile.c b/makedumpfile.c
index 0b31932..4b565f1 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -37,6 +37,14 @@ struct DumpInfo *info = NULL;
char filename_stdout[] = FILENAME_STDOUT;
+void
+print_VmHWM(void)
+{
+ char cmd[64];
+ sprintf(cmd, "grep VmHWM /proc/%d/status", getpid());
+ system(cmd);
+}
+
static void first_cycle(mdf_pfn_t start, mdf_pfn_t max, struct cycle *cycle)
{
cycle->start_pfn = round(start, info->pfn_cyclic);
@@ -9357,5 +9365,7 @@ out:
}
free_elf_info();
+ print_VmHWM();
+
return retcd;
}
More information about the kexec
mailing list