[PATCH] makedumpfile: print spinner in progress information
HATAYAMA Daisuke
d.hatayama at jp.fujitsu.com
Mon Oct 28 22:42:44 EDT 2013
(2013/10/29 10:26), HATAYAMA Daisuke wrote:
> (2013/10/25 13:07), Atsushi Kumagai wrote:
>> Hello HATAYAMA-san,
>>
>> (2013/10/25 9:55), HATAYAMA Daisuke wrote:
>>> On system with huge memory, percentage in progress information is
>>> updated at very slow interval, because 1 percent on 1 TiB memory is
>>> about 10 GiB, which looks like as if system has freezed. Then,
>>> confused users might get tempted to push a reset button to recover the
>>> system. We want to avoid such situation as much as possible.
>>>
>>> To address the issue, this patch adds spinner that rotates in the
>>> order of /, |, \ and - next to the progress indicator in percentage,
>>> which helps users to get aware that system is still active and crash
>>> dump process is still in progress now.
>>>
>>> This code is borrowed from diskdump code.
>>>
>>> The example is like this:
>>>
>>> Copying data : [ 0 %] /
>>> Copying data : [ 8 %] |
>>> Copying data : [ 11 %] \
>>> Copying data : [ 14 %] -
>>> Copying data : [ 16 %] /
>>> ...
>>> Copying data : [ 99 %] /
>>> Copying data : [100 %] |
>>
>> I like it, but have a comment.
>>
>> 6109 int
>> 6110 write_kdump_pages_cyclic(struct cache_data *cd_header, struct cache_data *cd_page,
>> 6111 struct page_desc *pd_zero, off_t *offset_data)
>> 6112 {
>> ...
>> 6156 per = info->num_dumpable / 100;
>> ...
>> 6178 for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> 6179
>> 6180 if ((num_dumped % per) == 0)
>> 6181 print_progress(PROGRESS_COPY, num_dumped, info->num_dumpable);
>>
>> The interval of calling print_progress() looks still long if
>> num_dumpable is huge.
>> So how about fix this, e.g., by changing the interval to time based ?
>>
>
> I wrote simple bench for time-based interval as below, which measures
> total time consumed for calling time system call with/without vDSO.
> It seems to me that both results are acceptable.
> I'll reflect this change in next version.
>
> $ ./bench
> total: 21.059131
> average: 0.000000
> total: 65.558263
> average: 0.000000
>
This conclusion was wrong. Sorry. For example on our FJ 12 TiB system we collected about 300 GiB
crash dump in about 40 minutes. If removing "if ((num_dumped % per) == 0)" and calling time()
in each loop in print_progress(), total time for invoking time() system call is about 65 * 12
= 780 sec = 13 min. This is about 20 % of a whole crash dump time. Obviously problematic.
Instead, I think it better to increase the number of calling print_progress() like:
per = info->num_dumpable / 10000
--
Thanks.
HATAYAMA, Daisuke
More information about the kexec
mailing list