[PATCH] kexec: add resriction on the kexec_load

zhong jiang zhongjiang at huawei.com
Tue Jul 19 20:38:02 PDT 2016


On 2016/7/20 10:07, Eric W. Biederman wrote:
> zhongjiang <zhongjiang at huawei.com> writes:
>
>> From: zhong jiang <zhongjiang at huawei.com>
>>
>> I hit the following question when run trinity in my system. The
>> kernel is 3.4 version. but the mainline have same question to be
>> solved. The root cause is the segment size is too large, it can
>> expand the most of the area or the whole memory, therefore, it
>> may waste an amount of time to abtain a useable page. and other
>> cases will block until the test case quit. at the some time,
>> OOM will come up.
> 5MiB is way too small.  I have seen vmlinux images not to mention
> ramdisks that get larger than that.  Depending on the system
> 1GiB might not be an unreasonable ramdisk size.  AKA run an entire live
> system out of a ramfs.  It works well if you have enough memory.
>
> I think there is a practical limit at about 50% of memory (because we
> need two copies in memory the source and the destination pages), but
> anything else is pretty much reasonable and should have a fair chance of
> working.
>
> A limit that reflected that reality above would be interesting.
> Anything else will likely cause someone trouble in the futrue.
>
> Eric
  In addition, I had tested when set max segment size to 1G when system memory have 32G,
  the rlock probabilistic come up when trinity run.
>> ck time:20160628120131-243c5
>> rlock reason:SOFT-WATCHDOG detected! on cpu 5.
>> CPU 5 Pid: 9485, comm: trinity-c5
>> RIP: 0010:[<ffffffff8111a4cf>]  [<ffffffff8111a4cf>] next_zones_zonelist+0x3f/0x60
>> RSP: 0018:ffff88088783bc38  EFLAGS: 00000283
>> RAX: ffff8808bffd9b08 RBX: ffff88088783bbb8 RCX: ffff88088783bd30
>> RDX: ffff88088f15a248 RSI: 0000000000000002 RDI: 0000000000000000
>> RBP: ffff88088783bc38 R08: ffff8808bffd8d80 R09: 0000000412c4d000
>> R10: 0000000412c4e000 R11: 0000000000000000 R12: 0000000000000002
>> R13: 0000000000000000 R14: ffff8808bffd9b00 R15: 0000000000000000
>> FS:  00007f91137ee700(0000) GS:ffff88089f2a0000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 000000000016161a CR3: 0000000887820000 CR4: 00000000000407e0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process trinity-c5 (pid: 9485, threadinfo ffff88088783a000, task ffff88088f159980)
>> Stack:
>>  ffff88088783bd88 ffffffff81106eac ffff8808bffd8d80 0000000000000000
>>  0000000000000000 ffffffff8124c2be 0000000000000001 000000000000001e
>>  0000000000000000 ffffffff8124c2be 0000000000000002 ffffffff8124c2be
>> Call Trace:
>>  [<ffffffff81106eac>] __alloc_pages_nodemask+0x14c/0x8f0
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>>  [<ffffffff8113e5ef>] alloc_pages_current+0xaf/0x120
>>  [<ffffffff810a0da0>] kimage_alloc_pages+0x10/0x60
>>  [<ffffffff810a15ad>] kimage_alloc_control_pages+0x5d/0x270
>>  [<ffffffff81027e85>] machine_kexec_prepare+0xe5/0x6c0
>>  [<ffffffff810a0d52>] ? kimage_free_page_list+0x52/0x70
>>  [<ffffffff810a1921>] sys_kexec_load+0x141/0x600
>>  [<ffffffff8115e6b0>] ? vfs_write+0x100/0x180
>>  [<ffffffff8145fbd9>] system_call_fastpath+0x16/0x1b
>>
>> The patch just add condition on sanity_check_segment_list to
>> restriction the segment size.
>>
>> Signed-off-by: zhong jiang <zhongjiang at huawei.com>
>> ---
>>  arch/x86/include/asm/kexec.h |  1 +
>>  kernel/kexec_core.c          | 12 ++++++++++++
>>  2 files changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index d2434c1..b31a723 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -67,6 +67,7 @@ struct kimage;
>>  /* Memory to backup during crash kdump */
>>  #define KEXEC_BACKUP_SRC_START	(0UL)
>>  #define KEXEC_BACKUP_SRC_END	(640 * 1024UL)	/* 640K */
>> +#define KEXEC_MAX_SEGMENT_SIZE	(5 * 1024 * 1024UL)	/* 5M */
>>  
>>  /*
>>   * CPU does not save ss and sp on stack if execution is already
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 448127d..35c5159 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -209,6 +209,18 @@ int sanity_check_segment_list(struct kimage *image)
>>  			return result;
>>  	}
>>  
>> +
>> +	/* Verity all segment size donnot exceed the specified size.
>> + 	 * if segment size from user space is too large,  a large 
>> + 	 * amount of time will be wasted when allocating page. so,
>> + 	 * softlockup may be come up.
>> + 	 */
>> +	for (i = 0; i< nr_segments; i++) {
>> +		if (image->segment[i].memsz > KEXEC_MAX_SEGMENT_SIZE)
>> +			return result;
>> +	}
>> +
>> +
>>  	/*
>>  	 * Verify we have good destination addresses.  Normally
>>  	 * the caller is responsible for making certain we don't
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo at kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont at kvack.org"> email at kvack.org </a>
>
> .
>





More information about the kexec mailing list