[PATCH v2] kexec: add resriction on the kexec_load
zhong jiang
zhongjiang at huawei.com
Sat Jul 23 06:37:11 PDT 2016
On 2016/7/23 3:58, Andrew Morton wrote:
> On Fri, 22 Jul 2016 13:36:22 +0800 zhongjiang <zhongjiang at huawei.com> wrote:
>
>> From: zhong jiang <zhongjiang at huawei.com>
>>
>> I hit the following question when run trinity in my system. The
>> kernel is 3.4 version. but the mainline have same question to be
>> solved. The root cause is the segment size is too large, it can
>> expand the most of the area or the whole memory, therefore, it
>> may waste an amount of time to abtain a useable page. and other
>> cases will block until the test case quit. at the some time,
>> OOM will come up.
>>
>> Call Trace:
>> [<ffffffff81106eac>] __alloc_pages_nodemask+0x14c/0x8f0
>> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
>> [<ffffffff8113e5ef>] alloc_pages_current+0xaf/0x120
>> [<ffffffff810a0da0>] kimage_alloc_pages+0x10/0x60
>> [<ffffffff810a15ad>] kimage_alloc_control_pages+0x5d/0x270
>> [<ffffffff81027e85>] machine_kexec_prepare+0xe5/0x6c0
>> [<ffffffff810a0d52>] ? kimage_free_page_list+0x52/0x70
>> [<ffffffff810a1921>] sys_kexec_load+0x141/0x600
>> [<ffffffff8115e6b0>] ? vfs_write+0x100/0x180
>> [<ffffffff8145fbd9>] system_call_fastpath+0x16/0x1b
>>
>> The patch just add condition on sanity_check_segment_list to
>> restriction the segment size.
>>
>> ...
>>
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -148,6 +148,7 @@ static struct page *kimage_alloc_page(struct kimage *image,
>> int sanity_check_segment_list(struct kimage *image)
>> {
>> int result, i;
>> + unsigned long total_segments = 0;
>> unsigned long nr_segments = image->nr_segments;
>>
>> /*
>> @@ -209,6 +210,21 @@ int sanity_check_segment_list(struct kimage *image)
>> return result;
>> }
>>
>> + /* Verity all segment size donnot exceed the specified size.
>> + * if segment size from user space is too large, a large
>> + * amount of time will be wasted when allocating page. so,
>> + * softlockup may be come up.
>> + */
>> + for (i = 0; i < nr_segments; i++) {
>> + if (image->segment[i].memsz > (totalram_pages / 2))
>> + return result;
>> +
>> + total_segments += image->segment[i].memsz;
>> + }
>> +
>> + if (total_segments > (totalram_pages / 2))
>> + return result;
>> +
>> /*
>> * Verify we have good destination addresses. Normally
>> * the caller is responsible for making certain we don't
> This needed a few adjustments for pending changes in linux-next's
> sanity_check_segment_list(). Mainly s/return result/return -EINVAL/.
> I also tweaked the patch changelog. Please check.
>
> From: zhong jiang <zhongjiang at huawei.com>
> Subject: kexec: add restriction on kexec_load() segment sizes
>
> I hit the following issue when run trinity in my system. The kernel is
> 3.4 version, but mainline has the same issue.
>
> The root cause is that the segment size is too large so the kerenl spends
> too long trying to allocate a page. Other cases will block until the test
> case quits. Also, OOM conditions will occur.
>
> Call Trace:
> [<ffffffff81106eac>] __alloc_pages_nodemask+0x14c/0x8f0
> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
> [<ffffffff8124c2be>] ? trace_hardirqs_on_thunk+0x3a/0x3c
> [<ffffffff8113e5ef>] alloc_pages_current+0xaf/0x120
> [<ffffffff810a0da0>] kimage_alloc_pages+0x10/0x60
> [<ffffffff810a15ad>] kimage_alloc_control_pages+0x5d/0x270
> [<ffffffff81027e85>] machine_kexec_prepare+0xe5/0x6c0
> [<ffffffff810a0d52>] ? kimage_free_page_list+0x52/0x70
> [<ffffffff810a1921>] sys_kexec_load+0x141/0x600
> [<ffffffff8115e6b0>] ? vfs_write+0x100/0x180
> [<ffffffff8145fbd9>] system_call_fastpath+0x16/0x1b
>
> The patch chnages sanity_check_segment_list() to verify that no segment is
> larger than half of memory.
>
> Link: http://lkml.kernel.org/r/1469165782-13193-1-git-send-email-zhongjiang@huawei.com
> Signed-off-by: zhong jiang <zhongjiang at huawei.com>
> Cc: Eric Biederman <ebiederm at xmission.com>
> Cc: Vivek Goyal <vgoyal at redhat.com>
> Cc: Dave Young <dyoung at redhat.com>
> Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
> ---
>
> kernel/kexec_core.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff -puN kernel/kexec_core.c~kexec-add-resriction-on-the-kexec_load kernel/kexec_core.c
> --- a/kernel/kexec_core.c~kexec-add-resriction-on-the-kexec_load
> +++ a/kernel/kexec_core.c
> @@ -154,6 +154,7 @@ static struct page *kimage_alloc_page(st
> int sanity_check_segment_list(struct kimage *image)
> {
> int i;
> + unsigned long total_segments = 0;
> unsigned long nr_segments = image->nr_segments;
>
> /*
> @@ -214,6 +215,21 @@ int sanity_check_segment_list(struct kim
> return -EINVAL;
> }
>
> + /* Verity all segment size donnot exceed the specified size.
> + * if segment size from user space is too large, a large
> + * amount of time will be wasted when allocating page. so,
> + * softlockup may be come up.
> + */
> + for (i = 0; i < nr_segments; i++) {
> + if (image->segment[i].memsz > (totalram_pages / 2))
> + return -EINVAL;
> +
> + total_segments += image->segment[i].memsz;
> + }
> +
> + if (total_segments > (totalram_pages / 2))
> + return -EINVAL;
> +
> /*
> * Verify we have good destination addresses. Normally
> * the caller is responsible for making certain we don't
> _
>
>
>
>
> also I tweaked the comments a bit:
>
> --- a/kernel/kexec_core.c~kexec-add-resriction-on-the-kexec_load-fix
> +++ a/kernel/kexec_core.c
> @@ -215,10 +215,10 @@ int sanity_check_segment_list(struct kim
> return -EINVAL;
> }
>
> - /* Verity all segment size donnot exceed the specified size.
> - * if segment size from user space is too large, a large
> - * amount of time will be wasted when allocating page. so,
> - * softlockup may be come up.
> + /*
> + * Verify that no segment is larger than half of memory. If a segment
> + * from userspace is too large, a large amount of time will be wasted
> + * allocating pages, which can cause a soft lockup.
> */
> for (i = 0; i < nr_segments; i++) {
> if (image->segment[i].memsz > (totalram_pages / 2))
> _
>
>
> Eric ack?
>
> .
>
I am so sorry, I think that page_shift is more suitable than 12. it should like as follow.
if (total_segments > (totalram_pages << PAGE_SHIFT) / 2)
More information about the kexec
mailing list