[PATCH] arm64: mm: Fix false positives in W+X checking
Jeffrey Hugo
jhugo at codeaurora.org
Fri Apr 27 13:01:48 PDT 2018
On 4/27/2018 1:30 PM, Laura Abbott wrote:
> On 04/25/2018 09:37 AM, Will Deacon wrote:
>> On Wed, Apr 25, 2018 at 10:36:25AM -0600, Jeffrey Hugo wrote:
>>> On 4/25/2018 10:23 AM, Kees Cook wrote:
>>>> On Wed, Apr 25, 2018 at 8:12 AM, Jeffrey Hugo <jhugo at codeaurora.org>
>>>> wrote:
>>>>> On 4/25/2018 9:03 AM, Will Deacon wrote:
>>>>>> On Wed, Apr 25, 2018 at 08:13:31AM -0600, Jeffrey Hugo wrote:
>>>>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>>>>> index 2dbb2c9..40d45fd 100644
>>>>>>> --- a/arch/arm64/mm/mmu.c
>>>>>>> +++ b/arch/arm64/mm/mmu.c
>>>>>>> @@ -503,6 +503,12 @@ void mark_rodata_ro(void)
>>>>>>> update_mapping_prot(__pa_symbol(__start_rodata), (unsigned
>>>>>>> long)__start_rodata,
>>>>>>> section_size, PAGE_KERNEL_RO);
>>>>>>> + /*
>>>>>>> + * load_module() results in W+X mappings, which are
>>>>>>> cleaned up
>>>>>>> with
>>>>>>> + * call_rcu_sched(). Let's make sure that queued work is
>>>>>>> flushed
>>>>>>> so
>>>>>>> + * that we don't hit false positives.
>>>>>>> + */
>>>>>>> + rcu_barrier_sched();
>>>>>>> debug_checkwx();
>>>>>>> }
>>>>>>
>>>>>>
>>>>>> Whilst this looks correct to me, it looks to me like other
>>>>>> architectures
>>>>>> (e.g. x86) would be affected by this problem as well. Perhaps it
>>>>>> would be
>>>>>> better to solve it in the core code before invoking
>>>>>> mark_rodata_ro, or is
>>>>>> there some reason that we only run into this on arm64?
>>>>>>
>>>>>
>>>>> Thanks for the review.
>>>>>
>>>>> I was actually pondering pushing this into ptdump_check_wx() so
>>>>> that the
>>>>> "barrier" hit is only observed with CONFIG_DEBUG_WX.
>>>>>
>>>>> I agree, in principal this is not an arm64 specific issue. I do
>>>>> not have
>>>>> sufficient equipment to validate other architectures. On QDF2400, the
>>>>> reproduction rate is very low, roughly 1-3 instances in 2000-3000
>>>>> reboots.
>>>>> I do have a system simulator which happens to repro the issue 100%
>>>>> of the
>>>>> time, which is what I used to debug and test this fix, before
>>>>> applying it to
>>>>> QDF2400.
>>>>>
>>>>> I'm waffling. I see the benefit of fixing this in common code, but
>>>>> the
>>>>> "core" functionality of mark_rodata_ro doesn't need this barrier...
>>>>>
>>>>> I suppose I can push it up to core code, and see what the rest of the
>>>>> community says. Is that what you recommend?
>>>>
>>>> I think fixing this in the general case makes sense.
>>>
>>> Ok. I'll give it until Monday to see if Laura has any insights into
>>> x86,
>>> and then spin a v2.
>>
>> Thanks, Jeffrey.
>>
>> Will
>>
>
> I set up some test code that does request_module in a late_initcall
> and stuck a delay and print in do_free_init and triggered it:
>
> [ 6.115319] ? module_memfree+0x10/0x10
> [ 6.116082] rcu_process_callbacks+0x1bd/0x7f0
> [ 6.116895] __do_softirq+0xd9/0x2af
> [ 6.117588] irq_exit+0xa9/0xb0
> [ 6.118193] smp_apic_timer_interrupt+0x67/0x120
> [ 6.118993] apic_timer_interrupt+0xf/0x20
> [ 6.119732] </IRQ>
> [ 6.120265] RIP: 0010:__change_page_attr_set_clr+0x667/0xc80
> [ 6.121363] RSP: 0018:ffffc90000197c60 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff13
> [ 6.123417] RAX: 00000000000001e8 RBX: 80000000000000e3 RCX:
> ffffffff81c031d1
> [ 6.124677] RDX: 0000000000000000 RSI: ffffc90000197dd8 RDI:
> 80000000028000e3
> [ 6.125905] RBP: ffff8800029e7000 R08: 00000000ffffd801 R09:
> 000ffffffffff000
> [ 6.127114] R10: 0000000000000001 R11: 000ffffffffff000 R12:
> 80000000000000e3
> [ 6.128350] R13: 00000000000029e7 R14: 0000000000000200 R15:
> 80000000000000e3
> [ 6.129522] ? __alloc_pages_nodemask+0xfc/0x220
> [ 6.130218] ? smp_call_function_many+0x1e7/0x230
> [ 6.130935] ? load_new_mm_cr3+0xe0/0xe0
> [ 6.131566] __change_page_attr_set_clr+0xc59/0xc80
> [ 6.132293] ? cpumask_next+0x16/0x20
> [ 6.132894] change_page_attr_set_clr+0x131/0x470
> [ 6.133601] set_memory_rw+0x21/0x30
> [ 6.134172] free_init_pages+0x59/0x80
> [ 6.134766] ? rest_init+0xb0/0xb0
> [ 6.135322] kernel_init+0x14/0x100
> [ 6.135885] ret_from_fork+0x35/0x40
>
> This is part way through mark_rodata_ro which does the debug_checkwx()
> so I think it could still be an issue on x86. The set_memory_* functions
> on x86 are a bit more involved than arm64 and may provide more
> opportunities for the rcu callbacks to run. I'm guessing arm64 may just
> be loading modules that could be particularly likely to load other
> modules.
>
> So if you're willing to take some of my hand waving, I think this
> should go in generic code since this is difficult to reproduce.
Thank you for the follow up Laura. In my opinion, there is no hand
waiving. I feel you synthetically demonstrated the issue on x86, thus
confirming this is a multi-architectural issue.
I will work on a v2 moving the barrier to common code as suggested.
--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
More information about the linux-arm-kernel
mailing list