[PATCH] arm: Use kernel mm when updating section permissions

Kees Cook keescook at chromium.org
Fri Nov 6 11:12:00 PST 2015


On Fri, Nov 6, 2015 at 11:08 AM, Kees Cook <keescook at chromium.org> wrote:
> On Fri, Nov 6, 2015 at 10:44 AM, Laura Abbott <labbott at redhat.com> wrote:
>> On 11/05/2015 05:15 PM, Kees Cook wrote:
>>>
>>> On Thu, Nov 5, 2015 at 5:05 PM, Laura Abbott <labbott at redhat.com> wrote:
>>>>
>>>> On 11/05/2015 08:27 AM, Russell King - ARM Linux wrote:
>>>>>
>>>>>
>>>>> On Thu, Nov 05, 2015 at 08:20:42AM -0800, Laura Abbott wrote:
>>>>>>
>>>>>>
>>>>>> On 11/05/2015 01:46 AM, Russell King - ARM Linux wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 04, 2015 at 05:00:39PM -0800, Laura Abbott wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Currently, read only permissions are not being applied even
>>>>>>>> when CONFIG_DEBUG_RODATA is set. This is because section_update
>>>>>>>> uses current->mm for adjusting the page tables. current->mm
>>>>>>>> need not be equivalent to the kernel version. Use pgd_offset_k
>>>>>>>> to get the proper page directory for updating.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> What are you trying to achieve here?  You can't use these functions
>>>>>>> at run time (after the first thread has been spawned) to change
>>>>>>> permissions, because there will be multiple copies of the kernel
>>>>>>> section mappings, and those copies will not get updated.
>>>>>>>
>>>>>>> In any case, this change will probably break kexec and ftrace, as
>>>>>>> the running thread will no longer see the updated page tables.
>>>>>>>
>>>>>>
>>>>>> I think I was hitting that exact problem with multiple copies
>>>>>> not getting updated. The section_update code was being called
>>>>>> and I was seeing the tables get updated but nothing was being
>>>>>> applied when I tried to write to text or check the debugfs
>>>>>> page table. The current flow is:
>>>>>>
>>>>>> rest_init -> kernel_thread(kernel_init) and from that thread
>>>>>> mark_rodata_ro. So mark_rodata_ro is always going to happen
>>>>>> in a thread.
>>>>>>
>>>>>> Do we need to update for both init_mm and the first running
>>>>>> thread?
>>>>>
>>>>>
>>>>>
>>>>> The "first running thread" is merely coincidental for things like kexec.
>>>>>
>>>>> Hmm.  Actually, I think the existing code _should_ be fine.  At the
>>>>> point where mark_rodata_ro() is, we should still be using init_mm, so
>>>>> updating the current threads page tables should actually be updating
>>>>> the swapper_pg_dir.
>>>>
>>>>
>>>>
>>>> That doesn't seem to hold true. Based on what I'm seeing, we lose
>>>> the the guarantee of init_mm after the first exec. If usermodehelper
>>>> gets called to load a module, that triggers an exec and the kernel
>>>> thread is no longer using init_mm after that. I'm testing with the
>>>> multi-v7 defconfig which uses the smsc911x driver which loads a
>>>> module during initcall. That gets called before mark_rodata_ro so
>>>> the init_mm is never updated. I verified that disabling smsc911x
>>>> makes things work as expected. I suspect the testing was never done
>>>> with a driver that tried to call usermodehelper during init time.
>>>
>>>
>>> Ooooh. Nice catch. Yeah, my testing didn't include that case.
>>>
>>>> I got as far as narrowing it down that it happens after the
>>>> usermodehelper
>>>> but I wasn't able to pinpoint where exactly the switch happened. It seems
>>>> like we need to have the page tables set up before any initcalls
>>>> happen otherwise we risk having an exec create stray processes which we
>>>> can't update.
>>>
>>>
>>> Can we just make mark_rodata_ro() a no-op and do the RO setting
>>> earlier when we do the NX setting?
>>>
>>
>> Unfortunately no. The time we are doing the nx setting is before we've
>> finished
>> with the initmem so we need the initmem to be finished and freed before we
>> can
>> mark anything RO.
>>
>> More importantly, the NX settings are also not getting set. Compare before:
>>
>> ---[ Kernel Mapping ]---
>> 0xc0000000-0xc0300000           3M     RW NX
>> 0xc0300000-0xc1300000          16M     RW x
>> 0xc1300000-0xcc000000         173M     RW NX
>> 0xcc000000-0xcc040000         256K     RW NX     MEM/BUFFERABLE/WC
>> 0xcc040000-0xcc100000         768K     RW NX     MEM/CACHED/WBRA
>> 0xcc100000-0xcc280000        1536K     RW NX     MEM/BUFFERABLE/WC
>> 0xcc280000-0xd0000000       62976K     RW NX     MEM/CACHED/WBRA
>> 0xd0000000-0xd0200000           2M     RW NX
>>
>> and after
>>
>> ---[ Kernel Mapping ]---
>> 0xc0000000-0xc0300000           3M     RW NX
>> 0xc0300000-0xc0c00000           9M     ro x
>> 0xc0c00000-0xc1100000           5M     ro NX
>> 0xc1100000-0xcc000000         175M     RW NX
>> 0xcc000000-0xcc040000         256K     RW NX     MEM/BUFFERABLE/WC
>> 0xcc040000-0xcc100000         768K     RW NX     MEM/CACHED/WBRA
>> 0xcc100000-0xcc280000        1536K     RW NX     MEM/BUFFERABLE/WC
>> 0xcc280000-0xd0000000       62976K     RW NX     MEM/CACHED/WBRA
>> 0xd0000000-0xd0200000           2M     RW NX
>>
>>
>> with my test patch. I think setting both current->active_mm and &init_mm
>> is sufficient. Maybe explicitly setting swapper_pg_dir would be cleaner?
>>
>> Is there a test that should be running in a CI somewhere to catch cases like
>> this where the permissions are not working as expected
>
> I wrote these for lkdtm -- actually just mentioned it here:
> http://lwn.net/Articles/663531/
>
> EXEC_DATA
> EXEC_STACK
> EXEC_KMALLOC
> EXEC_VMALLOC
> EXEC_USERSPACE
> ACCESS_USERSPACE
> WRITE_RO
> WRITE_KERN
>
> Each of those should Oops the kernel if things are working correctly.
> I'm not aware of a public CI that currently handles checking for
> expected Oops via lkdtm.
>
> The other tests I ran when building this were to turn ftrace on and
> off. If that works for you, then this patch seems fine. (AIUI, the
> code would be unchanged from original when running ftrace, so I would
> expect this to work.)

Hi Kevin and Kernel CI folks,

Could lkdtm get added to the kernel-CI workflows? Extracting and
validating Oops details when poking lkdtm would be extremely valuable
for these cases. :)

-Kees

>
> -Kees
>
>>
>> My test patch that seems to be working:
>>
>> ----8<-----
>>
>> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
>> index 8a63b4c..6276b234 100644
>> --- a/arch/arm/mm/init.c
>> +++ b/arch/arm/mm/init.c
>> @@ -627,12 +627,10 @@ static struct section_perm ro_perms[] = {
>>   * safe to be called with preemption disabled, as under stop_machine().
>>   */
>>  static inline void section_update(unsigned long addr, pmdval_t mask,
>> -                                 pmdval_t prot)
>> +                                 pmdval_t prot, struct mm_struct *mm)
>>  {
>> -       struct mm_struct *mm;
>>         pmd_t *pmd;
>>  -      mm = current->active_mm;
>>         pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);
>>   #ifdef CONFIG_ARM_LPAE
>> @@ -656,7 +654,7 @@ static inline bool arch_has_strict_perms(void)
>>         return !!(get_cr() & CR_XP);
>>  }
>>  -#define set_section_perms(perms, field)       {
>> \
>> +#define set_section_perms(perms, field, all)   {                       \
>>         size_t i;                                                       \
>>         unsigned long addr;                                             \
>>                                                                         \
>> @@ -674,31 +672,35 @@ static inline bool arch_has_strict_perms(void)
>>                                                                         \
>>                 for (addr = perms[i].start;                             \
>>                      addr < perms[i].end;                               \
>> -                    addr += SECTION_SIZE)                              \
>> +                    addr += SECTION_SIZE) {                            \
>>                         section_update(addr, perms[i].mask,             \
>> -                                      perms[i].field);                 \
>> +                                      perms[i].field, current->active_mm);
>> \
>> +                       if (all)                                        \
>> +                               section_update(addr, perms[i].mask,     \
>> +                                       perms[i].field, &init_mm);      \
>> +               }                                                       \
>>         }                                                               \
>>  }
>>  -static inline void fix_kernmem_perms(void)
>> +void fix_kernmem_perms(void)
>>  {
>> -       set_section_perms(nx_perms, prot);
>> +       set_section_perms(nx_perms, prot, true);
>>  }
>>   #ifdef CONFIG_DEBUG_RODATA
>>  void mark_rodata_ro(void)
>>  {
>> -       set_section_perms(ro_perms, prot);
>> +       set_section_perms(ro_perms, prot, true);
>>  }
>>   void set_kernel_text_rw(void)
>>  {
>> -       set_section_perms(ro_perms, clear);
>> +       set_section_perms(ro_perms, clear, false);
>>  }
>>   void set_kernel_text_ro(void)
>>  {
>> -       set_section_perms(ro_perms, prot);
>> +       set_section_perms(ro_perms, prot, false);
>>  }
>>  #endif /* CONFIG_DEBUG_RODATA */
>>
>>
>>
>>
>>
>
>
>
> --
> Kees Cook
> Chrome OS Security



-- 
Kees Cook
Chrome OS Security



More information about the linux-arm-kernel mailing list