[PATCH] arm64: fix the overlap between the kernel image and vmalloc address

Ard Biesheuvel ard.biesheuvel at linaro.org
Mon Apr 24 13:56:43 EDT 2017


On 24 April 2017 at 18:52, Laura Abbott <labbott at redhat.com> wrote:
> On 04/24/2017 08:51 AM, Mark Rutland wrote:
>> On Mon, Apr 24, 2017 at 09:28:48PM +0800, zhong jiang wrote:
>>> On 2017/4/24 18:44, Mark Rutland wrote:
>>>> So the issue is that we have the callchain below for a kernel image
>>>> address:
>>>>
>>>> read_kcore()
>>>> ->is_vmalloc_or_module_addr() // returns true
>>>> ->vread()
>>>> -->aligned_vread()
>>>> --->vmalloc_to_page()
>>>>
>>>> In is_vmalloc{,or_module}_addr() we just check the addr against
>>>> VMALLOC_START and VMALLOC_END, so they will return true for a kernel
>>>> image address.
>>>>
>>>> Then, we call vmalloc_to_page(). While this only handles mappings made
>>>> at page granularity, the kernel image mapping may have used sections. So
>>>> this tries a bogus walk to the pte level.
>>
>>>> Should we special-case kernel image handling, e.g. with new
>>>> is_kernel_image_addr() / kernel_image_to_page() helpers?
>>
>>>   yes ,  it seems to the best way to implents it  without performance back.
>>
>>> The following patch is the implment. Any thought?
>>>
>>> Signed-off-by: zhong jiang <zhongjiang at huawei.com>
>>>
>>>   diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index b84615b..851ac35 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -475,6 +475,15 @@ static inline bool is_vmalloc_addr(const void *x)
>>>         return false;
>>>  #endif
>>>  }
>>> +
>>> +static inline bool is_kernel_image_addr(const void *x)
>>> +{
>>> +       unsigned long addr = (unsigned long)x;
>>> +
>>> +       return addr >= (unsigned long)_stext && addr < (unsigned long)_end;
>>> +
>>> +}
>>> +
>>>  #ifdef CONFIG_MMU
>>>  extern int is_vmalloc_or_module_addr(const void *x);
>>>  #else
>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>> index 3ca82d4..9a9ef65 100644
>>> --- a/mm/vmalloc.c
>>> +++ b/mm/vmalloc.c
>>> @@ -229,6 +229,42 @@ int is_vmalloc_or_module_addr(const void *x)
>>>         return is_vmalloc_addr(x);
>>>  }
>>>
>>> +static struct page *kernel_image_to_page(const void *kernel_addr, pgd_t *pgd)
>>> +{
>>> +       unsigned long addr = (unsigned long)kernel_addr;
>>> +       struct page *page = NULL;
>>> +       pud_t *pud;
>>> +       pmd_t *pmd;
>>> +       pte_t *pte;
>>> +
>>> +       if (pgd_none(*pgd))
>>> +               goto out;
>>> +
>>> +       pud = pud_offset(pgd, addr);
>>> +       if (pud_none(*pud))
>>> +               goto out;
>>> +
>>> +       if (pud_sect(*pud))
>>> +               return pud_page(*pud);
>>
>> The *_sect() helpers are arch-specific, so we cannot use them in generic
>> code. This would need to be architecture-specific.
>>
>> Secondly, this will return head page of the section regardless of which
>> page in the section the address corresponds to
>>
>>> +
>>> +       pmd = pmd_offset(*pmd, addr);
>>> +       if (pmd_none(*pmd))
>>> +               goto out;
>>> +
>>> +       if (pmd_sect(*pmd))
>>> +               return pmd_page(*pmd);
>>
>> Likewise on both counts.
>>
>>> +
>>> +       pte = pte_offset_kernel(pmd, addr);
>>> +       if (pte_none(*pte))
>>> +               goto out;
>>> +
>>> +       page = pte_page(*pte);
>>> +
>>> +out:
>>> +       return page;
>>> +
>>> +}
>>
>> Given we know what the address should map to, I don't think we need to
>> walk the page tables here. I think this can be:
>>
>> static struct page *kernel_image_to_page(const void *addr)
>> {
>>       return virt_to_page(lm_alias(vmalloc_addr));
>> }
>>
>>> +
>>>  /*
>>>    * Walk a vmap address to the struct page it maps.
>>>    */
>>> @@ -244,6 +280,9 @@ struct page *vmalloc_to_page(const void *vmalloc_addr)
>>>          */
>>>         VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
>>>
>>> +       if (is_kernel_image_addr(vmalloc_addr))
>>> +               return kernel_image_to_page(vmalloc_addr, pgd);
>>
>> It's not clear to me that this is the right place for this to live.
>>
>> It might be best to code the kernel image logic directly in kcore (and
>> kmem), assuming everyone's OK with that approach.
>>
>
> That will fix kcore and kmem but this will show up in other places too.
> We've gone through and made sure that virt_addr_valid returns
> true if and only if virt_to_page returns a valid address. I don't know
> if we can make as strong a claim about is_vmalloc_addr and
> vmalloc_to_page in all cases but is_vmalloc_addr should not return true
> for the kernel image. That would at least let kcore fall back to
> kern_addr_valid which should correctly handle the kernel image.
> The suggestion to move the kernel image out of VMALLOC_START/VMALLOC_END
> seems like the best approach although I haven't tried a prototype
> at all.
>

Moving the kernel into the vmalloc region was kind of the point, for
KASLR. Undoing that means either disabling KASLR, or splitting the
vmalloc region into a KASLR region only for the kernel, and a vmalloc
region like we had when vmlinux lived in the linear region.

In general, I think we should be able to deal with different kinds of
mappings with different granularity in the vmalloc region. If
necessary, we could introduce a VM_xxx flag for the kernel to
distinguish such regions from ordinary VM_MAP regions.



More information about the linux-arm-kernel mailing list