[PATCH v2 2/2] ARM: support PHYS_OFFSET minimum aligned at 64KiB boundary

Sun Sep 27 21:30:08 EDT 2020

On 2020/9/22 20:30, Leizhen (ThunderTown) wrote:
> 
> 
> On 2020/9/21 16:53, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2020/9/21 14:47, Ard Biesheuvel wrote:
>>> On Mon, 21 Sep 2020 at 05:35, Leizhen (ThunderTown)
>>> <thunder.leizhen at huawei.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2020/9/17 22:00, Ard Biesheuvel wrote:
>>>>> On Tue, 15 Sep 2020 at 22:06, Russell King - ARM Linux admin
>>>>> <linux at armlinux.org.uk> wrote:
>>>>>>
>>>>>> On Tue, Sep 15, 2020 at 09:16:15PM +0800, Zhen Lei wrote:
>>>>>>> Currently, only support the kernels where the base of physical memory is
>>>>>>> at a 16MiB boundary. Because the add/sub instructions only contains 8bits
>>>>>>> unrotated value. But we can use one more "add/sub" instructions to handle
>>>>>>> bits 23-16. The performance will be slightly affected.
>>>>>>>
>>>>>>> Since most boards meet 16 MiB alignment, so add a new configuration
>>>>>>> option ARM_PATCH_PHYS_VIRT_RADICAL (default n) to control it. Say Y if
>>>>>>> anyone really needs it.
>>>>>>>
>>>>>>> All r0-r7 (r1 = machine no, r2 = atags or dtb, in the start-up phase) are
>>>>>>> used in __fixup_a_pv_table() now, but the callee saved r11 is not used in
>>>>>>> the whole head.S file. So choose it.
>>>>>>>
>>>>>>> Because the calculation of "y = x + __pv_offset[63:24]" have been done,
>>>>>>> so we only need to calculate "y = y + __pv_offset[23:16]", that's why
>>>>>>> the parameters "to" and "from" of __pv_stub() and __pv_add_carry_stub()
>>>>>>> in the scope of CONFIG_ARM_PATCH_PHYS_VIRT_RADICAL are all passed "t"
>>>>>>> (above y).
>>>>>>>
>>>>>>> Signed-off-by: Zhen Lei <thunder.leizhen at huawei.com>
>>>>>>> ---
>>>>>>>  arch/arm/Kconfig              | 18 +++++++++++++++++-
>>>>>>>  arch/arm/include/asm/memory.h | 16 +++++++++++++---
>>>>>>>  arch/arm/kernel/head.S        | 25 +++++++++++++++++++------
>>>>>>>  3 files changed, 49 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
>>>>>>> index e00d94b16658765..19fc2c746e2ce29 100644
>>>>>>> --- a/arch/arm/Kconfig
>>>>>>> +++ b/arch/arm/Kconfig
>>>>>>> @@ -240,12 +240,28 @@ config ARM_PATCH_PHYS_VIRT
>>>>>>>         kernel in system memory.
>>>>>>>
>>>>>>>         This can only be used with non-XIP MMU kernels where the base
>>>>>>> -       of physical memory is at a 16MB boundary.
>>>>>>> +       of physical memory is at a 16MiB boundary.
>>>>>>>
>>>>>>>         Only disable this option if you know that you do not require
>>>>>>>         this feature (eg, building a kernel for a single machine) and
>>>>>>>         you need to shrink the kernel to the minimal size.
>>>>>>>
>>>>>>> +config ARM_PATCH_PHYS_VIRT_RADICAL
>>>>>>> +     bool "Support PHYS_OFFSET minimum aligned at 64KiB boundary"
>>>>>>> +     default n
>>>>>>
>>>>>> Please drop the "default n" - this is the default anyway.
>>>>>>
>>>>>>> @@ -236,6 +243,9 @@ static inline unsigned long __phys_to_virt(phys_addr_t x)
>>>>>>>        * in place where 'r' 32 bit operand is expected.
>>>>>>>        */
>>>>>>>       __pv_stub((unsigned long) x, t, "sub", __PV_BITS_31_24);
>>>>>>> +#ifdef CONFIG_ARM_PATCH_PHYS_VIRT_RADICAL
>>>>>>> +     __pv_stub((unsigned long) t, t, "sub", __PV_BITS_23_16);
>>>>>>
>>>>>> t is already unsigned long, so this cast is not necessary.
>>>>>>
>>>>>> I've been debating whether it would be better to use "movw" for this
>>>>>> for ARMv7.  In other words:
>>>>>>
>>>>>>         movw    tmp, #16-bit
>>>>>>         adds    %Q0, %1, tmp, lsl #16
>>>>>>         adc     %R0, %R0, #0
>>>>>>
>>>>>> It would certainly be less instructions, but at the cost of an
>>>>>> additional register - and we'd have to change the fixup code to
>>>>>> know about movw.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>
>>>>> Since LPAE implies v7, we can use movw unconditionally, which is nice.
>>>>>
>>>>> There is no need to use an additional temp register, as we can use the
>>>>> register holding the high word. (There is no need for the mov_hi macro
>>>>> to be separate)
>>>>>
>>>>> 0:     movw    %R0, #low offset >> 16
>>>>>        adds    %Q0, %1, %R0, lsl #16
>>>>> 1:     mov     %R0, #high offset
>>>>>        adc     %R0, %R0, #0
>>>>>        .pushsection .pv_table,"a"
>>>>>        .long 0b, 1b
>>>>>        .popsection
>>>>>
>>>>> The only problem is distinguishing the two mov instructions from each
>>>>
>>>> The #high offset can also consider use movw, it just save two bytes in
>>>> the thumb2 scenario. We can store different imm16 value for high_offset
>>>> and low_offset, so that we can distinguish them in __fixup_a_pv_table().
>>>>
>>>> This will make the final implementation of the code look more clear and
>>>> consistent, especially THUMB2.
>>>>
>>>> Let me try it.
>>>>
>>>
>>> Hello Zhen Lei,
>>>
>>> I am looking into this as well:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm-p2v-v2
>>>
>>> Could you please test this version on your hardware?
>>
>> OK, I will test it on my boards.
> Hi Ard Biesheuvel:
>   I have tested it on 16MiB aligned + LE board, it works well. I've asked my colleagues
> from other departments to run it on 2MiB aligned + BE board. He will do it tomorrow.

Hi, Ard Biesheuvel:
  I'm sorry to keep you waiting so long. You patch series works well on 2MiB aligned + BE board
also. I spent a lot of time, because our 2MiB aligned + BE board loads zImage. Therefore, special
processing is required for the following code:

arch/arm/boot/compressed/head.S:
#ifdef CONFIG_AUTO_ZRELADDR
                mov     r4, pc
                and     r4, r4, #0xf8000000		//currently only support 128MiB alignment
                add     r4, r4, #TEXT_OFFSET
#else

This is a special scenario that does not conflict with your code framework. So I'm trying to fix it.

Tested-by: Zhen Lei <thunder.leizhen at huawei.com>

> 
> 
>>
>>>
>>> .
>>>
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>> .
>>