[PATCH v4 2/3] arm64: vmlinux.ld: Add .mmuoff.{text, data} sections

James Morse james.morse at arm.com
Thu Aug 18 04:39:49 PDT 2016


Hi Ard,

On 17/08/16 18:50, Ard Biesheuvel wrote:
> On 15 August 2016 at 19:12, James Morse <james.morse at arm.com> wrote:
>> Resume from hibernate needs to clean any text executed by the kernel with
>> the MMU off to the PoC. Collect these functions together into a new
>> .mmuoff.text section. __boot_cpu_mode and secondary_holding_pen_release
>> are data that is read or written with the MMU off. Add these to a new
>> .mmuoff.data section.
>>
>> This covers booting of secondary cores and the cpu_suspend() path used
>> by cpu-idle and suspend-to-ram.
>>
>> The bulk of head.S is not included, as the primary boot code is only ever
>> executed once, the kernel never needs to ensure it is cleaned to a
>> particular point in the cache.

>> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
>> index b77f58355da1..4230eeeeabf5 100644
>> --- a/arch/arm64/kernel/head.S
>> +++ b/arch/arm64/kernel/head.S
>> @@ -621,17 +622,31 @@ set_cpu_boot_mode_flag:
>>  ENDPROC(set_cpu_boot_mode_flag)
>>
>>  /*
>> + * Values in this section are written with the MMU off, but read with the
>> + * MMU on. Writers will invalidate the corresponding address, discarding
>> + * a 'Cache Writeback Granule' (CWG) worth of data. Align these variables
>> + * to the architectural maximum of 2K.
>> + */
>> +       .pushsection ".mmuoff.data", "aw"
>> +       .align 11
>> +/*
>>   * We need to find out the CPU boot mode long after boot, so we need to
>>   * store it in a writable variable.
>>   *
>>   * This is not in .bss, because we set it sufficiently early that the boot-time
>>   * zeroing of .bss would clobber it.
>>   */
>> -       .pushsection    .data..cacheline_aligned
>> -       .align  L1_CACHE_SHIFT
>>  ENTRY(__boot_cpu_mode)
>>         .long   BOOT_CPU_MODE_EL2
>>         .long   BOOT_CPU_MODE_EL1
>> +/*
>> + * The booting CPU updates the failed status @__early_cpu_boot_status,
>> + * with MMU turned off.
>> + */
>> +ENTRY(__early_cpu_boot_status)
>> +       .long   0
>> +
>> +       .align 11
> 
> How is this supposed to work? Is secondary_holding_pen_release
> expected to be covered by this region as well?

In this section, but not in the same CWG:
__boot_cpu_mode and __early_cpu_boot_status are written with the mmu off, then
the corresponding cache area is invalidated.

secondary_holding_pen_release works the other way round, it is written with the
mmu on, then clean+invalidated, to be read with the mmu off.

I grouped them together in an older version, Mark pointed out that the
maintenance of one could corrupt the other if they fall within a CWG of each
other. [0]


> Wouldn't it be better to handle this alignment in the linker script?

Its not just alignment of the section, but the alignment between mmuoff:read and
mmuoff:write variables. Maybe it would be cleared if they were in separate sections?

(I should at least smother it with more comments)


> (if you even need it, but see below)
> 
>>         .popsection
>>
>>         /*
>> @@ -687,6 +702,7 @@ __secondary_switched:
>>         mov     x29, #0
>>         b       secondary_start_kernel
>>  ENDPROC(__secondary_switched)
>> +       .popsection
>>
>>  /*
>>   * The booting CPU updates the failed status @__early_cpu_boot_status,
>> @@ -706,12 +722,6 @@ ENDPROC(__secondary_switched)
>>         dc      ivac, \tmp1                     // Invalidate potentially stale cache line
>>         .endm
>>
>> -       .pushsection    .data..cacheline_aligned
>> -       .align  L1_CACHE_SHIFT
>> -ENTRY(__early_cpu_boot_status)
>> -       .long   0
>> -       .popsection
>> -
>>  /*
>>   * Enable the MMU.
>>   *

>> diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
>> index 659963d40bb4..bbab3d886516 100644
>> --- a/arch/arm64/kernel/vmlinux.lds.S
>> +++ b/arch/arm64/kernel/vmlinux.lds.S
>> @@ -120,6 +120,9 @@ SECTIONS
>>                         IRQENTRY_TEXT
>>                         SOFTIRQENTRY_TEXT
>>                         ENTRY_TEXT
>> +                       __mmuoff_text_start = .;
>> +                       *(.mmuoff.text)
>> +                       __mmuoff_text_end = .;
>>                         TEXT_TEXT
>>                         SCHED_TEXT
>>                         LOCK_TEXT
>> @@ -186,6 +189,11 @@ SECTIONS
>>         _sdata = .;
>>         RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
>>         PECOFF_EDATA_PADDING
> 
> This padding needs to be at the end, it is intended to make the size
> of Image a multiple of 512.

Ah, I didn't realise that, readelf says I broke this.
I will move it to be before.

(secondary_holding_pen_rel appears after head.S's align 11s, so the next symbol
isn't 2KB aligned)


> Alternatively, you could get rid of it
> completely, I guess, if the end of .mmuoff.text is expected to be 2 KB
> aligned (but I wonder if you need to)
> 
>> +       .mmuoff.data : {
> 
>  .mmuoff.data : ALIGN (SZ_2K) {
> 
>> +               __mmuoff_data_start = .;
>> +               *(.mmuoff.data)
> 
> . = ALIGN(SZ_2K);
> 
> However, if the invalidation occurs before .bss is cleared (with the
> caches on), perhaps there is no need to align the end of this section?

We invalidate something in this section via secondary_entry() ->
set_cpu_boot_mode_flag(), so this can happen any time.
My understanding is that CWG is maximum amount of data that will be invalidated
and at compile time we assume its worst case of 2KB. Aligning the end is to stop
anything else being located in this worst case range.


> 
>> +               __mmuoff_data_end = .;
>> +       }
>>         _edata = .;
>>
>>         BSS_SECTION(0, 0, 0)


Thanks,


James


[0] https://patchwork.kernel.org/patch/9203423/



More information about the linux-arm-kernel mailing list