[v3 PATCH 0/6] arm64: support FEAT_BBM level 2 and large block mapping when rodata=full

Thu May 29 11:30:40 PDT 2025

On 29/05/2025 18:35, Yang Shi wrote:
> 
> 
> On 5/29/25 8:33 AM, Ryan Roberts wrote:
>> On 29/05/2025 09:48, Ryan Roberts wrote:
>>
>> [...]
>>
>>>>>> Regarding the linear map repainting, I had a chat with Catalin, and he
>>>>>> reminded
>>>>>> me of a potential problem; if you are doing the repainting with the machine
>>>>>> stopped, you can't allocate memory at that point; it's possible a CPU was
>>>>>> inside
>>>>>> the allocator when it stopped. And I think you need to allocate intermediate
>>>>>> pgtables, right? Do you have a solution to that problem? I guess one approach
>>>>>> would be to figure out how much memory you will need and pre-allocate
>>>>>> prior to
>>>>>> stoping the machine?
>>>>> OK, I don't remember we discussed this problem before. I think we can do
>>>>> something like what kpti does. When creating the linear map we know how many
>>>>> PUD and PMD mappings are created, we can record the number, it will tell how
>>>>> many pages we need for repainting the linear map.
>>>> Looking the kpti code further, it looks like kpti also allocates memory with
>>>> the
>>>> machine stopped, but it calls memory allocation on cpu 0 only.
>>> Oh yes, I hadn't spotted that. It looks like a special case that may be ok for
>>> kpti though; it's allocating a fairly small amount of memory (max levels=5 so
>>> max order=3) and it's doing it with GFP_ATOMIC. So if my understanding of the
>>> page allocator is correct, then this should be allocated from a per-cpu reserve?
>>> Which means that it never needs to take a lock that other, stopped CPUs could be
>>> holding. And GFP_ATOMIC guarrantees that the thread will never sleep, which I
>>> think is not allowed while the machine is stopped.
> 
> The pcp should be set up by then, but I don't think it is actually populated
> until the first allocation happens IIRC.
> 
>>>
>>>> IIUC this
>>>> guarantees the code will not be called on a CPU which was inside the allocator
>>>> when it stopped because CPU 0 is running stop_machine().
>>> My concern was a bit more general; if any other CPU was inside the allocator
>>> holding a lock when the machine was stopped, then if CPU 0 comes along and makes
>>> a call to the allocator that requires the lock, then we have a deadlock.
>>>
>>> All that said, looking at the stop_machine() docs, it says:
>>>
>>>   * Description: This causes a thread to be scheduled on every cpu,
>>>   * each of which disables interrupts.  The result is that no one is
>>>   * holding a spinlock or inside any other preempt-disabled region when
>>>   * @fn() runs.
>>>
>>> So I think my deadlock concern was unfounded. I think as long as you can
>>> garrantee that fn() won't try to sleep then you should be safe? So I guess
>>> allocating from within fn() should be safe as long as you use GFP_ATOMIC?
> 
> Yes, the deadlock should be not a concern.
> 
> The other comment also said:
> 
>  * On each target cpu, @fn is run in a process context with the highest priority
>  * preempting any task on the cpu and monopolizing it.
> 
> Since the fn is running in a process context, so sleep should be ok? Sleep
> should just can happen when allocation requires memory reclaim due to
> insufficient memory for kpti and repainting linear map usecases. But I do agree
> GFP_ATOMIC is safer.

Interrupts are disabled so I can't imagine sleeping is a good idea...

> 
>> I just had another conversation about this internally, and there is another
>> concern; we obviously don't want to modify the pgtables while other CPUs that
>> don't support BBML2 could be accessing them. Even in stop_machine() this may be
>> possible if the CPU stacks and task structure (for example) are allocated out of
>> the linear map.
>>
>> So we need to be careful to follow the pattern used by kpti; all secondary CPUs
>> need to switch to the idmap (which is installed in TTBR0) then install the
>> reserved map in TTBR1, then wait for CPU 0 to repaint the linear map, then have
>> the secondary CPUs switch TTBR1 back to swapper then switch back out of idmap.
> 
> So the below code should be ok?
> 
> cpu_install_idmap()
> Busy loop to wait for cpu 0 done
> cpu_uninstall_idmap()

Once you have installed the idmap, you'll need to call a function by its PA so
you are actually executing out of the idmap. And you will need to be in assembly
so you don't need the stack, and you'll need to switch TTBR1 to the reserved
pgtable, so that the CPU has no access to the swapper pgtable (which CPU 0 is
able to modify).

You may well be able to reuse __idmap_kpti_secondary in proc.S, or lightly
refactor it to work for both the existing idmap_kpti_install_ng_mappings case,
and your case.

Thanks,
Ryan

> 
>>
>> Given CPU 0 supports BBML2, I think it can just update the linear map live,
>> without needing to do the idmap dance?
> 
> Yes, I think so too.
> 
> Thanks,
> Yang
> 
>>
>> Thanks,
>> Ryan
>>
>>
>>> Thanks,
>>> Ryan
>>>
>