[PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests

Suzuki K Poulose suzuki.poulose at arm.com
Thu Apr 9 07:18:43 PDT 2026



On 09/04/2026 10:38, Suzuki K Poulose wrote:
> On 07/04/2026 18:21, Catalin Marinas wrote:
>> On Tue, Apr 07, 2026 at 10:57:35AM +0100, Suzuki K Poulose wrote:
>>> On 02/04/2026 21:43, Catalin Marinas wrote:
>>>> On Mon, Mar 30, 2026 at 05:17:02PM +0100, Ryan Roberts wrote:
>>>>>    int split_kernel_leaf_mapping(unsigned long start, unsigned long 
>>>>> end)
>>>>>    {
>>>>>        int ret;
>>>>> -    /*
>>>>> -     * !BBML2_NOABORT systems should not be trying to change 
>>>>> permissions on
>>>>> -     * anything that is not pte-mapped in the first place. Just 
>>>>> return early
>>>>> -     * and let the permission change code raise a warning if not 
>>>>> already
>>>>> -     * pte-mapped.
>>>>> -     */
>>>>> -    if (!system_supports_bbml2_noabort())
>>>>> -        return 0;
>>>>> -
>>>>>        /*
>>>>>         * If the region is within a pte-mapped area, there is no 
>>>>> need to try to
>>>>>         * split. Additionally, CONFIG_DEBUG_PAGEALLOC and 
>>>>> CONFIG_KFENCE may
>>>>>         * change permissions from atomic context so for those cases 
>>>>> (which are
>>>>>         * always pte-mapped), we must not go any further because 
>>>>> taking the
>>>>> -     * mutex below may sleep.
>>>>> +     * mutex below may sleep. Do not call force_pte_mapping() here 
>>>>> because
>>>>> +     * it could return a confusing result if called from a 
>>>>> secondary cpu
>>>>> +     * prior to finalizing caps. Instead, 
>>>>> linear_map_requires_bbml2 gives us
>>>>> +     * what we need.
>>>>>         */
>>>>> -    if (force_pte_mapping() || is_kfence_address((void *)start))
>>>>> +    if (!linear_map_requires_bbml2 || is_kfence_address((void 
>>>>> *)start))
>>>>>            return 0;
>>>>> +    if (!system_supports_bbml2_noabort()) {
>>>>> +        /*
>>>>> +         * !BBML2_NOABORT systems should not be trying to change
>>>>> +         * permissions on anything that is not pte-mapped in the 
>>>>> first
>>>>> +         * place. Just return early and let the permission change 
>>>>> code
>>>>> +         * raise a warning if not already pte-mapped.
>>>>> +         */
>>>>> +        if (system_capabilities_finalized())
>>>>> +            return 0;
>>>>> +
>>>>> +        /*
>>>>> +         * Boot-time: split_kernel_leaf_mapping_locked() allocates 
>>>>> from
>>>>> +         * page allocator. Can't split until it's available.
>>>>> +         */
>>>>> +        if (WARN_ON(!page_alloc_available))
>>>>> +            return -EBUSY;
>>>>> +
>>>>> +        /*
>>>>> +         * Boot-time: Started secondary cpus but don't know if they
>>>>> +         * support BBML2_NOABORT yet. Can't allow splitting in this
>>>>> +         * window in case they don't.
>>>>> +         */
>>>>> +        if (WARN_ON(num_online_cpus() > 1))
>>>>> +            return -EBUSY;
>>>>> +    }
>>>>
>>>> I think sashiko is over cautions here
>>>> (https://sashiko.dev/#/patchset/20260330161705.3349825-1- 
>>>> ryan.roberts at arm.com)
>>>> but it has a somewhat valid point from the perspective of
>>>> num_online_cpus() semantics. We have have num_online_cpus() == 1 while
>>>> having a secondary CPU just booted and with its MMU enabled. I don't
>>>> think we can have any asynchronous tasks running at that point to
>>>> trigger a spit though. Even async_init() is called after smp_init().
>>>>
>>>> An option may be to attempt cpus_read_trylock() as this lock is 
>>>> taken by
>>>> _cpu_up(). If it fails, return -EBUSY, otherwise check 
>>>> num_online_cpus()
>>>> and unlock (and return -EBUSY if secondaries already started).
>>>>
>>>> Another thing I couldn't get my head around - IIUC is_realm_world()
>>>> won't return true for map_mem() yet (if in a realm).
>>>
>>> That is correct. map_mem() comes from paginig_init(), which gets called
>>> before arm64_rsi_init(). Realm check was delayed until psci_xx_init().
>>> We had a version which parsed the DT for PSCI conduit early enough
>>> to be able to make the SMC calls to detect the Realm. But there
>>> were concerns around it.
>>
>> Ah, yes, I remember.
>>
>> Does it mean that commit 42be24a4178f ("arm64: Enable memory encrypt for
>> Realms") was broken without rodata=full w.r.t. the linear map? Commit
> 
> Apparently, it looks like we missed this when we demoted the RSI
> detection later.
> 
>> a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
>> introduced force_pte_mapping() but it just copied the logic in the
>> existing can_set_direct_map(). Looking at the linear_map_requires_bbml2
>> assignment, we get (!is_realm_world() && is_realm_world()) and it
>> cancels out, no effect on it but we don't get pte mappings either (even
>> if we don't have BBML2).
> 
> Yep, that's right.
>>
>> I think we need at least some safety checks:
>>
>> 1. BBML2_NOABORT support on the boot CPU - continue with the existing
>>     logic (as per Ryan's series)
>>
>> 2. !system_supports_bbml2_noabort() - split in
>>     linear_map_maybe_split_to_ptes(). This does not currently happen
>>     because linear_map_requires_bbml2 may be false in the absence of
>>     rodata=full. Not sure how to fix this without some variable telling
>>     us how the linear map was mapped. The requires_bbml2 flag doesn't
>>
>> 3. Panic in arm64_rsi_init() if !BBML2_NOABORT on the boot CPU _and_ we
>>     have block mappings already. People can avoid it with rodata=full
> 
> It looks like this will be a common case :-(

Having another look, by default, arm64 boots with rodata=full, and users
have to explicitly lower the bar by setting rodata=off or noalias. So
this has been keeping us running ;-).

With rodata=off, I get the following for a Realm boot:

[    0.000000] ------------[ cut here ]------------ 

[    0.000000] WARNING: arch/arm64/mm/pageattr.c:61 at 
pageattr_pmd_entry+0x78/0xe0, CPU#0: swapper/0
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 7.0.0-rc1+ 
#1889 PREEMPT
[    0.000000] Hardware name: linux,dummy-virt (DT)
[    0.000000] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS 
BTYPE=--)
[    0.000000] pc : pageattr_pmd_entry+0x78/0xe0
[    0.000000] lr : walk_pgd_range+0x43c/0x970
[    0.000000] sp : ffff800082343b70
[    0.000000] x29: ffff800082343b70 x28: fff0000019600000 x27: 
fff0000019580000
[    0.000000] x26: ffff800082343c98 x25: fff000001d57ffff x24: 
fff000001fffe000
[    0.000000] x23: ffff8000810ae698 x22: fff000001fffd650 x21: 
fff0000019780000
[    0.000000] x20: fff000001d580000 x19: 0000000000000000 x18: 
0000000000000030
[    0.000000] x17: 0000000000004000 x16: 000000009fffc000 x15: 
0000000000000020
[    0.000000] x14: 0000000000003be4 x13: 0000000000000020 x12: 
0000000000000000
[    0.000000] x11: 0000000000000016 x10: 0000000000000015 x9 : 
0000000000000013
[    0.000000] x8 : 0000000000000015 x7 : 0000000080000000 x6 : 
0000000000000000
[    0.000000] x5 : 0078000099400405 x4 : fff000001fffd650 x3 : 
ffff800082343c98
[    0.000000] x2 : 0000000000080000 x1 : fff0000019580000 x0 : 
0000000000000001
[    0.000000] Call trace:
[    0.000000]  pageattr_pmd_entry+0x78/0xe0 (P)
[    0.000000]  walk_kernel_page_table_range_lockless+0x60/0xa0 

[    0.000000]  update_range_prot+0x80/0x128
[    0.000000]  __set_memory_enc_dec.part.0+0x88/0x258
[    0.000000]  realm_set_memory_decrypted+0x54/0x98
[    0.000000]  set_memory_decrypted+0x38/0x58
[    0.000000]  swiotlb_update_mem_attributes+0x44/0x58
[    0.000000]  mem_init+0x24/0x38
[    0.000000]  mm_core_init+0x94/0x140
[    0.000000]  start_kernel+0x544/0xa18
[    0.000000]  __primary_switched+0x88/0x98
[    0.000000] ---[ end trace 0000000000000000 ]---


Suzuki

> 
>>
>> 4. If (3) is a common case, a better alternative is to rewrite the
>>     linear map sometime after arm64_rsi_init() but before we call
>>     split_kernel_leaf_mapping().
> 
> We will explore this route.
> 
> The other option is to move the RSI detection (and the PSCI probe)
> earlier to be able to make better decisions early on. I will play with
> that a bit too.
> 
> Suzuki
> 
> 
>>
> 




More information about the linux-arm-kernel mailing list