[PATCH] arm64/pageattr: Propagate return value from __change_memory_common

Mon Nov 10 21:08:38 PST 2025

On 11/10/25 8:55 PM, Dev Jain wrote:
>
> On 11/11/25 10:14 am, Yang Shi wrote:
>>
>>
>> On 11/10/25 8:37 PM, Dev Jain wrote:
>>>
>>> On 11/11/25 9:47 am, Yang Shi wrote:
>>>>
>>>>
>>>> On 11/10/25 7:39 PM, Dev Jain wrote:
>>>>>
>>>>> On 05/11/25 9:27 am, Dev Jain wrote:
>>>>>>
>>>>>> On 04/11/25 6:26 pm, Will Deacon wrote:
>>>>>>> On Tue, Nov 04, 2025 at 09:06:12AM +0530, Dev Jain wrote:
>>>>>>>> On 04/11/25 12:15 am, Yang Shi wrote:
>>>>>>>>> On 11/3/25 7:16 AM, Will Deacon wrote:
>>>>>>>>>> On Mon, Nov 03, 2025 at 11:43:06AM +0530, Dev Jain wrote:
>>>>>>>>>>> Post a166563e7ec3 ("arm64: mm: support large block mapping when
>>>>>>>>>>> rodata=full"),
>>>>>>>>>>> __change_memory_common has a real chance of failing due to 
>>>>>>>>>>> split
>>>>>>>>>>> failure.
>>>>>>>>>>> Before that commit, this line was introduced in c55191e96caa,
>>>>>>>>>>> still having
>>>>>>>>>>> a chance of failing if it needs to allocate pagetable memory in
>>>>>>>>>>> apply_to_page_range, although that has never been observed 
>>>>>>>>>>> to be true.
>>>>>>>>>>> In general, we should always propagate the return value to 
>>>>>>>>>>> the caller.
>>>>>>>>>>>
>>>>>>>>>>> Cc: stable at vger.kernel.org
>>>>>>>>>>> Fixes: c55191e96caa ("arm64: mm: apply r/o permissions of VM
>>>>>>>>>>> areas to its linear alias as well")
>>>>>>>>>>> Signed-off-by: Dev Jain <dev.jain at arm.com>
>>>>>>>>>>> ---
>>>>>>>>>>> Based on Linux 6.18-rc4.
>>>>>>>>>>>
>>>>>>>>>>>    arch/arm64/mm/pageattr.c | 5 ++++-
>>>>>>>>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/arch/arm64/mm/pageattr.c 
>>>>>>>>>>> b/arch/arm64/mm/pageattr.c
>>>>>>>>>>> index 5135f2d66958..b4ea86cd3a71 100644
>>>>>>>>>>> --- a/arch/arm64/mm/pageattr.c
>>>>>>>>>>> +++ b/arch/arm64/mm/pageattr.c
>>>>>>>>>>> @@ -148,6 +148,7 @@ static int change_memory_common(unsigned
>>>>>>>>>>> long addr, int numpages,
>>>>>>>>>>>        unsigned long size = PAGE_SIZE * numpages;
>>>>>>>>>>>        unsigned long end = start + size;
>>>>>>>>>>>        struct vm_struct *area;
>>>>>>>>>>> +    int ret;
>>>>>>>>>>>        int i;
>>>>>>>>>>>          if (!PAGE_ALIGNED(addr)) {
>>>>>>>>>>> @@ -185,8 +186,10 @@ static int change_memory_common(unsigned
>>>>>>>>>>> long addr, int numpages,
>>>>>>>>>>>        if (rodata_full && (pgprot_val(set_mask) == 
>>>>>>>>>>> PTE_RDONLY ||
>>>>>>>>>>>                    pgprot_val(clear_mask) == PTE_RDONLY)) {
>>>>>>>>>>>            for (i = 0; i < area->nr_pages; i++) {
>>>>>>>>>>> - __change_memory_common((u64)page_address(area->pages[i]),
>>>>>>>>>>> +            ret =
>>>>>>>>>>> __change_memory_common((u64)page_address(area->pages[i]),
>>>>>>>>>>>                               PAGE_SIZE, set_mask, clear_mask);
>>>>>>>>>>> +            if (ret)
>>>>>>>>>>> +                return ret;
>>>>>>>>>> Hmm, this means we can return failure half-way through the 
>>>>>>>>>> operation. Is
>>>>>>>>>> that something callers are expecting to handle? If so, how 
>>>>>>>>>> can they tell
>>>>>>>>>> how far we got?
>>>>>>>>> IIUC the callers don't have to know whether it is half-way or not
>>>>>>>>> because the callers will change the permission back (e.g. to 
>>>>>>>>> RW) for the
>>>>>>>>> whole range when freeing memory.
>>>>>>>> Yes, it is the caller's responsibility to set 
>>>>>>>> VM_FLUSH_RESET_PERMS flag.
>>>>>>>> Upon vfree(), it will change the direct map permissions back to 
>>>>>>>> RW.
>>>>>>> Ok, but vfree() ends up using update_range_prot() to do that and 
>>>>>>> if we
>>>>>>> need to worry about that failing (as per your commit message), then
>>>>>>> we're in trouble because the calls to set_area_direct_map() are 
>>>>>>> unchecked.
>>>>>>>
>>>>>>> In other words, this patch is either not necessary or it is 
>>>>>>> incomplete.
>>>>>>
>>>>>> Here is the relevant email, in the discussion between Ryan and Yang:
>>>>>>
>>>>>> https://lore.kernel.org/all/fe52a1d8-5211-4962-afc8-c3f9caf64119@os.amperecomputing.com/ 
>>>>>>
>>>>>>
>>>>>> We had concluded that all callers of set_memory_ro() or 
>>>>>> set_memory_rox() (which require the
>>>>>> linear map perm change back to default, upon vfree() ) will call 
>>>>>> it for the entire region (vm_struct).
>>>>>> So, when we do the set_direct_map_invalid_noflush, it is 
>>>>>> guaranteed that the region has already
>>>>>> been split. So this call cannot fail.
>>>>>>
>>>>>> https://lore.kernel.org/all/f8898c87-8f49-4ef2-86ae-b60bcf67658c@os.amperecomputing.com/ 
>>>>>>
>>>>>>
>>>>>> This email notes that there is some code doing set_memory_rw() 
>>>>>> and unnecessarily setting the VM_FLUSH_RESET_PERMS
>>>>>> flag, but in that case we don't care about the 
>>>>>> set_direct_map_invalid_noflush call failing because the protections
>>>>>> are already RW.
>>>>>>
>>>>>> Although we had also observed that all of this is fragile and 
>>>>>> depends on the caller doing the
>>>>>> correct thing. The real solution should be somehow getting rid of 
>>>>>> the BBM style invalidation.
>>>>>> Ryan had proposed some methods in that email thread.
>>>>>>
>>>>>> One solution which I had thought of, is that, observe that we are 
>>>>>> doing an overkill by
>>>>>> setting the linear map to invalid and then default, for the 
>>>>>> *entire* region. What we
>>>>>> can do is iterate over the linear map alias of the vm_struct 
>>>>>> *area and only change permission
>>>>>> back to RW for the pages which are *not* RW. And, those relevant 
>>>>>> mappings are guaranteed to
>>>>>> be split because they were changed from RW to not RW.
>>>>>
>>>>> @Yang and Ryan,
>>>>>
>>>>> I saw Yang's patch here:
>>>>> https://lore.kernel.org/all/20251023204428.477531-1-yang@os.amperecomputing.com/ 
>>>>>
>>>>> and realized that currently we are splitting away the linear map 
>>>>> alias of the *entire* region.
>>>>>
>>>>> Shouldn't this then imply that set_direct_map_invalid_noflush will 
>>>>> never fail, since even
>>>>>
>>>>> a set_memory_rox() call on a single page will split the linear map 
>>>>> for the entire region,
>>>>>
>>>>> and thus there is no fragility here which we were discussing 
>>>>> about? I may be forgetting
>>>>>
>>>>> something, this linear map stuff is confusing enough already.
>>>>
>>>> It still may fail due to page table allocation failure when doing 
>>>> split. But it is still fine. We may run into 3 cases:
>>>>
>>>> 1. set_memory_rox succeed to split the whole range, then 
>>>> set_direct_map_invalid_noflush() will succeed too
>>>> 2. set_memory_rox fails to split, for example, just change partial 
>>>> range permission due to page table allocation failure, then 
>>>> set_direct_map_invalid_noflush() may
>>>>    a. successfully change the permission back to default till where 
>>>> set_memory_rox fails at since that range has been successfully 
>>>> split. It is ok since the remaining range is actually not changed 
>>>> to ro by set_memory_rox at all
>>>>    b. successfully change the permission back to default for the 
>>>> whole range (for example, memory pressure is mitigated when 
>>>> set_direct_map_invalid_noflush() is called). It is definitely fine 
>>>> as well
>>>
>>> Correct, what I mean to imply here is that, your patch will break 
>>> this? If set_memory_* is applied on x till y, your patch changes the 
>>> linear map alias
>>>
>>> only from x till y - set_direct_map_invalid_noflush instead operates 
>>> on 0 till size - 1, where 0 <=x <=y <= size - 1. So, it may 
>>> encounter a -ENOMEM
>>>
>>> on [0, x) range while invalidating, and that is *not* okay because 
>>> we must reset back [0, x) to default?
>>
>> I see your point now. But I think the callers need to guarantee they 
>> call set_memory_rox and set_direct_map_invalid_noflush on the same 
>> range, right? Currently kernel just calls them on the whole area.
>
> Nope. The fact that the kernel changes protections, and undoes the 
> changed protections, on the *entire* alias of the vm_struct region, 
> protects us from the fragility we were talking about earlier.

This is what I meant "kernel just calls them on the whole area".

>
> Suppose you have a range from 0 till size - 1, and you call 
> set_memory_* on a random point (page) p. The argument we discussed 
> above is independent of p, which lets us drop our
>
> previous erroneous conclusion that all of this works because no caller 
> does a partial set_memory_*.

Sorry I don't follow you. What "erroneous conclusion" do you mean? You 
can call set_memory_* on a random point, but 
set_direct_map_invalid_noflush() should be called on the random point 
too. The current code of set_area_direct_map() doesn't consider this 
case because there is no such call. Is this what you meant?

>
>
> I would like to send a patch clearly documenting this behaviour, 
> assuming no one else finds a hole in this reasoning.

Proper comment to explain the subtle behavior is definitely welcome.

Thanks,
Yang

>
>
>>
>> Thanks,
>> Yang
>>
>>>
>>>
>>>>
>>>> Hopefully I don't miss anything.
>>>>
>>>> Thanks,
>>>> Yang
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Will
>>>>>>
>>>>
>>