[PATCH] arm64/pageattr: Propagate return value from __change_memory_common
Dev Jain
dev.jain at arm.com
Mon Nov 10 20:55:24 PST 2025
On 11/11/25 10:14 am, Yang Shi wrote:
>
>
> On 11/10/25 8:37 PM, Dev Jain wrote:
>>
>> On 11/11/25 9:47 am, Yang Shi wrote:
>>>
>>>
>>> On 11/10/25 7:39 PM, Dev Jain wrote:
>>>>
>>>> On 05/11/25 9:27 am, Dev Jain wrote:
>>>>>
>>>>> On 04/11/25 6:26 pm, Will Deacon wrote:
>>>>>> On Tue, Nov 04, 2025 at 09:06:12AM +0530, Dev Jain wrote:
>>>>>>> On 04/11/25 12:15 am, Yang Shi wrote:
>>>>>>>> On 11/3/25 7:16 AM, Will Deacon wrote:
>>>>>>>>> On Mon, Nov 03, 2025 at 11:43:06AM +0530, Dev Jain wrote:
>>>>>>>>>> Post a166563e7ec3 ("arm64: mm: support large block mapping when
>>>>>>>>>> rodata=full"),
>>>>>>>>>> __change_memory_common has a real chance of failing due to split
>>>>>>>>>> failure.
>>>>>>>>>> Before that commit, this line was introduced in c55191e96caa,
>>>>>>>>>> still having
>>>>>>>>>> a chance of failing if it needs to allocate pagetable memory in
>>>>>>>>>> apply_to_page_range, although that has never been observed to
>>>>>>>>>> be true.
>>>>>>>>>> In general, we should always propagate the return value to
>>>>>>>>>> the caller.
>>>>>>>>>>
>>>>>>>>>> Cc: stable at vger.kernel.org
>>>>>>>>>> Fixes: c55191e96caa ("arm64: mm: apply r/o permissions of VM
>>>>>>>>>> areas to its linear alias as well")
>>>>>>>>>> Signed-off-by: Dev Jain <dev.jain at arm.com>
>>>>>>>>>> ---
>>>>>>>>>> Based on Linux 6.18-rc4.
>>>>>>>>>>
>>>>>>>>>> arch/arm64/mm/pageattr.c | 5 ++++-
>>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>>>>>>>>>> index 5135f2d66958..b4ea86cd3a71 100644
>>>>>>>>>> --- a/arch/arm64/mm/pageattr.c
>>>>>>>>>> +++ b/arch/arm64/mm/pageattr.c
>>>>>>>>>> @@ -148,6 +148,7 @@ static int change_memory_common(unsigned
>>>>>>>>>> long addr, int numpages,
>>>>>>>>>> unsigned long size = PAGE_SIZE * numpages;
>>>>>>>>>> unsigned long end = start + size;
>>>>>>>>>> struct vm_struct *area;
>>>>>>>>>> + int ret;
>>>>>>>>>> int i;
>>>>>>>>>> if (!PAGE_ALIGNED(addr)) {
>>>>>>>>>> @@ -185,8 +186,10 @@ static int change_memory_common(unsigned
>>>>>>>>>> long addr, int numpages,
>>>>>>>>>> if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY ||
>>>>>>>>>> pgprot_val(clear_mask) == PTE_RDONLY)) {
>>>>>>>>>> for (i = 0; i < area->nr_pages; i++) {
>>>>>>>>>> - __change_memory_common((u64)page_address(area->pages[i]),
>>>>>>>>>> + ret =
>>>>>>>>>> __change_memory_common((u64)page_address(area->pages[i]),
>>>>>>>>>> PAGE_SIZE, set_mask, clear_mask);
>>>>>>>>>> + if (ret)
>>>>>>>>>> + return ret;
>>>>>>>>> Hmm, this means we can return failure half-way through the
>>>>>>>>> operation. Is
>>>>>>>>> that something callers are expecting to handle? If so, how can
>>>>>>>>> they tell
>>>>>>>>> how far we got?
>>>>>>>> IIUC the callers don't have to know whether it is half-way or not
>>>>>>>> because the callers will change the permission back (e.g. to
>>>>>>>> RW) for the
>>>>>>>> whole range when freeing memory.
>>>>>>> Yes, it is the caller's responsibility to set
>>>>>>> VM_FLUSH_RESET_PERMS flag.
>>>>>>> Upon vfree(), it will change the direct map permissions back to RW.
>>>>>> Ok, but vfree() ends up using update_range_prot() to do that and
>>>>>> if we
>>>>>> need to worry about that failing (as per your commit message), then
>>>>>> we're in trouble because the calls to set_area_direct_map() are
>>>>>> unchecked.
>>>>>>
>>>>>> In other words, this patch is either not necessary or it is
>>>>>> incomplete.
>>>>>
>>>>> Here is the relevant email, in the discussion between Ryan and Yang:
>>>>>
>>>>> https://lore.kernel.org/all/fe52a1d8-5211-4962-afc8-c3f9caf64119@os.amperecomputing.com/
>>>>>
>>>>>
>>>>> We had concluded that all callers of set_memory_ro() or
>>>>> set_memory_rox() (which require the
>>>>> linear map perm change back to default, upon vfree() ) will call
>>>>> it for the entire region (vm_struct).
>>>>> So, when we do the set_direct_map_invalid_noflush, it is
>>>>> guaranteed that the region has already
>>>>> been split. So this call cannot fail.
>>>>>
>>>>> https://lore.kernel.org/all/f8898c87-8f49-4ef2-86ae-b60bcf67658c@os.amperecomputing.com/
>>>>>
>>>>>
>>>>> This email notes that there is some code doing set_memory_rw() and
>>>>> unnecessarily setting the VM_FLUSH_RESET_PERMS
>>>>> flag, but in that case we don't care about the
>>>>> set_direct_map_invalid_noflush call failing because the protections
>>>>> are already RW.
>>>>>
>>>>> Although we had also observed that all of this is fragile and
>>>>> depends on the caller doing the
>>>>> correct thing. The real solution should be somehow getting rid of
>>>>> the BBM style invalidation.
>>>>> Ryan had proposed some methods in that email thread.
>>>>>
>>>>> One solution which I had thought of, is that, observe that we are
>>>>> doing an overkill by
>>>>> setting the linear map to invalid and then default, for the
>>>>> *entire* region. What we
>>>>> can do is iterate over the linear map alias of the vm_struct *area
>>>>> and only change permission
>>>>> back to RW for the pages which are *not* RW. And, those relevant
>>>>> mappings are guaranteed to
>>>>> be split because they were changed from RW to not RW.
>>>>
>>>> @Yang and Ryan,
>>>>
>>>> I saw Yang's patch here:
>>>> https://lore.kernel.org/all/20251023204428.477531-1-yang@os.amperecomputing.com/
>>>>
>>>> and realized that currently we are splitting away the linear map
>>>> alias of the *entire* region.
>>>>
>>>> Shouldn't this then imply that set_direct_map_invalid_noflush will
>>>> never fail, since even
>>>>
>>>> a set_memory_rox() call on a single page will split the linear map
>>>> for the entire region,
>>>>
>>>> and thus there is no fragility here which we were discussing about?
>>>> I may be forgetting
>>>>
>>>> something, this linear map stuff is confusing enough already.
>>>
>>> It still may fail due to page table allocation failure when doing
>>> split. But it is still fine. We may run into 3 cases:
>>>
>>> 1. set_memory_rox succeed to split the whole range, then
>>> set_direct_map_invalid_noflush() will succeed too
>>> 2. set_memory_rox fails to split, for example, just change partial
>>> range permission due to page table allocation failure, then
>>> set_direct_map_invalid_noflush() may
>>> a. successfully change the permission back to default till where
>>> set_memory_rox fails at since that range has been successfully
>>> split. It is ok since the remaining range is actually not changed to
>>> ro by set_memory_rox at all
>>> b. successfully change the permission back to default for the
>>> whole range (for example, memory pressure is mitigated when
>>> set_direct_map_invalid_noflush() is called). It is definitely fine
>>> as well
>>
>> Correct, what I mean to imply here is that, your patch will break
>> this? If set_memory_* is applied on x till y, your patch changes the
>> linear map alias
>>
>> only from x till y - set_direct_map_invalid_noflush instead operates
>> on 0 till size - 1, where 0 <=x <=y <= size - 1. So, it may encounter
>> a -ENOMEM
>>
>> on [0, x) range while invalidating, and that is *not* okay because we
>> must reset back [0, x) to default?
>
> I see your point now. But I think the callers need to guarantee they
> call set_memory_rox and set_direct_map_invalid_noflush on the same
> range, right? Currently kernel just calls them on the whole area.
Nope. The fact that the kernel changes protections, and undoes the
changed protections, on the *entire* alias of the vm_struct region,
protects us from the fragility we were talking about earlier.
Suppose you have a range from 0 till size - 1, and you call set_memory_*
on a random point (page) p. The argument we discussed above is
independent of p, which lets us drop our
previous erroneous conclusion that all of this works because no caller
does a partial set_memory_*.
I would like to send a patch clearly documenting this behaviour,
assuming no one else finds a hole in this reasoning.
>
> Thanks,
> Yang
>
>>
>>
>>>
>>> Hopefully I don't miss anything.
>>>
>>> Thanks,
>>> Yang
>>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Will
>>>>>
>>>
>
More information about the linux-arm-kernel
mailing list