[PATCH] arm64/pageattr: Propagate return value from __change_memory_common
Yang Shi
yang at os.amperecomputing.com
Tue Nov 11 15:45:17 PST 2025
On 11/10/25 9:12 PM, Dev Jain wrote:
>
> On 11/11/25 10:38 am, Yang Shi wrote:
>>
>>
>> On 11/10/25 8:55 PM, Dev Jain wrote:
>>>
>>> On 11/11/25 10:14 am, Yang Shi wrote:
>>>>
>>>>
>>>> On 11/10/25 8:37 PM, Dev Jain wrote:
>>>>>
>>>>> On 11/11/25 9:47 am, Yang Shi wrote:
>>>>>>
>>>>>>
>>>>>> On 11/10/25 7:39 PM, Dev Jain wrote:
>>>>>>>
>>>>>>> On 05/11/25 9:27 am, Dev Jain wrote:
>>>>>>>>
>>>>>>>> On 04/11/25 6:26 pm, Will Deacon wrote:
>>>>>>>>> On Tue, Nov 04, 2025 at 09:06:12AM +0530, Dev Jain wrote:
>>>>>>>>>> On 04/11/25 12:15 am, Yang Shi wrote:
>>>>>>>>>>> On 11/3/25 7:16 AM, Will Deacon wrote:
>>>>>>>>>>>> On Mon, Nov 03, 2025 at 11:43:06AM +0530, Dev Jain wrote:
>>>>>>>>>>>>> Post a166563e7ec3 ("arm64: mm: support large block mapping
>>>>>>>>>>>>> when
>>>>>>>>>>>>> rodata=full"),
>>>>>>>>>>>>> __change_memory_common has a real chance of failing due to
>>>>>>>>>>>>> split
>>>>>>>>>>>>> failure.
>>>>>>>>>>>>> Before that commit, this line was introduced in c55191e96caa,
>>>>>>>>>>>>> still having
>>>>>>>>>>>>> a chance of failing if it needs to allocate pagetable
>>>>>>>>>>>>> memory in
>>>>>>>>>>>>> apply_to_page_range, although that has never been observed
>>>>>>>>>>>>> to be true.
>>>>>>>>>>>>> In general, we should always propagate the return value to
>>>>>>>>>>>>> the caller.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cc: stable at vger.kernel.org
>>>>>>>>>>>>> Fixes: c55191e96caa ("arm64: mm: apply r/o permissions of VM
>>>>>>>>>>>>> areas to its linear alias as well")
>>>>>>>>>>>>> Signed-off-by: Dev Jain <dev.jain at arm.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>> Based on Linux 6.18-rc4.
>>>>>>>>>>>>>
>>>>>>>>>>>>> arch/arm64/mm/pageattr.c | 5 ++++-
>>>>>>>>>>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/arch/arm64/mm/pageattr.c
>>>>>>>>>>>>> b/arch/arm64/mm/pageattr.c
>>>>>>>>>>>>> index 5135f2d66958..b4ea86cd3a71 100644
>>>>>>>>>>>>> --- a/arch/arm64/mm/pageattr.c
>>>>>>>>>>>>> +++ b/arch/arm64/mm/pageattr.c
>>>>>>>>>>>>> @@ -148,6 +148,7 @@ static int change_memory_common(unsigned
>>>>>>>>>>>>> long addr, int numpages,
>>>>>>>>>>>>> unsigned long size = PAGE_SIZE * numpages;
>>>>>>>>>>>>> unsigned long end = start + size;
>>>>>>>>>>>>> struct vm_struct *area;
>>>>>>>>>>>>> + int ret;
>>>>>>>>>>>>> int i;
>>>>>>>>>>>>> if (!PAGE_ALIGNED(addr)) {
>>>>>>>>>>>>> @@ -185,8 +186,10 @@ static int change_memory_common(unsigned
>>>>>>>>>>>>> long addr, int numpages,
>>>>>>>>>>>>> if (rodata_full && (pgprot_val(set_mask) ==
>>>>>>>>>>>>> PTE_RDONLY ||
>>>>>>>>>>>>> pgprot_val(clear_mask) == PTE_RDONLY)) {
>>>>>>>>>>>>> for (i = 0; i < area->nr_pages; i++) {
>>>>>>>>>>>>> - __change_memory_common((u64)page_address(area->pages[i]),
>>>>>>>>>>>>> + ret =
>>>>>>>>>>>>> __change_memory_common((u64)page_address(area->pages[i]),
>>>>>>>>>>>>> PAGE_SIZE, set_mask,
>>>>>>>>>>>>> clear_mask);
>>>>>>>>>>>>> + if (ret)
>>>>>>>>>>>>> + return ret;
>>>>>>>>>>>> Hmm, this means we can return failure half-way through the
>>>>>>>>>>>> operation. Is
>>>>>>>>>>>> that something callers are expecting to handle? If so, how
>>>>>>>>>>>> can they tell
>>>>>>>>>>>> how far we got?
>>>>>>>>>>> IIUC the callers don't have to know whether it is half-way
>>>>>>>>>>> or not
>>>>>>>>>>> because the callers will change the permission back (e.g. to
>>>>>>>>>>> RW) for the
>>>>>>>>>>> whole range when freeing memory.
>>>>>>>>>> Yes, it is the caller's responsibility to set
>>>>>>>>>> VM_FLUSH_RESET_PERMS flag.
>>>>>>>>>> Upon vfree(), it will change the direct map permissions back
>>>>>>>>>> to RW.
>>>>>>>>> Ok, but vfree() ends up using update_range_prot() to do that
>>>>>>>>> and if we
>>>>>>>>> need to worry about that failing (as per your commit message),
>>>>>>>>> then
>>>>>>>>> we're in trouble because the calls to set_area_direct_map()
>>>>>>>>> are unchecked.
>>>>>>>>>
>>>>>>>>> In other words, this patch is either not necessary or it is
>>>>>>>>> incomplete.
>>>>>>>>
>>>>>>>> Here is the relevant email, in the discussion between Ryan and
>>>>>>>> Yang:
>>>>>>>>
>>>>>>>> https://lore.kernel.org/all/fe52a1d8-5211-4962-afc8-c3f9caf64119@os.amperecomputing.com/
>>>>>>>>
>>>>>>>>
>>>>>>>> We had concluded that all callers of set_memory_ro() or
>>>>>>>> set_memory_rox() (which require the
>>>>>>>> linear map perm change back to default, upon vfree() ) will
>>>>>>>> call it for the entire region (vm_struct).
>>>>>>>> So, when we do the set_direct_map_invalid_noflush, it is
>>>>>>>> guaranteed that the region has already
>>>>>>>> been split. So this call cannot fail.
>>>>>>>>
>>>>>>>> https://lore.kernel.org/all/f8898c87-8f49-4ef2-86ae-b60bcf67658c@os.amperecomputing.com/
>>>>>>>>
>>>>>>>>
>>>>>>>> This email notes that there is some code doing set_memory_rw()
>>>>>>>> and unnecessarily setting the VM_FLUSH_RESET_PERMS
>>>>>>>> flag, but in that case we don't care about the
>>>>>>>> set_direct_map_invalid_noflush call failing because the
>>>>>>>> protections
>>>>>>>> are already RW.
>>>>>>>>
>>>>>>>> Although we had also observed that all of this is fragile and
>>>>>>>> depends on the caller doing the
>>>>>>>> correct thing. The real solution should be somehow getting rid
>>>>>>>> of the BBM style invalidation.
>>>>>>>> Ryan had proposed some methods in that email thread.
>>>>>>>>
>>>>>>>> One solution which I had thought of, is that, observe that we
>>>>>>>> are doing an overkill by
>>>>>>>> setting the linear map to invalid and then default, for the
>>>>>>>> *entire* region. What we
>>>>>>>> can do is iterate over the linear map alias of the vm_struct
>>>>>>>> *area and only change permission
>>>>>>>> back to RW for the pages which are *not* RW. And, those
>>>>>>>> relevant mappings are guaranteed to
>>>>>>>> be split because they were changed from RW to not RW.
>>>>>>>
>>>>>>> @Yang and Ryan,
>>>>>>>
>>>>>>> I saw Yang's patch here:
>>>>>>> https://lore.kernel.org/all/20251023204428.477531-1-yang@os.amperecomputing.com/
>>>>>>>
>>>>>>> and realized that currently we are splitting away the linear map
>>>>>>> alias of the *entire* region.
>>>>>>>
>>>>>>> Shouldn't this then imply that set_direct_map_invalid_noflush
>>>>>>> will never fail, since even
>>>>>>>
>>>>>>> a set_memory_rox() call on a single page will split the linear
>>>>>>> map for the entire region,
>>>>>>>
>>>>>>> and thus there is no fragility here which we were discussing
>>>>>>> about? I may be forgetting
>>>>>>>
>>>>>>> something, this linear map stuff is confusing enough already.
>>>>>>
>>>>>> It still may fail due to page table allocation failure when doing
>>>>>> split. But it is still fine. We may run into 3 cases:
>>>>>>
>>>>>> 1. set_memory_rox succeed to split the whole range, then
>>>>>> set_direct_map_invalid_noflush() will succeed too
>>>>>> 2. set_memory_rox fails to split, for example, just change
>>>>>> partial range permission due to page table allocation failure,
>>>>>> then set_direct_map_invalid_noflush() may
>>>>>> a. successfully change the permission back to default till
>>>>>> where set_memory_rox fails at since that range has been
>>>>>> successfully split. It is ok since the remaining range is
>>>>>> actually not changed to ro by set_memory_rox at all
>>>>>> b. successfully change the permission back to default for the
>>>>>> whole range (for example, memory pressure is mitigated when
>>>>>> set_direct_map_invalid_noflush() is called). It is definitely
>>>>>> fine as well
>>>>>
>>>>> Correct, what I mean to imply here is that, your patch will break
>>>>> this? If set_memory_* is applied on x till y, your patch changes
>>>>> the linear map alias
>>>>>
>>>>> only from x till y - set_direct_map_invalid_noflush instead
>>>>> operates on 0 till size - 1, where 0 <=x <=y <= size - 1. So, it
>>>>> may encounter a -ENOMEM
>>>>>
>>>>> on [0, x) range while invalidating, and that is *not* okay because
>>>>> we must reset back [0, x) to default?
>>>>
>>>> I see your point now. But I think the callers need to guarantee
>>>> they call set_memory_rox and set_direct_map_invalid_noflush on the
>>>> same range, right? Currently kernel just calls them on the whole area.
>>>
>>> Nope. The fact that the kernel changes protections, and undoes the
>>> changed protections, on the *entire* alias of the vm_struct region,
>>> protects us from the fragility we were talking about earlier.
>>
>> This is what I meant "kernel just calls them on the whole area".
>>
>>>
>>> Suppose you have a range from 0 till size - 1, and you call
>>> set_memory_* on a random point (page) p. The argument we discussed
>>> above is independent of p, which lets us drop our
>>>
>>> previous erroneous conclusion that all of this works because no
>>> caller does a partial set_memory_*.
>>
>> Sorry I don't follow you. What "erroneous conclusion" do you mean?
>> You can call set_memory_* on a random point, but
>> set_direct_map_invalid_noflush() should be called on the random point
>> too. The current code of set_area_direct_map() doesn't consider this
>> case because there is no such call. Is this what you meant?
>
>
> I was referring to the discussion in the linear map work - I think we
> had concluded that we don't need to worry about the BBM style
> invalidation failing, *because* no one does a partial set_memory_*.
Yes, we don't have to worry about it.
>
> What I am saying - we don't care whether caller does a partial or a
> full set_memory_*, we are still safe, because the linear map alias
> change on both sides (set_memory_* -> __change_memory_common, and
> vm_reset_perms -> set_area_direct_map() )
>
> operate on the entire region.
Yes, this is the current behavior. My patch changes
change_memory_common() to just do permission update for the requested
range from the callers instead of assuming change the entire region,
although there is no one calls set_memory_* on a partial range. Shall
set_area_direct_map() be aware of potential partial range change from
set_memory_*()? Maybe. But it is just called from vfree() which just
free the entire region.
What happened if someone does something crazy, for example, call
set_memory_* on a partial range, then call vfree? IIUC, it is fine as
well. It is still covered by the 3 cases that I mentioned in the
previous email if I don't miss anything, right?
Thanks,
Yang
>
>
>>
>>>
>>>
>>> I would like to send a patch clearly documenting this behaviour,
>>> assuming no one else finds a hole in this reasoning.
>>
>> Proper comment to explain the subtle behavior is definitely welcome.
>>
>> Thanks,
>> Yang
>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Yang
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Hopefully I don't miss anything.
>>>>>>
>>>>>> Thanks,
>>>>>> Yang
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Will
>>>>>>>>
>>>>>>
>>>>
>>
More information about the linux-arm-kernel
mailing list