[PATCH 3/4] arm64: mm: support large block mapping when rodata=full

Wed Aug 6 17:44:16 PDT 2025

On 8/6/25 12:20 AM, Ryan Roberts wrote:
> On 05/08/2025 19:53, Yang Shi wrote:
>
> [...]
>
>>>> +    arch_enter_lazy_mmu_mode();
>>>> +    ret = split_pgd(pgd_offset_k(start), start, end);
>>> My instinct still remains that it would be better not to iterate over the range
>>> here, but instead call a "split(start); split(end);" since we just want to split
>>> the start and end. So the code would be simpler and probably more performant if
>>> we get rid of all the iteration.
>> It should be more performant for splitting large range, especially the range
>> includes leaf mappings at different levels. But I had some optimization to skip
>> leaf mappings in this version, so it should be close to your implementation from
>> performance perspective. And it just walks the page table once instead of twice.
>> It should be more efficient for small split, for example, 4K.
> I guess this is the crux of our disagreement. I think the "walks the table once
> for 4K" is a micro optimization, which I doubt we would see on any benchmark
> results. In the absence of data, I'd prefer the simpler, smaller, easier to
> understand version.

I did a simple benchmark with module stressor from stress-ng. I used the 
below command line:
stress-ng --module 1 --module-name loop --module-ops 1000

It basically loads loop module 1000 times. I saw a slight slowdown (2% - 
3% slowdown, average time spent in 5 iterations) with your 
implementation on my AmpereOne machine. It shouldn't result in any 
noticeable slowdown for real life workloads per the data.

Thanks,
Yang

>
> Both implementations are on list now; perhaps the maintainers can steer us.
>
> Thanks,
> Ryan