[PATCH v6 2/2] mm: use mapping_max_folio_order() for force_thp_readahead order
Pedro Falcato
pfalcato at suse.de
Wed Jun 3 04:51:21 PDT 2026
On Wed, Jun 03, 2026 at 11:10:45AM +0100, Usama Arif wrote:
>
>
> On 02/06/2026 18:35, Pedro Falcato wrote:
> > On Sat, May 30, 2026 at 05:16:29PM +0200, Jan Kara wrote:
> >> On Fri 29-05-26 15:11:54, Usama Arif wrote:
> >>> On 29/05/2026 14:40, Pedro Falcato wrote:
> >>>> On Fri, May 29, 2026 at 01:19:03PM +0100, Usama Arif wrote:
> >>>>>
> >>>>> which means mapping_max_folio_order(mapping) <= MAX_PAGECACHE_ORDER <= HPAGE_PMD_ORDER is always
> >>>>> true, and you dont need the min3(..) in your diff.
> >>>>>
> >>>>> Now the question is if then why not just do:
> >>>>>
> >>>>> if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && (vm_flags & VM_HUGEPAGE)) {
> >>>>> if (mapping_large_folio_support(mapping)) {
> >>>>> force_thp_readahead = true;
> >>>>> thp_order = min_t(unsigned int,
> >>>>> mapping_max_folio_order(mapping),
> >>>>> get_order(SZ_2M));
> >>>>> }
> >>>>> }
> >>>>>
> >>>>>
> >>>>> This is because this will regress the 16K ARM case where we already got 32M
> >>>>> folios. Someone might upgrade the kernel and start getting 2M folios now.
> >>>>
> >>>> So maybe limit to 32MB? It's still arbitrary but at least you get simpler
> >>>> logic. If the architecture does not support 32MiB folios, it will clamp
> >>>> the maximum folio order to HPAGE_PMD_ORDER, and you get the same result.
> >>>>
> >>>> Does this sound correct?
> >>>>
> >>>
> >>> Yes, so if we replace it with SZ_32M, it sounds correct. I just think
> >>> the 32M size is too large. But as you pointed out, even 2M can be too large...
> >>
> >> So AFAIU the practical discussion is about two options:
> >>
> >> 1) limiting at 2MB with a slighly more complicated logic to keep mapping at
> >> PMD order for 16k pagesize on ARM but use 2MB pages for 64k pagesize on ARM
> >>
> >> or
> >>
> >> 2) limit at 32MB with simple logic which results in larger (32MB) folios
> >> with 16k and 64k pagesize on ARM and thus larger memory overhead.
> >>
> >> I'd like to maybe offer option 3): limit at 2MB with simple logic. This
> >> will reduce folio size on 16k pagesize ARM compared to 1) but do we really
> >> care? I.e., is there big enough practical performance impact with conpte
> >> and other tricks ARM is playing?
> >>
> >
> > arm64 16K contpte tops out at 256KB TLB entries. It's quite a lot smaller than
> > a PMD entry. Also, something that was discussed at LSFMM was its effectiveness.
> > Apparently, most of the gains seem to sit on actually having a larger page size
> > (perhaps Dev/Ryan can comment; sadly the slides were not posted anywhere on
> > the ML, so I don't have numbers).
> >
> > To me, the question is quite clear: do we trust users that say "please give me
> > hugepages" enough to unconditionally give them hugepages? I would assume the
> > answer lies somewhere between "yes" and "no", but 32MB I would say is not
> > particularly excessive. 512MB is... much worse.
> >
>
> I think the other question also is, if the userspace asks for hugepages, is it asking
> for the biggest possible one? I think the answer is yes on 4K base page size when
> largest is 2M, but maybe not the case for 16K and 64K.
Yep, fully agree, the interface itself is limited. Though perhaps userspace
itself would not know...
>
> /sys/kernel/mm/transparent_hugepage/hugepages-* is supposed to be used
> for anon only, but maybe in the future we could use that to determine the size
> of THP to give to the user for file over here? For e.g. over here we could have
> used it to determine what the biggest size is that has madvise (or always) set
> and used it over here. Its probably a much bigger discussion.
Hmm, I'm not a huge fan of those toggles (that not many people know how to
toggle), maybe using those would be a mistake. Do note that shmem already has
its own set of confusing toggles, which work similarly to these anon ones.
But, yes, it's a much bigger discussion :)
--
Pedro
More information about the linux-arm-kernel
mailing list