[PATCH v6 0/9] variable-order, large folios for anonymous memory

Yang Shi shy828301 at gmail.com
Tue Oct 31 11:29:34 PDT 2023


On Tue, Oct 31, 2023 at 4:55 AM Ryan Roberts <ryan.roberts at arm.com> wrote:
>
> On 31/10/2023 11:50, Ryan Roberts wrote:
> > On 06/10/2023 21:06, David Hildenbrand wrote:
> > [...]
> >>
> >> Change 2: sysfs interface.
> >>
> >> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I
> >> agree.
> >>
> >> What we expose there and how, is TBD. Again, not a friend of "orders" and
> >> bitmaps at all. We can do better if we want to go down that path.
> >>
> >> Maybe we should take a look at hugetlb, and how they added support for multiple
> >> sizes. What *might* make sense could be (depending on which values we actually
> >> support!)
> >>
> >>
> >> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/
> >> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/
> >> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/
> >> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/
> >> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/
> >> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/
> >>
> >> Each one would contain an "enabled" and "defrag" file. We want something minimal
> >> first? Start with the "enabled" option.
> >>
> >>
> >> enabled: always [global] madvise never
> >>
> >> Initially, we would set it for PMD-sized THP to "global" and for everything else
> >> to "never".
> >
> > Hi David,
> >
> > I've just started coding this, and it occurs to me that I might need a small
> > clarification here; the existing global "enabled" control is used to drive
> > decisions for both anonymous memory and (non-shmem) file-backed memory. But the
> > proposed new per-size "enabled" is implicitly only controlling anon memory (for
> > now).
> >
> > 1) Is this potentially confusing for the user? Should we rename the per-size
> > controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so
> > we can reuse the same control for file-backed memory in future?
> >
> > 2) The global control will continue to drive the file-backed memory decision
> > (for now), even when hugepages-2048kB/enabled != "global"; agreed?
> >
> > Thanks,
> > Ryan
> >
>
> Also, an implementation question:
>
> hugepage_vma_check() doesn't currently care whether enabled="never" for DAX VMAs
> (although it does honour MADV_NOHUGEPAGE and the prctl); It will return true
> regardless. Is that by design? It couldn't fathom any reasoning from the commit log:

The enabled="never" is for anonymous VMAs, DAX VMAs are typically file VMAs.

>
> bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
>                         bool smaps, bool in_pf, bool enforce_sysfs)
> {
>         if (!vma->vm_mm)                /* vdso */
>                 return false;
>
>         /*
>          * Explicitly disabled through madvise or prctl, or some
>          * architectures may disable THP for some mappings, for
>          * example, s390 kvm.
>          * */
>         if ((vm_flags & VM_NOHUGEPAGE) ||
>             test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>                 return false;
>         /*
>          * If the hardware/firmware marked hugepage support disabled.
>          */
>         if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
>                 return false;
>
>         /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
>         if (vma_is_dax(vma))
>                 return in_pf;  <<<<<<<<
>
>         ...
> }
>
>



More information about the linux-arm-kernel mailing list