[PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp

Fri Sep 19 03:53:05 PDT 2025

On 2025/9/19 16:14, Lance Yang wrote:
> 
> 
> On 2025/9/19 15:55, David Hildenbrand wrote:
>>>> I think where possible we really only want to identify problematic
>>>> (tagged) pages and skip them. And we should either look into fixing KSM
>>>> as well or finding out why KSM is not affected.
>>>
>>> Yeah. Seems like we could introduce a new helper,
>>> folio_test_mte_tagged(struct
>>> folio *folio). By default, it would return false, and architectures like
>>> arm64
>>> can override it.
>>
>> If we add a new helper it should instead express the semantics that we 
>> cannot deduplicate.
> 
> Agreed.
> 
>>
>> For THP, I recall that only some pages might be tagged. So likely we 
>> want to check per page.
> 
> Yes, a per-page check would be simpler.
> 
>>
>>>
>>> Looking at the code, the PG_mte_tagged flag is not set for regular THP.
>>
>> I think it's supported for THP per page. Only for hugetlb we tag the 
>> whole thing through the head page instead of individual pages.
> 
> Right. That's exactly what I meant.
> 
>>
>>> The MTE
>>> status actually comes from the VM_MTE flag in the VMA that maps it.
>>>
>>
>> During the rmap walk we could check the VMA flag, but there would be 
>> no way to just stop the THP shrinker scanning this page early.
>>
>>> static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio)
>>> {
>>>     bool ret = test_bit(PG_mte_tagged, &folio->flags.f);
>>>
>>>     VM_WARN_ON_ONCE(!folio_test_hugetlb(folio));
>>>
>>>     /*
>>>      * If the folio is tagged, ensure ordering with a likely subsequent
>>>      * read of the tags.
>>>      */
>>>     if (ret)
>>>         smp_rmb();
>>>     return ret;
>>> }
>>>
>>> static inline bool page_mte_tagged(struct page *page)
>>> {
>>>     bool ret = test_bit(PG_mte_tagged, &page->flags.f);
>>>
>>>     VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>>>
>>>     /*
>>>      * If the page is tagged, ensure ordering with a likely subsequent
>>>      * read of the tags.
>>>      */
>>>     if (ret)
>>>         smp_rmb();
>>>     return ret;
>>> }
>>>
>>> contpte_set_ptes()
>>>     __set_ptes()
>>>         __set_ptes_anysz()
>>>             __sync_cache_and_tags()
>>>                 mte_sync_tags()
>>>                     set_page_mte_tagged()
>>>
>>> Then, having the THP shrinker skip any folios that are identified as
>>> MTE-tagged.
>>
>> Likely we should just do something like (maybe we want better naming)
>>
>> #ifndef page_is_mergable
>> #define page_is_mergable(page) (true)
>> #endif
> 
> 
> Maybe something like page_is_optimizable()? Just a thought ;p
> 
>>
>> And for arm64 have it be
>>
>> #define page_is_mergable(page) (!page_mte_tagged(page))
>>
>>
>> And then do
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 1f0813b956436..1cac9093918d6 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio)
>>
>>          for (i = 0; i < folio_nr_pages(folio); i++) {
>>                  kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>> -               if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>> +               if (page_is_mergable(folio_page(folio, i)) &&
>> +                   !memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>                          num_zero_pages++;
>>                          if (num_zero_pages > khugepaged_max_ptes_none) {
>>                                  kunmap_local(kaddr);
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 946253c398072..476a9a9091bd3 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(struct 
>> page_vma_mapped_walk *pvmw,
>>
>>          if (PageCompound(page))
>>                  return false;
>> +       if (!page_is_mergable(page))
>> +               return false;
>>          VM_BUG_ON_PAGE(!PageAnon(page), page);
>>          VM_BUG_ON_PAGE(!PageLocked(page), page);
>>          VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);
> 
> Looks good to me!
> 
>>
>>
>> For KSM, similarly just bail out early. But still wondering if this is 
>> already checked
>> somehow for KSM.
> 
> +1 I'm looking for a machine to test it on.

Interestingly, it seems KSM is already skipping MTE-tagged pages. My test,
running on a v6.8.0 kernel inside QEMU (with MTE enabled), shows no merging
activity for those pages ...