[v3 PATCH 1/2] hugetlb: arm64: add mte support

Mon Sep 9 10:38:25 PDT 2024

On 9/9/24 1:43 AM, David Hildenbrand wrote:
> On 06.09.24 19:59, Yang Shi wrote:
>> Enable MTE support for hugetlb.
>>
>> The MTE page flags will be set on the folio only.  When copying
>> hugetlb folio (for example, CoW), the tags for all subpages will be 
>> copied
>> when copying the first subpage.
>>
>> When freeing hugetlb folio, the MTE flags will be cleared.
>>
>> Signed-off-by: Yang Shi <yang at os.amperecomputing.com>
>> ---
>>   arch/arm64/include/asm/hugetlb.h | 15 ++++++-
>>   arch/arm64/include/asm/mman.h    |  3 +-
>>   arch/arm64/include/asm/mte.h     | 67 ++++++++++++++++++++++++++++++++
>>   arch/arm64/kernel/hibernate.c    |  7 ++++
>>   arch/arm64/kernel/mte.c          | 25 +++++++++++-
>>   arch/arm64/kvm/guest.c           | 16 ++++++--
>>   arch/arm64/kvm/mmu.c             | 11 ++++++
>>   arch/arm64/mm/copypage.c         | 33 +++++++++++++---
>>   fs/hugetlbfs/inode.c             |  2 +-
>>   9 files changed, 166 insertions(+), 13 deletions(-)
>>
>> v3: * Fixed the build error when !CONFIG_ARM64_MTE.
>>      * Incorporated the comment from David to have hugetlb folio
>>        specific APIs for manipulating the page flags.
>>      * Don't assume the first page is the head page since huge page copy
>>        can start from any subpage.
>> v2: * Reimplemented the patch to fix the comments from Catalin.
>>      * Added test cases (patch #2) per Catalin.
>>
>> diff --git a/arch/arm64/include/asm/hugetlb.h 
>> b/arch/arm64/include/asm/hugetlb.h
>> index 293f880865e8..06f621c5cece 100644
>> --- a/arch/arm64/include/asm/hugetlb.h
>> +++ b/arch/arm64/include/asm/hugetlb.h
>> @@ -11,6 +11,7 @@
>>   #define __ASM_HUGETLB_H
>>     #include <asm/cacheflush.h>
>> +#include <asm/mte.h>
>>   #include <asm/page.h>
>>     #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
>> @@ -18,9 +19,21 @@
>>   extern bool arch_hugetlb_migration_supported(struct hstate *h);
>>   #endif
>>   +#ifdef CONFIG_ARM64_MTE
>> +#define CLEAR_FLAGS (BIT(PG_dcache_clean) | BIT(PG_mte_tagged) | \
>> +             BIT(PG_mte_lock))
>> +#else
>> +#define CLEAR_FLAGS BIT(PG_dcache_clean)
>> +#endif
>> +
>>   static inline void arch_clear_hugetlb_flags(struct folio *folio)
>>   {
>> -    clear_bit(PG_dcache_clean, &folio->flags);
>> +    if (!system_supports_mte()) {
>> +        clear_bit(PG_dcache_clean, &folio->flags);
>> +        return;
>> +    }
>> +
>> +    folio->flags &= ~CLEAR_FLAGS;
>
> In contrast to clear_bit, this is now not an atomic operation anymore. 
> Could we have concurrent modifications (locking the folio? mte?) where 
> we could mess up (IOW, is there a reason we don't do __clear_bit in 
> existing code)?

AFAICT, I don't see there is any concurrent modification to them. 
arch_clear_hugetlb_flags() is called when:
   * hugetlb folio is freed
   * new hugetlb folio is enqueued (not available for allocation yet)

And is is protected by hugetlb_lock, all hugetlb folio allocation is 
serialized by the same lock. PG_mte_* flags can't be set before hugetlb 
folio is allocated. So I don't think atomic operation is actually 
necessary. Not sure if I miss something or not.

But all arches use clear_bit(). As you suggested below, it may be safer 
start with clear_bit().

>
> Maybe start with:
>
> static inline void arch_clear_hugetlb_flags(struct folio *folio)
> {
>     clear_bit(PG_dcache_clean, &folio->flags);
> #ifdef CONFIG_ARM64_MTE
>     if (system_supports_mte()) {
>         clear_bit(PG_mte_tagged, &folio->flags);
>         clear_bit(PG_mte_lock, &folio->flags);
>     }
> #endif
> }
>
> And if you can argue that atomics are not required, convert all to 
> __clear_bit() and have the compiler optimize it for you.
>
>>   }
>>   #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags
>>   > diff --git a/arch/arm64/include/asm/mte.h 
>> b/arch/arm64/include/asm/mte.h
>> index 0f84518632b4..cec9fb6fec3b 100644
>> --- a/arch/arm64/include/asm/mte.h
>> +++ b/arch/arm64/include/asm/mte.h
>> @@ -41,6 +41,8 @@ void mte_free_tag_storage(char *storage);
>>     static inline void set_page_mte_tagged(struct page *page)
>>   {
>> +    VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>> +
>>       /*
>>        * Ensure that the tags written prior to this function are visible
>>        * before the page flags update.
>> @@ -51,6 +53,8 @@ static inline void set_page_mte_tagged(struct page 
>> *page)
>>     static inline bool page_mte_tagged(struct page *page)
>>   {
>> +    VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>> +
>>       bool ret = test_bit(PG_mte_tagged, &page->flags);
>>         /*
>> @@ -76,6 +80,8 @@ static inline bool page_mte_tagged(struct page *page)
>>    */
>>   static inline bool try_page_mte_tagging(struct page *page)
>>   {
>> +    VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>> +
>>       if (!test_and_set_bit(PG_mte_lock, &page->flags))
>>           return true;
>
> [...]
>
>> +static inline void set_folio_hugetlb_mte_tagged(struct folio *folio)
>> +{
>> +}
>> +
>> +static inline bool folio_hugetlb_mte_tagged(struct folio *folio)
>> +{
>> +    return false;
>> +}
>> +
>> +static inline bool try_folio_hugetlb_mte_tagging(struct folio *folio)
>> +{
>> +    return false;
>> +}
>
> I would suggest to stick to the format of our folio_test / folio_set 
> ... functions. Please refer to 
> folio_set_hugetlb_migratable/folio_set_hugetlb_hwpoison/ ...
>
> Something like:
>
> folio_test_hugetlb_mte_tagged
> folio_set_hugetlb_mte_tagged

I don't have strong opinion on these two. They are straightforward 
enough and the users can easily map them to the page_* variants.

>
> But the semantics of try_folio_hugetlb_mte_tagging() are a bit less 
> obvious. I would suggest
>
> folio_test_and_set_hugetlb_mte_lock()

This one is a little bit hard to map to try_page_mte_tagging() TBH.

>
>
> We should probably clean up the page_* variants separately.

Probably. The most confusing one is actually try_page_mte_tagging(). We 
should probably change it to test_and_set_page_mte_lock(). But anyway 
they are just used by arm64, so I don't worry too much and I'd like to 
hear from arm64 maintainers.

>
> But ARM maintainers can feel free to intervene here.
>
>> +#endif
>> +
>>   static inline void mte_disable_tco_entry(struct task_struct *task)
>>   {
>>       if (!system_supports_mte())
>> diff --git a/arch/arm64/kernel/hibernate.c 
>> b/arch/arm64/kernel/hibernate.c
>> index 02870beb271e..ebf81fffa79d 100644
>> --- a/arch/arm64/kernel/hibernate.c
>> +++ b/arch/arm64/kernel/hibernate.c
>> @@ -266,10 +266,17 @@ static int swsusp_mte_save_tags(void)
>>           max_zone_pfn = zone_end_pfn(zone);
>>           for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
>>               struct page *page = pfn_to_online_page(pfn);
>> +            struct folio *folio;
>>                 if (!page)
>>                   continue;
>
> Nit: I would drop this empty line.

OK

>
>> +            folio = page_folio(page);
>> +
>> +            if (folio_test_hugetlb(folio) &&
>> +                !folio_hugetlb_mte_tagged(folio))
>> +                continue;
>> +
>>               if (!page_mte_tagged(page))
>>                   continue;
>>   diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
>> index 6174671be7c1..c8b13bf36fc6 100644
>> --- a/arch/arm64/kernel/mte.c
>> +++ b/arch/arm64/kernel/mte.c
>> @@ -38,7 +38,22 @@ EXPORT_SYMBOL_GPL(mte_async_or_asymm_mode);
>>   void mte_sync_tags(pte_t pte, unsigned int nr_pages)
>>   {
>>       struct page *page = pte_page(pte);
>> -    unsigned int i;
>> +    struct folio *folio = page_folio(page);
>> +    unsigned long i;
>> +
>> +    if (folio_test_hugetlb(folio)) {
>> +        unsigned long nr = folio_nr_pages(folio);
>
> Nit: empty line please.

OK

>
>> +        /* Hugetlb MTE flags are set for head page only */
>> +        if (try_folio_hugetlb_mte_tagging(folio)) {
>> +            for (i = 0; i < nr; i++, page++)
>> +                mte_clear_page_tags(page_address(page));
>> +            set_folio_hugetlb_mte_tagged(folio);
>> +        }
>> +
>> +        smp_wmb();
>
> We already do have one in set_folio_hugetlb_mte_tagged() [and 
> try_folio_hugetlb_mte_tagging() does some magic as well], do we really 
> need this smp_wmb()?
>
> In general, I think checkpatch will tell you to document memory 
> barriers and their counterparts thoroughly.

It is not used to guarantee the PG_mte_tagged flag is visible, it is 
used to guarantee the tags are visible before PTE is set. The 
non-hugetlb path has:

         /* ensure the tags are visible before the PTE is set */
         smp_wmb();

I should keep the comment for hugetlb path as well.

>
>> +
>> +        return;
>> +    }
>
>
>
>