[PATCH] hugetlbfs: add MTE support

Yang Shi yang at os.amperecomputing.com
Tue Jul 9 10:42:58 PDT 2024



On 7/4/24 6:44 AM, Catalin Marinas wrote:
> On Tue, Jul 02, 2024 at 05:15:12PM -0700, Yang Shi wrote:
>> On 7/2/24 5:04 PM, Yang Shi wrote:
>>> On 7/2/24 5:34 AM, Catalin Marinas wrote:
>>>> Last time I checked, about a year ago, this was not sufficient. One
>>>> issue is that there's no arch_clear_hugetlb_flags() implemented by your
>>>> patch, leaving PG_arch_{2,3} set on a page. The other issue was that I
>>>> initially tried to do this only on the head page but this did not go
>>>> well with the folio_copy() -> copy_highpage() which expects the
>>>> PG_arch_* flags on each individual page. The alternative was for
>>>> arch_clear_hugetlb_flags() to iterate over all the pages in a folio.
>>> Thanks for pointing this out. I did miss this point. I took a quick look
>>> at when the PG_ flags are set. IIUC, it is set by post_alloc_hook() for
>>> order-0 anonymous folio (clearing page and tags) and set_ptes() for
>>> others (just clear tags), for example, THP and hugetlb.
>>>
>>> I can see THP does set the PG_mte_tagged flag for each sub pages. But it
>>> seems it does not do it for hugetlb if I read the code correctly. The
>>> call path is:
>>>
>>> hugetlb_fault() ->
>>>    hugetlb_no_page->
>>>      set_huge_pte_at ->
>>>        __set_ptes() ->
>>>          __sync_cache_and_tags() ->
>>>
>>>
>>> The __set_ptes() is called in a loop:
>>>
>>> if (!pte_present(pte)) {
>>>          for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
>>>              __set_ptes(mm, addr, ptep, pte, 1);
>>>          return;
>>>      }
>>>
>>> The ncontig and pgsize are returned by num_contig_ptes(). For example,
>>> 2M hugetlb, ncontig is 1 and pgsize is 2M IIUC. So it means actually
>>> just the head page has PG_mte_tagged set. If so the copy_highpage() will
>>> just copy the old head page's flag to the new head page, and the tag.
>>> All the sub pages don't have PG_mte_tagged set.
>>>
>>> Is it expected behavior? I'm supposed we need tags for every sub pages
>>> too, right?
>> We should need something like the below to have tags and page flag set up
>> for each sub page:
>>
>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> index 3f09ac73cce3..528164deef27 100644
>> --- a/arch/arm64/mm/hugetlbpage.c
>> +++ b/arch/arm64/mm/hugetlbpage.c
>> @@ -228,9 +228,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned
>> long addr,
>>          int ncontig;
>>          unsigned long pfn, dpfn;
>>          pgprot_t hugeprot;
>> +       unsigned long nr = sz >> PAGE_SHIFT;
>>
>>          ncontig = num_contig_ptes(sz, &pgsize);
>>
>> +       __sync_cache_and_tags(pte, nr);
>> +
>>          if (!pte_present(pte)) {
>>                  for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
>>                          __set_ptes(mm, addr, ptep, pte, 1);
> We only need this for the last loop when we have a present pte. I'd also
> avoid calling __set_ptes(...., 1) if we call the __sync_cache_and_tags()
> here already. Basically just unroll __set_ptes() in set_huge_pte_at()
> and call __set_pte() directly.
>
> It might be better to convert those page flag checks to only happen on
> the head page. My stashed changes from over a year ago (before we had
> more folio conversions) below. However, as I mentioned, I got stuck on
> folio_copy() which also does a cond_resched() between copy_highpage().

We can have the page flags set for head only for hugetlb page. For 
copy_highpage(), we should be able to do something like the below:

if  page_is_head && page_is_hugetlb && page_has_mte_tagged
     set page_mte_tagged flags
     copy tags for all sub pages
else // <-- tail page or non-hugetlb page
     current copy_highpage implementation


The hugetlb folio can't go away under us since migration path should pin 
it so the status of folio is stable. The preemption caused by 
cond_resched() should be fine too due to the pin and the page table 
entry keeps being migration entry until migration is done, so every one 
should just see migration entry and wait for migration is done.

The other concerned user of copy_highpage() is uprobe, but it also pins 
the page then doing copy and it is called with holding write mmap_lock.

IIUC, it should work if I don't miss something. This also should have no 
impact on HVO. The overhead for other users of copy_highpage() should be 
also acceptable.

>
> ---------8<--------------------------------
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index f5bcb0dc6267..a84ad0e46f12 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -49,7 +49,9 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte,
>   		return;
>   
>   	if (try_page_mte_tagging(page)) {
> -		mte_clear_page_tags(page_address(page));
> +		unsigned long i, nr_pages = compound_nr(page);
> +		for (i = 0; i < nr_pages; i++)
> +			mte_clear_page_tags(page_address(page + i));
>   		set_page_mte_tagged(page);
>   	}
>   }
> @@ -57,22 +59,23 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte,
>   void mte_sync_tags(pte_t old_pte, pte_t pte)
>   {
>   	struct page *page = pte_page(pte);
> -	long i, nr_pages = compound_nr(page);
> -	bool check_swap = nr_pages == 1;
> +	bool check_swap = !PageCompound(page);
>   	bool pte_is_tagged = pte_tagged(pte);
>   
>   	/* Early out if there's nothing to do */
>   	if (!check_swap && !pte_is_tagged)
>   		return;
>   
> +	/*
> +	 * HugeTLB pages are always fully mapped, so only setting head page's
> +	 * PG_mte_* flags is enough.
> +	 */
> +	if (PageHuge(page))
> +		page = compound_head(page);
> +
>   	/* if PG_mte_tagged is set, tags have already been initialised */
> -	for (i = 0; i < nr_pages; i++, page++) {
> -		if (!page_mte_tagged(page)) {
> -			mte_sync_page_tags(page, old_pte, check_swap,
> -					   pte_is_tagged);
> -			set_page_mte_tagged(page);
> -		}
> -	}
> +	if (!page_mte_tagged(page))
> +		mte_sync_page_tags(page, old_pte, check_swap, pte_is_tagged);
>   
>   	/* ensure the tags are visible before the PTE is set */
>   	smp_wmb();
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 5626ddb540ce..692198d8c0d2 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -1079,7 +1079,7 @@ long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>   
>   			/* uaccess failed, don't leave stale tags */
>   			if (num_tags != MTE_GRANULES_PER_PAGE)
> -				mte_clear_page_tags(page);
> +				mte_clear_page_tags(page_address(page));
>   			set_page_mte_tagged(page);
>   
>   			kvm_release_pfn_dirty(pfn);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 31d7fa4c7c14..b531b1d75cda 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1173,11 +1173,10 @@ static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>   	if (!kvm_has_mte(kvm))
>   		return;
>   
> -	for (i = 0; i < nr_pages; i++, page++) {
> -		if (try_page_mte_tagging(page)) {
> -			mte_clear_page_tags(page_address(page));
> -			set_page_mte_tagged(page);
> -		}
> +	if (try_page_mte_tagging(page)) {
> +		for (i = 0; i < nr_pages; i++)
> +			mte_clear_page_tags(page_address(page + i));
> +		set_page_mte_tagged(page);
>   	}
>   }
>   
>




More information about the linux-arm-kernel mailing list