[PATCH] arm64: tlb: Fix TLBI RANGE operand
Gavin Shan
gshan at redhat.com
Wed Apr 3 04:37:30 PDT 2024
On 4/3/24 18:58, Marc Zyngier wrote:
> On Wed, 03 Apr 2024 07:49:29 +0100,
> Gavin Shan <gshan at redhat.com> wrote:
>>
>> KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty
>> bitmap is collected by VMM and the corresponding PTEs need to be
>> write-protected again. Unfortunately, the operand passed to the TLBI
>> RANGE instruction isn't correctly sorted out by commit d1d3aa98b1d4
>> ("arm64: tlb: Use the TLBI RANGE feature in arm64"). It leads to
>> crash on the destination VM after live migration because some of the
>> dirty pages are missed.
>>
>> For example, I have a VM where 8GB memory is assigned, starting from
>> 0x40000000 (1GB). Note that the host has 4KB as the base page size.
>> All TLBs for VM can be covered by one TLBI RANGE operation. However,
>> I receives 0xffff708000040000 as the operand, which is wrong and the
>> correct one should be 0x00007f8000040000. From the wrong operand, we
>> have 3 and 1 for SCALE (bits[45:44) and NUM (bits943:39], only 1GB
>> instead of 8GB memory is covered.
>>
>> Fix the macro __TLBI_RANGE_NUM() so that the correct NUM and TLBI
>> RANGE operand are provided.
>>
>> Fixes: d1d3aa98b1d4 ("arm64: tlb: Use the TLBI RANGE feature in arm64")
>> Cc: stable at kernel.org # v5.10+
>> Reported-by: Yihuang Yu <yihyu at redhat.com>
>> Signed-off-by: Gavin Shan <gshan at redhat.com>
>> ---
>> arch/arm64/include/asm/tlbflush.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
>> index 3b0e8248e1a4..07c4fb4b82b4 100644
>> --- a/arch/arm64/include/asm/tlbflush.h
>> +++ b/arch/arm64/include/asm/tlbflush.h
>> @@ -166,7 +166,7 @@ static inline unsigned long get_trans_granule(void)
>> */
>> #define TLBI_RANGE_MASK GENMASK_ULL(4, 0)
>> #define __TLBI_RANGE_NUM(pages, scale) \
>> - ((((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK) - 1)
>> + ((((pages) >> (5 * (scale) + 1)) - 1) & TLBI_RANGE_MASK)
>>
>> /*
>> * TLB Invalidation
>
> This looks pretty wrong, by the very definition of the comment that's
> just above:
>
> <quote>
> /*
> * Generate 'num' values from -1 to 30 with -1 rejected by the
> * __flush_tlb_range() loop below.
> */
> </quote>
>
> With your change, num can't ever be negative, and that breaks
> __flush_tlb_range_op():
>
> <quote>
> num = __TLBI_RANGE_NUM(pages, scale); \
> if (num >= 0) { \
> addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
> scale, num, tlb_level); \
> __tlbi(r##op, addr); \
> if (tlbi_user) \
> __tlbi_user(r##op, addr); \
> start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
> pages -= __TLBI_RANGE_PAGES(num, scale); \
> } \
> scale--; \
> </quote>
>
> We'll then shove whatever value we've found in the TLBI operation,
> leading to unknown results instead of properly adjusting the scale to
> issue a smaller invalidation.
>
Marc, thanks for your review and comments.
Indeed, this patch is incomplete at least. I think we need __TLBI_RANGE_NUM()
to return [-1 31] instead of [-1 30], to be consistent with MAX_TLBI_RANGE_PAGES.
-1 will be rejected in the following loop. I'm not 100% sure if I did the correct
calculation though.
/*
* Generate 'num' values in range [-1 31], but -1 will be rejected
* by the __flush_tlb_range() loop below.
*/
#define __TLBI_RANGE_NUM(pages, scale) \
({ \
int __next = (pages) & (1ULL << (5 * (scale) + 6)); \
int __mask = ((pages) >> (5 * (scale) + 1)) & TLBI_RANGE_MASK; \
int __num = (((pages) >> (5 * (scale) + 1)) - 1) & \
TLBI_RANGE_MASK; \
(__next || __mask) ? __num : -1; \
})
Alternatively, we can also limit the number of pages to be invalidated from
arch/arm64/kvm/hyp/pgtable.c::kvm_tlb_flush_vmid_range() because the maximal
capacity is (MAX_TLBI_RANGE_PAGES - 1) instead of MAX_TLBI_RANGE_PAGES, as
the comments for __flush_tlb_range_nosync() say.
- inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
+ inval_pages = min(pages, MAX_TLBI_RANGE_PAGES - 1);
static inline void __flush_tlb_range_nosync(...)
{
:
/*
* When not uses TLB range ops, we can handle up to
* (MAX_DVM_OPS - 1) pages;
* When uses TLB range ops, we can handle up to
* (MAX_TLBI_RANGE_PAGES - 1) pages.
*/
if ((!system_supports_tlb_range() &&
(end - start) >= (MAX_DVM_OPS * stride)) ||
pages >= MAX_TLBI_RANGE_PAGES) {
flush_tlb_mm(vma->vm_mm);
return;
}
}
Please let me know which way is better.
> I think the problem is that you are triggering NUM=31 and SCALE=3,
> which the current code cannot handle as per the comment above
> __flush_tlb_range_op() (we can't do NUM=30 and SCALE=4, obviously).
>
Yes, exactly.
> Can you try the untested patch below?
>
>
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 3b0e8248e1a4..b71a1cece802 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -379,10 +379,6 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
> * 3. If there is 1 page remaining, flush it through non-range operations. Range
> * operations can only span an even number of pages. We save this for last to
> * ensure 64KB start alignment is maintained for the LPA2 case.
> - *
> - * Note that certain ranges can be represented by either num = 31 and
> - * scale or num = 0 and scale + 1. The loop below favours the latter
> - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
> */
> #define __flush_tlb_range_op(op, start, pages, stride, \
> asid, tlb_level, tlbi_user, lpa2) \
> @@ -407,6 +403,7 @@ do { \
> \
> num = __TLBI_RANGE_NUM(pages, scale); \
> if (num >= 0) { \
> + num += 1; \
> addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
> scale, num, tlb_level); \
> __tlbi(r##op, addr); \
>
Thanks, but I don't think it's going to work. The loop will be running infinitely
because the condition 'if (num >= 0)' can't be met when @pages is 0x200000 when
@scale is 3/2/1/0 until @scale becomes negative and positive again, but @scale
isn't in range [0 3]. I ported the chunk of code to user-space and I can see this
with added printf() messages.
Thanks,
Gavin
More information about the linux-arm-kernel
mailing list