[tlb] e242a269fa: WARNING:at_mm/mmu_gather.c:#tlb_gather_mmu

Mon Nov 23 12:51:08 EST 2020

Hmm, this is interesting but my x86-fu is a bit lacking:

On Sun, Nov 22, 2020 at 11:11:58PM +0800, kernel test robot wrote:
> commit: e242a269fa4b7aee0b157ce5b1d7d12179fc3c44 ("[PATCH 5/6] tlb: mmu_gather: Introduce tlb_gather_mmu_fullmm()")
> url: https://github.com/0day-ci/linux/commits/Will-Deacon/tlb-Fix-access-and-soft-dirty-bit-management/20201120-223809
> base: https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git for-next/core

[...]

> [   14.182822] WARNING: CPU: 0 PID: 1 at mm/mmu_gather.c:293 tlb_gather_mmu+0x40/0x99

This fires because free_ldt_pgtables() initialises an mmu_gather() with
an end address > TASK_SIZE. In other words, this code:

	unsigned long start = LDT_BASE_ADDR;
	unsigned long end = LDT_END_ADDR;

	if (!boot_cpu_has(X86_FEATURE_PTI))
		return;

	tlb_gather_mmu(&tlb, mm, start, end);

seems to be passing kernel addresses to tlb_gather_mmu(), which will cause
the range adjusment logic in __tlb_adjust_range() to round the base down
to TASK_SIZE afaict. At which point, I suspect the low-level invalidation
routine replaces the enormous range with a fullmm flush (see the check in
flush_tlb_mm_range()).

If that's the case (and I would appreciate some input from somebody who
knows what an LDT is), then I think the right answer is to replace this with
a call to tlb_gather_mmu_fullmm, although I haven't ever anticipated these
things working on kernel addresses and whether that would do the right kind
of invalidation for x86 w/ PTI. A quick read of the code suggests it should
work out...

Will