[PATCH 0/3] kasan: Fix ordering between MTE tag colouring and page->flags

Fri May 20 06:01:47 PDT 2022

On Thu, May 19, 2022 at 11:45:04PM +0200, Andrey Konovalov wrote:
> On Tue, May 17, 2022 at 8:09 PM Catalin Marinas <catalin.marinas at arm.com> wrote:
> > That's more of an RFC to get a discussion started. I plan to eventually
> > apply the third patch reverting the page_kasan_tag_reset() calls under
> > arch/arm64 since they don't cover all cases (the race is rare and we
> > haven't hit anything yet but it's possible).
> >
> > On a system with MTE and KASAN_HW_TAGS enabled, when a page is allocated
> > kasan_unpoison_pages() sets a random tag and saves it in page->flags so
> > that page_to_virt() re-creates the correct tagged pointer. We need to
> > ensure that the in-memory tags are visible before setting the
> > page->flags:
> >
> > P0 (__kasan_unpoison_range):    P1 (access via virt_to_page):
> >   Wtags=x                         Rflags=x
> >     |                               |
> >     | DMB                           | address dependency
> >     V                               V
> >   Wflags=x                        Rtags=x
> 
> This is confusing: the paragraph mentions page_to_virt() and the
> diagram - virt_to_page(). I assume it should be page_to_virt().

Yes, it should be page_to_virt().

> alloc_pages(), which calls kasan_unpoison_pages(), has to return
> before page_to_virt() can be called. So they only can race if the tags
> don't get propagated to memory before alloc_pages() returns, right?
> This is why you say that the race is rare?

Yeah, it involves another CPU getting the pfn or page address and trying
to access it before the tags are propagated (not necessarily to DRAM, it
can be some some cache level or they are just stuck in a writebuffer).
It's unlikely but still possible.

See a somewhat related recent memory ordering fix, it was found in
actual testing:

https://git.kernel.org/arm64/c/1d0cb4c8864a

> > If such page is mapped in user-space with PROT_MTE, the architecture
> > code will set the tag to 0 and a subsequent page_to_virt() dereference
> > will fault. We currently try to fix this by resetting the tag in
> > page->flags so that it is 0xff (match-all, not faulting). However,
> > setting the tags and flags can race with another CPU reading the flags
> > (page_to_virt()) and barriers can't help, e.g.:
> >
> > P0 (mte_sync_page_tags):        P1 (memcpy from virt_to_page):
> >                                   Rflags!=0xff
> >   Wflags=0xff
> >   DMB (doesn't help)
> >   Wtags=0
> >                                   Rtags=0   // fault
> 
> So this change, effectively, makes the tag in page->flags for GFP_USER
> pages to be reset at allocation time. And the current approach of
> resetting the tag when the kernel is about to access these pages is
> not good because: 1. it's inconvenient to track all places where this
> should be done and 2. the tag reset can race with page_to_virt() even
> with patch #1 applied. Is my understanding correct?

Yes. Regarding (1), it's pretty impractical. There are some clear places
like copy_user_highpage() where we could untag the page address. In
others others it may not be as simple. We could try to reset the page
flags when we do a get_user_pages() to cover another class. But we still
have swap, page migration that may read a page with a mismatched tag.

> This will reset the tags for all kinds of GFP_USER allocations, not
> only for the ones intended for MAP_ANONYMOUS and RAM-based file
> mappings, for which userspace can set tags, right? This will thus
> weaken in-kernel MTE for pages whose tags can't even be set by
> userspace. Is there a way to deal with this?

That's correct, it will weaken some of the allocations where the user
doesn't care about MTE. And TBH, I'm not sure it covers all cases
either (can we have an anonymous or memfd page mapped in user space that
was not allocated with GFP_USER?).

Another option would be to lock the page but set_pte_at() seems to be
called for pages both locked and unlocked.

Any suggestions are welcomed.

> > Since clearing the flags in the arch code doesn't work, try to do this
> > at page allocation time by a new flag added to GFP_USER. Could we
> > instead add __GFP_SKIP_KASAN_UNPOISON rather than a new flag?
> 
> Why do we need a new flag? Can we just check & GFP_USER instead?

GFP_USER is not a flag as such but a combination of flags, none of which
says explicitly it's meant for user.

-- 
Catalin