[PATCH v2 0/4] kasan: Fix ordering between MTE tag colouring and page->flags

Andrey Konovalov andreyknvl at gmail.com
Thu Feb 2 04:59:29 PST 2023


On Thu, Feb 2, 2023 at 6:25 AM Kuan-Ying Lee (李冠穎)
<Kuan-Ying.Lee at mediatek.com> wrote:
>
> On Fri, 2022-06-10 at 16:21 +0100, Catalin Marinas wrote:
> > Hi,
> >
> > That's a second attempt on fixing the race race between setting the
> > allocation (in-memory) tags in a page and the corresponding logical
> > tag
> > in page->flags. Initial version here:
> >
> >
> https://lore.kernel.org/r/20220517180945.756303-1-catalin.marinas@arm.com
> >
> > This new series does not introduce any new GFP flags but instead
> > always
> > skips unpoisoning of the user pages (we already skip the poisoning on
> > free). Any unpoisoned page will have the page->flags tag reset.
> >
> > For the background:
> >
> > On a system with MTE and KASAN_HW_TAGS enabled, when a page is
> > allocated
> > kasan_unpoison_pages() sets a random tag and saves it in page->flags
> > so
> > that page_to_virt() re-creates the correct tagged pointer. We need to
> > ensure that the in-memory tags are visible before setting the
> > page->flags:
> >
> > P0 (__kasan_unpoison_range):    P1 (access via virt_to_page):
> >   Wtags=x                         Rflags=x
> >     |                               |
> >     | DMB                           | address dependency
> >     V                               V
> >   Wflags=x                        Rtags=x
> >
> > The first patch changes the order of page unpoisoning with the tag
> > storing in page->flags. page_kasan_tag_set() has the right barriers
> > through try_cmpxchg().
> >
> > If a page is mapped in user-space with PROT_MTE, the architecture
> > code
> > will set the allocation tag to 0 and a subsequent page_to_virt()
> > dereference will fault. We currently try to fix this by resetting the
> > tag in page->flags so that it is 0xff (match-all, not faulting).
> > However, setting the tags and flags can race with another CPU reading
> > the flags (page_to_virt()) and barriers can't help, e.g.:
> >
> > P0 (mte_sync_page_tags):        P1 (memcpy from virt_to_page):
> >                                   Rflags!=0xff
> >   Wflags=0xff
> >   DMB (doesn't help)
> >   Wtags=0
> >                                   Rtags=0   // fault
> >
> > Since clearing the flags in the arch code doesn't work, to do this at
> > page allocation time when __GFP_SKIP_KASAN_UNPOISON is passed.
> >
> > Thanks.
> >
> > Catalin Marinas (4):
> >   mm: kasan: Ensure the tags are visible before the tag in page-
> > >flags
> >   mm: kasan: Skip unpoisoning of user pages
> >   mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
> >   arm64: kasan: Revert "arm64: mte: reset the page tag in page-
> > >flags"
> >
> >  arch/arm64/kernel/hibernate.c |  5 -----
> >  arch/arm64/kernel/mte.c       |  9 ---------
> >  arch/arm64/mm/copypage.c      |  9 ---------
> >  arch/arm64/mm/fault.c         |  1 -
> >  arch/arm64/mm/mteswap.c       |  9 ---------
> >  include/linux/gfp.h           |  2 +-
> >  mm/kasan/common.c             |  3 ++-
> >  mm/page_alloc.c               | 19 ++++++++++---------
> >  8 files changed, 13 insertions(+), 44 deletions(-)
> >
>
> Hi kasan maintainers,
>
> We hit the following issue on the android-6.1 devices with MTE and HW
> tag kasan enabled.
>
> I observe that the anon flag doesn't have skip_kasan_poison and
> skip_kasan_unpoison flag and kasantag is weird.
>
> AFAIK, kasantag of anon flag needs to be 0x0.
>
> [   71.953938] [T1403598] FramePolicy:
> [name:report&]=========================================================
> =========
> [   71.955305] [T1403598] FramePolicy: [name:report&]BUG: KASAN:
> invalid-access in copy_page+0x10/0xd0
> [   71.956476] [T1403598] FramePolicy: [name:report&]Read at addr
> f0ffff81332a8000 by task FramePolicy/3598
> [   71.957673] [T1403598] FramePolicy: [name:report_hw_tags&]Pointer
> tag: [f0], memory tag: [ff]
> [   71.958746] [T1403598] FramePolicy: [name:report&]
> [   71.959354] [T1403598] FramePolicy: CPU: 4 PID: 3598 Comm:
> FramePolicy Tainted: G S      W  OE      6.1.0-mainline-android14-0-
> ga8a53f83b9e4 #1
> [   71.960978] [T1403598] FramePolicy: Hardware name: MT6985(ENG) (DT)
> [   71.961767] [T1403598] FramePolicy: Call trace:
> [   71.962338] [T1403598] FramePolicy:  dump_backtrace+0x108/0x158
> [   71.963097] [T1403598] FramePolicy:  show_stack+0x20/0x48
> [   71.963782] [T1403598] FramePolicy:  dump_stack_lvl+0x6c/0x88
> [   71.964512] [T1403598] FramePolicy:  print_report+0x2cc/0xa64
> [   71.965263] [T1403598] FramePolicy:  kasan_report+0xb8/0x138
> [   71.965986] [T1403598] FramePolicy:  __do_kernel_fault+0xd4/0x248
> [   71.966782] [T1403598] FramePolicy:  do_bad_area+0x38/0xe8
> [   71.967484] [T1403598] FramePolicy:  do_tag_check_fault+0x24/0x38
> [   71.968261] [T1403598] FramePolicy:  do_mem_abort+0x48/0xb0
> [   71.968973] [T1403598] FramePolicy:  el1_abort+0x44/0x68
> [   71.969646] [T1403598] FramePolicy:  el1h_64_sync_handler+0x68/0xb8
> [   71.970440] [T1403598] FramePolicy:  el1h_64_sync+0x68/0x6c
> [   71.971146] [T1403598] FramePolicy:  copy_page+0x10/0xd0
> [   71.971824] [T1403598] FramePolicy:  copy_user_highpage+0x20/0x40
> [   71.972603] [T1403598] FramePolicy:  wp_page_copy+0xd0/0x9f8
> [   71.973344] [T1403598] FramePolicy:  do_wp_page+0x374/0x3b0
> [   71.974056] [T1403598] FramePolicy:  handle_mm_fault+0x3ec/0x119c
> [   71.974833] [T1403598] FramePolicy:  do_page_fault+0x344/0x4ac
> [   71.975583] [T1403598] FramePolicy:  do_mem_abort+0x48/0xb0
> [   71.976294] [T1403598] FramePolicy:  el0_da+0x4c/0xe0
> [   71.976934] [T1403598] FramePolicy:  el0t_64_sync_handler+0xd4/0xfc
> [   71.977725] [T1403598] FramePolicy:  el0t_64_sync+0x1a0/0x1a4
> [   71.978451] [T1403598] FramePolicy: [name:report&]
> [   71.979057] [T1403598] FramePolicy: [name:report&]The buggy address
> belongs to the physical page:
> [   71.980173] [T1403598] FramePolicy:
> [name:debug&]page:fffffffe04ccaa00 refcount:14 mapcount:13
> mapping:0000000000000000 index:0x7884c74 pfn:0x1732a8
> [   71.981849] [T1403598] FramePolicy:
> [name:debug&]memcg:faffff80c0241000
> [   71.982680] [T1403598] FramePolicy: [name:debug&]anon flags:
> 0x43c000000048003e(referenced|uptodate|dirty|lru|active|swapbacked|arch
> _2|zone=1|kasantag=0xf)
> [   71.984446] [T1403598] FramePolicy: raw: 43c000000048003e
> fffffffe04b99648 fffffffe04cca308 f2ffff8103390831
> [   71.985684] [T1403598] FramePolicy: raw: 0000000007884c74
> 0000000000000000 0000000e0000000c faffff80c0241000
> [   71.986919] [T1403598] FramePolicy: [name:debug&]page dumped
> because: kasan: bad access detected
> [   71.988022] [T1403598] FramePolicy: [name:report&]
> [   71.988624] [T1403598] FramePolicy: [name:report&]Memory state
> around the buggy address:
> [   71.989641] [T1403598] FramePolicy:  ffffff81332a7e00: fe fe fe fe
> fe fe fe fe fe fe fe fe fe fe fe fe
> [   71.990811] [T1403598] FramePolicy:  ffffff81332a7f00: fe fe fe fe
> fe fe fe fe fe fe fe fe fe fe fe fe
> [   71.991982] [T1403598] FramePolicy: >ffffff81332a8000: ff ff ff ff
> f0 f0 fc fc fc fc fc fc fc f0 f0 f3
> [   71.993149] [T1403598] FramePolicy:
> [name:report&]                   ^
> [   71.993972] [T1403598] FramePolicy:  ffffff81332a8100: f3 f3 f3 f3
> f3 f3 f0 f0 f8 f8 f8 f8 f8 f8 f8 f0
> [   71.995141] [T1403598] FramePolicy:  ffffff81332a8200: f0 fb fb fb
> fb fb fb fb f0 f0 fe fe fe fe fe fe
> [   71.996332] [T1403598] FramePolicy:
> [name:report&]=========================================================
> =========
>
> Originally, I suspect that some userspace pages have been migrated so
> the page->flags will be lost and page->flags is re-generated by
> alloc_pages().

Hi Kuan-Ying,

There recently was a similar crash due to incorrectly implemented sampling.

Do you have the following patch in your tree?

https://android.googlesource.com/kernel/common/+/9f7f5a25f335e6e1484695da9180281a728db7e2

If not, please sync your 6.1 tree with the Android common kernel.
Hopefully this will fix the issue.

Thanks!



More information about the linux-arm-kernel mailing list