[PATCH v2 0/4] kasan: Fix ordering between MTE tag colouring and page->flags
Kuan-Ying Lee (李冠穎)
Kuan-Ying.Lee at mediatek.com
Wed Feb 1 21:25:00 PST 2023
On Fri, 2022-06-10 at 16:21 +0100, Catalin Marinas wrote:
> Hi,
>
> That's a second attempt on fixing the race race between setting the
> allocation (in-memory) tags in a page and the corresponding logical
> tag
> in page->flags. Initial version here:
>
>
https://lore.kernel.org/r/20220517180945.756303-1-catalin.marinas@arm.com
>
> This new series does not introduce any new GFP flags but instead
> always
> skips unpoisoning of the user pages (we already skip the poisoning on
> free). Any unpoisoned page will have the page->flags tag reset.
>
> For the background:
>
> On a system with MTE and KASAN_HW_TAGS enabled, when a page is
> allocated
> kasan_unpoison_pages() sets a random tag and saves it in page->flags
> so
> that page_to_virt() re-creates the correct tagged pointer. We need to
> ensure that the in-memory tags are visible before setting the
> page->flags:
>
> P0 (__kasan_unpoison_range): P1 (access via virt_to_page):
> Wtags=x Rflags=x
> | |
> | DMB | address dependency
> V V
> Wflags=x Rtags=x
>
> The first patch changes the order of page unpoisoning with the tag
> storing in page->flags. page_kasan_tag_set() has the right barriers
> through try_cmpxchg().
>
> If a page is mapped in user-space with PROT_MTE, the architecture
> code
> will set the allocation tag to 0 and a subsequent page_to_virt()
> dereference will fault. We currently try to fix this by resetting the
> tag in page->flags so that it is 0xff (match-all, not faulting).
> However, setting the tags and flags can race with another CPU reading
> the flags (page_to_virt()) and barriers can't help, e.g.:
>
> P0 (mte_sync_page_tags): P1 (memcpy from virt_to_page):
> Rflags!=0xff
> Wflags=0xff
> DMB (doesn't help)
> Wtags=0
> Rtags=0 // fault
>
> Since clearing the flags in the arch code doesn't work, to do this at
> page allocation time when __GFP_SKIP_KASAN_UNPOISON is passed.
>
> Thanks.
>
> Catalin Marinas (4):
> mm: kasan: Ensure the tags are visible before the tag in page-
> >flags
> mm: kasan: Skip unpoisoning of user pages
> mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON
> arm64: kasan: Revert "arm64: mte: reset the page tag in page-
> >flags"
>
> arch/arm64/kernel/hibernate.c | 5 -----
> arch/arm64/kernel/mte.c | 9 ---------
> arch/arm64/mm/copypage.c | 9 ---------
> arch/arm64/mm/fault.c | 1 -
> arch/arm64/mm/mteswap.c | 9 ---------
> include/linux/gfp.h | 2 +-
> mm/kasan/common.c | 3 ++-
> mm/page_alloc.c | 19 ++++++++++---------
> 8 files changed, 13 insertions(+), 44 deletions(-)
>
Hi kasan maintainers,
We hit the following issue on the android-6.1 devices with MTE and HW
tag kasan enabled.
I observe that the anon flag doesn't have skip_kasan_poison and
skip_kasan_unpoison flag and kasantag is weird.
AFAIK, kasantag of anon flag needs to be 0x0.
[ 71.953938] [T1403598] FramePolicy:
[name:report&]=========================================================
=========
[ 71.955305] [T1403598] FramePolicy: [name:report&]BUG: KASAN:
invalid-access in copy_page+0x10/0xd0
[ 71.956476] [T1403598] FramePolicy: [name:report&]Read at addr
f0ffff81332a8000 by task FramePolicy/3598
[ 71.957673] [T1403598] FramePolicy: [name:report_hw_tags&]Pointer
tag: [f0], memory tag: [ff]
[ 71.958746] [T1403598] FramePolicy: [name:report&]
[ 71.959354] [T1403598] FramePolicy: CPU: 4 PID: 3598 Comm:
FramePolicy Tainted: G S W OE 6.1.0-mainline-android14-0-
ga8a53f83b9e4 #1
[ 71.960978] [T1403598] FramePolicy: Hardware name: MT6985(ENG) (DT)
[ 71.961767] [T1403598] FramePolicy: Call trace:
[ 71.962338] [T1403598] FramePolicy: dump_backtrace+0x108/0x158
[ 71.963097] [T1403598] FramePolicy: show_stack+0x20/0x48
[ 71.963782] [T1403598] FramePolicy: dump_stack_lvl+0x6c/0x88
[ 71.964512] [T1403598] FramePolicy: print_report+0x2cc/0xa64
[ 71.965263] [T1403598] FramePolicy: kasan_report+0xb8/0x138
[ 71.965986] [T1403598] FramePolicy: __do_kernel_fault+0xd4/0x248
[ 71.966782] [T1403598] FramePolicy: do_bad_area+0x38/0xe8
[ 71.967484] [T1403598] FramePolicy: do_tag_check_fault+0x24/0x38
[ 71.968261] [T1403598] FramePolicy: do_mem_abort+0x48/0xb0
[ 71.968973] [T1403598] FramePolicy: el1_abort+0x44/0x68
[ 71.969646] [T1403598] FramePolicy: el1h_64_sync_handler+0x68/0xb8
[ 71.970440] [T1403598] FramePolicy: el1h_64_sync+0x68/0x6c
[ 71.971146] [T1403598] FramePolicy: copy_page+0x10/0xd0
[ 71.971824] [T1403598] FramePolicy: copy_user_highpage+0x20/0x40
[ 71.972603] [T1403598] FramePolicy: wp_page_copy+0xd0/0x9f8
[ 71.973344] [T1403598] FramePolicy: do_wp_page+0x374/0x3b0
[ 71.974056] [T1403598] FramePolicy: handle_mm_fault+0x3ec/0x119c
[ 71.974833] [T1403598] FramePolicy: do_page_fault+0x344/0x4ac
[ 71.975583] [T1403598] FramePolicy: do_mem_abort+0x48/0xb0
[ 71.976294] [T1403598] FramePolicy: el0_da+0x4c/0xe0
[ 71.976934] [T1403598] FramePolicy: el0t_64_sync_handler+0xd4/0xfc
[ 71.977725] [T1403598] FramePolicy: el0t_64_sync+0x1a0/0x1a4
[ 71.978451] [T1403598] FramePolicy: [name:report&]
[ 71.979057] [T1403598] FramePolicy: [name:report&]The buggy address
belongs to the physical page:
[ 71.980173] [T1403598] FramePolicy:
[name:debug&]page:fffffffe04ccaa00 refcount:14 mapcount:13
mapping:0000000000000000 index:0x7884c74 pfn:0x1732a8
[ 71.981849] [T1403598] FramePolicy:
[name:debug&]memcg:faffff80c0241000
[ 71.982680] [T1403598] FramePolicy: [name:debug&]anon flags:
0x43c000000048003e(referenced|uptodate|dirty|lru|active|swapbacked|arch
_2|zone=1|kasantag=0xf)
[ 71.984446] [T1403598] FramePolicy: raw: 43c000000048003e
fffffffe04b99648 fffffffe04cca308 f2ffff8103390831
[ 71.985684] [T1403598] FramePolicy: raw: 0000000007884c74
0000000000000000 0000000e0000000c faffff80c0241000
[ 71.986919] [T1403598] FramePolicy: [name:debug&]page dumped
because: kasan: bad access detected
[ 71.988022] [T1403598] FramePolicy: [name:report&]
[ 71.988624] [T1403598] FramePolicy: [name:report&]Memory state
around the buggy address:
[ 71.989641] [T1403598] FramePolicy: ffffff81332a7e00: fe fe fe fe
fe fe fe fe fe fe fe fe fe fe fe fe
[ 71.990811] [T1403598] FramePolicy: ffffff81332a7f00: fe fe fe fe
fe fe fe fe fe fe fe fe fe fe fe fe
[ 71.991982] [T1403598] FramePolicy: >ffffff81332a8000: ff ff ff ff
f0 f0 fc fc fc fc fc fc fc f0 f0 f3
[ 71.993149] [T1403598] FramePolicy:
[name:report&] ^
[ 71.993972] [T1403598] FramePolicy: ffffff81332a8100: f3 f3 f3 f3
f3 f3 f0 f0 f8 f8 f8 f8 f8 f8 f8 f0
[ 71.995141] [T1403598] FramePolicy: ffffff81332a8200: f0 fb fb fb
fb fb fb fb f0 f0 fe fe fe fe fe fe
[ 71.996332] [T1403598] FramePolicy:
[name:report&]=========================================================
=========
Originally, I suspect that some userspace pages have been migrated so
the page->flags will be lost and page->flags is re-generated by
alloc_pages().
I try the following diff, but it didn't help.
diff --git a/mm/migrate.c b/mm/migrate.c
index dff333593a8a..ed2065908418 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -51,6 +51,7 @@
#include <linux/random.h>
#include <linux/sched/sysctl.h>
#include <linux/memory-tiers.h>
+#include <linux/kasan.h>
#include <asm/tlbflush.h>
@@ -611,6 +612,14 @@ void folio_migrate_flags(struct folio *newfolio,
struct folio *folio)
if (!folio_test_hugetlb(folio))
mem_cgroup_migrate(folio, newfolio);
+
+#ifdef CONFIG_KASAN_HW_TAGS
+ if (kasan_hw_tags_enabled()) {
+ if (folio_test_skip_kasan_poison(folio))
+ folio_set_skip_kasan_poison(newfolio);
+ page_kasan_tag_set(&newfolio->page,
page_kasan_tag(&folio->page));
+ }
+#endif
}
EXPORT_SYMBOL(folio_migrate_flags);
After I revert this patchset (4 patches), this issue disappear.
>
More information about the linux-arm-kernel
mailing list