[PATCH v5] arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis

Tejas Belagod Tejas.Belagod at arm.com
Fri Jun 25 09:21:07 PDT 2021



> -----Original Message-----
> From: Szabolcs Nagy <Szabolcs.Nagy at arm.com>
> Sent: Friday, June 25, 2021 3:15 PM
> To: Will Deacon <will at kernel.org>
> Cc: Catalin Marinas <Catalin.Marinas at arm.com>; Peter Collingbourne
> <pcc at google.com>; Vincenzo Frascino <Vincenzo.Frascino at arm.com>; Evgenii
> Stepanov <eugenis at google.com>; Linux ARM <linux-arm-
> kernel at lists.infradead.org>; Tejas Belagod <Tejas.Belagod at arm.com>
> Subject: Re: [PATCH v5] arm64: mte: allow async MTE to be upgraded to sync on
> a per-CPU basis
> 
> The 06/25/2021 13:39, Will Deacon wrote:
> > On Fri, Jun 25, 2021 at 01:01:37PM +0100, Catalin Marinas wrote:
> > > Thanks, that's useful. I guess since the _MTAG_ENABLE tunable is not
> > > ABI, the user app can't rely on what the glibc has configured.
> > > Arguably, since it's driven from outside the application (env), we
> > > could say the same for sysfs, though for the glibc case, the user
> > > app is still be able to override it before the first thread is
> > > created (or per-thread). I assume glibc only issues the prctl() once, not for
> every new thread.
> 
> note: in the end the tunable is like
> 
> GLIBC_TUNABLES=glibc.mem.tagging=3 ./exe
> 
> not _MTAG_ENABLE.
> 
> and yes the setting comes from outside and glibc calls prctl once.
> 
> > > So we can document that the mode requested by the app is an
> > > indication, the system may change it to another value (and back-port
> > > documentation to 5.10). If we get a request from developers to
> > > honour a specific mode, we can add a new PR_MTE_TCF_EXACT bit or
> > > something but it's not essential we do it now.
> > >
> > > So if we allow the kernel to change the user requested mode (via
> > > sysfs), I think we still have two more issues to clarify:
> > >
> > > 1. Do we allow only "upgrade" (for some meaning of this) or sysfs can
> > >    downgrade to a less strict mode. I'd go for upgrade here to a
> > >    stricter check as in Peter's patch.
> > >
> > > 2. Should the sysfs upgrade the PR_MTE_TCF_NONE? _MTAG_ENABLE does
> that,
> > >    so I'd say yes.
> > >
> > > Any other thoughts are welcome.
> >
> > As I mentioned before, I think the sysfs interface should offer:
> >
> > 	"task"	: Honour whatever the task has asked for (default)
> > 	"async" : Force async on this CPU
> > 	"sync"  : Force sync on this CPU
> >
> > I don't think we should upgrade PR_MTE_TCF_NONE unless we also have a
> "none"
> > option in here. I originally suggested that, but in hindsight it feels
> > like a bad idea because a task could SIGILL on migration. So what
> > we're saying is that PR_MTE_TCF_SYNC and PR_MTE_TCF_ASYNC will always
> > enable MTE on success, but the reporting mode is a hint.
> >
> > I don't think upgrade/downgrade makes a lot of sense given that the
> > sysfs controls can be changed at any point in time. It should just be an
> override.
> >
> > This means that we can force async for CPUs where sync mode is
> > horribly slow, whilst honouring the task's request on CPUs which are
> > better implemented.
> 
> i think a user should be able to ask for sync check mode for a process and get an
> error if that's not possible.
> 
> at least this is the semantics that makes sense in glibc. i think it's very confusing
> if somebody explicitly asks for sync checks to debug something but then gets
> useless diagnostics because somebody else tried to second guess their
> performance tradeoff preferences. (if sync check is too slow on a cpu then the
> user can taskset to a cpu that's not slow or just use other debugging method,
> silent override sounds bad.)

Sorry I'm late to the party - just catching up with this thread.

I'm not a kernel/glibc expert so please correct me if I'm wrong here - I see two themes in this discussion - the usage model between user/system of the TCF mode and its implementation.

 From a user's perspective, they should be able to get the TCF mode they asked for and get atleast a warning if that's not possible for whatever reason(performance et al). From a system/kernel's perspective, if the system wants to use MTE, it should be able to provide the most performant(or most strict) TCF mode/per CPU as it sees fit - user's 'task' setting in sysfs/GLIBC's env will override it. Can a user-setting downgrade a system-setting TCF level is another question - probably not, so the user will need to be warned against it. So what happens if a process with TCF_NONE migrates from a CPU with TCF_NONE to TCF_ASYNC - should the kernel control process/CPU migrations according to the TCF setting? Won't this affect performance anyway by increasing contention?

Implementation-wise, the sysfs interface of 'task', 'async', 'sync' seems to make sense to me as it fits in well if we use the above as a guiding principle. 

Thanks,
Tejas.


More information about the linux-arm-kernel mailing list