[PATCH] arm64: mte: allow async MTE to be upgraded to sync on a per-CPU basis

Tue Jun 8 14:55:49 PDT 2021

On Tue, Jun 8, 2021 at 12:54 PM Peter Collingbourne <pcc at google.com> wrote:
>
> On Tue, Jun 8, 2021 at 7:44 AM Catalin Marinas <catalin.marinas at arm.com> wrote:
> >
> > On Thu, Jun 03, 2021 at 10:49:24AM -0700, Peter Collingbourne wrote:
> > > On Thu, Jun 3, 2021 at 6:01 AM Vincenzo Frascino
> > > <vincenzo.frascino at arm.com> wrote:
> > > > On 6/3/21 12:24 AM, Peter Collingbourne wrote:
> > > > > On some CPUs the performance of MTE in synchronous mode is the same
> > > > > as that of asynchronous mode. This makes it worthwhile to enable
> > > > > synchronous mode on those CPUs when asynchronous mode is requested,
> > > > > in order to gain the error detection benefits of synchronous mode
> > > > > without the performance downsides. Therefore, make it possible for CPUs
> > > > > to opt into upgrading to synchronous mode via a new mte-prefer-sync
> > > > > device tree attribute.
> > > > >
> > > >
> > > > I had a look at your patch and I think that there are few points that are worth
> > > > mentioning:
> > > > 1) The approach you are using is per-CPU hence we might end up with a system
> > > > that has some PE configured as sync and some configured as async. We currently
> > > > support only a system wide setting.
> > >
> > > This is the intent. On e.g. a big/little system this means that we
> > > would effectively have sampling of sync MTE faults at a higher rate
> > > than a pure userspace implementation could achieve, at zero cost.
> > >
> > > > 2) async and sync have slightly different semantics (e.g. in sync mode the
> > > > access does not take place and it requires emulation) this means that a mixed
> > > > configuration affects the ABI.
> > >
> > > We considered the ABI question and think that is somewhat academic.
> > > While it's true that we would prevent the first access from succeeding
> > > (and, more visibly, use SEGV_MTESERR in the signal rather than
> > > SEGV_MTEAERR) I'm not aware of a reasonable way that a userspace
> > > program could depend on the access succeeding.
> >
> > It's more about whether some software relies on the async mode only for
> > logging without any intention of handling the synchronous faults. In
> > such scenario, the async signal is fairly simple, it logs and continues
> > safely (well, as "safe" as before MTE). With the sync mode, however, the
> > signal handler will have to ensure the access took place either before
> > continuing, either by emulating, restarting the instruction with
> > PSTATE.TCO set or by falling back to the async mode.
> >
> > IOW, I don't expect all programs making use of MTE to simply die on an
> > MTE fault (though I guess they'll be in a minority but we still need to
> > allow such scenarios).
>
> Okay, allowing such programs seems reasonable to me. The question is
> then whether the new behavior should be an opt-in or an opt-out.
>
> > > While it's slightly
> > > more plausible that there could be a dependency on the signal type, we
> > > don't depend on that in Android, at least not in a way that would lead
> > > to worse outcomes if we get MTESERR instead of MTEAERR (it would lead
> > > to better outcomes, in the form of a more accurate/detailed crash
> > > report, which is what motivates this change). I also checked glibc and
> > > they don't appear to have any dependencies on the signal type, or
> > > indeed have any detailed crash reporting at all as far as I can tell.
> > > Furthermore, basically nobody has hardware at the moment so I don't
> > > think we would be breaking any actual users by doing this.
> >
> > While there's no user-space out there yet, given that MTE was merged in
> > 5.10 and that's an LTS kernel, we'll see software running on this kernel
> > version at some point in the next few years. So any fix will need
> > backporting but an ABI change for better performance on specific SoCs
> > hardly counts as a fix.
>
> Okay, seems reasonable enough.
>
> > I'm ok with improving the ABI but not breaking the current one, hence
> > the suggestion for a new PR_MTE_TCF_* field or maybe a new bit that
> > allows the mode to be "upgraded", for some definition of this (for
> > example, if TCF_NONE is as fast as TCF_ASYNC, should NONE be upgraded?)
>
> I think the possibility of upgrading NONE to ASYNC is a good reason to
> make this an opt-in, since it alters the behavior in a more
> significant way than ASYNC to SYNC.
>
> > > > 3) In your patch you use DT to enforce sync mode on a CPU, probably it is better
> > > > to have an MIDR scheme to mark these CPUs.
> > >
> > > Okay, so in your scheme we would say that e.g. all Cortex-A510 CPUs
> > > should be subject to this treatment. Can we guarantee that all
> > > Cortex-A510 CPUs would have the same performance for sync and async or
> > > could the system designer tweak some aspect of the system such that
> > > they could get different performance? The possibility of the latter is
> > > what led me to specify the information via DT.
> >
> > While it's more of a CPU microarchitecture issue, there's indeed a good
> > chance that the SoC has a non-trivial impact on the performance of the
> > synchronous mode, so it may tip the balance one way or another.
> >
> > Another idea would be to introduce a PR_MTE_TCF_DEFAULT mode together
> > with some /sys/*/cpu*/ entries to control the default MTE mode per CPU.
> > We'd leave it entirely to user-space (privileged user), maybe it even
> > wants to run some checks before tuning the default mode per CPU.
>
> That's an interesting idea, but it sounds like it wouldn't permit
> upgrading from NONE as a separate feature from upgrading from ASYNC.

For the userspace API, I like the idea of a new bit flag with the
meaning "upgrade to a stricter mode with similar performance".
On a hypothetical CPU where SYNC mode has negligible overhead this
could upgrade NONE all the way to SYNC.
It would also allow transparent opt-in to any future modes, like asymmetric.

A /sys interface in addition to DT would be nice to have, but with
this API we already have an escape hatch in the userspace - remove the
bit, or even have libc forcibly remove the bit for all prctl calls.

>
> > I'm pretty sure the sync vs async decision is not a clear cut (e.g. sync
> > may still be slower but within a tolerable margin for certain
> > benchmarks). Leaving the decision to the hardware vendor, hard-coded in
> > the DT, is probably not the best option (nor is the MIDR). Some future
> > benchmark with a weird memory access pattern could expose slowness in
> > the sync mode and vendors will scramble on how to change the DT already
> > deployed (maybe they can do this OTA already).
>
> Perhaps we can let the default upgrading settings be specified via DT,
> and allow them to be overridden later via sysfs. That seems like the
> best of both worlds, since I think that in most cases DT should be
> enough, while the ability to override should satisfy the remaining
> cases. It also means that ACPI users have a way to use this feature
> until it gets added to the spec.
>
> By the way, if we want to allow upgrading from NONE in the future, or
> upgrading to asymmetric mode, perhaps we should spell the option
> "mte-upgrade-async" and have it take an argument in the range 0-3
> (corresponding to the SCTLR_EL1.TCF bits) specifying the desired mode
> to upgrade to. Then we could have e.g. "mte-upgrade-none" later with
> the same semantics.
>
> Peter