[RFC PATCH v2 0/2] Simplify kernel-mode NEON

Dave Martin Dave.Martin at arm.com
Wed May 24 08:54:32 PDT 2017


On Wed, May 24, 2017 at 08:29:06AM -0700, Ard Biesheuvel wrote:
> On 24 May 2017 at 07:42, Dave Martin <Dave.Martin at arm.com> wrote:
> > This series aims to simplify kernel-mode NEON.
> >
> > The main motivation for these changes is that supporting kernel-mode
> > NEON alongside SVE is tricky with the current framework: the current
> > partial save/restore mechanisms would need additional porting to
> > save/restore the extended SVE vector bits, and this renders the cost
> > saving of partial save/restore rather doubtful -- even if not all vector
> > registers are saved, the save/restore cost will still grow with
> > increasing vector length.  We could get the mechanics of this to work in
> > principle, but it doesn't feel like the right thing to do.
> >
> > If we withdraw kernel-mode NEON support for hardirq context, accept some
> > extra softirq latency and disable nesting, then we can simplify the code
> > by always saving the task context directly into task_struct and
> > deferring all restore to ret_to_user.  Previous discussions with Ard
> > Biesheuvel suggest that this is a feasible approach and reasonably
> > aligned with other architectures.
> >
> > The main impact on client code is that callers must check that
> > kernel-mode NEON is usable in the current context and fall back to a
> > non-NEON when necessary.  Ard has already done some work on this. [1]
> >
> > The interesting changes are all in patch 2: the first patch just adds a
> > header inclusion guard that I noted was missing.
> >
> > [1] git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git kernel-mode-neon
> >
> >
> > I've only build-tested so far.
> >
> > Ard, do you have any suggestions for how to test these changes
> > effectively?
> >
> 
> IIRC, a zd1211 based USB wifi stick will use the mac80211 crypto

Do you have one of those?

> routines that execute in softirq context (and I am sure there are
> others). If you run a VPN over that, and only enable a single CPU, I
> would expect most code paths to get exercised when pushing a lot of
> data over that. In userland, you can run something like 'openssl speed
> -evp aes-128-ctr' in a loop to exercise the userland part, although it
> would be better to check the correctness as well (which 'speed' will
> not do for you)

Another approach would be to hack up a softirq irritator that just does

	kernel_neon_begin()

	/* write garbage to FPSIMD regs */

	kernel_neon_end()

alongside some userspace test that will spot corruption.


Writing softirq handlers seems to be a bit of a black art though...
I was wondering whether it would make sense to hack something into
the timer softirq code, but I'm not well clued-up on how that stuff
works.

[...]

Cheers
---Dave




More information about the linux-arm-kernel mailing list