[PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context

Ard Biesheuvel ardb at kernel.org
Thu Dec 15 03:48:19 PST 2022


On Thu, 15 Dec 2022 at 11:51, Russell King (Oracle)
<linux at armlinux.org.uk> wrote:
>
> On Thu, Dec 15, 2022 at 11:43:22AM +0100, Ard Biesheuvel wrote:
> > On Thu, 15 Dec 2022 at 11:27, Linus Walleij <linus.walleij at linaro.org> wrote:
> > >
> > > On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <ardb at kernel.org> wrote:
> > >
> > > > We currently only permit kernel mode NEON in process context, to avoid
> > > > the need to preserve/restore the NEON register file when taking an
> > > > exception while running in the kernel.
> > > >
> > > > Like we did on arm64, we can relax this restriction substantially, by
> > > > permitting kernel mode NEON from softirq context, while ensuring that
> > > > softirq processing is disabled when the NEON is being used in task
> > > > context. This guarantees that only NEON context belonging to user space
> > > > needs to be preserved and restored, which is already taken care of.
> > > >
> > > > This is especially relevant for network encryption, where incoming
> > > > frames are typically handled in softirq context, and deferring software
> > > > decryption to a kernel thread or falling back to C code are both
> > > > undesirable from a performance PoV.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> > >
> > > So boosting WireGuard as primary SW network encryption user?
> >
> > Essentially, although the use case that inspired this work is related
> > to IPsec not WireGuard, and the crypto algorithm in that case (GCM) is
> > ~3x faster than WG's chacha20poly1305, which makes the performance
> > overhead of asynchronous completion even more significant. (Note that
> > GCM needs the AES and PMULL instructions which are usually only
> > available when running the 32-bit kernel on a 64-bit core, whereas
> > chacha20poly1305 uses ordinary NEON instructions.)
> >
> > But Martin responded with a Tested-by regarding chacha20poly1305 on
> > IPsec (not WG) where there is also a noticeable speedup, so WG on
> > ARM32 should definitely benefit from this as well.
>
> It'll be interesting to see whether there is any noticable difference
> with my WG VPN.
>

Using WireGuard with the same 32-bit KVM guest communicating with its
64-bit host using virtio-net, I get a 44% speedup in the host->guest
direction. The other direction performs exactly the same, which is
unsurprising as it doesn't involve NEON crypto in softirq context at
all.

BEFORE
======

ardb at vm32:~$ iperf3 -c 192.168.11.2
Connecting to host 192.168.11.2, port 5201
[  5] local 192.168.11.1 port 40144 connected to 192.168.11.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  25.8 MBytes   216 Mbits/sec    0    397 KBytes
[  5]   1.00-2.00   sec  25.9 MBytes   217 Mbits/sec    0    397 KBytes
[  5]   2.00-3.00   sec  27.0 MBytes   226 Mbits/sec    0    397 KBytes
[  5]   3.00-4.00   sec  26.5 MBytes   222 Mbits/sec    0    397 KBytes
[  5]   4.00-5.00   sec  26.2 MBytes   220 Mbits/sec    0    397 KBytes
[  5]   5.00-6.00   sec  26.1 MBytes   219 Mbits/sec    0    436 KBytes
[  5]   6.00-7.00   sec  26.2 MBytes   220 Mbits/sec    0    458 KBytes
[  5]   7.00-8.00   sec  26.2 MBytes   220 Mbits/sec    0    458 KBytes
[  5]   8.00-9.00   sec  26.5 MBytes   222 Mbits/sec    0    480 KBytes
[  5]   9.00-10.00  sec  26.9 MBytes   225 Mbits/sec    0    480 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   263 MBytes   221 Mbits/sec    0             sender
[  5]   0.00-10.00  sec   262 MBytes   220 Mbits/sec                  receiver


ardb at sudo:~$ iperf3 -c 192.168.11.1
Connecting to host 192.168.11.1, port 5201
[  5] local 192.168.11.2 port 46340 connected to 192.168.11.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  47.5 MBytes   398 Mbits/sec    0   1.75 MBytes
[  5]   1.00-2.00   sec  45.0 MBytes   377 Mbits/sec   18   1.35 MBytes
[  5]   2.00-3.00   sec  43.8 MBytes   367 Mbits/sec    0   1.47 MBytes
[  5]   3.00-4.00   sec  45.0 MBytes   377 Mbits/sec    0   1.56 MBytes
[  5]   4.00-5.00   sec  45.0 MBytes   377 Mbits/sec    0   1.63 MBytes
[  5]   5.00-6.00   sec  42.5 MBytes   357 Mbits/sec    0   1.68 MBytes
[  5]   6.00-7.00   sec  43.8 MBytes   367 Mbits/sec    0   1.71 MBytes
[  5]   7.00-8.00   sec  43.8 MBytes   367 Mbits/sec    0   1.73 MBytes
[  5]   8.00-9.00   sec  45.0 MBytes   377 Mbits/sec    0   1.74 MBytes
[  5]   9.00-10.00  sec  43.8 MBytes   367 Mbits/sec    0   1.75 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   445 MBytes   373 Mbits/sec   18             sender
[  5]   0.00-10.04  sec   444 MBytes   371 Mbits/sec                  receiver

iperf Done.


AFTER
=====

ardb at vm32:~$ iperf3 -c 192.168.11.2
Connecting to host 192.168.11.2, port 5201
[  5] local 192.168.11.1 port 44004 connected to 192.168.11.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  26.2 MBytes   220 Mbits/sec    0    399 KBytes
[  5]   1.00-2.00   sec  25.9 MBytes   217 Mbits/sec    0    399 KBytes
[  5]   2.00-3.00   sec  26.0 MBytes   218 Mbits/sec    0    444 KBytes
[  5]   3.00-4.00   sec  26.8 MBytes   225 Mbits/sec    0    485 KBytes
[  5]   4.00-5.00   sec  26.4 MBytes   222 Mbits/sec    0    542 KBytes
[  5]   5.00-6.00   sec  26.6 MBytes   223 Mbits/sec    0    568 KBytes
[  5]   6.00-7.00   sec  25.4 MBytes   213 Mbits/sec    0    568 KBytes
[  5]   7.00-8.00   sec  25.9 MBytes   217 Mbits/sec    0    568 KBytes
[  5]   8.00-9.00   sec  26.7 MBytes   224 Mbits/sec    0    568 KBytes
[  5]   9.00-10.00  sec  25.9 MBytes   217 Mbits/sec    0    568 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   262 MBytes   220 Mbits/sec    0             sender
[  5]   0.00-9.99   sec   261 MBytes   219 Mbits/sec                  receiver

iperf Done.

ardb at sudo:~$ iperf3 -c 192.168.11.1
Connecting to host 192.168.11.1, port 5201
[  5] local 192.168.11.2 port 49838 connected to 192.168.11.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  61.2 MBytes   514 Mbits/sec    0   1.59 MBytes
[  5]   1.00-2.00   sec  66.2 MBytes   555 Mbits/sec    0   1.67 MBytes
[  5]   2.00-3.00   sec  65.0 MBytes   545 Mbits/sec   79   1.24 MBytes
[  5]   3.00-4.00   sec  63.8 MBytes   535 Mbits/sec    0   1.36 MBytes
[  5]   4.00-5.00   sec  63.8 MBytes   535 Mbits/sec    0   1.46 MBytes
[  5]   5.00-6.00   sec  63.8 MBytes   535 Mbits/sec    0   1.53 MBytes
[  5]   6.00-7.00   sec  62.5 MBytes   524 Mbits/sec    0   1.59 MBytes
[  5]   7.00-8.00   sec  65.0 MBytes   545 Mbits/sec   99   1.18 MBytes
[  5]   8.00-9.00   sec  65.0 MBytes   545 Mbits/sec    0   1.25 MBytes
[  5]   9.00-10.00  sec  65.0 MBytes   545 Mbits/sec    0   1.30 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   641 MBytes   538 Mbits/sec  178             sender
[  5]   0.00-10.02  sec   638 MBytes   535 Mbits/sec                  receiver

iperf Done.



More information about the linux-arm-kernel mailing list