[PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context
Ard Biesheuvel
ardb at kernel.org
Thu Dec 15 03:48:19 PST 2022
On Thu, 15 Dec 2022 at 11:51, Russell King (Oracle)
<linux at armlinux.org.uk> wrote:
>
> On Thu, Dec 15, 2022 at 11:43:22AM +0100, Ard Biesheuvel wrote:
> > On Thu, 15 Dec 2022 at 11:27, Linus Walleij <linus.walleij at linaro.org> wrote:
> > >
> > > On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <ardb at kernel.org> wrote:
> > >
> > > > We currently only permit kernel mode NEON in process context, to avoid
> > > > the need to preserve/restore the NEON register file when taking an
> > > > exception while running in the kernel.
> > > >
> > > > Like we did on arm64, we can relax this restriction substantially, by
> > > > permitting kernel mode NEON from softirq context, while ensuring that
> > > > softirq processing is disabled when the NEON is being used in task
> > > > context. This guarantees that only NEON context belonging to user space
> > > > needs to be preserved and restored, which is already taken care of.
> > > >
> > > > This is especially relevant for network encryption, where incoming
> > > > frames are typically handled in softirq context, and deferring software
> > > > decryption to a kernel thread or falling back to C code are both
> > > > undesirable from a performance PoV.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> > >
> > > So boosting WireGuard as primary SW network encryption user?
> >
> > Essentially, although the use case that inspired this work is related
> > to IPsec not WireGuard, and the crypto algorithm in that case (GCM) is
> > ~3x faster than WG's chacha20poly1305, which makes the performance
> > overhead of asynchronous completion even more significant. (Note that
> > GCM needs the AES and PMULL instructions which are usually only
> > available when running the 32-bit kernel on a 64-bit core, whereas
> > chacha20poly1305 uses ordinary NEON instructions.)
> >
> > But Martin responded with a Tested-by regarding chacha20poly1305 on
> > IPsec (not WG) where there is also a noticeable speedup, so WG on
> > ARM32 should definitely benefit from this as well.
>
> It'll be interesting to see whether there is any noticable difference
> with my WG VPN.
>
Using WireGuard with the same 32-bit KVM guest communicating with its
64-bit host using virtio-net, I get a 44% speedup in the host->guest
direction. The other direction performs exactly the same, which is
unsurprising as it doesn't involve NEON crypto in softirq context at
all.
BEFORE
======
ardb at vm32:~$ iperf3 -c 192.168.11.2
Connecting to host 192.168.11.2, port 5201
[ 5] local 192.168.11.1 port 40144 connected to 192.168.11.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 25.8 MBytes 216 Mbits/sec 0 397 KBytes
[ 5] 1.00-2.00 sec 25.9 MBytes 217 Mbits/sec 0 397 KBytes
[ 5] 2.00-3.00 sec 27.0 MBytes 226 Mbits/sec 0 397 KBytes
[ 5] 3.00-4.00 sec 26.5 MBytes 222 Mbits/sec 0 397 KBytes
[ 5] 4.00-5.00 sec 26.2 MBytes 220 Mbits/sec 0 397 KBytes
[ 5] 5.00-6.00 sec 26.1 MBytes 219 Mbits/sec 0 436 KBytes
[ 5] 6.00-7.00 sec 26.2 MBytes 220 Mbits/sec 0 458 KBytes
[ 5] 7.00-8.00 sec 26.2 MBytes 220 Mbits/sec 0 458 KBytes
[ 5] 8.00-9.00 sec 26.5 MBytes 222 Mbits/sec 0 480 KBytes
[ 5] 9.00-10.00 sec 26.9 MBytes 225 Mbits/sec 0 480 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 263 MBytes 221 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 262 MBytes 220 Mbits/sec receiver
ardb at sudo:~$ iperf3 -c 192.168.11.1
Connecting to host 192.168.11.1, port 5201
[ 5] local 192.168.11.2 port 46340 connected to 192.168.11.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 47.5 MBytes 398 Mbits/sec 0 1.75 MBytes
[ 5] 1.00-2.00 sec 45.0 MBytes 377 Mbits/sec 18 1.35 MBytes
[ 5] 2.00-3.00 sec 43.8 MBytes 367 Mbits/sec 0 1.47 MBytes
[ 5] 3.00-4.00 sec 45.0 MBytes 377 Mbits/sec 0 1.56 MBytes
[ 5] 4.00-5.00 sec 45.0 MBytes 377 Mbits/sec 0 1.63 MBytes
[ 5] 5.00-6.00 sec 42.5 MBytes 357 Mbits/sec 0 1.68 MBytes
[ 5] 6.00-7.00 sec 43.8 MBytes 367 Mbits/sec 0 1.71 MBytes
[ 5] 7.00-8.00 sec 43.8 MBytes 367 Mbits/sec 0 1.73 MBytes
[ 5] 8.00-9.00 sec 45.0 MBytes 377 Mbits/sec 0 1.74 MBytes
[ 5] 9.00-10.00 sec 43.8 MBytes 367 Mbits/sec 0 1.75 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 445 MBytes 373 Mbits/sec 18 sender
[ 5] 0.00-10.04 sec 444 MBytes 371 Mbits/sec receiver
iperf Done.
AFTER
=====
ardb at vm32:~$ iperf3 -c 192.168.11.2
Connecting to host 192.168.11.2, port 5201
[ 5] local 192.168.11.1 port 44004 connected to 192.168.11.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 26.2 MBytes 220 Mbits/sec 0 399 KBytes
[ 5] 1.00-2.00 sec 25.9 MBytes 217 Mbits/sec 0 399 KBytes
[ 5] 2.00-3.00 sec 26.0 MBytes 218 Mbits/sec 0 444 KBytes
[ 5] 3.00-4.00 sec 26.8 MBytes 225 Mbits/sec 0 485 KBytes
[ 5] 4.00-5.00 sec 26.4 MBytes 222 Mbits/sec 0 542 KBytes
[ 5] 5.00-6.00 sec 26.6 MBytes 223 Mbits/sec 0 568 KBytes
[ 5] 6.00-7.00 sec 25.4 MBytes 213 Mbits/sec 0 568 KBytes
[ 5] 7.00-8.00 sec 25.9 MBytes 217 Mbits/sec 0 568 KBytes
[ 5] 8.00-9.00 sec 26.7 MBytes 224 Mbits/sec 0 568 KBytes
[ 5] 9.00-10.00 sec 25.9 MBytes 217 Mbits/sec 0 568 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 262 MBytes 220 Mbits/sec 0 sender
[ 5] 0.00-9.99 sec 261 MBytes 219 Mbits/sec receiver
iperf Done.
ardb at sudo:~$ iperf3 -c 192.168.11.1
Connecting to host 192.168.11.1, port 5201
[ 5] local 192.168.11.2 port 49838 connected to 192.168.11.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 61.2 MBytes 514 Mbits/sec 0 1.59 MBytes
[ 5] 1.00-2.00 sec 66.2 MBytes 555 Mbits/sec 0 1.67 MBytes
[ 5] 2.00-3.00 sec 65.0 MBytes 545 Mbits/sec 79 1.24 MBytes
[ 5] 3.00-4.00 sec 63.8 MBytes 535 Mbits/sec 0 1.36 MBytes
[ 5] 4.00-5.00 sec 63.8 MBytes 535 Mbits/sec 0 1.46 MBytes
[ 5] 5.00-6.00 sec 63.8 MBytes 535 Mbits/sec 0 1.53 MBytes
[ 5] 6.00-7.00 sec 62.5 MBytes 524 Mbits/sec 0 1.59 MBytes
[ 5] 7.00-8.00 sec 65.0 MBytes 545 Mbits/sec 99 1.18 MBytes
[ 5] 8.00-9.00 sec 65.0 MBytes 545 Mbits/sec 0 1.25 MBytes
[ 5] 9.00-10.00 sec 65.0 MBytes 545 Mbits/sec 0 1.30 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 641 MBytes 538 Mbits/sec 178 sender
[ 5] 0.00-10.02 sec 638 MBytes 535 Mbits/sec receiver
iperf Done.
More information about the linux-arm-kernel
mailing list