[BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")

Janne Grunau j at jannau.net
Tue May 21 14:23:10 PDT 2024



On Tue, May 21, 2024, at 22:21, Mark Brown wrote:
> On Tue, May 21, 2024 at 10:06:36PM +0200, Janne Grunau wrote:
>
>> Running SIMD intense workloads in user space apparently increase the
>> reproduction odds. When running AV1 decoding using dav1d in parallel
>> errors appear faster. Errors manifest either in changed decoder
>> output, fio verification errors or both. I'm using `dav1d -i
>> sample.ivf --muxer xxh3 -o -` here as user space SIMD payload but I'd
>> assume the exact SIMD user space code doesn't matter as long as it
>> runs on all CPU cores.
>
> There's fp-stress in tools/testing/selftests/arm64/fp which will run two
> copies of fpsimd-test (from the same directory) per core in parallel
> looking for corruption in the FPSIMD registers.  Specify '-t -1' and
> it'll run for ever.

It's hard (impossible) to reproduce just with fio and fp-stress. 5 consecutive fio runs (each 40-45 seconds) without verification error in fio or Mismatch from fpsimd-test.

With AV1 decoding in parallel each fio run shows at least one of decoding mismatch, fio verification error or mismatches from fpsimd-test as below:

| # FPSIMD-6-0: Mismatch: PID=2110, iteration=9989281, reg=0
| # FPSIMD-6-0:   Expected [3e0800103e0840103e0880103e08c010]
| # FPSIMD-6-0:   Got      [f0f3cf35dea2f3ea41a13a27d8d9369b]
| # Sending signals, timeout remaining: -1
| ...
| # Sending signals, timeout remaining: -1
| # FPSIMD-7-0: Mismatch: PID=2112, iteration=15635931, reg=6
| # FPSIMD-7-0:   Expected [400806b0400846b0400886b04008c6b0]
| # FPSIMD-7-0:   Got      [b24b366b6b3b470ce763389ad425a33d]
| # FPSIMD-4-0: Mismatch: PID=2106, iteration=13371905, reg=0
| # FPSIMD-4-0:   Expected [3a0800103a0840103a0880103a08c010]
| # FPSIMD-4-0:   Got      [b9a7d504554cc797724fab09aa988e2f]
| # Sending signals, timeout remaining: -1
| ...
| # Sending signals, timeout remaining: -1
| # FPSIMD-5-0: Mismatch: PID=2108, iteration=14880477, reg=0
| # FPSIMD-5-0:   Expected [3c0800d03c0840d03c0880d03c08c0d0]
| # FPSIMD-5-0:   Got      [c550b9a0f0947cd38aa17241e129f9a6]
| # FPSIMD-7-1: Mismatch: PID=2113, iteration=17682959, reg=0
| # FPSIMD-7-1:   Expected [410800f0410840f0410880f04108c0f0]
| # FPSIMD-7-1:   Got      [98a43b8ae2157b6b30497714c52bf6d6]
| # FPSIMD-2-0: Mismatch: PID=2102, iteration=16482263, reg=0
| # FPSIMD-2-0:   Expected [3608007036084070360880703608c070]
| # FPSIMD-2-0:   Got      [1cf724cdaf8f997338ee499a5f1f33e7]

In the majority of cases the mismatch is reported for reg=0.
Running just fp-stress and AV1 decoding without fio reports no errors.

The fio testing probably caused segfaults in sddm / kwin on the tests system using llvmpipe.

best regards

Janne



More information about the linux-arm-kernel mailing list