[BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")

Janne Grunau j at jannau.net
Tue May 21 13:06:36 PDT 2024


Hej,

On Tue, May 21, 2024, at 20:34, Will Deacon wrote:
> Hi Johannes,
>
> On Tue, May 21, 2024 at 08:22:08AM +0200, Johannes Nixdorf wrote:
>> Bad news: I hit the bug again with 2632e2521769 ("arm64: fpsimd: Implement
>> lazy restore for kernel mode FPSIMD") reverted during prolonged interactive
>> usage with the downstream Asahi Linux kernel.
>
> Damn, but thanks for the update. I have to ask, but are you absolutely
> sure this was with 2632e2521769 reverted? If you're able to double-check
> that, it would be great, since we're having trouble reproducing the
> issue.

I can reproduce the issue with v6.8 and 2632e2521769 reverted on M1 (t8103). Reproduction with 2632e2521769 reverted is harder. I've seen multiple fio runs without verification errors while with plain v6.8 verification errors are hit after a few seconds.
Running SIMD intense workloads in user space apparently increase the reproduction odds. When running AV1 decoding using dav1d in parallel errors appear faster. Errors manifest either in changed decoder output, fio verification errors or both. I'm using `dav1d -i sample.ivf --muxer xxh3 -o -` here as user space SIMD payload but I'd assume the exact SIMD user space code doesn't matter as long as it runs on all CPU cores.

>> This prompted me to adjust the reproducer to be closer to the desktop use
>> case, which then also found aefbab8e77eb ("arm64: fpsimd: Preserve/restore
>> kernel mode NEON at context switch"). With the vanilla kernel before the
>> commit or that commit reverted on the Asahi Linux kernel the new reproducer
>> also sees no bug, and interactive usage seems fine.
>
> I've already reverted 2632e2521769 ("arm64: fpsimd: Implement lazy
> restore for kernel mode FPSIMD"), so it sounds like I should revert
> aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at
> context switch") as well while we work to reproduce the issue.

v6.8 with 2632e2521769 and aefbab8e77eb reverted does no longer reproduce errors. av1 decoding produces a stable hash as expected and fio does not report any verification errors.

best regards

Janne



More information about the linux-arm-kernel mailing list