[PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
Marc Zyngier
maz at kernel.org
Thu Mar 9 01:01:29 PST 2023
On Mon, 06 Mar 2023 22:08:04 +0000,
Colton Lewis <coltonlewis at google.com> wrote:
>
> Hi Marc,
>
> First of all, thanks for your previous responses to my comments. Many of
> them clarified things I did not fully understand on my own.
>
> As I stated in another email, I've been testing this series on ECV
> capable hardware. Things look good but I have been able to reproduce a
> consistent assertion failure in this selftest when setting a
> sufficiently large physical offset. I have so far not been able to
> determine the cause of the failure and wonder if you have any insight as
> to what might be causing this and how to debug.
>
> The following example reproduces the error every time I have tried:
>
> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
> ./aarch64/arch_timer -O 0xffff
> ==== Test Assertion Failure ====
> aarch64/arch_timer.c:239: false
> pid=48094 tid=48095 errno=4 - Interrupted system call
> 1 0x4010fb: test_vcpu_run at arch_timer.c:239
> 2 0x42a5bf: start_thread at pthread_create.o:0
> 3 0x46845b: thread_start at clone.o:0
> Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2
The fun part is that you can see similar things without the series:
==== Test Assertion Failure ====
aarch64/arch_timer.c:239: false
pid=647 tid=651 errno=4 - Interrupted system call
1 0x00000000004026db: test_vcpu_run at arch_timer.c:239
2 0x00007fffb13cedd7: ?? ??:0
3 0x00007fffb1437e9b: ?? ??:0
Failed guest assert: config_iter + 1 == irq_iter at aarch64/arch_timer.c:188
values: 2, 3; 0, vcpu 3; stage; 4; iter: 3
That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
without any argument in a loop. After a few iterations, it blows.
>
> Observations:
>
> - Failure always occurs at stage 3 or 4 (physical timer stages)
> - xcnt_diff_us is always slightly less than 10000, or 10 ms
> - Reducing offset size reduces the probability of failure linearly (for
> example, -O 0x8000 will fail close to half the time)
> - Failure occurs with a wide range of different period values and
> whether or not migrations happen
The problem is that I don't understand enough of the test to make a
judgement call. I hardly get *what* it is testing. Do you?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list