[PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets

Marc Zyngier maz at kernel.org
Thu Mar 9 01:01:29 PST 2023


On Mon, 06 Mar 2023 22:08:04 +0000,
Colton Lewis <coltonlewis at google.com> wrote:
> 
> Hi Marc,
> 
> First of all, thanks for your previous responses to my comments. Many of
> them clarified things I did not fully understand on my own.
> 
> As I stated in another email, I've been testing this series on ECV
> capable hardware. Things look good but I have been able to reproduce a
> consistent assertion failure in this selftest when setting a
> sufficiently large physical offset. I have so far not been able to
> determine the cause of the failure and wonder if you have any insight as
> to what might be causing this and how to debug.
> 
> The following example reproduces the error every time I have tried:
> 
> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
> ./aarch64/arch_timer -O 0xffff
> ==== Test Assertion Failure ====
>   aarch64/arch_timer.c:239: false
>   pid=48094 tid=48095 errno=4 - Interrupted system call
>      1  0x4010fb: test_vcpu_run at arch_timer.c:239
>      2  0x42a5bf: start_thread at pthread_create.o:0
>      3  0x46845b: thread_start at clone.o:0
>   Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2

The fun part is that you can see similar things without the series:

==== Test Assertion Failure ====
  aarch64/arch_timer.c:239: false
  pid=647 tid=651 errno=4 - Interrupted system call
     1  0x00000000004026db: test_vcpu_run at arch_timer.c:239
     2  0x00007fffb13cedd7: ?? ??:0
     3  0x00007fffb1437e9b: ?? ??:0
  Failed guest assert: config_iter + 1 == irq_iter at aarch64/arch_timer.c:188
values: 2, 3; 0, vcpu 3; stage; 4; iter: 3

That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
without any argument in a loop. After a few iterations, it blows.

>
> Observations:
> 
> - Failure always occurs at stage 3 or 4 (physical timer stages)
> - xcnt_diff_us is always slightly less than 10000, or 10 ms
> - Reducing offset size reduces the probability of failure linearly (for
>   example, -O 0x8000 will fail close to half the time)
> - Failure occurs with a wide range of different period values and
>   whether or not migrations happen

The problem is that I don't understand enough of the test to make a
judgement call. I hardly get *what* it is testing. Do you?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list