[PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets

Marc Zyngier maz at kernel.org
Mon Mar 13 04:43:01 PDT 2023


On Fri, 10 Mar 2023 19:26:47 +0000,
Colton Lewis <coltonlewis at google.com> wrote:
> 
> Marc Zyngier <maz at kernel.org> writes:
> 
> >> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
> >> ./aarch64/arch_timer -O 0xffff
> >> ==== Test Assertion Failure ====
> >>    aarch64/arch_timer.c:239: false
> >>    pid=48094 tid=48095 errno=4 - Interrupted system call
> >>       1  0x4010fb: test_vcpu_run at arch_timer.c:239
> >>       2  0x42a5bf: start_thread at pthread_create.o:0
> >>       3  0x46845b: thread_start at clone.o:0
> >>    Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
> >> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2
> 
> > The fun part is that you can see similar things without the series:
> 
> > ==== Test Assertion Failure ====
> >    aarch64/arch_timer.c:239: false
> >    pid=647 tid=651 errno=4 - Interrupted system call
> >       1  0x00000000004026db: test_vcpu_run at arch_timer.c:239
> >       2  0x00007fffb13cedd7: ?? ??:0
> >       3  0x00007fffb1437e9b: ?? ??:0
> >    Failed guest assert: config_iter + 1 == irq_iter at
> > aarch64/arch_timer.c:188
> > values: 2, 3; 0, vcpu 3; stage; 4; iter: 3
> 
> > That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
> > without any argument in a loop. After a few iterations, it blows.

I finally got to the bottom of that one. This is yet another case of
the test making the assumption that spurious interrupts don't exist...

Here, the timer interrupt has been masked at the source, but the GIC
(or its emulation) can be slow to retire it. So we take it again,
spuriously, and account it as a true interrupt. None of the asserts in
the timer handler fire because they only check the *previous* state.

Eventually, the interrupt retires and we progress to the next
iteration. But in the meantime, we have incremented the irq counter by
the number of spurious events, and the test fails.

The obvious fix is to check for the timer state in the handler and
exit early if the timer interrupt is masked or the timer disabled.
With that, I don't see these failures anymore.

I've folded that into the patch that already deals with some spurious
events.

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list