[PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets

Colton Lewis coltonlewis at google.com
Tue Mar 14 10:47:01 PDT 2023


Marc Zyngier <maz at kernel.org> writes:

> On Fri, 10 Mar 2023 19:26:47 +0000,
> Colton Lewis <coltonlewis at google.com> wrote:

>> Marc Zyngier <maz at kernel.org> writes:

>> >> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
>> >> ./aarch64/arch_timer -O 0xffff
>> >> ==== Test Assertion Failure ====
>> >>    aarch64/arch_timer.c:239: false
>> >>    pid=48094 tid=48095 errno=4 - Interrupted system call
>> >>       1  0x4010fb: test_vcpu_run at arch_timer.c:239
>> >>       2  0x42a5bf: start_thread at pthread_create.o:0
>> >>       3  0x46845b: thread_start at clone.o:0
>> >>    Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
>> >> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2

>> > The fun part is that you can see similar things without the series:

>> > ==== Test Assertion Failure ====
>> >    aarch64/arch_timer.c:239: false
>> >    pid=647 tid=651 errno=4 - Interrupted system call
>> >       1  0x00000000004026db: test_vcpu_run at arch_timer.c:239
>> >       2  0x00007fffb13cedd7: ?? ??:0
>> >       3  0x00007fffb1437e9b: ?? ??:0
>> >    Failed guest assert: config_iter + 1 == irq_iter at
>> > aarch64/arch_timer.c:188
>> > values: 2, 3; 0, vcpu 3; stage; 4; iter: 3

>> > That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
>> > without any argument in a loop. After a few iterations, it blows.

> I finally got to the bottom of that one. This is yet another case of
> the test making the assumption that spurious interrupts don't exist...

That's great!

> Here, the timer interrupt has been masked at the source, but the GIC
> (or its emulation) can be slow to retire it. So we take it again,
> spuriously, and account it as a true interrupt. None of the asserts in
> the timer handler fire because they only check the *previous* state.

> Eventually, the interrupt retires and we progress to the next
> iteration. But in the meantime, we have incremented the irq counter by
> the number of spurious events, and the test fails.

> The obvious fix is to check for the timer state in the handler and
> exit early if the timer interrupt is masked or the timer disabled.
> With that, I don't see these failures anymore.

> I've folded that into the patch that already deals with some spurious
> events.

I'll be looking at it and will keep in mind your questions about my
hardware should I find any issues. Yes it has ECV and CNTPOFF but no I
didn't try turning it off for this because my issue occured only when
setting a physical offset and that can't be done without ECV.



More information about the linux-arm-kernel mailing list