[PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
Colton Lewis
coltonlewis at google.com
Tue Mar 14 10:47:01 PDT 2023
Marc Zyngier <maz at kernel.org> writes:
> On Fri, 10 Mar 2023 19:26:47 +0000,
> Colton Lewis <coltonlewis at google.com> wrote:
>> Marc Zyngier <maz at kernel.org> writes:
>> >> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
>> >> ./aarch64/arch_timer -O 0xffff
>> >> ==== Test Assertion Failure ====
>> >> aarch64/arch_timer.c:239: false
>> >> pid=48094 tid=48095 errno=4 - Interrupted system call
>> >> 1 0x4010fb: test_vcpu_run at arch_timer.c:239
>> >> 2 0x42a5bf: start_thread at pthread_create.o:0
>> >> 3 0x46845b: thread_start at clone.o:0
>> >> Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
>> >> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2
>> > The fun part is that you can see similar things without the series:
>> > ==== Test Assertion Failure ====
>> > aarch64/arch_timer.c:239: false
>> > pid=647 tid=651 errno=4 - Interrupted system call
>> > 1 0x00000000004026db: test_vcpu_run at arch_timer.c:239
>> > 2 0x00007fffb13cedd7: ?? ??:0
>> > 3 0x00007fffb1437e9b: ?? ??:0
>> > Failed guest assert: config_iter + 1 == irq_iter at
>> > aarch64/arch_timer.c:188
>> > values: 2, 3; 0, vcpu 3; stage; 4; iter: 3
>> > That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
>> > without any argument in a loop. After a few iterations, it blows.
> I finally got to the bottom of that one. This is yet another case of
> the test making the assumption that spurious interrupts don't exist...
That's great!
> Here, the timer interrupt has been masked at the source, but the GIC
> (or its emulation) can be slow to retire it. So we take it again,
> spuriously, and account it as a true interrupt. None of the asserts in
> the timer handler fire because they only check the *previous* state.
> Eventually, the interrupt retires and we progress to the next
> iteration. But in the meantime, we have incremented the irq counter by
> the number of spurious events, and the test fails.
> The obvious fix is to check for the timer state in the handler and
> exit early if the timer interrupt is masked or the timer disabled.
> With that, I don't see these failures anymore.
> I've folded that into the patch that already deals with some spurious
> events.
I'll be looking at it and will keep in mind your questions about my
hardware should I find any issues. Yes it has ECV and CNTPOFF but no I
didn't try turning it off for this because my issue occured only when
setting a physical offset and that can't be done without ECV.
More information about the linux-arm-kernel
mailing list