[RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

Andrew Jones drjones at redhat.com
Tue Feb 24 06:55:31 PST 2015


On Fri, Feb 20, 2015 at 04:36:26PM +0100, Andrew Jones wrote:
> On Fri, Feb 20, 2015 at 02:37:25PM +0000, Ard Biesheuvel wrote:
> > On 20 February 2015 at 14:29, Andrew Jones <drjones at redhat.com> wrote:
> > > So looks like the 3 orders of magnitude greater number of traps
> > > (only to el2) don't impact kernel compiles.
> > >
> > 
> > OK, good! That was what I was hoping for, obviously.
> > 
> > > Then I thought I'd be able to quick measure the number of cycles
> > > a trap to el2 takes with this kvm-unit-tests test
> > >
> > > int main(void)
> > > {
> > >         unsigned long start, end;
> > >         unsigned int sctlr;
> > >
> > >         asm volatile(
> > >         "       mrs %0, sctlr_el1\n"
> > >         "       msr pmcr_el0, %1\n"
> > >         : "=&r" (sctlr) : "r" (5));
> > >
> > >         asm volatile(
> > >         "       mrs %0, pmccntr_el0\n"
> > >         "       msr sctlr_el1, %2\n"
> > >         "       mrs %1, pmccntr_el0\n"
> > >         : "=&r" (start), "=&r" (end) : "r" (sctlr));
> > >
> > >         printf("%llx\n", end - start);
> > >         return 0;
> > > }
> > >
> > > after applying this patch to kvm
> > >
> > > diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> > > index bb91b6fc63861..5de39d740aa58 100644
> > > --- a/arch/arm64/kvm/hyp.S
> > > +++ b/arch/arm64/kvm/hyp.S
> > > @@ -770,7 +770,7 @@
> > >
> > >         mrs     x2, mdcr_el2
> > >         and     x2, x2, #MDCR_EL2_HPMN_MASK
> > > -       orr     x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
> > > +//     orr     x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
> > >         orr     x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
> > >
> > >         // Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
> > >
> > > But I get zero for the cycle count. Not sure what I'm missing.
> > >
> > 
> > No clue tbh. Does the counter work as expected in the host?
> >
> 
> Guess not. I dropped the test into a module_init and inserted
> it on the host. Always get zero for pmccntr_el0 reads. Or, if
> I set it to something non-zero with a write, then I always get
> that back - no increments. pmcr_el0 looks OK... I had forgotten
> to set bit 31 of pmcntenset_el0, but doing that still doesn't
> help. Anyway, I assume the problem is me. I'll keep looking to
> see what I'm missing.
>

I returned to this and see that the problem was indeed me. I needed yet
another enable bit set (the filter register needed to be instructed to
count cycles while in el2). I've attached the code for the curious.
The numbers are mean=6999, std_dev=242. Run on the host, or in a guest
running on a host without this patch series (after TVM traps have been
disabled), I get a pretty consistent 40.

I checked how many vm-sysreg traps we do during the kernel compile
benchmark. It's 124924. So it's a bit strange that we don't see the
benchmark taking 10 to 20 seconds longer on average. I should probably
double check my runs. In any case, while I like the approach of this
series, the overhead is looking non-negligible.

drew
-------------- next part --------------
#include <libcflat.h>

static void prep_cc(void)
{
	asm volatile(
	"	msr pmovsclr_el0, %0\n"
	"	msr pmccfiltr_el0, %1\n"
	"	msr pmcntenset_el0, %2\n"
	"	msr pmcr_el0, %3\n"
	"	isb\n"
	:
	: "r" (1 << 31), "r" (1 << 27), "r" (1 << 31),
	  "r" (1 << 6 | 1 << 2 | 1 << 0));
}

int main(void)
{
	unsigned long start, end;
	unsigned int sctlr;
	int i, zeros = 0;

	asm volatile("mrs %0, sctlr_el1" : "=&r" (sctlr));
	prep_cc();

	for (i = 0; i < 100000; ++i) {
		asm volatile(
		"	mrs %0, pmccntr_el0\n"
		"	msr sctlr_el1, %2\n"
		"	mrs %1, pmccntr_el0\n"
		"	isb\n"
		: "=&r" (start), "=&r" (end) : "r" (sctlr));

		if ((i % 10) == 0)
			printf("\n");

		printf(" %d", end - start);

		if ((end - start) == 0) {
			++zeros;
			prep_cc();
		}
	}

	printf("\nnum zero counts = %d\n", zeros);
	return 0;
}


More information about the linux-arm-kernel mailing list