[PATCH] ARM: KVM: iterate over all CPUs for CPU compatibility check
Andre Przywara
andre.przywara at linaro.org
Fri Apr 19 08:58:57 EDT 2013
On 04/17/2013 11:12 AM, Christoffer Dall wrote:
>
>
>
> On Wed, Apr 17, 2013 at 1:16 AM, Marc Zyngier <marc.zyngier at arm.com
> <mailto:marc.zyngier at arm.com>> wrote:
>
> On Wed, 17 Apr 2013 10:08:12 +0200, Andre Przywara
> <andre.przywara at linaro.org <mailto:andre.przywara at linaro.org>> wrote:
> > On 04/16/2013 06:33 PM, Marc Zyngier wrote:
> >> On Tue, 16 Apr 2013 09:26:26 -0700, Christoffer Dall
> >> <cdall at cs.columbia.edu <mailto:cdall at cs.columbia.edu>> wrote:
> >>> On Mon, Apr 15, 2013 at 6:48 AM, Will Deacon
> <will.deacon at arm.com <mailto:will.deacon at arm.com>>
> >> wrote:
> >>>> On Mon, Apr 15, 2013 at 02:13:55PM +0100, Andre Przywara wrote:
> >>>>> On 04/15/2013 11:52 AM, Alexander Spyridakis wrote:
> >>>>>> I've run on this problem before, while trying to run KVM
> guests on
> >> A7
> >>>>>> cores.
> >>>>>>
> >>>>>> For some reason the 3rd A7 hangs in arch/arm/kvm/init.S, on the
> >>>>>> instruction that updates HSCTLR between the two isbs on
> >> __do_hyp_init
> >>>>>> (mcr p15, 4, r0, c1, c0, 0). If you boot the system with
> maxcpus=4
> >>>>>> then
> >>>>>> init_hyp_mode() will not hang on the A7 cluster. Other than that
> >> from
> >>>>>> my
> >>>>>> limited testing KVM on A7 works on a usual linux guest. I also
> tried
> >>>>>> to
> >>>>>> only boot the 3rd A7 core to rule out any racing issues, but
> still
> >> the
> >>>>>> same behaviour applies.
> >>>>>
> >>>>> Could well be the same issue here. I chased it down till CPU
> 2 goes
> >> into
> >>>>> HYP mode to do the initialization.
> >>>>> I am running with maxcpus=3 (this increases the likelyhood that
> >>>>> kvm_target_cpu() runs on an A15), so CPU #2 is the only one A7.
> >>>>> As the HYP mode exception table is empty except for the HVC
> trap, it
> >> may
> >>>>> be looping here. I am trying now to get the PC of the faulty
> >>>>> instruction.
> >>>>
> >>>> Yes, it sounds like you're taking a recursive fault because the
> vectors
> >>>> aren't installed yet. Is there any chance you can find out
> what value
> >>>> you end
> >>>> up writing (or trying to write) to the HSCTLR please?
> >>>>
> >>> Actually I'm a little confused, wasn't Andre seeing a halt on
> an A15
> >>> cpu, not an A7? Or is the theory that an A7 locks up and the
> calling
> >>> A15 hangs on the SMP call to cpu_init_hyp_mode, waiting for the
> A7 to
> >>> complete?
> >>
> >> Yes, A15 hanging, not A7. That's why I'm strongly opposed to this
> patch.
> >> I'm pretty sure the A7s only have a side effect that triggers a
> kernel
> >> bug
> >> on the A15 side. Before taking *any* patch around this, we should
> >> understand the issue fully, and not start patching random stuff just
> >> because Linus is going to tag 3.9.
> >
> > I think there is a misunderstanding. The RCU watchdog was complaining
> > because the A15 wasn't making any progress. As Christoffer said,
> this is
>
> > because it was waiting for CPU 2 to return from the SMP call. It is
> > actually the A7 hanging inside HYP mode.
> > I tried some ways to get information out of there, but had no luck so
> > far. The different mapping between HYP and SVC doesn't make it
> easy to
> > dump some variables, but I am still working on it (but only half
> steam
>
> You could force a full mapping of the kernel text in HYP. Ugly, but
> should
> work.
>
> > because I am home looking after my sick daughter). So for now I
> assume
> > that it is the HSCTLR setting Alexander observed already.
>
> I'll give it a go today or tomorrow, depending how quickly I can get rid
> of my backlog after a couple of days off work.
>
> Assuming this is an A7 handing on HSCTLR access, it should be pretty
> easy
> to narrow down by booting only on the A7s, leaving the A15s held in
> reset.
>
>
> You could also try installing a vector handler early and detect faults,
> and add an alternative return path from the init function with some
> error reporting value in r0 or something like that, just for debugging,
> naturally, but that could be a way to detect if we really are taking
> recursive faults here.
OK, I added code to return earlier on CPUs not from cluster 0.
Indeed it hangs in the HSCR write. The two A15s pass this instruction,
writing 0x30c5187F into the register.
This means all the fixed bits for A15 correctly, C,A,M and I set and
WXN, EE, TE cleared. FI was also cleared
The A7 wanted to write the very same value. I tried to set bit 21, which
kind of the A7 TRM hints to do: but no change.
Before the HSCLTR write, the register reads 0x30c50878, with SCTLR being
0x30c5387d.
So the code wants to set M, A, C and I in HSCLTR. Interestingly SCTLR
has the V bits set, could that be an issue?
Regards,
Andre.
More information about the linux-arm-kernel
mailing list