[PATCH] GICv3: Add restart handler to detach CPU from GICv3

Marc Zyngier maz at kernel.org
Thu Jan 5 02:37:50 PST 2023


On Wed, 04 Jan 2023 19:52:33 +0000,
Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:
> 
> On Wed, 2023-01-04 at 18:48 +0000, Marc Zyngier wrote:
> > On Wed, 04 Jan 2023 17:23:42 +0000,
> > Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:
> > > 
> > > On Wed, 2023-01-04 at 16:50 +0000, Marc Zyngier wrote:
> > > > On Wed, 04 Jan 2023 16:04:14 +0000,
> > > > Mark Rutland <mark.rutland at arm.com> wrote:
> > > > > 
> > > > > On Tue, Jan 03, 2023 at 04:27:07PM +0000, Joakim Tjernlund wrote:
> > > > > > On Fri, 2022-12-16 at 17:21 +0100, Joakim Tjernlund wrote:
> > > > > > 
> > > > > > Ping?
> > > > > 
> > > > > To whom?
> > > > > 
> > > > > You don't appeared to have Cc'd any relevant maintainer, and people are still
> > > > > on holiday, so it's extremely likely this will be missed.
> > > > 
> > > > That, plus nobody reads the list looking for this sort of things.
> > > > 
> > > > > 
> > > > > For the maintainer, please use scripts/get_maintainer.pl, e.g.
> > > > > 
> > > > > > [mark at lakrids:~/src/linux]% ./scripts/get_maintainer.pl -f drivers/irqchip/irq-gic-v3.c
> > > > > > Thomas Gleixner <tglx at linutronix.de> (maintainer:IRQCHIP DRIVERS)
> > > > > > Marc Zyngier <maz at kernel.org> (maintainer:IRQCHIP DRIVERS)
> > > > > > linux-kernel at vger.kernel.org (open list:IRQCHIP DRIVERS)
> > > > > 
> > > > > Note: I've Cc'd Marc, who wrote the GICv3 driver.
> > > > 
> > > > Cheers Mark, much appreciated.
> > > 
> > > Sorry for missing that extra maintainer CC:
> > > 
> > > > 
> > > > > 
> > > > > > > Needed for reboot without resetting the whole GIC
> > > > > 
> > > > > This doesn't really explain what you're trying to do nor why.
> > > > > 
> > > > > Why do you need to "reboot without resetting the whole GIC" ?
> > > > > 
> > > > > Do you encounter a problem if we try to reset the whole GIC?
> > > > > 
> > > > > Is this for kexec?
> > > > > 
> > > > > Is this for some use-case enabled by out-of-tree code?
> > > > 
> > > > All valid questions. This smells of a terrible hack...
> > > 
> > > Yes, all god Q's.
> > 
> > And no answer?
> 
> No kexec, out of tree kernel with minor mods
> > 
> > > 
> > > > 
> > > > The interesting aspect is that this is only done when DS=1, probably
> > > > meaning that they are doing this in a VM. it also rely on some
> > > 
> > > Nope, on our custom target.
> > 
> > And you run with DS=1? On bare metal? Humpf...
> 
> Yes, we control all SW running on this target.
> 
> > 
> > > 
> > > > (unbounded) UNPRED behaviour as ProcessorSleep is entered without
> > > > any consideration for Group0... Good luck with that.
> 
> No Group0 IRQ used(or so I think but I could be mistaken)
> 
> > > 
> > > hmm, I am doing the same as PM does which also needs DS=1 so I figured
> > > this was uncontroversial.
> > 
> > I seriously doubt anyone is actually using that code in anger. I
> > always found it dodgy, and I'm pretty sure it is totally broken.
> 
> I see
> 
> > 
> > > 
> > > > 
> > > > Anyway, I don't think we want any of this stuff. Certainly not without
> > > > a cast-iron justification.
> > > 
> > > We use several cores but only one run Linux so a Linux reboot will
> > > only reset its own core.
> > 
> > And what happens with the distributor? One of the key assumption of
> > the GIC architecture is that there is only one operating system in
> > control of it, the whole of it. The only way to share it is by
> > virtualising it. Shades of Jailhouse...
> 
> Distributor is not touched until Linux is going up again
> 
> > 
> > > 
> > > Without this patch I either need to reset the GIC as part of the
> > > reboot or I get RWP timeout when linux starts again. Resetting the
> > > GIC kills IRQ on the other cores for a long time which is unwanted.
> > 
> > But that's what happens anyway when Linux boots (we reinitialise the
> > whole distributor). So what is this about?
> 
> At that point IRQ will only be lost shortly until the other cores
> are reloaded by Linux user space

How different is that from reseting the GIC altogether?

> 
> > 
> > > 
> > > The RWP timeout comes from lost HW handshake between core and GIC.
> > > Is there another way to regain the HW handshake ?
> > 
> > That's the firmware's job. But overall, getting these timeouts means
> > your GIC has locked up. It would be more interesting to understand
> 
> FW here is u-boot starting from EL3
> 
> > *why* you get in that situation, which I presume is due to the way the
> > driver initialises itself.
> > 
> > Assuming you use kexec to reboot your Linux instance, does the
> > following help (totally untested)?
> 
> no kexec and no it did not help.
> I get the first RWP here:
> static void __init gic_dist_init(void)
> {
> 	unsigned int i;
> 	u64 affinity;
> 	void __iomem *base = gic_data.dist_base;
> 	u32 val;
> 
> 	/* Disable the distributor */
> 	writel_relaxed(0, base + GICD_CTLR);
> 	gic_dist_wait_for_rwp(); <------ RWP timeout
> 
> I guess it is impossible to recover GIC HW handshake once it is lost ?

This is trying to teardown the distributor, but the downstream
redistributor (and CPU interface) are still up. The architecture
doesn't specify what can happen in this case, and I wouldn't be
surprised if you were triggering some really bad HW bugs. Please
provide the exact version of your GIC (at least the GICD_IIDR value).

Since you're in control of your firmware, you should reset things
there, because even if we hack something here for the boot CPU, this
says nothing of the other CPUs.

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list