[PATCH] GICv3: Add restart handler to detach CPU from GICv3

Joakim Tjernlund Joakim.Tjernlund at infinera.com
Wed Jan 4 11:52:33 PST 2023


On Wed, 2023-01-04 at 18:48 +0000, Marc Zyngier wrote:
> On Wed, 04 Jan 2023 17:23:42 +0000,
> Joakim Tjernlund <Joakim.Tjernlund at infinera.com> wrote:
> > 
> > On Wed, 2023-01-04 at 16:50 +0000, Marc Zyngier wrote:
> > > On Wed, 04 Jan 2023 16:04:14 +0000,
> > > Mark Rutland <mark.rutland at arm.com> wrote:
> > > > 
> > > > On Tue, Jan 03, 2023 at 04:27:07PM +0000, Joakim Tjernlund wrote:
> > > > > On Fri, 2022-12-16 at 17:21 +0100, Joakim Tjernlund wrote:
> > > > > 
> > > > > Ping?
> > > > 
> > > > To whom?
> > > > 
> > > > You don't appeared to have Cc'd any relevant maintainer, and people are still
> > > > on holiday, so it's extremely likely this will be missed.
> > > 
> > > That, plus nobody reads the list looking for this sort of things.
> > > 
> > > > 
> > > > For the maintainer, please use scripts/get_maintainer.pl, e.g.
> > > > 
> > > > > [mark at lakrids:~/src/linux]% ./scripts/get_maintainer.pl -f drivers/irqchip/irq-gic-v3.c
> > > > > Thomas Gleixner <tglx at linutronix.de> (maintainer:IRQCHIP DRIVERS)
> > > > > Marc Zyngier <maz at kernel.org> (maintainer:IRQCHIP DRIVERS)
> > > > > linux-kernel at vger.kernel.org (open list:IRQCHIP DRIVERS)
> > > > 
> > > > Note: I've Cc'd Marc, who wrote the GICv3 driver.
> > > 
> > > Cheers Mark, much appreciated.
> > 
> > Sorry for missing that extra maintainer CC:
> > 
> > > 
> > > > 
> > > > > > Needed for reboot without resetting the whole GIC
> > > > 
> > > > This doesn't really explain what you're trying to do nor why.
> > > > 
> > > > Why do you need to "reboot without resetting the whole GIC" ?
> > > > 
> > > > Do you encounter a problem if we try to reset the whole GIC?
> > > > 
> > > > Is this for kexec?
> > > > 
> > > > Is this for some use-case enabled by out-of-tree code?
> > > 
> > > All valid questions. This smells of a terrible hack...
> > 
> > Yes, all god Q's.
> 
> And no answer?

No kexec, out of tree kernel with minor mods
> 
> > 
> > > 
> > > The interesting aspect is that this is only done when DS=1, probably
> > > meaning that they are doing this in a VM. it also rely on some
> > 
> > Nope, on our custom target.
> 
> And you run with DS=1? On bare metal? Humpf...

Yes, we control all SW running on this target.

> 
> > 
> > > (unbounded) UNPRED behaviour as ProcessorSleep is entered without
> > > any consideration for Group0... Good luck with that.

No Group0 IRQ used(or so I think but I could be mistaken)

> > 
> > hmm, I am doing the same as PM does which also needs DS=1 so I figured
> > this was uncontroversial.
> 
> I seriously doubt anyone is actually using that code in anger. I
> always found it dodgy, and I'm pretty sure it is totally broken.

I see

> 
> > 
> > > 
> > > Anyway, I don't think we want any of this stuff. Certainly not without
> > > a cast-iron justification.
> > 
> > We use several cores but only one run Linux so a Linux reboot will
> > only reset its own core.
> 
> And what happens with the distributor? One of the key assumption of
> the GIC architecture is that there is only one operating system in
> control of it, the whole of it. The only way to share it is by
> virtualising it. Shades of Jailhouse...

Distributor is not touched until Linux is going up again

> 
> > 
> > Without this patch I either need to reset the GIC as part of the
> > reboot or I get RWP timeout when linux starts again. Resetting the
> > GIC kills IRQ on the other cores for a long time which is unwanted.
> 
> But that's what happens anyway when Linux boots (we reinitialise the
> whole distributor). So what is this about?

At that point IRQ will only be lost shortly until the other cores are reloaded by Linux user space

> 
> > 
> > The RWP timeout comes from lost HW handshake between core and GIC.
> > Is there another way to regain the HW handshake ?
> 
> That's the firmware's job. But overall, getting these timeouts means
> your GIC has locked up. It would be more interesting to understand

FW here is u-boot starting from EL3

> *why* you get in that situation, which I presume is due to the way the
> driver initialises itself.
> 
> Assuming you use kexec to reboot your Linux instance, does the
> following help (totally untested)?

no kexec and no it did not help.
I get the first RWP here:
static void __init gic_dist_init(void)
{
	unsigned int i;
	u64 affinity;
	void __iomem *base = gic_data.dist_base;
	u32 val;

	/* Disable the distributor */
	writel_relaxed(0, base + GICD_CTLR);
	gic_dist_wait_for_rwp(); <------ RWP timeout

I guess it is impossible to recover GIC HW handshake once it is lost ?


> 
> 	M.
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 997104d4338e..0db35a07ffdb 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -1186,12 +1186,25 @@ static int gic_dist_supports_lpis(void)
>  static void gic_cpu_init(void)
>  {
>  	void __iomem *rbase;
> +	bool g0;
>  	int i;
>  
>  	/* Register ourselves with the rest of the world */
>  	if (gic_populate_rdist())
>  		return;
>  
> +	g0 = gic_has_group0();
> +
> +	if (read_gicreg(ICC_IGRPEN1_EL1) ||
> +	    (g0 && read_gicreg(ICC_IGRPEN0_EL1))) {
> +		if (g0)
> +			write_gicreg(0, ICC_IGRPEN0_EL1);
> +
> +		write_gicreg(0, ICC_IGRPEN1_EL1);
> +		isb();
> +		gic_enable_redist(false);
> +	}
> +
>  	gic_enable_redist(true);
>  
>  	WARN((gic_data.ppi_nr > 16 || GIC_ESPI_NR != 0) &&
> 



More information about the linux-arm-kernel mailing list