GICv3: RWP timeout, gone fishing

Joakim Tjernlund Joakim.Tjernlund at infinera.com
Wed May 4 15:27:35 PDT 2022


On Fri, 2022-04-22 at 12:19 +0200, Joakim Tjernlund wrote:
> After a 1 cluster reset(got 2 clusters with 2 A53 cores in each, only one core runs Linux) I se a bunch of
>   GICv3: RWP timeout, gone fishing
> when linux is booting up and IRQs aren't working, the console hangs when starting user space.
> 
> The cluster reset is impl. in linux reboot code so everything should shut down properly
> before the cluster reset.
> 
> kernel 5.15.26
> 
> What can could be the cause? Where do I even begin to look?
>  
>  Jocke

Tried to fix these RWP timeout by impl. restart_handler by mimicking PM:
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -23,6 +23,7 @@
 #include <linux/irqchip/arm-gic-common.h>
 #include <linux/irqchip/arm-gic-v3.h>
 #include <linux/irqchip/irq-partition-percpu.h>
+#include <linux/reboot.h>
 
 #include <asm/cputype.h>
 #include <asm/exception.h>
@@ -811,6 +812,7 @@ static void __init gic_dist_init(void)
        gic_dist_config(base, GIC_LINE_NR, gic_dist_wait_for_rwp);
 
        val = GICD_CTLR_ARE_NS | GICD_CTLR_ENABLE_G1A | GICD_CTLR_ENABLE_G1;
        if (gic_data.rdists.gicd_typer2 & GICD_TYPER2_nASSGIcap) {
                pr_info("Enabling SGIs without active state\n");
                val |= GICD_CTLR_nASSGIreq;
@@ -1359,6 +1361,22 @@ static void gic_cpu_pm_init(void)
 static inline void gic_cpu_pm_init(void) { }
 #endif /* CONFIG_CPU_PM */
 
+static int gicv3_restart_notify(struct notifier_block *nb,
+                              unsigned long mode, void *cmd)
+{
+       if (1 || gic_dist_security_disabled()) {
+               gic_write_grpen1(0);
+               gic_enable_redist(false);
+       }
+
+       return NOTIFY_DONE;
+}
+
+static struct notifier_block gicv3_restart_nb = {
+       .notifier_call = gicv3_restart_notify,
+       .priority = 0, /* Call late/last */
+};
+
 static struct irq_chip gic_chip = {
        .name                   = "GICv3",
        .irq_mask               = gic_mask_irq,
@@ -1843,6 +1861,8 @@ static int __init gic_init_bases(void __iomem *dist_base,
        gic_cpu_init();
        gic_smp_init();
        gic_cpu_pm_init();
+       //register_reboot_notifier(&gicv3_restart_nb);
+       register_restart_handler(&gicv3_restart_nb);

However this does not work until I move local_irq_disable() in machine_restart
so that the restart handler runs before IRQs are turned off:
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -125,8 +125,10 @@ void machine_power_off(void)
  */
 void machine_restart(char *cmd)
 {
+       
+
        /* Disable interrupts first */
-       local_irq_disable();
+       //local_irq_disable();
        smp_send_stop();
 
        /*
@@ -138,6 +140,7 @@ void machine_restart(char *cmd)
 
        /* Now call the architecture specific reboot code. */
        do_kernel_restart(cmd);
+       local_irq_disable();
 
        /*
         * Whoops - the architecture was unable to reboot.

I am probably barking up the work tree here too, any clues?

Another related Q: gic_dist_security_disabled() is false in my system, not
sure if that is best/common ? I can change u-boot to disable security here, we control all
SW running in this system.

 Jocke

PS. 
   reboot_notifier does is even worse but perhaps this is the way forward anyway?



More information about the linux-arm-kernel mailing list