[PATCH 0/3] soc/tegra: Prevent the PMC driver from corrupting interrupt routing

Marc Zyngier maz at kernel.org
Mon Oct 5 09:06:37 EDT 2020


On 2020-10-05 12:22, Thierry Reding wrote:
> On Mon, Oct 05, 2020 at 12:14:40PM +0100, Marc Zyngier wrote:
>> Jon recently reported that one of the Tegra systems (Jetson TX2, aka
>> tegra186) stopped booting with the introduction of the "IPI as IRQs"
>> series. After a few weeks of head scratching and complete puzzlement,
>> I obtained a board and started looking at what was happening.
>> 
>> The interrupt hierarchy looks like this:
>> 
>> 	[DEVICE] -A-> [PMC] -B-> [GIC]
>> 
>> which seems simple enough. However, not all the devices attached to
>> the PMC follow this hierarchy, and in some cases, the 'B' link isn't
>> present in the HW. In other cases, neither 'A' nor 'B' are present.
>> And yet the PMC driver creates such linkages using random hwirq values
>> for the non-existent links, potentially overriding existing mappings
>> in the process. "What could possibly go wrong?"
> 
> Yes, that would've been my fault. It seemed like the right thing to do
> at the time, but the way you describe it makes it obvious that it was
> not. I can't say I understand why this would've worked prior to the
> rework that made this surface, though.

Because until these IPI patches, the range 0-7 never ever appeared
as actual hwirqs in the GIC domain. SGIs were handled in the GIC
code, behind the core kernel's back. As soon as we start using
an actual domain mapping for hwirq 0, the PMC driver starts messing
with it.

>> It turns out that for the 'B' link, the PMC driver uses hwirq 0, which
>> is SGI0 for the GIC, and used as the rescheduling IPI. Obviously, this
>> doesn't go very well, nor very far, as the IPI gets routed to random
>> drivers. Also, as the handling flow has been overridden, this
>> interrupt never gets deactivated and can't fire anymore. Yes, this is
>> bad.
>> 
>> The 'A' link is less problematic, but the hwirq value is still out of
>> the irqdomain range, and gets remapped every time a new 'A'-less
>> driver comes up.
>> 
>> Instead, let's trim the unused hierarchy levels as needed. This
>> requires some checks in the upper levels of the hierarchy as we now
>> have optional levels, but this looks a lot saner than what we
>> currently have. With this, tegra186 is back booting on -next.
>> 
>> I haven't tested any wake-up stuff, nor any other nvidia system (this
>> is the only one I have). If people agree to these changes, I can take
>> them via the irqchip tree so that they make it into the next merge
>> window.
> 
> Yeah, it sounds like this needs to go in ideally before the rework that
> caused this to surface in order to preserve bisectibility. But if it
> goes in afterwards that's probably fine as well.

It's easy to take it as part of the same pull request as the IPI
stuff. Not fully bisectable for this platform, but close enough.
I may even be able to merge this in *before* the IPI patches
(I'd need to rebuild irq/irqchip-next, but that won't alter the commit
ids of the individual patches as they are on separate branches).

> Let Jon and myself do a bit of testing with this to verify that the 
> wake
> up paths are still working.

Sure. Let me know what you find.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list