[PATCH] irqchip/msi-lib: Honor the MSI_FLAG_PCI_MSI_MASK_PARENT flag

Mon Jun 30 01:59:29 PDT 2025

On Fri, May 23 2025 at 10:06, Marc Zyngier wrote:
> On Sat, 17 May 2025 20:59:10 +0100,
> Thomas Gleixner <tglx at linutronix.de> wrote:
>> 
>> On Sat, May 17 2025 at 11:30, Marc Zyngier wrote:
>> > +	/*
>> > +	 * If the parent domain insists on being in charge of masking, obey
>> > +	 * blindly. The default mask/unmask become the shutdown/enable
>> > +	 * callbacks, ensuring that we correctly start/stop the interrupt.
>> > + 	 * We make a point in not using the irq_disable() in order to
>> > +	 * preserve the "lazy disable" behaviour.
>> > +	 */
>> > +	if (info->flags & MSI_FLAG_PCI_MSI_MASK_PARENT) {
>> > +		chip->irq_shutdown	= chip->irq_mask;
>> > +		chip->irq_enable	= chip->irq_unmask;
>> 
>> This is only correct, when the chip does not have dedicated
>> irq_shutdown/enable callbacks.
>
> The chip structure provided by the PCI MSI code doesn't provide such
> callback, meaning that they are unused for the whole hierarchy.

Fair enough, but it still stinks.

>> And I really hate the asymmetry of this.
>
> So do I, but that's how the lazy disable thing currently works. Drop
> the bizarre asymmetry on irq_disable, and we can make this nicely
> symmetric as well.

Well, it's not that bizarre and it has a massive performance win if the
thing does not need to go out to the hardware in some scenarios. Don't
ask about the main use case. Mentioning it is probably considered a
violation of the United Nations Convention Against Torture (UNCAT).

>> > +		chip->irq_mask		= irq_chip_mask_parent;
>> > +		chip->irq_unmask	= irq_chip_unmask_parent;
>> > +	}
>> 
>> I'm still trying to understand, what's the actual problem is you are
>> trying to solve.
>
> I'm trying to remove some overhead from machines that don't need to
> suffer from this nonsense double masking. Specially in VMs when
> masking/unmasking requires *two* extremely costly exits (write +
> synchronising read-back). This change reduces the overhead
> significantly by only masking where it actually matters.
>
>> MSIs are edge type interrupts, so the interrupt handling hotpath usually
>> does not mask at all. The only time masking happens is when it's lazy
>> disabled or during affinity changes, which is not the end of the world.
>
> And that's part of the problem. The lazy disable ends up being way
> more costly than it should when the interrupt fires during the
> "disabled but not quite" phase, and in turn makes the critical section
> delineated by disable_irq()/enable_irq() more expensive.
>
> So while, as you put it, it's "not the end of the world", this seems
> to me like a valuable optimisation.

I understand, but this needs more thoughts. Doing this wholesale for all
potential PCI/MSI parent domains which require MASK_PARTN makes me more
than nervous.

> Another possible improvement would be to teach the PCI code it can
> still rely on masking even when the endpoint is not capable of masking
> individual MSIs.

Well, it relies on that today already if the underlying parent domain is
capable of masking. If not, it hopes that nothing bad happens, which is
the only option we have :(

It get's worse when the device does not support masking _and_ the parent
domain does not provide immutable MSI messages because then the MSI
message write becomes a horrorshow. For illustration see the mess in
arch/x86/kernel/apic/msi.c::msi_set_affinity(), which is a violation of
above mentioned convention as well. Despite the fact that this has been
known for decades, RISC-V went ahead and replicated that trainwreck in
the IMSIC IP block. Oh well....

I sat down and stared at it in the few moments where the heat wave did
not completely shutdown my brain. As usual this ended in a larger
cleanup and overhaul... At the end I went and created a new pair of chip
callbacks and the corresponding logic around it. A preview of the whole
pile is at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git irq/msi

Thanks,

        tglx