[PATCH 3/4] x86: hyperv: Expose hv_map_msi_interrupt function

Michael Kelley mhklinux at outlook.com
Thu Jun 19 15:02:03 PDT 2025


From: Nuno Das Neves <nunodasneves at linux.microsoft.com> Sent: Wednesday, June 18, 2025 2:08 PM
> 
> On 6/11/2025 4:07 PM, Michael Kelley wrote:
> > From: Nuno Das Neves <nunodasneves at linux.microsoft.com> Sent: Tuesday, June
> 10, 2025 4:52 PM
> >>
> >> From: Stanislav Kinsburskii <skinsburskii at linux.microsoft.com>
> >
> > The preferred patch Subject prefix is "x86/hyperv:"
> >
> 
> Thank you for clarifying - I thought I saw some precedent for x86: hyperv:
> but must have been mistaken.
> 
> >>
> >> This patch moves a part of currently internal logic into the
> >> hv_map_msi_interrupt function and makes it globally available helper
> >> function, which will be used to map PCI interrupts in case of root
> >> partition.
> >
> > Avoid "this patch" in commit messages.  Suggest:
> >
> > Create a helper function hv_map_msi_interrupt() that contains some
> > logic that is currently internal to irqdomain.c. Make the helper function
> > globally available so it can be used to map PCI interrupts when running
> > in the root partition.
> >
> 
> Thanks, I'll rephrase.
> 
> >>
> >> Signed-off-by: Stanislav Kinsburskii <skinsburskii at linux.microsoft.com>
> >> Signed-off-by: Nuno Das Neves <nunodasneves at linux.microsoft.com>
> >> ---
> >>  arch/x86/hyperv/irqdomain.c     | 47 ++++++++++++++++++++++++---------
> >>  arch/x86/include/asm/mshyperv.h |  2 ++
> >>  2 files changed, 36 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> >> index 31f0d29cbc5e..82f3bafb93d6 100644
> >> --- a/arch/x86/hyperv/irqdomain.c
> >> +++ b/arch/x86/hyperv/irqdomain.c
> >> @@ -169,13 +169,40 @@ static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
> >>  	return dev_id;
> >>  }
> >>
> >> -static int hv_map_msi_interrupt(struct pci_dev *dev, int cpu, int vector,
> >> -				struct hv_interrupt_entry *entry)
> >> +/**
> >> + * hv_map_msi_interrupt() - "Map" the MSI IRQ in the hypervisor.
> >> + * @data:      Describes the IRQ
> >> + * @out_entry: Hypervisor (MSI) interrupt entry (can be NULL)
> >> + *
> >> + * Map the IRQ in the hypervisor by issuing a MAP_DEVICE_INTERRUPT hypercall.
> >> + */
> >> +int hv_map_msi_interrupt(struct irq_data *data,
> >> +			 struct hv_interrupt_entry *out_entry)
> >>  {
> >> -	union hv_device_id device_id = hv_build_pci_dev_id(dev);
> >> +	struct msi_desc *msidesc;
> >> +	struct pci_dev *dev;
> >> +	union hv_device_id device_id;
> >> +	struct hv_interrupt_entry dummy;
> >> +	struct irq_cfg *cfg = irqd_cfg(data);
> >> +	const cpumask_t *affinity;
> >> +	int cpu;
> >> +	u64 res;
> >>
> >> -	return hv_map_interrupt(device_id, false, cpu, vector, entry);
> >> +	msidesc = irq_data_get_msi_desc(data);
> >> +	dev = msi_desc_to_pci_dev(msidesc);
> >> +	device_id = hv_build_pci_dev_id(dev);
> >> +	affinity = irq_data_get_effective_affinity_mask(data);
> >> +	cpu = cpumask_first_and(affinity, cpu_online_mask);
> >
> > Is the cpus_read_lock held at this point? I'm not sure what the
> > overall calling sequence looks like. If it is not held, the CPU that
> > is selected could go offline before hv_map_interrupt() is called.
> > This computation of the target CPU is the same as in the code
> > before this patch, but that existing code looks like it has the
> > same problem.
> >
> 
> Thanks for pointing it out - It *looks* like the read lock is not held
> everywhere this could be called, so it could indeed be a problem.
> 
> I've been thinking about different ways around this but I lack the
> knowledge to have an informed opinion about it:
> 
> - We could take the cpu read lock in this function, would that work?
> 
> - I'm not actually sure why the code is getting the first cpu off the effective
>   affinity mask in the first place. It is possible to get the apic id (and hence
>   the cpu) already associated with the irq, as per e.g. x86_vector_msi_compose_msg()
>   Maybe we could get the cpu that way, assuming that doesn't have a similar issue.
> 
> - We could just let this race happen, maybe the outcome isn't too catastrophic?
> 
> What do you think?

I would have to study further to provide good answers to your questions as
I don't have deep knowledge of this area off the top of my head. The code
looked suspicious because AND'ing the affinity with the cpu_online_mask in
the first place is presumably to prevent assigning the interrupt to a CPU
that is offline. That's a valid intent, since such assigning would indeed be
problematic.

But as written the code is inherently racy unless the cpus_read_lock() is
held. I'm on vacation all next week, and probably won't be able to look at
this again until early July. So the best I can do for now is flag the issue.

Michael

> 
> >> +
> >> +	res = hv_map_interrupt(device_id, false, cpu, cfg->vector,
> >> +			       out_entry ? out_entry : &dummy);
> >> +	if (!hv_result_success(res))
> >> +		pr_err("%s: failed to map interrupt: %s",
> >> +		       __func__, hv_result_to_string(res));
> >
> > hv_map_interrupt() already outputs a message if the hypercall
> > fails. Is another message needed here?
> >
> 
> It does print the function name, which gives additional context.
> Probably it can just be removed however.
> 
> >> +
> >> +	return hv_result_to_errno(res);
> >
> > The error handling is rather messed up. First hv_map_interrupt()
> > sometimes returns a Linux errno (not negated), and sometimes a
> > hypercall result. The errno is EINVAL, which has value "22", which is
> > the same as hypercall result HV_STATUS_ACKNOWLEDGED. And
> > the hypercall result returned from hv_map_interrupt() is just
> > the result code, not the full 64-bit status, as hv_map_interrupt()
> > has already done hv_result(status). Hence the "res" input arg to
> > hv_result_to_errno() isn't really the correct input. For example,
> > if the hypercall returns U64_MAX, that won't be caught by
> > hv_result_to_errno() since the value has been truncated to
> > 32 bits. Fixing all this will require some unscrambling.
> >
> 
> Good point, it's pretty messed up! I think in v2 I'll add a patch to
> clean this up first.
> 
> >>  }
> >> +EXPORT_SYMBOL_GPL(hv_map_msi_interrupt);
> >>
> >>  static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct
> msi_msg *msg)
> >>  {
> >> @@ -190,10 +217,8 @@ static void hv_irq_compose_msi_msg(struct irq_data
> *data, struct msi_msg *msg)
> >>  {
> >>  	struct msi_desc *msidesc;
> >>  	struct pci_dev *dev;
> >> -	struct hv_interrupt_entry out_entry, *stored_entry;
> >> +	struct hv_interrupt_entry *stored_entry;
> >>  	struct irq_cfg *cfg = irqd_cfg(data);
> >> -	const cpumask_t *affinity;
> >> -	int cpu;
> >>  	u64 status;
> >>
> >>  	msidesc = irq_data_get_msi_desc(data);
> >> @@ -204,9 +229,6 @@ static void hv_irq_compose_msi_msg(struct irq_data *data,
> struct msi_msg *msg)
> >>  		return;
> >>  	}
> >>
> >> -	affinity = irq_data_get_effective_affinity_mask(data);
> >> -	cpu = cpumask_first_and(affinity, cpu_online_mask);
> >> -
> >>  	if (data->chip_data) {
> >>  		/*
> >>  		 * This interrupt is already mapped. Let's unmap first.
> >> @@ -235,15 +257,14 @@ static void hv_irq_compose_msi_msg(struct irq_data
> *data, struct msi_msg *msg)
> >>  		return;
> >>  	}
> >>
> >> -	status = hv_map_msi_interrupt(dev, cpu, cfg->vector, &out_entry);
> >> +	status = hv_map_msi_interrupt(data, stored_entry);
> >>  	if (status != HV_STATUS_SUCCESS) {
> >
> > hv_map_msi_interrupt() returns an errno, so testing for HV_STATUS_SUCCESS
> > is bogus.
> >
> 
> Thanks, noted.
> 
> >>  		kfree(stored_entry);
> >>  		return;
> >>  	}
> >>
> >> -	*stored_entry = out_entry;
> >>  	data->chip_data = stored_entry;
> >> -	entry_to_msi_msg(&out_entry, msg);
> >> +	entry_to_msi_msg(data->chip_data, msg);
> >>
> >>  	return;
> >>  }
> >> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> >> index 5ec92e3e2e37..843121465ddd 100644
> >> --- a/arch/x86/include/asm/mshyperv.h
> >> +++ b/arch/x86/include/asm/mshyperv.h
> >> @@ -261,6 +261,8 @@ static inline void hv_apic_init(void) {}
> >>
> >>  struct irq_domain *hv_create_pci_msi_domain(void);
> >>
> >> +int hv_map_msi_interrupt(struct irq_data *data,
> >> +			 struct hv_interrupt_entry *out_entry);
> >>  int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> >>  		struct hv_interrupt_entry *entry);
> >>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> >> --
> >> 2.34.1




More information about the linux-arm-kernel mailing list