[PATCH 3/4] x86: hyperv: Expose hv_map_msi_interrupt function
Michael Kelley
mhklinux at outlook.com
Thu Jun 19 15:02:03 PDT 2025
From: Nuno Das Neves <nunodasneves at linux.microsoft.com> Sent: Wednesday, June 18, 2025 2:08 PM
>
> On 6/11/2025 4:07 PM, Michael Kelley wrote:
> > From: Nuno Das Neves <nunodasneves at linux.microsoft.com> Sent: Tuesday, June
> 10, 2025 4:52 PM
> >>
> >> From: Stanislav Kinsburskii <skinsburskii at linux.microsoft.com>
> >
> > The preferred patch Subject prefix is "x86/hyperv:"
> >
>
> Thank you for clarifying - I thought I saw some precedent for x86: hyperv:
> but must have been mistaken.
>
> >>
> >> This patch moves a part of currently internal logic into the
> >> hv_map_msi_interrupt function and makes it globally available helper
> >> function, which will be used to map PCI interrupts in case of root
> >> partition.
> >
> > Avoid "this patch" in commit messages. Suggest:
> >
> > Create a helper function hv_map_msi_interrupt() that contains some
> > logic that is currently internal to irqdomain.c. Make the helper function
> > globally available so it can be used to map PCI interrupts when running
> > in the root partition.
> >
>
> Thanks, I'll rephrase.
>
> >>
> >> Signed-off-by: Stanislav Kinsburskii <skinsburskii at linux.microsoft.com>
> >> Signed-off-by: Nuno Das Neves <nunodasneves at linux.microsoft.com>
> >> ---
> >> arch/x86/hyperv/irqdomain.c | 47 ++++++++++++++++++++++++---------
> >> arch/x86/include/asm/mshyperv.h | 2 ++
> >> 2 files changed, 36 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> >> index 31f0d29cbc5e..82f3bafb93d6 100644
> >> --- a/arch/x86/hyperv/irqdomain.c
> >> +++ b/arch/x86/hyperv/irqdomain.c
> >> @@ -169,13 +169,40 @@ static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
> >> return dev_id;
> >> }
> >>
> >> -static int hv_map_msi_interrupt(struct pci_dev *dev, int cpu, int vector,
> >> - struct hv_interrupt_entry *entry)
> >> +/**
> >> + * hv_map_msi_interrupt() - "Map" the MSI IRQ in the hypervisor.
> >> + * @data: Describes the IRQ
> >> + * @out_entry: Hypervisor (MSI) interrupt entry (can be NULL)
> >> + *
> >> + * Map the IRQ in the hypervisor by issuing a MAP_DEVICE_INTERRUPT hypercall.
> >> + */
> >> +int hv_map_msi_interrupt(struct irq_data *data,
> >> + struct hv_interrupt_entry *out_entry)
> >> {
> >> - union hv_device_id device_id = hv_build_pci_dev_id(dev);
> >> + struct msi_desc *msidesc;
> >> + struct pci_dev *dev;
> >> + union hv_device_id device_id;
> >> + struct hv_interrupt_entry dummy;
> >> + struct irq_cfg *cfg = irqd_cfg(data);
> >> + const cpumask_t *affinity;
> >> + int cpu;
> >> + u64 res;
> >>
> >> - return hv_map_interrupt(device_id, false, cpu, vector, entry);
> >> + msidesc = irq_data_get_msi_desc(data);
> >> + dev = msi_desc_to_pci_dev(msidesc);
> >> + device_id = hv_build_pci_dev_id(dev);
> >> + affinity = irq_data_get_effective_affinity_mask(data);
> >> + cpu = cpumask_first_and(affinity, cpu_online_mask);
> >
> > Is the cpus_read_lock held at this point? I'm not sure what the
> > overall calling sequence looks like. If it is not held, the CPU that
> > is selected could go offline before hv_map_interrupt() is called.
> > This computation of the target CPU is the same as in the code
> > before this patch, but that existing code looks like it has the
> > same problem.
> >
>
> Thanks for pointing it out - It *looks* like the read lock is not held
> everywhere this could be called, so it could indeed be a problem.
>
> I've been thinking about different ways around this but I lack the
> knowledge to have an informed opinion about it:
>
> - We could take the cpu read lock in this function, would that work?
>
> - I'm not actually sure why the code is getting the first cpu off the effective
> affinity mask in the first place. It is possible to get the apic id (and hence
> the cpu) already associated with the irq, as per e.g. x86_vector_msi_compose_msg()
> Maybe we could get the cpu that way, assuming that doesn't have a similar issue.
>
> - We could just let this race happen, maybe the outcome isn't too catastrophic?
>
> What do you think?
I would have to study further to provide good answers to your questions as
I don't have deep knowledge of this area off the top of my head. The code
looked suspicious because AND'ing the affinity with the cpu_online_mask in
the first place is presumably to prevent assigning the interrupt to a CPU
that is offline. That's a valid intent, since such assigning would indeed be
problematic.
But as written the code is inherently racy unless the cpus_read_lock() is
held. I'm on vacation all next week, and probably won't be able to look at
this again until early July. So the best I can do for now is flag the issue.
Michael
>
> >> +
> >> + res = hv_map_interrupt(device_id, false, cpu, cfg->vector,
> >> + out_entry ? out_entry : &dummy);
> >> + if (!hv_result_success(res))
> >> + pr_err("%s: failed to map interrupt: %s",
> >> + __func__, hv_result_to_string(res));
> >
> > hv_map_interrupt() already outputs a message if the hypercall
> > fails. Is another message needed here?
> >
>
> It does print the function name, which gives additional context.
> Probably it can just be removed however.
>
> >> +
> >> + return hv_result_to_errno(res);
> >
> > The error handling is rather messed up. First hv_map_interrupt()
> > sometimes returns a Linux errno (not negated), and sometimes a
> > hypercall result. The errno is EINVAL, which has value "22", which is
> > the same as hypercall result HV_STATUS_ACKNOWLEDGED. And
> > the hypercall result returned from hv_map_interrupt() is just
> > the result code, not the full 64-bit status, as hv_map_interrupt()
> > has already done hv_result(status). Hence the "res" input arg to
> > hv_result_to_errno() isn't really the correct input. For example,
> > if the hypercall returns U64_MAX, that won't be caught by
> > hv_result_to_errno() since the value has been truncated to
> > 32 bits. Fixing all this will require some unscrambling.
> >
>
> Good point, it's pretty messed up! I think in v2 I'll add a patch to
> clean this up first.
>
> >> }
> >> +EXPORT_SYMBOL_GPL(hv_map_msi_interrupt);
> >>
> >> static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct
> msi_msg *msg)
> >> {
> >> @@ -190,10 +217,8 @@ static void hv_irq_compose_msi_msg(struct irq_data
> *data, struct msi_msg *msg)
> >> {
> >> struct msi_desc *msidesc;
> >> struct pci_dev *dev;
> >> - struct hv_interrupt_entry out_entry, *stored_entry;
> >> + struct hv_interrupt_entry *stored_entry;
> >> struct irq_cfg *cfg = irqd_cfg(data);
> >> - const cpumask_t *affinity;
> >> - int cpu;
> >> u64 status;
> >>
> >> msidesc = irq_data_get_msi_desc(data);
> >> @@ -204,9 +229,6 @@ static void hv_irq_compose_msi_msg(struct irq_data *data,
> struct msi_msg *msg)
> >> return;
> >> }
> >>
> >> - affinity = irq_data_get_effective_affinity_mask(data);
> >> - cpu = cpumask_first_and(affinity, cpu_online_mask);
> >> -
> >> if (data->chip_data) {
> >> /*
> >> * This interrupt is already mapped. Let's unmap first.
> >> @@ -235,15 +257,14 @@ static void hv_irq_compose_msi_msg(struct irq_data
> *data, struct msi_msg *msg)
> >> return;
> >> }
> >>
> >> - status = hv_map_msi_interrupt(dev, cpu, cfg->vector, &out_entry);
> >> + status = hv_map_msi_interrupt(data, stored_entry);
> >> if (status != HV_STATUS_SUCCESS) {
> >
> > hv_map_msi_interrupt() returns an errno, so testing for HV_STATUS_SUCCESS
> > is bogus.
> >
>
> Thanks, noted.
>
> >> kfree(stored_entry);
> >> return;
> >> }
> >>
> >> - *stored_entry = out_entry;
> >> data->chip_data = stored_entry;
> >> - entry_to_msi_msg(&out_entry, msg);
> >> + entry_to_msi_msg(data->chip_data, msg);
> >>
> >> return;
> >> }
> >> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> >> index 5ec92e3e2e37..843121465ddd 100644
> >> --- a/arch/x86/include/asm/mshyperv.h
> >> +++ b/arch/x86/include/asm/mshyperv.h
> >> @@ -261,6 +261,8 @@ static inline void hv_apic_init(void) {}
> >>
> >> struct irq_domain *hv_create_pci_msi_domain(void);
> >>
> >> +int hv_map_msi_interrupt(struct irq_data *data,
> >> + struct hv_interrupt_entry *out_entry);
> >> int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> >> struct hv_interrupt_entry *entry);
> >> int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
> >> --
> >> 2.34.1
More information about the linux-arm-kernel
mailing list