[PATCH 3/4] x86: hyperv: Expose hv_map_msi_interrupt function

Wed Jun 18 14:08:26 PDT 2025

On 6/11/2025 4:07 PM, Michael Kelley wrote:
> From: Nuno Das Neves <nunodasneves at linux.microsoft.com> Sent: Tuesday, June 10, 2025 4:52 PM
>>
>> From: Stanislav Kinsburskii <skinsburskii at linux.microsoft.com>
> 
> The preferred patch Subject prefix is "x86/hyperv:"
> 

Thank you for clarifying - I thought I saw some precedent for x86: hyperv:
but must have been mistaken.

>>
>> This patch moves a part of currently internal logic into the
>> hv_map_msi_interrupt function and makes it globally available helper
>> function, which will be used to map PCI interrupts in case of root
>> partition.
> 
> Avoid "this patch" in commit messages.  Suggest:
> 
> Create a helper function hv_map_msi_interrupt() that contains some
> logic that is currently internal to irqdomain.c. Make the helper function
> globally available so it can be used to map PCI interrupts when running
> in the root partition.
> 

Thanks, I'll rephrase.

>>
>> Signed-off-by: Stanislav Kinsburskii <skinsburskii at linux.microsoft.com>
>> Signed-off-by: Nuno Das Neves <nunodasneves at linux.microsoft.com>
>> ---
>>  arch/x86/hyperv/irqdomain.c     | 47 ++++++++++++++++++++++++---------
>>  arch/x86/include/asm/mshyperv.h |  2 ++
>>  2 files changed, 36 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
>> index 31f0d29cbc5e..82f3bafb93d6 100644
>> --- a/arch/x86/hyperv/irqdomain.c
>> +++ b/arch/x86/hyperv/irqdomain.c
>> @@ -169,13 +169,40 @@ static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
>>  	return dev_id;
>>  }
>>
>> -static int hv_map_msi_interrupt(struct pci_dev *dev, int cpu, int vector,
>> -				struct hv_interrupt_entry *entry)
>> +/**
>> + * hv_map_msi_interrupt() - "Map" the MSI IRQ in the hypervisor.
>> + * @data:      Describes the IRQ
>> + * @out_entry: Hypervisor (MSI) interrupt entry (can be NULL)
>> + *
>> + * Map the IRQ in the hypervisor by issuing a MAP_DEVICE_INTERRUPT hypercall.
>> + */
>> +int hv_map_msi_interrupt(struct irq_data *data,
>> +			 struct hv_interrupt_entry *out_entry)
>>  {
>> -	union hv_device_id device_id = hv_build_pci_dev_id(dev);
>> +	struct msi_desc *msidesc;
>> +	struct pci_dev *dev;
>> +	union hv_device_id device_id;
>> +	struct hv_interrupt_entry dummy;
>> +	struct irq_cfg *cfg = irqd_cfg(data);
>> +	const cpumask_t *affinity;
>> +	int cpu;
>> +	u64 res;
>>
>> -	return hv_map_interrupt(device_id, false, cpu, vector, entry);
>> +	msidesc = irq_data_get_msi_desc(data);
>> +	dev = msi_desc_to_pci_dev(msidesc);
>> +	device_id = hv_build_pci_dev_id(dev);
>> +	affinity = irq_data_get_effective_affinity_mask(data);
>> +	cpu = cpumask_first_and(affinity, cpu_online_mask);
> 
> Is the cpus_read_lock held at this point? I'm not sure what the
> overall calling sequence looks like. If it is not held, the CPU that
> is selected could go offline before hv_map_interrupt() is called.
> This computation of the target CPU is the same as in the code
> before this patch, but that existing code looks like it has the
> same problem.
> 

Thanks for pointing it out - It *looks* like the read lock is not held
everywhere this could be called, so it could indeed be a problem.

I've been thinking about different ways around this but I lack the
knowledge to have an informed opinion about it:

- We could take the cpu read lock in this function, would that work?

- I'm not actually sure why the code is getting the first cpu off the effective
  affinity mask in the first place. It is possible to get the apic id (and hence
  the cpu) already associated with the irq, as per e.g. x86_vector_msi_compose_msg()
  Maybe we could get the cpu that way, assuming that doesn't have a similar issue.

- We could just let this race happen, maybe the outcome isn't too catastrophic?

What do you think?

>> +
>> +	res = hv_map_interrupt(device_id, false, cpu, cfg->vector,
>> +			       out_entry ? out_entry : &dummy);
>> +	if (!hv_result_success(res))
>> +		pr_err("%s: failed to map interrupt: %s",
>> +		       __func__, hv_result_to_string(res));
> 
> hv_map_interrupt() already outputs a message if the hypercall
> fails. Is another message needed here?
> 

It does print the function name, which gives additional context.
Probably it can just be removed however.

>> +
>> +	return hv_result_to_errno(res);
> 
> The error handling is rather messed up. First hv_map_interrupt()
> sometimes returns a Linux errno (not negated), and sometimes a
> hypercall result. The errno is EINVAL, which has value "22", which is
> the same as hypercall result HV_STATUS_ACKNOWLEDGED. And
> the hypercall result returned from hv_map_interrupt() is just
> the result code, not the full 64-bit status, as hv_map_interrupt()
> has already done hv_result(status). Hence the "res" input arg to
> hv_result_to_errno() isn't really the correct input. For example,
> if the hypercall returns U64_MAX, that won't be caught by
> hv_result_to_errno() since the value has been truncated to
> 32 bits. Fixing all this will require some unscrambling.
> 

Good point, it's pretty messed up! I think in v2 I'll add a patch to
clean this up first.

>>  }
>> +EXPORT_SYMBOL_GPL(hv_map_msi_interrupt);
>>
>>  static inline void entry_to_msi_msg(struct hv_interrupt_entry *entry, struct msi_msg *msg)
>>  {
>> @@ -190,10 +217,8 @@ static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>>  {
>>  	struct msi_desc *msidesc;
>>  	struct pci_dev *dev;
>> -	struct hv_interrupt_entry out_entry, *stored_entry;
>> +	struct hv_interrupt_entry *stored_entry;
>>  	struct irq_cfg *cfg = irqd_cfg(data);
>> -	const cpumask_t *affinity;
>> -	int cpu;
>>  	u64 status;
>>
>>  	msidesc = irq_data_get_msi_desc(data);
>> @@ -204,9 +229,6 @@ static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>>  		return;
>>  	}
>>
>> -	affinity = irq_data_get_effective_affinity_mask(data);
>> -	cpu = cpumask_first_and(affinity, cpu_online_mask);
>> -
>>  	if (data->chip_data) {
>>  		/*
>>  		 * This interrupt is already mapped. Let's unmap first.
>> @@ -235,15 +257,14 @@ static void hv_irq_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>>  		return;
>>  	}
>>
>> -	status = hv_map_msi_interrupt(dev, cpu, cfg->vector, &out_entry);
>> +	status = hv_map_msi_interrupt(data, stored_entry);
>>  	if (status != HV_STATUS_SUCCESS) {
> 
> hv_map_msi_interrupt() returns an errno, so testing for HV_STATUS_SUCCESS
> is bogus.
> 

Thanks, noted.

>>  		kfree(stored_entry);
>>  		return;
>>  	}
>>
>> -	*stored_entry = out_entry;
>>  	data->chip_data = stored_entry;
>> -	entry_to_msi_msg(&out_entry, msg);
>> +	entry_to_msi_msg(data->chip_data, msg);
>>
>>  	return;
>>  }
>> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
>> index 5ec92e3e2e37..843121465ddd 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -261,6 +261,8 @@ static inline void hv_apic_init(void) {}
>>
>>  struct irq_domain *hv_create_pci_msi_domain(void);
>>
>> +int hv_map_msi_interrupt(struct irq_data *data,
>> +			 struct hv_interrupt_entry *out_entry);
>>  int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
>>  		struct hv_interrupt_entry *entry);
>>  int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
>> --
>> 2.34.1