[RFC 4/7] arm64: irqflags: Use ICC sysregs to implement IRQ masking

Fri Nov 3 07:30:16 PDT 2017

On 03/11/17 11:57, Julien Thierry wrote:
> 
> 
> On 17/10/17 10:12, Christoffer Dall wrote:
>> On Tue, Oct 17, 2017 at 09:36:51AM +0100, Julien Thierry wrote:
>>>
>>>
>>> On 16/10/17 22:18, Christoffer Dall wrote:
>>>> On Wed, Oct 11, 2017 at 02:00:59PM +0100, Julien Thierry wrote:
>>>>> From: Daniel Thompson <daniel.thompson at linaro.org>
>>>>>
>>>>> Currently irqflags is implemented using the PSR's I bit. It is possible
>>>>> to implement irqflags by using the co-processor interface to the GIC.
>>>>> Using the co-processor interface makes it feasible to simulate NMIs
>>>>> using GIC interrupt prioritization.
>>>>>
>>>>> This patch changes the irqflags macros to modify, save and restore
>>>>> ICC_PMR_EL1. This has a substantial knock on effect for the rest of
>>>>> the kernel. There are four reasons for this:
>>>>>
>>>>> 1. The state of the PMR becomes part of the interrupt context and must be
>>>>>     saved and restored during exceptions. It is saved on the stack as part
>>>>>     of the saved context when an interrupt/exception is taken.
>>>>>
>>>>> 2. The hardware automatically masks the I bit (at boot, during traps, etc).
>>>>>     When the I bit is set by hardware we must add code to switch from I
>>>>>     bit masking and PMR masking:
>>>>>         - For IRQs, this is done after the interrupt has been acknowledged
>>>>>         	 avoiding the need to unmask.
>>>>>         - For other exceptions, this is done right after saving the context.
>>>>>
>>>>> 3. Some instructions, such as wfi, require that the PMR not be used
>>>>>     for interrupt masking. Before calling these instructions we must
>>>>>     switch from PMR masking to I bit masking.
>>>>>     This is also the case when KVM runs a guest, if the CPU receives
>>>>>     an interrupt from the host, interrupts must not be masked in PMR
>>>>>     otherwise the GIC will not signal it to the CPU.
>>>>>
>>>>> 4. We use the alternatives system to allow a single kernel to boot and
>>>>>     be switched to the alternative masking approach at runtime.
>>>>>
>>>>> Signed-off-by: Daniel Thompson <daniel.thompson at linaro.org>
>>>>> [julien.thierry at arm.com: changes reflected in commit,
>>>>> 			 message, fixes, renaming]
>>>>> Signed-off-by: Julien Thierry <julien.thierry at arm.com>
>>>>> Cc: Catalin Marinas <catalin.marinas at arm.com>
>>>>> Cc: Will Deacon <will.deacon at arm.com>
>>>>> Cc: Christoffer Dall <christoffer.dall at linaro.org>
>>>>> Cc: Marc Zyngier <marc.zyngier at arm.com>
>>>>> Cc: Thomas Gleixner <tglx at linutronix.de>
>>>>> Cc: Jason Cooper <jason at lakedaemon.net>
>>>> ---
>>>>
>>>> I just gave a quick look to the KVM part here but didn't get into
>>>> whether this rather invasive change is warranted or not (it's definitely
>>>> entertaining though).
>>>>
>>>
>>> I appreciate you looking into this, it wasn't very clear to me how to deal
>>> with that for KVM.
>>>
>>>> [...]
>>>>
>>>>
>>>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>>>> index e923b58..070e8a5 100644
>>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>>> @@ -38,6 +38,10 @@
>>>>>   #include <kvm/arm_arch_timer.h>
>>>>>   #include <kvm/arm_pmu.h>
>>>>>
>>>>> +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
>>>>> +#include <asm/arch_gicv3.h>
>>>>> +#endif
>>>>> +
>>>>>   #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>>>>>
>>>>>   #define KVM_VCPU_MAX_FEATURES 4
>>>>> @@ -332,7 +336,30 @@ int kvm_unmap_hva_range(struct kvm *kvm,
>>>>>   void kvm_arm_resume_guest(struct kvm *kvm);
>>>>>
>>>>>   u64 __kvm_call_hyp(void *hypfn, ...);
>>>>> +
>>>>> +#ifdef CONFIG_USE_ICC_SYSREGS_FOR_IRQFLAGS
>>>>> +/*
>>>>> + * If we use PMR to disable interrupts, interrupts happening while the
>>>>
>>>> I would describe this as 'masking' interrupts, as opposed to 'disabling'
>>>> interrupts, in general, and probably we can be more precise by
>>>> categorizing interrupts as 'normal' vs. 'priority (NMIs)' or something
>>>> like that.
>>>>
>>>
>>> Right, I'll do that.
>>>
>>>>> + * guest is executing will not be signaled to the host by the GIC  (and
>>>>
>>>> signaled to the CPU.  (Whether this then causes an exception to EL2
>>>> depends on the HCR_EL2.IMO flag and what KVM does with that exception).
>>>>
>>>
>>> True, I'll fix that.
>>>
>>>>> + * any call to a waiting kvm_kick_many_cpus will likely cause a
>>>>> + * deadlock).
>>>>
>>>> This kick thing is an unccessary specific example to bring out here, I
>>>> don't think it adds to the general understanding.
>>>>
>>>
>>> Yes, makes sense.
>>>
>>>>> + * We need to switch back to using PSR.I.
>>>>> + */
>>>>> +#define kvm_call_hyp(f, ...)						\
>>>>> +	({								\
>>>>> +		u64 res;						\
>>>>> +		unsigned long flags;					\
>>>>> +									\
>>>>> +		flags = arch_local_irq_save();				\
>>>>> +		gic_end_pmr_masking();					\
>>>>> +		res = __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__);	\
>>>>> +		gic_start_pmr_masking();				\
>>>>> +		arch_local_irq_restore(flags);				\
>>>>> +		res;							\
>>>>
>>>> This is in the critical path, so ideally this could instead be done
>>>> inside the function, because the trap will set PSTATE.I already, so it
>>>> should be possible to reduce this to a set of save/clear/restore
>>>> operations after __kvm_call_hyp.
>>>
>>> Hmmm, you are talking about the function called by __kvm_call_hyp, right?
>>>
>>> But this means we'll need to add the save/clear/restore to each function
>>> that can be called by __kvm_call_hyp, no?
>>
>> No, you only need to do this in the case where you run the guedt,
>> __kvm_vcpu_run, because all the other callers don't need to be able to
>> take interrupts as they are under control of the host.
>>
>>>
>>> Also in the VHE case we don't have the trap setting the PSTATE.I (the code
>>> calling kvm_call_hyp disables interrupts before entering the guest, but now
>>> the function disabling interrupts is just masking them with PMR).
>>>
>>
>> Yes, for VHE you'd have to do something more.  One option would be to
>> disable interrupts using PSTATE.I in kvm_arch_vcpu_ioctl_run() for VHE,
>> another option is to simply mask interrupts in the PSTATE only for VHE
>> in __kvm_vcpu_run.
>>
> 
> I agree this should work, but I wonder whether it is worth splitting the 
> different places we have to deal with that.
> 
> Do you think there is good performance improvement to be had here?
> 
> Marc, what's your opinion on saving/restoring the PMR state in the hyp 
> run code, relying on the I bit being always set before hand by hvc or 
> explicitly when in VHE?

One thing I'm not overly keen on is to multiply the locations where we
reconfigure the interrupt delivery, because it is overly fragile.

If I have followed what should happen, we have the following situation:

- pre-VHE: after HVC, unmask the PMR so that we can get interrupts when
the guest is running. Restore it to its normal value before returning to
the host

- VHE: Disable interrupts (mimicking the HVC), unmask PMR. Do the
opposite on return.

At the moment, you do a bit too much, but that's because we only have a
single entry point (__kvm_call_hyp). With Christoffer's VHE rework, we
get a different entry point per architecture, probably making it simpler
to hack something there.

You can have a look at those patches and see if that would make things
easier...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...