[PATCH v5 07/14] KVM: ARM: World-switch implementation

Tue Jan 15 23:08:24 EST 2013

On Tue, Jan 15, 2013 at 9:08 PM, Christoffer Dall
<c.dall at virtualopensystems.com> wrote:
> On Tue, Jan 15, 2013 at 4:43 AM, Gleb Natapov <gleb at redhat.com> wrote:
>> On Tue, Jan 08, 2013 at 01:39:24PM -0500, Christoffer Dall wrote:
>>> Provides complete world-switch implementation to switch to other guests
>>> running in non-secure modes. Includes Hyp exception handlers that
>>> capture necessary exception information and stores the information on
>>> the VCPU and KVM structures.
>>>
>>> The following Hyp-ABI is also documented in the code:
>>>
>>> Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
>>>    Switching to Hyp mode is done through a simple HVC #0 instruction. The
>>>    exception vector code will check that the HVC comes from VMID==0 and if
>>>    so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
>>>    - r0 contains a pointer to a HYP function
>>>    - r1, r2, and r3 contain arguments to the above function.
>>>    - The HYP function will be called with its arguments in r0, r1 and r2.
>>>    On HYP function return, we return directly to SVC.
>>>
>>> A call to a function executing in Hyp mode is performed like the following:
>>>
>>>         <svc code>
>>>         ldr     r0, =BSYM(my_hyp_fn)
>>>         ldr     r1, =my_param
>>>         hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
>>>         <svc code>
>>>
>>> Otherwise, the world-switch is pretty straight-forward. All state that
>>> can be modified by the guest is first backed up on the Hyp stack and the
>>> VCPU values is loaded onto the hardware. State, which is not loaded, but
>>> theoretically modifiable by the guest is protected through the
>>> virtualiation features to generate a trap and cause software emulation.
>>> Upon guest returns, all state is restored from hardware onto the VCPU
>>> struct and the original state is restored from the Hyp-stack onto the
>>> hardware.
>>>
>>> SMP support using the VMPIDR calculated on the basis of the host MPIDR
>>> and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.
>>>
>>> Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
>>> a separate patch into the appropriate patches introducing the
>>> functionality. Note that the VMIDs are stored per VM as required by the ARM
>>> architecture reference manual.
>>>
>>> To support VFP/NEON we trap those instructions using the HPCTR. When
>>> we trap, we switch the FPU.  After a guest exit, the VFP state is
>>> returned to the host.  When disabling access to floating point
>>> instructions, we also mask FPEXC_EN in order to avoid the guest
>>> receiving Undefined instruction exceptions before we have a chance to
>>> switch back the floating point state.  We are reusing vfp_hard_struct,
>>> so we depend on VFPv3 being enabled in the host kernel, if not, we still
>>> trap cp10 and cp11 in order to inject an undefined instruction exception
>>> whenever the guest tries to use VFP/NEON. VFP/NEON developed by
>>> Antionios Motakis and Rusty Russell.
>>>
>>> Aborts that are permission faults, and not stage-1 page table walk, do
>>> not report the faulting address in the HPFAR.  We have to resolve the
>>> IPA, and store it just like the HPFAR register on the VCPU struct. If
>>> the IPA cannot be resolved, it means another CPU is playing with the
>>> page tables, and we simply restart the guest.  This quirk was fixed by
>>> Marc Zyngier.
>>>
>>> Reviewed-by: Marcelo Tosatti <mtosatti at redhat.com>
>>> Signed-off-by: Rusty Russell <rusty.russell at linaro.org>
>>> Signed-off-by: Antonios Motakis <a.motakis at virtualopensystems.com>
>>> Signed-off-by: Marc Zyngier <marc.zyngier at arm.com>
>>> Signed-off-by: Christoffer Dall <c.dall at virtualopensystems.com>
>>> ---
>>>  arch/arm/include/asm/kvm_arm.h  |   51 ++++
>>>  arch/arm/include/asm/kvm_host.h |   10 +
>>>  arch/arm/kernel/asm-offsets.c   |   25 ++
>>>  arch/arm/kvm/arm.c              |  187 ++++++++++++++++
>>>  arch/arm/kvm/interrupts.S       |  396 +++++++++++++++++++++++++++++++++++
>>>  arch/arm/kvm/interrupts_head.S  |  443 +++++++++++++++++++++++++++++++++++++++
>>>  6 files changed, 1108 insertions(+), 4 deletions(-)
>>>  create mode 100644 arch/arm/kvm/interrupts_head.S
>>>
>>> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
>>> index fb22ee8..a3262a2 100644
>>> --- a/arch/arm/include/asm/kvm_arm.h
>>> +++ b/arch/arm/include/asm/kvm_arm.h
>>> @@ -98,6 +98,18 @@
>>>  #define TTBCR_T0SZ   3
>>>  #define HTCR_MASK    (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
>>>
>>> +/* Hyp System Trap Register */
>>> +#define HSTR_T(x)    (1 << x)
>>> +#define HSTR_TTEE    (1 << 16)
>>> +#define HSTR_TJDBX   (1 << 17)
>>> +
>>> +/* Hyp Coprocessor Trap Register */
>>> +#define HCPTR_TCP(x) (1 << x)
>>> +#define HCPTR_TCP_MASK       (0x3fff)
>>> +#define HCPTR_TASE   (1 << 15)
>>> +#define HCPTR_TTA    (1 << 20)
>>> +#define HCPTR_TCPAC  (1 << 31)
>>> +
>>>  /* Hyp Debug Configuration Register bits */
>>>  #define HDCR_TDRA    (1 << 11)
>>>  #define HDCR_TDOSA   (1 << 10)
>>> @@ -144,6 +156,45 @@
>>>  #else
>>>  #define VTTBR_X              (5 - KVM_T0SZ)
>>>  #endif
>>> +#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
>>> +#define VTTBR_BADDR_MASK  (((1LLU << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
>>> +#define VTTBR_VMID_SHIFT  (48LLU)
>>> +#define VTTBR_VMID_MASK        (0xffLLU << VTTBR_VMID_SHIFT)
>>> +
>>> +/* Hyp Syndrome Register (HSR) bits */
>>> +#define HSR_EC_SHIFT (26)
>>> +#define HSR_EC               (0x3fU << HSR_EC_SHIFT)
>>> +#define HSR_IL               (1U << 25)
>>> +#define HSR_ISS              (HSR_IL - 1)
>>> +#define HSR_ISV_SHIFT        (24)
>>> +#define HSR_ISV              (1U << HSR_ISV_SHIFT)
>>> +#define HSR_FSC              (0x3f)
>>> +#define HSR_FSC_TYPE (0x3c)
>>> +#define HSR_WNR              (1 << 6)
>>> +
>>> +#define FSC_FAULT    (0x04)
>>> +#define FSC_PERM     (0x0c)
>>> +
>>> +/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
>>> +#define HPFAR_MASK   (~0xf)
>>>
>>> +#define HSR_EC_UNKNOWN       (0x00)
>>> +#define HSR_EC_WFI   (0x01)
>>> +#define HSR_EC_CP15_32       (0x03)
>>> +#define HSR_EC_CP15_64       (0x04)
>>> +#define HSR_EC_CP14_MR       (0x05)
>>> +#define HSR_EC_CP14_LS       (0x06)
>>> +#define HSR_EC_CP_0_13       (0x07)
>>> +#define HSR_EC_CP10_ID       (0x08)
>>> +#define HSR_EC_JAZELLE       (0x09)
>>> +#define HSR_EC_BXJ   (0x0A)
>>> +#define HSR_EC_CP14_64       (0x0C)
>>> +#define HSR_EC_SVC_HYP       (0x11)
>>> +#define HSR_EC_HVC   (0x12)
>>> +#define HSR_EC_SMC   (0x13)
>>> +#define HSR_EC_IABT  (0x20)
>>> +#define HSR_EC_IABT_HYP      (0x21)
>>> +#define HSR_EC_DABT  (0x24)
>>> +#define HSR_EC_DABT_HYP      (0x25)
>>>
>>>  #endif /* __ARM_KVM_ARM_H__ */
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index 1de6f0d..ddb09da 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -21,6 +21,7 @@
>>>
>>>  #include <asm/kvm.h>
>>>  #include <asm/kvm_asm.h>
>>> +#include <asm/fpstate.h>
>>>
>>>  #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
>>>  #define KVM_USER_MEM_SLOTS 32
>>> @@ -85,6 +86,14 @@ struct kvm_vcpu_arch {
>>>       u32 hxfar;              /* Hyp Data/Inst Fault Address Register */
>>>       u32 hpfar;              /* Hyp IPA Fault Address Register */
>>>
>>> +     /* Floating point registers (VFP and Advanced SIMD/NEON) */
>>> +     struct vfp_hard_struct vfp_guest;
>>> +     struct vfp_hard_struct *vfp_host;
>>> +
>>> +     /*
>>> +      * Anything that is not used directly from assembly code goes
>>> +      * here.
>>> +      */
>>>       /* Interrupt related fields */
>>>       u32 irq_lines;          /* IRQ and FIQ levels */
>>>
>>> @@ -112,6 +121,7 @@ struct kvm_one_reg;
>>>  int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>>  int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
>>>  u64 kvm_call_hyp(void *hypfn, ...);
>>> +void force_vm_exit(const cpumask_t *mask);
>>>
>>>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>>>  struct kvm;
>>> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
>>> index c985b48..c8b3272 100644
>>> --- a/arch/arm/kernel/asm-offsets.c
>>> +++ b/arch/arm/kernel/asm-offsets.c
>>> @@ -13,6 +13,9 @@
>>>  #include <linux/sched.h>
>>>  #include <linux/mm.h>
>>>  #include <linux/dma-mapping.h>
>>> +#ifdef CONFIG_KVM_ARM_HOST
>>> +#include <linux/kvm_host.h>
>>> +#endif
>>>  #include <asm/cacheflush.h>
>>>  #include <asm/glue-df.h>
>>>  #include <asm/glue-pf.h>
>>> @@ -146,5 +149,27 @@ int main(void)
>>>    DEFINE(DMA_BIDIRECTIONAL,  DMA_BIDIRECTIONAL);
>>>    DEFINE(DMA_TO_DEVICE,              DMA_TO_DEVICE);
>>>    DEFINE(DMA_FROM_DEVICE,    DMA_FROM_DEVICE);
>>> +#ifdef CONFIG_KVM_ARM_HOST
>>> +  DEFINE(VCPU_KVM,           offsetof(struct kvm_vcpu, kvm));
>>> +  DEFINE(VCPU_MIDR,          offsetof(struct kvm_vcpu, arch.midr));
>>> +  DEFINE(VCPU_CP15,          offsetof(struct kvm_vcpu, arch.cp15));
>>> +  DEFINE(VCPU_VFP_GUEST,     offsetof(struct kvm_vcpu, arch.vfp_guest));
>>> +  DEFINE(VCPU_VFP_HOST,              offsetof(struct kvm_vcpu, arch.vfp_host));
>>> +  DEFINE(VCPU_REGS,          offsetof(struct kvm_vcpu, arch.regs));
>>> +  DEFINE(VCPU_USR_REGS,              offsetof(struct kvm_vcpu, arch.regs.usr_regs));
>>> +  DEFINE(VCPU_SVC_REGS,              offsetof(struct kvm_vcpu, arch.regs.svc_regs));
>>> +  DEFINE(VCPU_ABT_REGS,              offsetof(struct kvm_vcpu, arch.regs.abt_regs));
>>> +  DEFINE(VCPU_UND_REGS,              offsetof(struct kvm_vcpu, arch.regs.und_regs));
>>> +  DEFINE(VCPU_IRQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.irq_regs));
>>> +  DEFINE(VCPU_FIQ_REGS,              offsetof(struct kvm_vcpu, arch.regs.fiq_regs));
>>> +  DEFINE(VCPU_PC,            offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc));
>>> +  DEFINE(VCPU_CPSR,          offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr));
>>> +  DEFINE(VCPU_IRQ_LINES,     offsetof(struct kvm_vcpu, arch.irq_lines));
>>> +  DEFINE(VCPU_HSR,           offsetof(struct kvm_vcpu, arch.hsr));
>>> +  DEFINE(VCPU_HxFAR,         offsetof(struct kvm_vcpu, arch.hxfar));
>>> +  DEFINE(VCPU_HPFAR,         offsetof(struct kvm_vcpu, arch.hpfar));
>>> +  DEFINE(VCPU_HYP_PC,                offsetof(struct kvm_vcpu, arch.hyp_pc));
>>> +  DEFINE(KVM_VTTBR,          offsetof(struct kvm, arch.vttbr));
>>> +#endif
>>>    return 0;
>>>  }
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 9b4566e..c94d278 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -40,6 +40,7 @@
>>>  #include <asm/kvm_arm.h>
>>>  #include <asm/kvm_asm.h>
>>>  #include <asm/kvm_mmu.h>
>>> +#include <asm/kvm_emulate.h>
>>>
>>>  #ifdef REQUIRES_VIRT
>>>  __asm__(".arch_extension     virt");
>>> @@ -49,6 +50,10 @@ static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
>>>  static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
>>>  static unsigned long hyp_default_vectors;
>>>
>>> +/* The VMID used in the VTTBR */
>>> +static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
>>> +static u8 kvm_next_vmid;
>>> +static DEFINE_SPINLOCK(kvm_vmid_lock);
>>>
>>>  int kvm_arch_hardware_enable(void *garbage)
>>>  {
>>> @@ -276,6 +281,8 @@ int __attribute_const__ kvm_target_cpu(void)
>>>
>>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>>  {
>>> +     /* Force users to call KVM_ARM_VCPU_INIT */
>>> +     vcpu->arch.target = -1;
>>>       return 0;
>>>  }
>>>
>>> @@ -286,6 +293,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>>>  void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>  {
>>>       vcpu->cpu = cpu;
>>> +     vcpu->arch.vfp_host = this_cpu_ptr(kvm_host_vfp_state);
>>>  }
>>>
>>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>> @@ -318,12 +326,189 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>>>
>>>  int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
>> As far as I see the function is unused.
>>
>>>  {
>>> +     return v->mode == IN_GUEST_MODE;
>>> +}
>>> +
>>> +/* Just ensure a guest exit from a particular CPU */
>>> +static void exit_vm_noop(void *info)
>>> +{
>>> +}
>>> +
>>> +void force_vm_exit(const cpumask_t *mask)
>>> +{
>>> +     smp_call_function_many(mask, exit_vm_noop, NULL, true);
>>> +}
>> There is make_all_cpus_request() for that. It actually sends IPIs only
>> to cpus that are running vcpus.
>>
>>> +
>>> +/**
>>> + * need_new_vmid_gen - check that the VMID is still valid
>>> + * @kvm: The VM's VMID to checkt
>>> + *
>>> + * return true if there is a new generation of VMIDs being used
>>> + *
>>> + * The hardware supports only 256 values with the value zero reserved for the
>>> + * host, so we check if an assigned value belongs to a previous generation,
>>> + * which which requires us to assign a new value. If we're the first to use a
>>> + * VMID for the new generation, we must flush necessary caches and TLBs on all
>>> + * CPUs.
>>> + */
>>> +static bool need_new_vmid_gen(struct kvm *kvm)
>>> +{
>>> +     return unlikely(kvm->arch.vmid_gen != atomic64_read(&kvm_vmid_gen));
>>> +}
>>> +
>>> +/**
>>> + * update_vttbr - Update the VTTBR with a valid VMID before the guest runs
>>> + * @kvm      The guest that we are about to run
>>> + *
>>> + * Called from kvm_arch_vcpu_ioctl_run before entering the guest to ensure the
>>> + * VM has a valid VMID, otherwise assigns a new one and flushes corresponding
>>> + * caches and TLBs.
>>> + */
>>> +static void update_vttbr(struct kvm *kvm)
>>> +{
>>> +     phys_addr_t pgd_phys;
>>> +     u64 vmid;
>>> +
>>> +     if (!need_new_vmid_gen(kvm))
>>> +             return;
>>> +
>>> +     spin_lock(&kvm_vmid_lock);
>>> +
>>> +     /*
>>> +      * We need to re-check the vmid_gen here to ensure that if another vcpu
>>> +      * already allocated a valid vmid for this vm, then this vcpu should
>>> +      * use the same vmid.
>>> +      */
>>> +     if (!need_new_vmid_gen(kvm)) {
>>> +             spin_unlock(&kvm_vmid_lock);
>>> +             return;
>>> +     }
>>> +
>>> +     /* First user of a new VMID generation? */
>>> +     if (unlikely(kvm_next_vmid == 0)) {
>>> +             atomic64_inc(&kvm_vmid_gen);
>>> +             kvm_next_vmid = 1;
>>> +
>>> +             /*
>>> +              * On SMP we know no other CPUs can use this CPU's or each
>>> +              * other's VMID after force_vm_exit returns since the
>>> +              * kvm_vmid_lock blocks them from reentry to the guest.
>>> +              */
>>> +             force_vm_exit(cpu_all_mask);
>>> +             /*
>>> +              * Now broadcast TLB + ICACHE invalidation over the inner
>>> +              * shareable domain to make sure all data structures are
>>> +              * clean.
>>> +              */
>>> +             kvm_call_hyp(__kvm_flush_vm_context);
>>> +     }
>>> +
>>> +     kvm->arch.vmid_gen = atomic64_read(&kvm_vmid_gen);
>>> +     kvm->arch.vmid = kvm_next_vmid;
>>> +     kvm_next_vmid++;
>>> +
>>> +     /* update vttbr to be used with the new vmid */
>>> +     pgd_phys = virt_to_phys(kvm->arch.pgd);
>>> +     vmid = ((u64)(kvm->arch.vmid) << VTTBR_VMID_SHIFT) & VTTBR_VMID_MASK;
>>> +     kvm->arch.vttbr = pgd_phys & VTTBR_BADDR_MASK;
>>> +     kvm->arch.vttbr |= vmid;
>>> +
>>> +     spin_unlock(&kvm_vmid_lock);
>>> +}
>>> +
>>> +/*
>>> + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>>> + * proper exit to QEMU.
>>> + */
>>> +static int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>>> +                    int exception_index)
>>> +{
>>> +     run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
>>>       return 0;
>>>  }
>>>
>>> +/**
>>> + * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code
>>> + * @vcpu:    The VCPU pointer
>>> + * @run:     The kvm_run structure pointer used for userspace state exchange
>>> + *
>>> + * This function is called through the VCPU_RUN ioctl called from user space. It
>>> + * will execute VM code in a loop until the time slice for the process is used
>>> + * or some emulation is needed from user space in which case the function will
>>> + * return with return value 0 and with the kvm_run structure filled in with the
>>> + * required data for the requested emulation.
>>> + */
>>>  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  {
>>> -     return -EINVAL;
>>> +     int ret;
>>> +     sigset_t sigsaved;
>>> +
>>> +     /* Make sure they initialize the vcpu with KVM_ARM_VCPU_INIT */
>>> +     if (unlikely(vcpu->arch.target < 0))
>>> +             return -ENOEXEC;
>>> +
>>> +     if (vcpu->sigset_active)
>>> +             sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
>>> +
>>> +     ret = 1;
>>> +     run->exit_reason = KVM_EXIT_UNKNOWN;
>>> +     while (ret > 0) {
>>> +             /*
>>> +              * Check conditions before entering the guest
>>> +              */
>>> +             cond_resched();
>>> +
>>> +             update_vttbr(vcpu->kvm);
>>> +
>>> +             local_irq_disable();
>>> +
>>> +             /*
>>> +              * Re-check atomic conditions
>>> +              */
>>> +             if (signal_pending(current)) {
>>> +                     ret = -EINTR;
>>> +                     run->exit_reason = KVM_EXIT_INTR;
>>> +             }
>>> +
>>> +             if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>> +                     local_irq_enable();
>>> +                     continue;
>>> +             }
>>> +
>>> +             /**************************************************************
>>> +              * Enter the guest
>>> +              */
>>> +             trace_kvm_entry(*vcpu_pc(vcpu));
>>> +             kvm_guest_enter();
>>> +             vcpu->mode = IN_GUEST_MODE;
>> You need to set mode to IN_GUEST_MODE before disabling interrupt and
>> check that mode != EXITING_GUEST_MODE after disabling interrupt but
>> before entering the guest. This way you will catch kicks that were sent
>> between setting of the mode and disabling the interrupts. Also you need
>> to check vcpu->requests and exit if it is not empty. I see that you do
>> not use vcpu->requests at all, but you should since common kvm code
>> assumes that it is used. make_all_cpus_request() uses it for instance.
>>
>
> I don't quite agree, but almost:
>
> Why would you set IN_GUEST_MODE before disabling interrupts? The only
> reason I can see for to be a requirement is to leverage an implicit
> memory barrier. Receiving the IPI in this little window does nothing
> (the smp_cross_call is a noop).
>
> Checking that mode != EXITING_GUEST_MODE is equally useless in my
> opinion, as I read the requests code the only reason for this mode is
> to avoid sending an IPI twice.
>
> Kicks sent between setting the mode and disabling the interrupts is
> not the point, the point is to check the requests field (which we
> don't use at all on ARM, and generic code also doesn't use on ARM)
> after disabling interrupts, and after setting IN_GUEST_MODE.
>
> The patch below fixes your issues, and while I would push back on
> anything else than direct bug fixes at this point, the current code is
> semantically incorrect wrt. KVM vcpu requests, so it's worth a fix,
> and the patch itself is trivial.
>
[...]

Actually, I take that back, the kvm_vcpu_block function does make a
request, which we don't need to handle, so adding code that checks for
features we don't support is useless at this point. Please ignore the
patch I sent earlier.

Later on we can change some of the code to use the vcpu->features map
if there's a real benefit, but right now the priority is to merge this
code, so anything that's not a bugfix should not go in.

The srcu lock is a real bug though, and should be fixed.

-Christoffer