[PATCH] KVM: arm64: Unregister HYP sections from kmemleak in protected mode

Thu Jul 29 09:42:15 PDT 2021

On Thu, Jul 29, 2021 at 02:50:16PM +0100, Marc Zyngier wrote:
> Booting a KVM host in protected mode with kmemleak quickly results
> in a pretty bad crash, as kmemleak doesn't know that the HYP sections
> have been taken away.
> 
> Make the unregistration from kmemleak part of marking the sections
> as HYP-private. The rest of the HYP-specific data is obtained via
> the page allocator, which is not subjected to kmemleak.
> 
> Fixes: 90134ac9cabb ("KVM: arm64: Protect the .hyp sections from the host")
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> Cc: Quentin Perret <qperret at google.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: stable at vger.kernel.org # 5.13
> ---
>  arch/arm64/kvm/arm.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e9a2b8f27792..23f12e602878 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -15,6 +15,7 @@
>  #include <linux/fs.h>
>  #include <linux/mman.h>
>  #include <linux/sched.h>
> +#include <linux/kmemleak.h>
>  #include <linux/kvm.h>
>  #include <linux/kvm_irqfd.h>
>  #include <linux/irqbypass.h>
> @@ -1960,8 +1961,12 @@ static inline int pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)
>  }
>  
>  #define pkvm_mark_hyp_section(__section)		\
> +({							\
> +	u64 sz = __section##_end - __section##_start;	\
> +	kmemleak_free_part(__section##_start, sz);	\
>  	pkvm_mark_hyp(__pa_symbol(__section##_start),	\
> -			__pa_symbol(__section##_end))
> +		      __pa_symbol(__section##_end));	\
> +})

Using kmemleak_free_part() is fine in principle as this is not a slab
object. However, the above would call the function even for ranges that
are not tracked at all by kmemleak (text, idmap). Luckily Kmemleak won't
complain, unless you #define DEBUG in the file (initially I tried to
warn all the time but I couldn't fix all the callbacks).

If it was just the BSS, I would move the kmemleak_free_part() call to
finalize_hyp_mode() but there's the __hyp_rodata section as well.

I think we have some inconsistency with .hyp.rodata which falls under
_sdata.._edata while the kernel's own .rodata goes immediately after
_etext. Should we move __hyp_rodata outside _sdata.._edata as well? It
would benefit from the RO NX marking (probably more useful without the
protected mode). If this works, we'd only need to call kmemleak once for
the BSS.

-- 
Catalin