[PATCH] kvm: fix gpu passthrough into vm on arm64

Wed Mar 23 03:12:15 PDT 2022

Please use my *working* email address (look in the MAINTAINERS file for 
the up-to-date one).

On 2022-03-23 01:25, xieming wrote:
> 1) when passthrough some pcie device, such as AMD gpus,
>         kvm will report:"Unsupported FSC:" err.

Please detail this. What values of FSC? In what circumstances?

> 
> 2) the main reason is kvm setting memory type to
>    PAGE_S2_DEVICE(DEVICE_nGnRE), but in guestos, all of device io 
> memory
>    type when ioremapping (including gpu driver TTM memory type) is
>    setting to MT_NORMAL_NC.
> 
> 3) according to ARM64 stage1&stage2 conbining rules.
>    memory type attributes combining rules:
>    Normal-WB < Normal-WT <  NormalNC <  Device-GRE <  Device-nGRE <
>    DevicenGnRE < Device-nGnRnE
>    Normal-WB is weakest,Device-nGnRnE is strongest.
> 
>    refferring to 'Arm Architecture Reference Manual Armv8,
>    for Armv8-A architecture profile' pdf, chapter B2.8
>    refferring to 'ARM System Memory Management Unit Architecture
>    Specification SMMU architecture version 3.0 and version 3.1' pdf,
>    chapter 13.1.5
> 
> 4) therefore, the I/O memory attribute of the VM is setting to
>    DevicenGnRE is a big mistake. it causes all device memory accessing 
> in
>    the virtual machine must be aligned.
> 
>    To summarize: stage2 memory type cannot be stronger than stage1 in
>    arm64 archtechture.

How do you suggest KVM finds out about what the guest wants and
what the device supports?

> 
> Signed-off-by: xieming <xieming at kylinos.cn>
> ---
>  virt/kvm/arm/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 11103b75c596..9b7fb13f4546 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1209,7 +1209,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm,
> phys_addr_t guest_ipa,
>  	pfn = __phys_to_pfn(pa);
> 
>  	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
> -		pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE);
> +		pte_t pte = pfn_pte(pfn, PAGE_S2);
> 
>  		if (writable)
>  			pte = kvm_s2pte_mkwrite(pte);

No, this cannot be a blanket change. This means that the
guest will be able to obtain a cacheable mapping on devices,
allow reordering, and other things that are likely to *break*
the system. You also have no business calling this function
outside of KVM.

You are asking us to trust the guest. There is no way this
is acceptable. If the device supports NORMAL_NC, this should
be known by the host kernel and exposed to KVM.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...