[PATCH v1 2/2] KVM: arm64: allow the VM to select DEVICE_* and NORMAL_NC for IO memory

Lorenzo Pieralisi lpieralisi at kernel.org
Mon Sep 11 07:57:51 PDT 2023


On Thu, Sep 07, 2023 at 11:14:59AM -0700, ankita at nvidia.com wrote:
> From: Ankit Agrawal <ankita at nvidia.com>
> 
> Linux allows device drivers to map IO memory on a per-page basis using
> "write combining" or WC. This is often done using
> pgprot_writecombing(). The driver knows which pages can support WC

pgprot_writecombine() ?

> access and the proper programming model to generate this IO. Generally
> the use case is to boost performance by using write combining to
> generate larger PCIe MemWr TLPs.

First off, this changeset does not affect *only* Linux guests, obviously.

I understand that's the use case you are after but this change is
targeting all VMs, it must be clear.

Then WC and mapping to PCI TLPs, either you describe that in details
(NormalNC vs device-nGnRE and resulting SystemBus<->PCI transactions) or
you don't describe it at all, as it stands I don't know how to use
this information.

> Allow VMs to select DEVICE_* or NORMAL_NC on a page by page basis for
> all IO memory. This puts the VM in charge of the memory attributes,
> and removes the KVM override to DEVICE_nGnRE.
> 
> Ultimately this makes pgprot_writecombing() work correctly in VMs and

pgprot_writecombine() ?

> allows drivers like mlx5 to fully operate their HW.
> 
> After some discussions with ARM and CPU architects we reached the
> conclusion there was no need for KVM to prevent the VM from selecting
> between DEVICE_* and NORMAL_NC for IO memory in VMs. There was a fear
> that NORMAL_NC could result in uncontained failures, but upon deeper
> analysis it turns out there are already possible cases for uncontained
> failures with DEVICE types too. Ultimately the platform must be
> implemented in a way that ensures that all DEVICE_* and NORMAL_NC
> accesses have no uncontained failures.

I would reorder/rephrase this changelog as follows:

- Describe what the problem is (ie KVM default s2 mappings)
- Describe how you are solving it
- Add a link to the documentation that states why it is safe to do
  that and the resulting s1/s2 mappings combination

It must be clear why from a legacy standpoint this is a safe change
to apply.

> Fortunately real platforms do tend to implement this.

Remove this sentence, it adds no information for someone who
is chasing bugs or just wants to understand the change itself.

> This patch makes the VM's memory attributes behave as follows:

"The resulting stage 1 and stage 2 memory types and cacheability
attributes are as follows":

>  S1           |   S2          |  Result
>  NORMAL-WB    |  NORMAL-NC    |  NORMAL-NC
>  NORMAL-WT    |  NORMAL-NC    |  NORMAL-NC
>  NORMAL-NC    |  NORMAL-NC    |  NORMAL-NC
>  DEVICE<attr> |  NORMAL-NC    |  DEVICE<attr>
> 
> See section D8.5.5 of DDI0487_I_a_a-profile_architecture_reference_manual.pdf
> for details.

D8.5 I assume.

I commented on the log because it must be easy to follow, it is an
important change (and there is a lot of background information
required to understand it - please report it).

The patch itself must be reviewed by arm64/kvm maintainers; the changes
(given the background work that led to them) seem fine to me (please CC
me on next versions).

Thanks
Lorenzo

> Signed-off-by: Ankit Agrawal <ankita at nvidia.com>
> ---
>  arch/arm64/include/asm/memory.h | 2 ++
>  arch/arm64/kvm/hyp/pgtable.c    | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index fde4186cc387..c247e5f29d5a 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -147,6 +147,7 @@
>   * Memory types for Stage-2 translation
>   */
>  #define MT_S2_NORMAL		0xf
> +#define MT_S2_NORMAL_NC	0x5
>  #define MT_S2_DEVICE_nGnRE	0x1
>  
>  /*
> @@ -154,6 +155,7 @@
>   * Stage-2 enforces Normal-WB and Device-nGnRE
>   */
>  #define MT_S2_FWB_NORMAL	6
> +#define MT_S2_FWB_NORMAL_NC	5
>  #define MT_S2_FWB_DEVICE_nGnRE	1
>  
>  #ifdef CONFIG_ARM64_4K_PAGES
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index ccd291b6893d..a80949002191 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -696,7 +696,7 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
>  				kvm_pte_t *ptep)
>  {
>  	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> -	kvm_pte_t attr = device ? KVM_S2_MEMATTR(pgt, DEVICE_nGnRE) :
> +	kvm_pte_t attr = device ? KVM_S2_MEMATTR(pgt, NORMAL_NC) :
>  			    KVM_S2_MEMATTR(pgt, NORMAL);
>  	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
>  
> -- 
> 2.17.1
> 



More information about the linux-arm-kernel mailing list