[PATCH RFC v2 1/1] wiotlb: split buffer into 32-bit default and 64-bit extra zones

Christoph Hellwig hch at infradead.org
Tue Sep 20 00:55:03 PDT 2022


On Sat, Aug 20, 2022 at 12:42:50AM -0700, Dongli Zhang wrote:
> Hello,
> 
> I used to send out RFC v1 to introduce an extra io_tlb_mem (created with
> SWIOTLB_ANY) in addition to the default io_tlb_mem (32-bit).  The
> dev->dma_io_tlb_mem is set to either default or the extra io_tlb_mem,
> depending on dma mask. However, that is not good for setting
> dev->dma_io_tlb_mem at swiotlb layer transparently as suggested by
> Christoph Hellwig.
> 
> https://lore.kernel.org/all/20220609005553.30954-1-dongli.zhang@oracle.com/
> 
> Therefore, this is another RFC v2 implementation following a different
> direction. The core ideas are:
> 
> 1. The swiotlb is splited into two zones, io_tlb_mem->zone[0] (32-bit) and
> io_tlb_mem->zone[1] (64-bit).
> 
> struct io_tlb_mem {
> 	struct io_tlb_zone zone[SWIOTLB_NR];
> 	struct dentry *debugfs;
> 	bool late_alloc;
> 	bool force_bounce;
> 	bool for_alloc;
> 	bool has_extra;
> };
> 
> struct io_tlb_zone {
> 	phys_addr_t start;
> 	phys_addr_t end;
> 	void *vaddr;
> 	unsigned long nslabs;
> 	unsigned long used;
> 	unsigned int nareas;
> 	unsigned int area_nslabs;
> 	struct io_tlb_area *areas;
> 	struct io_tlb_slot *slots;
> };
> 
> 2. By default, only io_tlb_mem->zone[0] is available. The
> io_tlb_mem->zone[1] is allocated conditionally if:
> 
> - the "swiotlb=" is configured to allocate extra buffer, and
> - the SWIOTLB_EXTRA is set in the flag (this is to make sure arch(s) other
>   than x86/sev/xen will not enable it until it is fully tested by each
>   arch, e.g., mips/powerpc). Currently it is enabled for x86 and xen.
> 
> 3. During swiotlb map, whether zone[0] (32-bit) or zone[1] (64-bit
> SWIOTLB_ANY)
> is used depends on min_not_zero(*dev->dma_mask, dev->bus_dma_limit).
> 
> To test the RFC v2, here is the QEMU command line.
> 
> qemu-system-x86_64 -smp 8 -m 32G -enable-kvm -vnc :5 -hda disk.img \
> -kernel path-to-linux/arch/x86_64/boot/bzImage \
> -append "root=/dev/sda1 init=/sbin/init text console=ttyS0 loglevel=7 swiotlb=32768,4194304,force" \
> -net nic -net user,hostfwd=tcp::5025-:22 \
> -device nvme,drive=nvme01,serial=helloworld -drive file=test.qcow2,if=none,id=nvme01 \
> -serial stdio
> 
> There is below in syslog. The extra 8GB buffer is allocated.
> 
> [    0.152251] software IO TLB: area num 8.
> ... ...
> [    3.706088] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [    3.707334] software IO TLB: mapped default [mem 0x00000000bbfd7000-0x00000000bffd7000] (64MB)
> [    3.708585] software IO TLB: mapped extra [mem 0x000000061cc00000-0x000000081cc00000] (8192MB)
> 
> After the FIO is triggered over NVMe, the 64-bit buffer is used.
> 
> $ cat /sys/kernel/debug/swiotlb/io_tlb_nslabs_extra
> 4194304
> $ cat /sys/kernel/debug/swiotlb/io_tlb_used_extra
> 327552
> 
> Would you mind helping if this is the right direction to go?
> 
> Thank you very much!
> 
> Cc: Konrad Wilk <konrad.wilk at oracle.com>
> Cc: Joe Jin <joe.jin at oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang at oracle.com>
> ---
>  arch/arm/xen/mm.c                      |   2 +-
>  arch/mips/pci/pci-octeon.c             |   5 +-
>  arch/x86/include/asm/xen/swiotlb-xen.h |   2 +-
>  arch/x86/kernel/pci-dma.c              |   6 +-
>  drivers/xen/swiotlb-xen.c              |  18 +-
>  include/linux/swiotlb.h                |  73 +++--

> diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> index 3d826c0..4edfa42 100644
> --- a/arch/arm/xen/mm.c
> +++ b/arch/arm/xen/mm.c
> @@ -125,7 +125,7 @@ static int __init xen_mm_init(void)
>  		return 0;
>  
>  	/* we can work with the default swiotlb */
> -	if (!io_tlb_default_mem.nslabs) {
> +	if (!io_tlb_default_mem.zone[SWIOTLB_DF].nslabs) {
>  		rc = swiotlb_init_late(swiotlb_size_or_default(),
>  				       xen_swiotlb_gfp(), NULL);
>  		if (rc < 0)

First thing we need before doing anything about multiple default
pools is to get all the knowledge of details hidden inside swiotlb.c.

For swiotlb_init_late that seems easy as we can just move the check
into it.

> diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
> index e457a18..0bf0859 100644
> --- a/arch/mips/pci/pci-octeon.c
> +++ b/arch/mips/pci/pci-octeon.c
> @@ -654,6 +654,9 @@ static int __init octeon_pci_setup(void)
>  		octeon_pci_mem_resource.end =
>  			octeon_pci_mem_resource.start + (1ul << 30);
>  	} else {
> +		struct io_tlb_mem *mem = &io_tlb_default_mem;
> +		struct io_tlb_zone *zone = &mem->zone[SWIOTLB_DF];
> +
>  		/* Remap the Octeon BAR 0 to map 128MB-(128MB+4KB) */
>  		octeon_npi_write32(CVMX_NPI_PCI_CFG04, 128ul << 20);
>  		octeon_npi_write32(CVMX_NPI_PCI_CFG05, 0);
> @@ -664,7 +667,7 @@ static int __init octeon_pci_setup(void)
>  
>  		/* BAR1 movable regions contiguous to cover the swiotlb */
>  		octeon_bar1_pci_phys =
> -			io_tlb_default_mem.start & ~((1ull << 22) - 1);
> +			zone->start & ~((1ull << 22) - 1);

But we'll need to do something about this mess.  I'll need help from
the octeon maintainer on it.

> -	x86_swiotlb_flags |= SWIOTLB_ANY;
> +	x86_swiotlb_flags |= SWIOTLB_ANY | SWIOTLB_EXTRA;

I don't think this is how it is suppoed to be.  SWIOTLB_ANY already
says give me a pool with no addressing constrains.  We don't need
two pools without that.  EXTRA is also not exactly a very helpful
name here.

>  #ifdef CONFIG_X86
> -int xen_swiotlb_fixup(void *buf, unsigned long nslabs)
> +int xen_swiotlb_fixup(void *buf, unsigned long nslabs, unsigned int flags)
>  {
>  	int rc;
>  	unsigned int order = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT);
>  	unsigned int i, dma_bits = order + PAGE_SHIFT;
>  	dma_addr_t dma_handle;
>  	phys_addr_t p = virt_to_phys(buf);
> +	unsigned int max_dma_bits = 32;

I think live will be a lot simple if the addressing bits are passed to
this function, and not some kind of flags.

> +#define SWIOTLB_DF	0
> +#define SWIOTLB_EX	1
> +#define SWIOTLB_NR	2

These names are not very descriptive.



More information about the linux-arm-kernel mailing list