[PATCH 7/8] swiotlb: respect min_align_mask

Thu Feb 4 18:13:45 EST 2021

On 2021-02-04 19:30, Christoph Hellwig wrote:
> Respect the min_align_mask in struct device_dma_parameters in swiotlb.
> 
> There are two parts to it:
>   1) for the lower bits of the alignment inside the io tlb slot, just
>      extent the size of the allocation and leave the start of the slot
>       empty
>   2) for the high bits ensure we find a slot that matches the high bits
>      of the alignment to avoid wasting too much memory
> 
> Based on an earlier patch from Jianxiong Gao <jxgao at google.com>.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   kernel/dma/swiotlb.c | 49 +++++++++++++++++++++++++++++++++++++-------
>   1 file changed, 42 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 6a2439826a1ba4..ab3192142b9906 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,6 +468,18 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
>   	}
>   }
>   
> +/*
> + * Return the offset into a iotlb slot required to keep the device happy.
> + */
> +static unsigned int swiotlb_align_offset(struct device *dev, u64 addr)
> +{
> +	unsigned min_align_mask = dma_get_min_align_mask(dev);
> +
> +	if (!min_align_mask)
> +		return 0;

I doubt that's beneficial - even if the compiler can convert it into a 
csel, it'll then be doing unnecessary work to throw away a 
cheaply-calculated 0 in favour of hard-coded 0 in the one case it matters ;)

> +	return addr & min_align_mask & ((1 << IO_TLB_SHIFT) - 1);

(BTW, for readability throughout, "#define IO_TLB_SIZE (1 << 
IO_TLB_SHIFT)" sure wouldn't go amiss...)

> +}
> +
>   /*
>    * Carefully handle integer overflow which can occur when boundary_mask == ~0UL.
>    */
> @@ -478,6 +490,16 @@ static inline unsigned long get_max_slots(unsigned long boundary_mask)
>   	return nr_slots(boundary_mask + 1);
>   }
>   
> +static inline bool check_alignment(phys_addr_t orig_addr,
> +		dma_addr_t tbl_dma_addr, unsigned int index,
> +		unsigned int min_align_mask)
> +{
> +	if (!min_align_mask)
> +		return true;

Ditto - even the 5 or so operations this might skip is unlikely to 
outweigh a branch on anything that matters, and again csel would be a 
net loss since x & 0 == y & 0 is still the correct answer.

> +	return ((tbl_dma_addr + (index << IO_TLB_SHIFT)) & min_align_mask) ==
> +		(orig_addr & min_align_mask);
> +}
> +
>   static unsigned int wrap_index(unsigned int index)
>   {
>   	if (index >= io_tlb_nslabs)
> @@ -489,9 +511,11 @@ static unsigned int wrap_index(unsigned int index)
>    * Find a suitable number of IO TLB entries size that will fit this request and
>    * allocate a buffer from that IO TLB pool.
>    */
> -static int find_slots(struct device *dev, size_t alloc_size,
> -		dma_addr_t tbl_dma_addr)
> +static int find_slots(struct device *dev, phys_addr_t orig_addr,
> +		size_t alloc_size, dma_addr_t tbl_dma_addr)
>   {
> +	unsigned int min_align_mask = dma_get_min_align_mask(dev) &
> +			~((1 << IO_TLB_SHIFT) - 1);
>   	unsigned int max_slots = get_max_slots(dma_get_seg_boundary(dev));
>   	unsigned int nslots = nr_slots(alloc_size), stride = 1;
>   	unsigned int index, wrap, count = 0, i;
> @@ -503,7 +527,9 @@ static int find_slots(struct device *dev, size_t alloc_size,
>   	 * For mappings greater than or equal to a page, we limit the stride
>   	 * (and hence alignment) to a page size.
>   	 */
> -	if (alloc_size >= PAGE_SIZE)
> +	if (min_align_mask)
> +		stride = (min_align_mask + 1) >> IO_TLB_SHIFT;

So this can't underflow because "min_align_mask" is actually just the 
high-order bits representing the number of iotlb slots needed to meet 
the requirement, right? (It took a good 5 minutes to realise this wasn't 
doing what I initially thought it did...)

In that case, a) could the local var be called something like 
iotlb_align_mask to clarify that it's *not* just a copy of the device's 
min_align_mask, and b) maybe just have an unconditional initialisation 
that works either way:

	stride = (min_align_mask >> IO_TLB_SHIFT) + 1;

In fact with that, I think could just mask orig_addr with ~IO_TLB_SIZE 
in the call to check_alignment() below, or shift everything down by 
IO_TLB_SHIFT in check_alignment() itself, instead of mangling 
min_align_mask at all (I'm assuming we do need to ignore the low-order 
bits of orig_addr at this point).

Robin.