[PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs

Peter Ujfalusi peter.ujfalusi at ti.com
Tue Jan 14 02:43:22 PST 2020

Christoph, Robin,

On 09/01/2020 16.49, Christoph Hellwig wrote:
> On Wed, Jan 08, 2020 at 03:20:07PM +0000, Robin Murphy wrote:
>>> The problem - I think - is that the DMA_BIT_MASK(32) from
>>> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)) is treated as physical
>>> address along the call path so the dma_pfn_offset is applied to it and
>>> the check will fail, saying that DMA_BIT_MASK(32) can not be supported.
>> But that's the thing - in isolation, that is entirely correct. Considering 
>> ZONE_DMA32 for simplicity, in general the zone is expected to cover the 
>> physical address range 0x0000_0000 - 0xffff_ffff (because DMA offsets are 
>> relatively rare), and a device with a dma_pfn_offset of more than 
>> (0x1_0000_0000 >> PAGE_SHIFT) *cannot* support that range with any mask, 
>> because the DMA address itself would have to be negative.
> Note that ZONE_DMA32 is irrelevant in this particular case, as we are
> talking about arm32.  But with ZONE_DMA instead this roughly makes sense.
>> The problem is that platforms with esoteric memory maps have no right thing 
>> to do. If the base of RAM is at at 0x1_0000_0000 or higher, the "correct" 
>> ZONE_DMA32 would be empty while ZONE_NORMAL above it would not, and last 
>> time I looked that makes the page allocator break badly. So the standard 
>> bodge on such platforms is to make ZONE_DMA32 cover not the first 4GB of 
>> *PA space*, but the first 4GB of *RAM*, wherever that happens to be. That 
>> then brings different problems - now the page allocator is happy and 
>> successfully returns GFP_DMA32 allocations from the range 0x8_0000_0000 - 
>> 0x8_ffff_ffff that are utterly useless to 32-bit devices with zero 
>> dma_pfn_offset - see the AMD Seattle SoC for the prime example of that. If 
>> on the other hand all devices are guaranteed to have a dma_pfn_offset that 
>> puts the base of RAM at DMA address 0 then GFP_DMA32 allocations do end up 
>> working as expected, but now the original assumption of where ZONE_DMA32 
>> actually is is broken, so generic code unaware of the 
>> platform/architecture-specific bodge will be misled - that's the case 
>> you're running into.
>> Having thought this far, if there's a non-hacky way to reach in and grab 
>> ZONE_DMA{32} such that dma_direct_supported() could use zone_end_pfn() 
>> instead of trying to assume either way, that might be the most robust 
>> general solution.
> zone_dma_bits is our somewhat ugly way to try to poke into this
> information, although the way it is done right now sucks pretty badly.

In my view the handling of dma_pfn_offset is just incorrect as it is applied to _any_ address.
According to DT specification dma-ranges:
"Value type: <empty> or <prop-encoded-array> encoded as an arbitrary
number of (child-bus-address, parent-bus-address, length) triplets."

Yet in drivers/of/ we only take the _first_ triplet and ignore the rest.

The dma_pfn_offset should be only applied to paddr in the range:
parent-bus-address to parent-bus-address+length
for anything outside of this the dma_pfn_offset is 0.

conversion back from dma to paddr should consider the offset in range:
child-bus-address to child-bus-address+length
and 0 for everything outside of this.

To correctly handle the dma-ranges we would need something like this in device.h:
+struct dma_ranges {
+       u64 paddr;
+       u64 dma_addr;
+       u64 size;
+	unsigned long pfn_offset;

struct device {
-	unsigned long	dma_pfn_offset;
+       struct dma_ranges *dma_ranges;
	int dma_ranges_cnt;

Then when we currently use dma_pfn_offset we would have:

unsigned long __phys_to_dma_pfn_offset(struct device *dev, phys_addr_t paddr)
	int i;

	if (!dev->dma_ranges)
		return 0;

	for (i = 0; i < dev->dma_ranges_cnt; i++) {
		struct dma_ranges *range = &dev->dma_ranges[i];
		if (paddr >= range->paddr &&
		    paddr <= (range->paddr + range->size))
			return range->pfn_offset;

	return 0;

unsigned long __dma_to_phys_pfn_offset(struct device *dev, dma_addr_t dma_addr)
	int i;

	for (i = 0; i < dev->dma_ranges_cnt; i++) {
		struct dma_ranges *range = &dev->dma_ranges[i];
		if (dma_addr >= range->dma_addr &&
		    dma_addr <= (range->dma_addr + range->size))
			return range->pfn_offset;

	return 0;

For existing drivers/archs setting dma_pfn_offset we can:
if (dev->dma_ranges_cnt == 1 && dev->dma_ranges[0].pfn_offset && !dev->dma_ranges[0].size)
	return dev->dma_ranges[0].pfn_offset;

and they would have to set up one struct dma_ranges.

One of the issue with this is that the struct dma_ranges would need to be allocated for
all devices, so there should be a some clever way need to be invented to use pointers
as much as we can.

> The patch I sent to Peter in December was trying to convey that
> information in a way similar to what the arm32 legacy dma code does, but
> it didn't work, so I'll need to find some time to sit down and figure out
> why.

But, while we get a proper solution can we get the following patch in to fix the regression?
Basically we are falling back to what works (and was used before commit ad3c7b18c5b362be5dbd0f2c0bcf1fd5fd659315).

commit 8c3c36b377c139603a9dff5c58dac59865f1ac0f
Author: Peter Ujfalusi <peter.ujfalusi at ti.com>
Date:   Thu Dec 19 15:07:25 2019 +0200

    arm: mm: dma-mapping: Fix dma_supported() when dev->dma_pfn_offset is not 0
    We can only use direct mapping when LPAE is enabled if the dma_pfn_offset
    is 0, otherwise valid dma_masks will be rejected and the DMA support is
    going to be denied for peripherals, or DMA drivers.
    Cc: Stable <stable at vger.kernel.org> #v5.3+
    Signed-off-by: Peter Ujfalusi <peter.ujfalusi at ti.com>

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 9414d72f664b..e07ec1ea3865 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1100,15 +1100,6 @@ int arm_dma_supported(struct device *dev, u64 mask)
 static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
-	/*
-	 * When CONFIG_ARM_LPAE is set, physical address can extend above
-	 * 32-bits, which then can't be addressed by devices that only support
-	 * 32-bit DMA.
-	 * Use the generic dma-direct / swiotlb ops code in that case, as that
-	 * handles bounce buffering for us.
-	 */
-		return NULL;
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
@@ -2313,6 +2304,15 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
 		dma_ops = arm_get_iommu_dma_map_ops(coherent);
+	else if (IS_ENABLED(CONFIG_ARM_LPAE) && !dev->dma_pfn_offset)
+		/*
+		 * When CONFIG_ARM_LPAE is set, physical address can extend
+		 * above * 32-bits, which then can't be addressed by devices
+		 * that only support 32-bit DMA.
+		 * Use the generic dma-direct / swiotlb ops code in that case,
+		 * as that handles bounce buffering for us.
+		 */
+		dma_ops = NULL;
 		dma_ops = arm_get_dma_map_ops(coherent);

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

More information about the linux-arm-kernel mailing list