[PATCH V2 3/3] arm: mvebu: Add hardware I/O Coherency support
Gregory CLEMENT
gregory.clement at free-electrons.com
Tue Nov 20 16:01:53 EST 2012
Hello,
On 11/19/2012 01:50 PM, Marek Szyprowski wrote:
> Hello,
>
> On 11/16/2012 10:45 AM, Gregory CLEMENT wrote:
>> Armada 370 and XP come with an unit called coherency fabric. This unit
>> allows to use the Armada 370/XP as a nearly coherent architecture. The
>> coherency mechanism uses snoop filters to ensure the coherency between
>> caches, DRAM and devices. This mechanism needs a synchronization
>> barrier which guarantees that all the memory writes initiated by the
>> devices have reached their target and do not reside in intermediate
>> write buffers. That's why the architecture is not totally coherent and
>> we need to provide our own functions for some DMA operations.
>>
>> Beside the use of the coherency fabric, the device units will have to
>> set the attribute flag of the decoding address window to select the
>> accurate coherency process for the memory transaction. This is done
>> each device driver programs the DRAM address windows. The value of the
>> attribute set by the driver is retrieved through the
>> orion_addr_map_cfg struct filled during the early initialization of
>> the platform.
>>
>> Signed-off-by: Gregory CLEMENT <gregory.clement at free-electrons.com>
>> Reviewed-by: Yehuda Yitschak <yehuday at marvell.com>
>> ---
>> .../devicetree/bindings/arm/coherency-fabric.txt | 9 ++-
>> arch/arm/boot/dts/armada-370-xp.dtsi | 3 +-
>> arch/arm/mach-mvebu/addr-map.c | 3 +
>> arch/arm/mach-mvebu/coherency.c | 73 ++++++++++++++++++++
>> 4 files changed, 85 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/arm/coherency-fabric.txt b/Documentation/devicetree/bindings/arm/coherency-fabric.txt
>> index 2bfbf67..17d8cd1 100644
>> --- a/Documentation/devicetree/bindings/arm/coherency-fabric.txt
>> +++ b/Documentation/devicetree/bindings/arm/coherency-fabric.txt
>> @@ -5,12 +5,17 @@ Available on Marvell SOCs: Armada 370 and Armada XP
>> Required properties:
>>
>> - compatible: "marvell,coherency-fabric"
>> -- reg: Should contain,coherency fabric registers location and length.
>> +
>> +- reg: Should contain coherency fabric registers location and
>> + length. First pair for the coherency fabric registers, second pair
>> + for the per-CPU fabric registers registers.
>>
>> Example:
>>
>> coherency-fabric at d0020200 {
>> compatible = "marvell,coherency-fabric";
>> - reg = <0xd0020200 0xb0>;
>> + reg = <0xd0020200 0xb0>,
>> + <0xd0021810 0x1c>;
>> +
>> };
>>
>> diff --git a/arch/arm/boot/dts/armada-370-xp.dtsi b/arch/arm/boot/dts/armada-370-xp.dtsi
>> index b0d075b..98a6b26 100644
>> --- a/arch/arm/boot/dts/armada-370-xp.dtsi
>> +++ b/arch/arm/boot/dts/armada-370-xp.dtsi
>> @@ -38,7 +38,8 @@
>>
>> coherency-fabric at d0020200 {
>> compatible = "marvell,coherency-fabric";
>> - reg = <0xd0020200 0xb0>;
>> + reg = <0xd0020200 0xb0>,
>> + <0xd0021810 0x1c>;
>> };
>>
>> soc {
>> diff --git a/arch/arm/mach-mvebu/addr-map.c b/arch/arm/mach-mvebu/addr-map.c
>> index fe454a4..595f6b7 100644
>> --- a/arch/arm/mach-mvebu/addr-map.c
>> +++ b/arch/arm/mach-mvebu/addr-map.c
>> @@ -108,6 +108,9 @@ static int __init armada_setup_cpu_mbus(void)
>>
>> addr_map_cfg.bridge_virt_base = mbus_unit_addr_decoding_base;
>>
>> + if (of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric"))
>> + addr_map_cfg.hw_io_coherency = 1;
>> +
>> /*
>> * Disable, clear and configure windows.
>> */
>> diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
>> index 20a0ccc..153fcfa 100644
>> --- a/arch/arm/mach-mvebu/coherency.c
>> +++ b/arch/arm/mach-mvebu/coherency.c
>> @@ -22,6 +22,8 @@
>> #include <linux/of_address.h>
>> #include <linux/io.h>
>> #include <linux/smp.h>
>> +#include <linux/dma-mapping.h>
>> +#include <linux/platform_device.h>
>> #include <asm/smp_plat.h>
>> #include "armada-370-xp.h"
>>
>> @@ -32,11 +34,14 @@
>> * value matching its virtual mapping
>> */
>> static void __iomem *coherency_base = ARMADA_370_XP_REGS_VIRT_BASE + 0x20200;
>> +static void __iomem *coherency_cpu_base;
>>
>> /* Coherency fabric registers */
>> #define COHERENCY_FABRIC_CTL_OFFSET 0x0
>> #define COHERENCY_FABRIC_CFG_OFFSET 0x4
>>
>> +#define IO_SYNC_BARRIER_CTL_OFFSET 0x0
>> +
>> static struct of_device_id of_coherency_table[] = {
>> {.compatible = "marvell,coherency-fabric"},
>> { /* end of list */ },
>> @@ -75,6 +80,70 @@ int set_cpu_coherent(unsigned int hw_cpu_id, int smp_group_id)
>> return 0;
>> }
>>
>> +static inline void mvebu_hwcc_sync_io_barrier(void)
>> +{
>> + writel(0x1, coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET);
>> + while (readl(coherency_cpu_base + IO_SYNC_BARRIER_CTL_OFFSET) & 0x1);
>> +}
>> +
>> +static dma_addr_t mvebu_hwcc_dma_map_page(struct device *dev, struct page *page,
>> + unsigned long offset, size_t size,
>> + enum dma_data_direction dir,
>> + struct dma_attrs *attrs)
>> +{
>> + if (dir != DMA_TO_DEVICE)
>> + mvebu_hwcc_sync_io_barrier();
>> + return pfn_to_dma(dev, page_to_pfn(page)) + offset;
>> +}
>> +
>> +
>> +static void mvebu_hwcc_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
>> + size_t size, enum dma_data_direction dir,
>> + struct dma_attrs *attrs)
>> +{
>> + if (dir != DMA_TO_DEVICE)
>> + mvebu_hwcc_sync_io_barrier();
>> +}
>> +
>> +static void mvebu_hwcc_dma_sync(struct device *dev, dma_addr_t dma_handle,
>> + size_t size, enum dma_data_direction dir)
>> +{
>> + if (dir != DMA_TO_DEVICE)
>> + mvebu_hwcc_sync_io_barrier();
>> +}
>> +
>> +static struct dma_map_ops mvebu_hwcc_dma_ops = {
>> + .alloc = arm_coherent_dma_alloc,
>> + .free = arm_coherent_dma_free,
>
> Are you sure that arm_coherent_dma_{alloc,free} are right functions for Your
> architecture? If I understand right, You need to do implicit synchronization
> (io barrier) between CPU transactions and device transactions.
>
> dma_alloc_coherent() provides memory which can be used simultaneously by
> both
> CPU and devices, so without such barrier the memory won't be coherent.
>
> IMHO You should use arm_dma_{alloc,free} functions as Your hardware is not
> truly coherent. Then Your mvebu_hwcc_dma_ops will look very similar to
> dmabounce_ops from arch/arm/common/dmabounce.c (custom functions only for
> map/unmap page and sync_single_for_cpu/device).
You are totally right. In our first internal version based on older
kernel (3.2 or 3.4). We had set arch_is_coherent() to 1, and added
some hook in __dma_single_cpu_to_dev(), __dma_single_dev_to_cpu(),
__dma_page_cpu_to_dev() and __dma_page_dev_to_cpu(). So when
arch_is_coherent() were removed and when we switched to dma_ops, I had
assumed that we set the architecture as coherent modulo the modified
function. But I didn't realize that in this older kernel there were no
functions for coherent_alloc for arm. So it was wrong to use
the new arm_coherent_dma_alloc.
I made the change you have suggested and I will sent a new version
very soon.
Thanks for you review!
>
>> + .mmap = arm_dma_mmap,
>> + .unmap_page = mvebu_hwcc_dma_unmap_page,
>
> Please reorder entries to get map and unmap together.
OK, I will.
Gregory
More information about the linux-arm-kernel
mailing list