[PATCH v12 1/6] drm: Add arch arm64 for drm_clflush_virt_range

Michael Cheng michael.cheng at intel.com
Mon Mar 7 08:52:06 PST 2022


Ah Thanks for the great feedback!

@Lucas or @Matt, could you please chime in?

Michael Cheng

On 2022-03-02 11:10 a.m., Robin Murphy wrote:
> On 2022-03-02 15:55, Michael Cheng wrote:
>> Thanks for the feedback Robin!
>>
>> Sorry my choices of word weren't that great, but what I meant is to 
>> understand how ARM flushes a range of dcache for device drivers, and 
>> not an equal to x86 clflush.
>>
>> I believe the concern is if the CPU writes an update, that update 
>> might only be sitting in the CPU cache and never make it to device 
>> memory where the device can see it; there are specific places that we 
>> are supposed to flush the CPU caches to make sure our updates are 
>> visible to the hardware.
>
> Ah, OK, if it's more about ordering, and it's actually write buffers 
> rather than caches that you care about flushing, then we might be a 
> lot safer, phew!
>
> For a very simple overview, in a case where the device itself needs to 
> observe memory writes in the correct order, e.g.:
>
>     data_descriptor.valid = 1;
>
>     clflush(&data_descriptor);
>
>     command_descriptor.data = &data_descriptor
>
>     writel(/* control register to read command to then read data */)
>
> then dma_wmb() between the first two writes should be the right tool 
> to ensure that the command does not observe the command update while 
> the data update is still sat somewhere in a CPU write buffer.
>
> If you want a slightly stronger notion that, at a given point, all 
> prior writes have actually been issued and should now be visible 
> (rather than just that they won't become visible in the wrong order 
> whenever they do), then wmb() should suffice on arm64.
>
> Note that wioth arm64 memory types, a Non-Cacheable mapping of DRAM 
> for a non-coherent DMA mapping, or of VRAM in a prefetchable BAR, can 
> still be write-buffered, so barriers still matter even when actual 
> cache maintenance ops don't (and as before if you're trying to perform 
> cache maintenance outside the DMA API then you've already lost 
> anyway). MMIO registers should be mapped as Device memory via 
> ioremap(), which is not bufferable, hence the barrier implicit in 
> writel() effectively pushes out any prior buffered writes ahead of a 
> register write, which is why we don't need to worry about this most of 
> the time.
>
> This is only a very rough overview, though, and I'm not familiar 
> enough with x86 semantics, your hardware, or the exact use-case to be 
> able to say whether barriers alone are anywhere near the right answer 
> or not.
>
> Robin.
>
>>
>> +Matt Roper
>>
>> Matt, Lucas, any feed back here?
>>
>> On 2022-03-02 4:49 a.m., Robin Murphy wrote:
>>> On 2022-02-25 19:27, Michael Cheng wrote:
>>>> Hi Robin,
>>>>
>>>> [ +arm64 maintainers for their awareness, which would have been a 
>>>> good thing to do from the start ]
>>>>
>>>>   * Thanks for adding the arm64 maintainer and sorry I didn't rope 
>>>> them
>>>>     in sooner.
>>>>
>>>> Why does i915 need to ensure the CPU's instruction cache is 
>>>> coherent with its data cache? Is it a self-modifying driver?
>>>>
>>>>   * Also thanks for pointing this out. Initially I was using
>>>>     dcache_clean_inval_poc, which seem to be the equivalently to what
>>>>     x86 is doing for dcache flushing, but it was giving me build 
>>>> errors
>>>>     since its not on the global list of kernel symbols. And after
>>>>     revisiting the documentation for caches_clean_inval_pou, it won't
>>>>     fly for what we are trying to do. Moving forward, what would 
>>>> you (or
>>>>     someone in the ARM community) suggest we do? Could it be 
>>>> possible to
>>>>     export dcache_clean_inval_poc as a global symbol?
>>>
>>> Unlikely, unless something with a legitimate need for CPU-centric 
>>> cache maintenance like kexec or CPU hotplug ever becomes modular.
>>>
>>> In the case of a device driver, it's not even the basic issues of 
>>> assuming to find direct equivalents to x86 semantics in other CPU 
>>> architectures, or effectively reinventing parts of the DMA API, it's 
>>> even bigger than that. Once you move from being integrated in a 
>>> single vendor's system architecture to being on a discrete card, you 
>>> fundamentally *no longer have any control over cache coherency*. 
>>> Whether the host CPU architecture happens to be AArch64, RISC-V, or 
>>> whatever doesn't really matter, you're at the mercy of 3rd-party 
>>> PCIe and interconnect IP vendors, and SoC integrators. You'll find 
>>> yourself in systems where PCIe simply cannot snoop any caches, where 
>>> you'd better have the correct DMA API calls in place to have any 
>>> hope of even the most basic functionality working properly; you'll 
>>> find yourself in systems where even if the PCIe root complex claims 
>>> to support No Snoop, your uncached traffic will still end up 
>>> snooping stale data that got prefetched back into caches you thought 
>>> you'd invalidated; you'll find yourself in systems where your memory 
>>> attributes may or may not get forcibly rewritten by an IOMMU 
>>> depending on the kernel config and/or command line.
>>>
>>> It's not about simply finding a substitute for clflush, it's that 
>>> the reasons you have for using clflush in the first place can no 
>>> longer be assumed to be valid.
>>>
>>> Robin.
>>>
>>>> On 2022-02-25 10:24 a.m., Robin Murphy wrote:
>>>>> [ +arm64 maintainers for their awareness, which would have been a 
>>>>> good thing to do from the start ]
>>>>>
>>>>> On 2022-02-25 03:24, Michael Cheng wrote:
>>>>>> Add arm64 support for drm_clflush_virt_range. caches_clean_inval_pou
>>>>>> performs a flush by first performing a clean, follow by an 
>>>>>> invalidation
>>>>>> operation.
>>>>>>
>>>>>> v2 (Michael Cheng): Use correct macro for cleaning and 
>>>>>> invalidation the
>>>>>>             dcache. Thanks Tvrtko for the suggestion.
>>>>>>
>>>>>> v3 (Michael Cheng): Replace asm/cacheflush.h with linux/cacheflush.h
>>>>>>
>>>>>> v4 (Michael Cheng): Arm64 does not export dcache_clean_inval_poc 
>>>>>> as a
>>>>>>             symbol that could be use by other modules, thus use
>>>>>>             caches_clean_inval_pou instead. Also this version
>>>>>>                 removes include for cacheflush, since its already
>>>>>>             included base on architecture type.
>>>>>>
>>>>>> Signed-off-by: Michael Cheng <michael.cheng at intel.com>
>>>>>> Reviewed-by: Matt Roper <matthew.d.roper at intel.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/drm_cache.c | 5 +++++
>>>>>>   1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_cache.c 
>>>>>> b/drivers/gpu/drm/drm_cache.c
>>>>>> index c3e6e615bf09..81c28714f930 100644
>>>>>> --- a/drivers/gpu/drm/drm_cache.c
>>>>>> +++ b/drivers/gpu/drm/drm_cache.c
>>>>>> @@ -174,6 +174,11 @@ drm_clflush_virt_range(void *addr, unsigned 
>>>>>> long length)
>>>>>>         if (wbinvd_on_all_cpus())
>>>>>>           pr_err("Timed out waiting for cache flush\n");
>>>>>> +
>>>>>> +#elif defined(CONFIG_ARM64)
>>>>>> +    void *end = addr + length;
>>>>>> +    caches_clean_inval_pou((unsigned long)addr, (unsigned 
>>>>>> long)end);
>>>>>
>>>>> Why does i915 need to ensure the CPU's instruction cache is 
>>>>> coherent with its data cache? Is it a self-modifying driver?
>>>>>
>>>>> Robin.
>>>>>
>>>>> (Note that the above is somewhat of a loaded question, and I do 
>>>>> actually have half an idea of what you're trying to do here and 
>>>>> why it won't fly, but I'd like to at least assume you've read the 
>>>>> documentation of the function you decided was OK to use)
>>>>>
>>>>>> +
>>>>>>   #else
>>>>>>       WARN_ONCE(1, "Architecture has no drm_cache.c support\n");
>>>>>>   #endif



More information about the linux-arm-kernel mailing list