[PATCH 2/3] riscv: Implement Zicbom-based cache management operations

Heiko Stübner heiko at sntech.de
Wed Jun 15 09:56:40 PDT 2022


Hi Christoph,

Am Freitag, 10. Juni 2022, 07:56:08 CEST schrieb Christoph Hellwig:
> On Fri, Jun 10, 2022 at 02:43:07AM +0200, Heiko Stuebner wrote:
> > +config RISCV_ISA_ZICBOM
> > +	bool "Zicbom extension support for non-coherent dma operation"
> > +	select ARCH_HAS_DMA_PREP_COHERENT
> > +	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> > +	select ARCH_HAS_SYNC_DMA_FOR_CPU
> > +	select ARCH_HAS_SETUP_DMA_OPS
> > +	select DMA_DIRECT_REMAP
> > +	select RISCV_ALTERNATIVE
> > +	default y
> > +	help
> > +	   Adds support to dynamically detect the presence of the ZICBOM extension
> 
> Overly long line here.

fixed

> 
> > +	   (Cache Block Management Operations) and enable its usage.
> > +
> > +	   If you don't know what to do here, say Y.
> 
> But more importantly I think the whole text here is not very helpful.
> What users care about is non-coherent DMA support.  What extension is
> used for that is rather secondary.

I guess it might make sense to split that in some way.
I.e. Zicbom provides one implementation for handling non-coherence,
the D1 uses different (but very similar) instructions while the SoC on the
Beagle-V does something completely different.

So I guess it could make sense to have a general DMA_NONCOHERENT option
and which gets selected by the relevant users.

This also fixes the issue that Zicbom needs a very new binutils
but if beagle-v support happens that wouldn't need that.


> Also please capitalize DMA.

fixed

> > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> > +{
> > +	switch (dir) {
> > +	case DMA_TO_DEVICE:
> > +		ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size, riscv_cbom_block_size);
> > +		break;
> > +	case DMA_FROM_DEVICE:
> > +		ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size, riscv_cbom_block_size);
> > +		break;
> > +	case DMA_BIDIRECTIONAL:
> > +		ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size, riscv_cbom_block_size);
> > +		break;
> > +	default:
> > +		break;
> > +	}
> 
> Pleae avoid all these crazy long lines.  and use a logical variable
> for the virtual address.  And why do you pass that virtual address
> as an unsigned long to ALT_CMO_OP?  You're going to make your life
> much easier if you simply always pass a pointer.

fixed all of those.
And of course you're right, not having the cast when calling ALT_CMO_OP
makes things definitly a lot nicer looking.

> Last but not last, does in RISC-V clean mean writeback and flush mean
> writeback plus invalidate?  If so the code is correct, but the choice
> of names in the RISC-V spec is extremely unfortunate.

clean: 
    makes data [...] visible to a set of non-coherent agents [...] by
    performing a write transfer of a copy of a cache block [...]

flush:
    performs a clean followed by an invalidate

So that's a yes to your question

> > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> > +{
> > +	switch (dir) {
> > +	case DMA_TO_DEVICE:
> > +		break;
> > +	case DMA_FROM_DEVICE:
> > +	case DMA_BIDIRECTIONAL:
> > +		ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size, riscv_cbom_block_size);
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +}
> 
> Same comment here and in few other places.

fixed

> > +
> > +void arch_dma_prep_coherent(struct page *page, size_t size)
> > +{
> > +	void *flush_addr = page_address(page);
> > +
> > +	memset(flush_addr, 0, size);
> > +	ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size, riscv_cbom_block_size);
> > +}
> 
> arch_dma_prep_coherent should never zero the memory, that is left
> for the upper layers.`

fixed

> > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > +		const struct iommu_ops *iommu, bool coherent)
> > +{
> > +	/* If a specific device is dma-coherent, set it here */
> 
> This comment isn't all that useful.

ok, I've dropped it

> > +	dev->dma_coherent = coherent;
> > +}
> 
> But more importantly, this assums that once this code is built all
> devices are non-coherent by default.  I.e. with this patch applied
> and the config option enabled we'll now suddenly start doing cache
> management operations or setups that didn't do it before.

If I'm reading things correctly [0], the default for those functions
is for those to be empty - but defined in the coherent case.

When you look at the definition of ALT_CMO_OP

#define ALT_CMO_OP(_op, _start, _size, _cachesize)                      \
asm volatile(ALTERNATIVE_2(                                             \
        __nops(6),                                                      \

 you'll see that it's default variant is to do nothing and it doing any
non-coherency voodoo is only patched in if the Zicbom extension
(or T-Head errata) is detected at runtime.

So in the coherent case (with the memset removed as you suggested),
the arch_sync_dma_* and arch_dma_prep_coherent functions end up as
something like

void arch_dma_prep_coherent(struct page *page, size_t size)
{
        void *flush_addr = page_address(page);

        nops(6);
}

which is very mich similar to the defaults [0] I guess, or am I
overlooking something?

Thanks for taking the time for that review
Heiko


[0] https://elixir.bootlin.com/linux/latest/source/include/linux/dma-map-ops.h#L293





More information about the linux-riscv mailing list