[PATCH] arm64: optimize memcpy_{from,to}io() and memset_io()
Catalin Marinas
catalin.marinas at arm.com
Thu Oct 9 03:16:14 PDT 2014
On Thu, Oct 09, 2014 at 03:39:33AM +0100, Joonwoo Park wrote:
> On Fri, Oct 03, 2014 at 05:31:42PM +0100, Catalin Marinas wrote:
> > On Tue, Jul 29, 2014 at 11:28:26PM -0700, Joonwoo Park wrote:
> > > diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c
> > > index 7d37ead..c0e3ab1 100644
> > > --- a/arch/arm64/kernel/io.c
> > > +++ b/arch/arm64/kernel/io.c
> > > @@ -20,18 +20,34 @@
> > > #include <linux/types.h>
> > > #include <linux/io.h>
> > >
> > > +#define IO_CHECK_ALIGN(v, a) ((((unsigned long)(v)) & ((a) - 1)) == 0)
> >
> > Can you not use just IS_ALIGNED?
>
> Will do. I would need to cast from/to with unsigned long.
Or define IO_CHECK_ALIGN as IS_ALIGNED((unsigned long)v, a)
> > > + from++;
> > > + to++;
> > > count--;
> > > - *t = readb(from);
> > > - t++;
> > > + }
> > > +
> > > + while (count >= 8) {
> > > + *(u64 *)to = readq_relaxed(from);
> > > + from += 8;
> > > + to += 8;
> > > + count -= 8;
> > > + }
> > > +
> > > + while (count) {
> > > + *(u8 *)to = readb_relaxed(from);
> > > from++;
> > > + to++;
> > > + count--;
> > > }
> > > + __iormb();
> >
> > We don't need this barrier here. In the readl() implementation, it's use
> > is for ordering between I/O polling and DMA buffer access.
>
> The barriers here and down below are for accessing different devices in a row.
> I thought that's what your suggestion too.
> http://lists.infradead.org/pipermail/linux-arm-kernel/2012-September/123178.html
I think we should leave them out until we find a use case. I currently
don't see any (writel/readl etc. still have the barriers).
> > > + while (count && !IO_CHECK_ALIGN(p, 8)) {
> > > + writeb_relaxed(c, p);
> >
> > Using dst here directly here should work (__raw_writeb takes the same
> > type as the second argument).
>
> Will update.
>
> I'm quite not sure if barriers are not needed or not indeed.
> The situation I'm worried about is like 'memcpy_io(device A);
> memcpy_io(device B);' which I think memcpy_io() needs to guarantee the
> order.
Without barriers, ordering between device A and B would not be
guaranteed. But do you have a scenario where this ordering actually
matters? Most case we have something like:
memcpy_io(device A); // mapped as Device or Normal NonCacheable
writel(device B); // or an I/O register of device A
Here writel() has the correct barrier.
--
Catalin
More information about the linux-arm-kernel
mailing list