[PATCH] arm64: optimize memcpy_{from,to}io() and memset_io()
Joonwoo Park
joonwoop at codeaurora.org
Mon Oct 13 21:04:50 PDT 2014
On Thu, Oct 09, 2014 at 11:16:14AM +0100, Catalin Marinas wrote:
> On Thu, Oct 09, 2014 at 03:39:33AM +0100, Joonwoo Park wrote:
> > On Fri, Oct 03, 2014 at 05:31:42PM +0100, Catalin Marinas wrote:
> > > On Tue, Jul 29, 2014 at 11:28:26PM -0700, Joonwoo Park wrote:
> > > > diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c
> > > > index 7d37ead..c0e3ab1 100644
> > > > --- a/arch/arm64/kernel/io.c
> > > > +++ b/arch/arm64/kernel/io.c
> > > > @@ -20,18 +20,34 @@
> > > > #include <linux/types.h>
> > > > #include <linux/io.h>
> > > >
> > > > +#define IO_CHECK_ALIGN(v, a) ((((unsigned long)(v)) & ((a) - 1)) == 0)
> > >
> > > Can you not use just IS_ALIGNED?
> >
> > Will do. I would need to cast from/to with unsigned long.
>
> Or define IO_CHECK_ALIGN as IS_ALIGNED((unsigned long)v, a)
>
> > > > + from++;
> > > > + to++;
> > > > count--;
> > > > - *t = readb(from);
> > > > - t++;
> > > > + }
> > > > +
> > > > + while (count >= 8) {
> > > > + *(u64 *)to = readq_relaxed(from);
> > > > + from += 8;
> > > > + to += 8;
> > > > + count -= 8;
> > > > + }
> > > > +
> > > > + while (count) {
> > > > + *(u8 *)to = readb_relaxed(from);
> > > > from++;
> > > > + to++;
> > > > + count--;
> > > > }
> > > > + __iormb();
> > >
> > > We don't need this barrier here. In the readl() implementation, it's use
> > > is for ordering between I/O polling and DMA buffer access.
> >
> > The barriers here and down below are for accessing different devices in a row.
> > I thought that's what your suggestion too.
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2012-September/123178.html
>
> I think we should leave them out until we find a use case. I currently
> don't see any (writel/readl etc. still have the barriers).
>
> > > > + while (count && !IO_CHECK_ALIGN(p, 8)) {
> > > > + writeb_relaxed(c, p);
> > >
> > > Using dst here directly here should work (__raw_writeb takes the same
> > > type as the second argument).
> >
> > Will update.
> >
> > I'm quite not sure if barriers are not needed or not indeed.
> > The situation I'm worried about is like 'memcpy_io(device A);
> > memcpy_io(device B);' which I think memcpy_io() needs to guarantee the
> > order.
>
> Without barriers, ordering between device A and B would not be
> guaranteed. But do you have a scenario where this ordering actually
> matters? Most case we have something like:
>
> memcpy_io(device A); // mapped as Device or Normal NonCacheable
> writel(device B); // or an I/O register of device A
>
> Here writel() has the correct barrier.
I don't have such use case that ordering matters either. I agree that we should leave until it's needed.
Thanks,
Joonwoo
>
> --
> Catalin
--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
More information about the linux-arm-kernel
mailing list