[RFC PATCH V2 4/4] arm64: mm: implement get_user_pages_fast
Steve Capper
steve.capper at linaro.org
Tue Mar 11 06:14:52 EDT 2014
On Tue, Feb 11, 2014 at 03:48:59PM +0000, Catalin Marinas wrote:
> On Thu, Feb 06, 2014 at 04:18:51PM +0000, Steve Capper wrote:
> > An implementation of get_user_pages_fast for arm64. It is based on the
> > arm implementation (it has the added ability to walk huge puds) which
> > is loosely on the PowerPC implementation. We disable interrupts in the
> > walker to prevent the call_rcu_sched pagetable freeing code from
> > running under us.
> >
> > We also explicitly fire an IPI in the Transparent HugePage splitting
> > case to prevent splits from interfering with the fast_gup walker.
> > As THP splits are relatively rare, this should not have a noticable
> > overhead.
> >
> > Signed-off-by: Steve Capper <steve.capper at linaro.org>
> > ---
> > arch/arm64/include/asm/pgtable.h | 4 +
> > arch/arm64/mm/Makefile | 2 +-
> > arch/arm64/mm/gup.c | 297 +++++++++++++++++++++++++++++++++++++++
>
> Why don't you make a generic gup.c implementation and let architectures
> select it? I don't see much arm64-specific code in here.
Hi Catalin,
I've had a stab at generalising the gup, but I've found that it varies
too much between architectures to make this practical for me:
* x86 blocks on TLB invalidate so does not need the speculative page
cache logic. Also x86 does not have 64-bit single-copy atomicity for
pte reads, so needs a work around.
* mips is similar-ish to x86.
* powerpc has extra is_hugepd codepaths to identify huge pages.
* superh has sub-architecture pte flags and no 64-bit single-copy
atomicity.
* sparc has hypervisor tlb logic for the pte flags.
* s390 has extra pmd derefence logic and extra barriers that I do not
quite understand.
My plan was to introduce pte_special(.) for arm with LPAE, add
pte_special logic to fast_gup and share the fast_gup between arm and
arm64.
Does this approach sound reasonable?
Thanks,
--
Steve
More information about the linux-arm-kernel
mailing list