[RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin

Jason Gunthorpe jgg at ziepe.ca
Mon Feb 8 16:30:23 EST 2021


On Mon, Feb 08, 2021 at 08:35:31PM +0000, Song Bao Hua (Barry Song) wrote:
> 
> 
> > From: Jason Gunthorpe [mailto:jgg at ziepe.ca]
> > Sent: Tuesday, February 9, 2021 7:34 AM
> > To: David Hildenbrand <david at redhat.com>
> > Cc: Wangzhou (B) <wangzhou1 at hisilicon.com>; linux-kernel at vger.kernel.org;
> > iommu at lists.linux-foundation.org; linux-mm at kvack.org;
> > linux-arm-kernel at lists.infradead.org; linux-api at vger.kernel.org; Andrew
> > Morton <akpm at linux-foundation.org>; Alexander Viro <viro at zeniv.linux.org.uk>;
> > gregkh at linuxfoundation.org; Song Bao Hua (Barry Song)
> > <song.bao.hua at hisilicon.com>; kevin.tian at intel.com;
> > jean-philippe at linaro.org; eric.auger at redhat.com; Liguozhu (Kenneth)
> > <liguozhu at hisilicon.com>; zhangfei.gao at linaro.org; chensihang (A)
> > <chensihang1 at hisilicon.com>
> > Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory
> > pin
> > 
> > On Mon, Feb 08, 2021 at 09:14:28AM +0100, David Hildenbrand wrote:
> > 
> > > People are constantly struggling with the effects of long term pinnings
> > > under user space control, like we already have with vfio and RDMA.
> > >
> > > And here we are, adding yet another, easier way to mess with core MM in the
> > > same way. This feels like a step backwards to me.
> > 
> > Yes, this seems like a very poor candidate to be a system call in this
> > format. Much too narrow, poorly specified, and possibly security
> > implications to allow any process whatsoever to pin memory.
> > 
> > I keep encouraging people to explore a standard shared SVA interface
> > that can cover all these topics (and no, uaccel is not that
> > interface), that seems much more natural.
> > 
> > I still haven't seen an explanation why DMA is so special here,
> > migration and so forth jitter the CPU too, environments that care
> > about jitter have to turn this stuff off.
> 
> This paper has a good explanation:
> https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7482091
> 
> mainly because page fault can go directly to the CPU and we have
> many CPUs. But IO Page Faults go a different way, thus mean much
> higher latency 3-80x slower than page fault:
> events in hardware queue -> Interrupts -> cpu processing page fault
> -> return events to iommu/device -> continue I/O.

The justifications for this was migration scenarios and migration is
short. If you take a fault on what you are migrating only then does it
slow down the CPU.

Are you also working with HW where the IOMMU becomes invalidated after
a migration and doesn't reload?

ie not true SVA but the sort of emulated SVA we see in a lot of
places?

It would be much better to work improve that to have closer sync with the
CPU page table than to use pinning.

Jason



More information about the linux-arm-kernel mailing list