[RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin

Mon Feb 8 16:30:23 EST 2021

On Mon, Feb 08, 2021 at 08:35:31PM +0000, Song Bao Hua (Barry Song) wrote:
> 
> 
> > From: Jason Gunthorpe [mailto:jgg at ziepe.ca]
> > Sent: Tuesday, February 9, 2021 7:34 AM
> > To: David Hildenbrand <david at redhat.com>
> > Cc: Wangzhou (B) <wangzhou1 at hisilicon.com>; linux-kernel at vger.kernel.org;
> > iommu at lists.linux-foundation.org; linux-mm at kvack.org;
> > linux-arm-kernel at lists.infradead.org; linux-api at vger.kernel.org; Andrew
> > Morton <akpm at linux-foundation.org>; Alexander Viro <viro at zeniv.linux.org.uk>;
> > gregkh at linuxfoundation.org; Song Bao Hua (Barry Song)
> > <song.bao.hua at hisilicon.com>; kevin.tian at intel.com;
> > jean-philippe at linaro.org; eric.auger at redhat.com; Liguozhu (Kenneth)
> > <liguozhu at hisilicon.com>; zhangfei.gao at linaro.org; chensihang (A)
> > <chensihang1 at hisilicon.com>
> > Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory
> > pin
> > 
> > On Mon, Feb 08, 2021 at 09:14:28AM +0100, David Hildenbrand wrote:
> > 
> > > People are constantly struggling with the effects of long term pinnings
> > > under user space control, like we already have with vfio and RDMA.
> > >
> > > And here we are, adding yet another, easier way to mess with core MM in the
> > > same way. This feels like a step backwards to me.
> > 
> > Yes, this seems like a very poor candidate to be a system call in this
> > format. Much too narrow, poorly specified, and possibly security
> > implications to allow any process whatsoever to pin memory.
> > 
> > I keep encouraging people to explore a standard shared SVA interface
> > that can cover all these topics (and no, uaccel is not that
> > interface), that seems much more natural.
> > 
> > I still haven't seen an explanation why DMA is so special here,
> > migration and so forth jitter the CPU too, environments that care
> > about jitter have to turn this stuff off.
> 
> This paper has a good explanation:
> https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7482091
> 
> mainly because page fault can go directly to the CPU and we have
> many CPUs. But IO Page Faults go a different way, thus mean much
> higher latency 3-80x slower than page fault:
> events in hardware queue -> Interrupts -> cpu processing page fault
> -> return events to iommu/device -> continue I/O.

The justifications for this was migration scenarios and migration is
short. If you take a fault on what you are migrating only then does it
slow down the CPU.

Are you also working with HW where the IOMMU becomes invalidated after
a migration and doesn't reload?

ie not true SVA but the sort of emulated SVA we see in a lot of
places?

It would be much better to work improve that to have closer sync with the
CPU page table than to use pinning.

Jason