[PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()

Christian König christian.koenig at amd.com
Sun Oct 3 23:58:35 PDT 2021


I'm not following this discussion to closely, but try to look into it 
from time to time.

Am 01.10.21 um 19:45 schrieb Jason Gunthorpe:
> On Fri, Oct 01, 2021 at 11:01:49AM -0600, Logan Gunthorpe wrote:
>
>> In device-dax, the refcount is only used to prevent the device, and
>> therefore the pages, from going away on device unbind. Pages cannot be
>> recycled, as you say, as they are mapped linearly within the device. The
>> address space invalidation is done only when the device is unbound.
> By address space invalidation I mean invalidation of the VMA that is
> pointing to those pages.
>
> device-dax may not have a issue with use-after-VMA-invalidation by
> it's very nature since every PFN always points to the same
> thing. fsdax and this p2p stuff are different though.
>
>> Before the invalidation, an active flag is cleared to ensure no new
>> mappings can be created while the unmap is proceeding.
>> unmap_mapping_range() should sequence itself with the TLB flush and
> AFIAK unmap_mapping_range() kicks off the TLB flush and then
> returns. It doesn't always wait for the flush to fully finish. Ie some
> cases use RCU to lock the page table against GUP fast and so the
> put_page() doesn't happen until the call_rcu completes - after a grace
> period. The unmap_mapping_range() does not wait for grace periods.

Wow, wait a second. That is quite a boomer. At least in all GEM/TTM 
based graphics drivers that could potentially cause a lot of trouble.

I've just double checked and we certainly have the assumption that when 
unmap_mapping_range() returns the pte is gone and the TLB flush 
completed in quite a number of places.

Do you have more information when and why that can happen?

Thanks,
Christian.



More information about the Linux-nvme mailing list