[PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages

Dave Chinner david at fromorbit.com
Mon Mar 27 16:02:06 PDT 2023


On Mon, Mar 27, 2023 at 06:45:15PM +0100, Matthew Wilcox (Oracle) wrote:
> Prevent filesystems from doing things which sleep in their map_pages
> method.  This is in preparation for a pagefault path protected only
> by RCU.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy at infradead.org>
> ---
>  Documentation/filesystems/locking.rst |  4 ++--
>  mm/memory.c                           | 11 ++++++++---
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index 922886fefb7f..8a80390446ba 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -645,7 +645,7 @@ ops		mmap_lock	PageLocked(page)
>  open:		yes
>  close:		yes
>  fault:		yes		can return with page locked
> -map_pages:	yes
> +map_pages:	read
>  page_mkwrite:	yes		can return with page locked
>  pfn_mkwrite:	yes
>  access:		yes
> @@ -661,7 +661,7 @@ locked. The VM will unlock the page.
>  
>  ->map_pages() is called when VM asks to map easy accessible pages.
>  Filesystem should find and map pages associated with offsets from "start_pgoff"
> -till "end_pgoff". ->map_pages() is called with page table locked and must
> +till "end_pgoff". ->map_pages() is called with the RCU lock held and must
>  not block.  If it's not possible to reach a page without blocking,
>  filesystem should skip it. Filesystem should use set_pte_range() to setup
>  page table entry. Pointer to entry associated with the page is passed in
> diff --git a/mm/memory.c b/mm/memory.c
> index 8071bb17abf2..a7edf6d714db 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4461,6 +4461,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
>  	/* The page offset of vmf->address within the VMA. */
>  	pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
>  	pgoff_t from_pte, to_pte;
> +	vm_fault_t ret;
>  
>  	/* The PTE offset of the start address, clamped to the VMA. */
>  	from_pte = max(ALIGN_DOWN(pte_off, nr_pages),
> @@ -4476,9 +4477,13 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
>  			return VM_FAULT_OOM;
>  	}
>  
> -	return vmf->vma->vm_ops->map_pages(vmf,
> -		vmf->pgoff + from_pte - pte_off,
> -		vmf->pgoff + to_pte - pte_off);
> +	rcu_read_lock();
> +	ret = vmf->vma->vm_ops->map_pages(vmf,
> +			vmf->pgoff + from_pte - pte_off,
> +			vmf->pgoff + to_pte - pte_off);
> +	rcu_read_unlock();
> +
> +	return ret;

Doesn't this mean that the rcu_read_lock/unlock can be removed from
filemap_map_pages()? i.e. all callers are now already under
rcu_read_lock(). Maybe WARN_ON_ONCE(!rcu_read_lock_held()) could
be put in filemap_map_pages() if you are worried about callers not
holding it...

Otherwise it looks fine.

-Dave.
-- 
Dave Chinner
david at fromorbit.com



More information about the linux-afs mailing list