[PATCH v6 01/11] filemap: Pass address_space mapping to ->free_folio()
David Hildenbrand
david at redhat.com
Wed Sep 17 07:52:40 PDT 2025
On 16.09.25 08:23, Hugh Dickins wrote:
> On Fri, 12 Sep 2025, Roy, Patrick wrote:
>
>> From: Elliot Berman <quic_eberman at quicinc.com>
>>
>> When guest_memfd removes memory from the host kernel's direct map,
>> direct map entries must be restored before the memory is freed again. To
>> do so, ->free_folio() needs to know whether a gmem folio was direct map
>> removed in the first place though. While possible to keep track of this
>> information on each individual folio (e.g. via page flags), direct map
>> removal is an all-or-nothing property of the entire guest_memfd, so it
>> is less error prone to just check the flag stored in the gmem inode's
>> private data. However, by the time ->free_folio() is called,
>> folio->mapping might be cleared. To still allow access to the address
>> space from which the folio was just removed, pass it in as an additional
>> argument to ->free_folio, as the mapping is well-known to all callers.
>>
>> Link: https://lore.kernel.org/all/15f665b4-2d33-41ca-ac50-fafe24ade32f@redhat.com/
>> Suggested-by: David Hildenbrand <david at redhat.com>
>> Acked-by: David Hildenbrand <david at redhat.com>
>> Signed-off-by: Elliot Berman <quic_eberman at quicinc.com>
>> [patrick: rewrite shortlog for new usecase]
>> Signed-off-by: Patrick Roy <roypat at amazon.co.uk>
>> ---
>> Documentation/filesystems/locking.rst | 2 +-
>> fs/nfs/dir.c | 11 ++++++-----
>> fs/orangefs/inode.c | 3 ++-
>> include/linux/fs.h | 2 +-
>> mm/filemap.c | 9 +++++----
>> mm/secretmem.c | 3 ++-
>> mm/vmscan.c | 4 ++--
>> virt/kvm/guest_memfd.c | 3 ++-
>> 8 files changed, 21 insertions(+), 16 deletions(-)
>>
>> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
>> index aa287ccdac2f..74c97287ec40 100644
>> --- a/Documentation/filesystems/locking.rst
>> +++ b/Documentation/filesystems/locking.rst
>> @@ -262,7 +262,7 @@ prototypes::
>> sector_t (*bmap)(struct address_space *, sector_t);
>> void (*invalidate_folio) (struct folio *, size_t start, size_t len);
>> bool (*release_folio)(struct folio *, gfp_t);
>> - void (*free_folio)(struct folio *);
>> + void (*free_folio)(struct address_space *, struct folio *);
>> int (*direct_IO)(struct kiocb *, struct iov_iter *iter);
>> int (*migrate_folio)(struct address_space *, struct folio *dst,
>> struct folio *src, enum migrate_mode);
>
> Beware, that is against the intent of free_folio().
>
> Since its 2.6.37 origin in 6072d13c4293 ("Call the filesystem back
> whenever a page is removed from the page cache"), freepage() or
> free_folio() has intentionally NOT taken a struct address_space *mapping,
> because that structure may already be freed by the time free_folio() is
> called, if the last folio holding it has now been freed.
Thanks for noticing that Hugh, very good point!
>
> Maybe something has changed since then, or maybe it happens to be safe
> just in the context in which you want to use it; but it is against the
> principle of free_folio(). (Maybe an rcu_read_lock() could be added
> in __remove_mapping() to make it safe nowadays? maybe not welcome.)
Let me dig into the callers:
1) filemap_free_folio()
filemap_free_folio() looks up the callback through
mapping->a_ops->free_folio. Nothing happens in-between that lookup and
the callback so we should be good.
2) replace_page_cache_folio()
replace_page_cache_folio() similarly looks up the callback through
mapping->a_ops->free_folio. We do some operations afterwards, but
essentially store the new folio in the page cache and remove the old one.
The only caller is fuse_try_move_folio(), and IIUC both folios are
locked, preventing concurrent truncation and the mapping going away.
3) __remove_mapping()
__remove_mapping() also looks up the callback through
mapping->a_ops->free_folio.
Before we call free_folio() we remove the folio from the pagecache
(__filemap_remove_folio) to then drop locks and call free_folio().
We're only holding the folio lock at that point.
So yes I agree, truncate_inode_pages_final() could be racing with
__remove_mapping().c That's probably exactly what the docs describe
regarding reclaim.
rcu_read_lock() should indeed work, or some other mechanism that keeps
truncate_inode_pages_final() from succeeding in this racy situation.
Alternatively I guess we would have to use another callback.
--
Cheers
David / dhildenb
More information about the linux-riscv
mailing list