[LSF/MM/BPF TOPIC] Removing GFP_NOFS

Vlastimil Babka (SUSE) vbabka at kernel.org
Thu Feb 8 08:02:07 PST 2024


On 1/9/24 05:47, Dave Chinner wrote:
> On Thu, Jan 04, 2024 at 09:17:16PM +0000, Matthew Wilcox wrote:
>> This is primarily a _FILESYSTEM_ track topic.  All the work has already
>> been done on the MM side; the FS people need to do their part.  It could
>> be a joint session, but I'm not sure there's much for the MM people
>> to say.
>> 
>> There are situations where we need to allocate memory, but cannot call
>> into the filesystem to free memory.  Generally this is because we're
>> holding a lock or we've started a transaction, and attempting to write
>> out dirty folios to reclaim memory would result in a deadlock.
>> 
>> The old way to solve this problem is to specify GFP_NOFS when allocating
>> memory.  This conveys little information about what is being protected
>> against, and so it is hard to know when it might be safe to remove.
>> It's also a reflex -- many filesystem authors use GFP_NOFS by default
>> even when they could use GFP_KERNEL because there's no risk of deadlock.
>> 
>> The new way is to use the scoped APIs -- memalloc_nofs_save() and
>> memalloc_nofs_restore().  These should be called when we start a
>> transaction or take a lock that would cause a GFP_KERNEL allocation to
>> deadlock.  Then just use GFP_KERNEL as normal.  The memory allocators
>> can see the nofs situation is in effect and will not call back into
>> the filesystem.
> 
> So in rebasing the XFS kmem.[ch] removal patchset I've been working
> on, there is a clear memory allocator function that we need to be
> scoped: __GFP_NOFAIL.
> 
> All of the allocations done through the existing XFS kmem.[ch]
> interfaces (i.e just about everything) have __GFP_NOFAIL semantics
> added except in the explicit cases where we add KM_MAYFAIL to
> indicate that the allocation can fail.
> 
> The result of this conversion to remove GFP_NOFS is that I'm also
> adding *dozens* of __GFP_NOFAIL annotations because we effectively
> scope that behaviour.
> 
> Hence I think this discussion needs to consider that __GFP_NOFAIL is
> also widely used within critical filesystem code that cannot
> gracefully recover from memory allocation failures, and that this
> would also be useful to scope....
> 
> Yeah, I know, mm developers hate __GFP_NOFAIL. We've been using
> these semantics NOFAIL in XFS for over 2 decades and the sky hasn't
> fallen. So can we get memalloc_nofail_{save,restore}() so that we
> can change the default allocation behaviour in certain contexts
> (e.g. the same contexts we need NOFS allocations) to be NOFAIL
> unless __GFP_RETRY_MAYFAIL or __GFP_NORETRY are set?

Your points and Kent's proposal of scoped GFP_NOWAIT [1] suggests to me this
is no longer FS-only topic as this isn't just about converting to the scoped
apis, but also how they should be improved.

[1] http://lkml.kernel.org/r/Zbu_yyChbCO6b2Lj@tiehlicka

> We already have memalloc_noreclaim_{save/restore}() for turning off
> direct memory reclaim for a given context (i.e. equivalent of
> clearing __GFP_DIRECT_RECLAIM), so if we are going to embrace scoped
> allocation contexts, then we should be going all in and providing
> all the contexts that filesystems actually need....
> 
> -Dave.




More information about the Linux-nvme mailing list