[LSF/MM/BPF TOPIC] Removing GFP_NOFS

Michal Hocko mhocko at suse.com
Thu Feb 8 09:33:51 PST 2024


On Thu 08-02-24 17:02:07, Vlastimil Babka (SUSE) wrote:
> On 1/9/24 05:47, Dave Chinner wrote:
> > On Thu, Jan 04, 2024 at 09:17:16PM +0000, Matthew Wilcox wrote:
> >> This is primarily a _FILESYSTEM_ track topic.  All the work has already
> >> been done on the MM side; the FS people need to do their part.  It could
> >> be a joint session, but I'm not sure there's much for the MM people
> >> to say.
> >> 
> >> There are situations where we need to allocate memory, but cannot call
> >> into the filesystem to free memory.  Generally this is because we're
> >> holding a lock or we've started a transaction, and attempting to write
> >> out dirty folios to reclaim memory would result in a deadlock.
> >> 
> >> The old way to solve this problem is to specify GFP_NOFS when allocating
> >> memory.  This conveys little information about what is being protected
> >> against, and so it is hard to know when it might be safe to remove.
> >> It's also a reflex -- many filesystem authors use GFP_NOFS by default
> >> even when they could use GFP_KERNEL because there's no risk of deadlock.
> >> 
> >> The new way is to use the scoped APIs -- memalloc_nofs_save() and
> >> memalloc_nofs_restore().  These should be called when we start a
> >> transaction or take a lock that would cause a GFP_KERNEL allocation to
> >> deadlock.  Then just use GFP_KERNEL as normal.  The memory allocators
> >> can see the nofs situation is in effect and will not call back into
> >> the filesystem.
> > 
> > So in rebasing the XFS kmem.[ch] removal patchset I've been working
> > on, there is a clear memory allocator function that we need to be
> > scoped: __GFP_NOFAIL.
> > 
> > All of the allocations done through the existing XFS kmem.[ch]
> > interfaces (i.e just about everything) have __GFP_NOFAIL semantics
> > added except in the explicit cases where we add KM_MAYFAIL to
> > indicate that the allocation can fail.
> > 
> > The result of this conversion to remove GFP_NOFS is that I'm also
> > adding *dozens* of __GFP_NOFAIL annotations because we effectively
> > scope that behaviour.
> > 
> > Hence I think this discussion needs to consider that __GFP_NOFAIL is
> > also widely used within critical filesystem code that cannot
> > gracefully recover from memory allocation failures, and that this
> > would also be useful to scope....
> > 
> > Yeah, I know, mm developers hate __GFP_NOFAIL. We've been using
> > these semantics NOFAIL in XFS for over 2 decades and the sky hasn't
> > fallen. So can we get memalloc_nofail_{save,restore}() so that we
> > can change the default allocation behaviour in certain contexts
> > (e.g. the same contexts we need NOFS allocations) to be NOFAIL
> > unless __GFP_RETRY_MAYFAIL or __GFP_NORETRY are set?
> 
> Your points and Kent's proposal of scoped GFP_NOWAIT [1] suggests to me this
> is no longer FS-only topic as this isn't just about converting to the scoped
> apis, but also how they should be improved.

Scoped GFP_NOFAIL context is slightly easier from the semantic POV than
scoped GFP_NOWAIT as it doesn't add a potentially unexpected failure
mode. It is still tricky to deal with GFP_NOWAIT requests inside the
NOFAIL scope because that makes it a non failing busy wait for an
allocation if we need to insist on scope NOFAIL semantic. 

On the other hand we can define the behavior similar to what you
propose with RETRY_MAYFAIL resp. NORETRY. Existing NOWAIT users should
better handle allocation failures regardless of the external allocation
scope.

Overriding that scoped NOFAIL semantic with RETRY_MAYFAIL or NORETRY
resembles the existing PF_MEMALLOC and GFP_NOMEMALLOC semantic and I do
not see an immediate problem with that.

Having more NOFAIL allocations is not great but if you need to
emulate those by implementing the nofail semantic outside of the
allocator then it is better to have those retries inside the allocator
IMO.

> [1] http://lkml.kernel.org/r/Zbu_yyChbCO6b2Lj@tiehlicka
> 
> > We already have memalloc_noreclaim_{save/restore}() for turning off
> > direct memory reclaim for a given context (i.e. equivalent of
> > clearing __GFP_DIRECT_RECLAIM), so if we are going to embrace scoped
> > allocation contexts, then we should be going all in and providing
> > all the contexts that filesystems actually need....
> > 
> > -Dave.

-- 
Michal Hocko
SUSE Labs



More information about the Linux-nvme mailing list