[PATCH v3 3/3] iova: defer maple tree erase on GFP_ATOMIC failure
Liam R. Howlett
liam at infradead.org
Thu Jun 18 07:50:56 PDT 2026
On 26/06/17 03:04PM, Jason Gunthorpe wrote:
> On Wed, Jun 17, 2026 at 01:45:27PM -0400, Liam R. Howlett wrote:
> > On 26/06/15 08:56AM, Jason Gunthorpe wrote:
> > > On Fri, Jun 12, 2026 at 02:44:06PM -0400, Liam R. Howlett wrote:
> > > > > Currently it never returns a failure to the caller. Look at mas_erase():
> > > > >
> > > > > entry = mas_state_walk(mas);
> > > > > if (!entry)
> > > > > return NULL;
> > > > > [..]
> > > > > if (mas_is_err(mas))
> > > > > goto out;
> > > > > [..]
> > > > > out:
> > > > > mas_destroy(mas);
> > > > > return entry;
> > > > >
> > > > > There is no propogation of ENOMEM, it returns success. No caller
> > > > > checks for any error here either.
> > > >
> > > > At one point this was considered to be impossible to fail, and it is
> > > > documented to return the entry or null.
> > >
> > > I think that is the right API design..
> >
> > Callers can check mas_is_err() and check for xa_err(mas) == -ENOMEM.
> > I'm going to add a note about it to the documentation of the erase
> > function.
>
> That's something for mas_erase, but doesn't help mtree_erase() ..
Indeed.
I'd rather avoid the leaking of the very much unlikely failure checks.
...
> > >
> > > > I think, in your case, hitting an XA_ZERO_ENTRY would be necessary to
> > > > indicate that we cannot reuse this particular location until it is
> > > > correctly dealt with? Or is the maple tree the only reason it is
> > > > considered unusable?
> > >
> > > Yeah, it would be be a maple tree issue only. Defered rebalancing
> > > leave space unavailable.
> > >
> > > This is a case where there is no sane way to handle destroy
> > > failure. You can't return an error code from dma_unmap() for
> > > example. So the only reason for this complexity is because maple tree
> > > exposes a failable erase to it's caller..
> >
> > I think this is even more complicated by the contexts it is called in -
> > that is, we cannot preallocate prior to going into this state either?
>
> Yes, in this case at least the context is GFP_ATOMIC and there is no
> way to pre-allocate.. But that seems like another issue since the
> mas_erase does not support GFP_ATOMIC anyhow..
Yes. If you need atomic allocations then you should stick to the API
that allows you to pass in the GFP flags. It may not be clear to users
that allocations might be required for erase, so I should document that.
>
> > > Eg _iommufd_destroy_mmap() is in trouble too, it cargo culted the
> > > no-check mt_erase.
> >
> > Are you sure that's not okay? The mt_mmap tree is allocated with an
> > internal spinlock. In this case, the lock will be dropped, the
> > allocation will be satisfied and the erase operation will retry.
>
> Is this what I was asking before? Under some conditions the allocation
> can not fail because in the modern kernel we don't allow small
> GFP_KERNEL allocations to fail?
>
> Otherwise this:
>
> if (gfpflags_allow_blocking(gfp) && !mt_external_lock(mas->tree)) {
> mtree_unlock(mas->tree);
> mas_alloc_nodes(mas, gfp);
> mtree_lock(mas->tree);
> } else {
> mas_alloc_nodes(mas, gfp);
> }
>
> Is always called with GFP_KERNEL for erase. It doesn't seem like
> external lock has any impact if mas_alloc_nodes can fail or not?
>
> It looks like if you have an external lock then the hard wired
> GFP_KERNEL in mtree_erase/mas_erase mean the lock has to be a sleeping
> kind to use those functions.
>
Ah, right.
Yes. The idea here is Too Small To Fail.
Also, note that we call our first allocation with NOWAIT, so that may
have already started the process of getting memory (I think?).
Also note that this is in sheaves, so we'd have to have nothing left in
the maple tree kmem_cache, the sheave, and no page laying around (1 page
is 16 nodes..), and the NOWAIT to have no effect when arriving here.
> If that's the case it should be documented like this too :)
Yeah, very much should add some notes here... but also maybe stop them
from doing it by adding this to mas_erase():
if (mt_external_lock(mas->tree))
might_alloc(GFP_KERNEL);
Otherwise, I'll just be telling people they didn't read the docs.
Thanks,
Liam
More information about the maple-tree
mailing list