[PATCH v3 3/3] iova: defer maple tree erase on GFP_ATOMIC failure

Wed Jun 17 10:45:27 PDT 2026

On 26/06/15 08:56AM, Jason Gunthorpe wrote:
> On Fri, Jun 12, 2026 at 02:44:06PM -0400, Liam R. Howlett wrote:
> > > Currently it never returns a failure to the caller. Look at mas_erase():
> > > 
> > > 	entry = mas_state_walk(mas);
> > > 	if (!entry)
> > > 		return NULL;
> > > [..]
> > > 	if (mas_is_err(mas))
> > > 		goto out;
> > > [..]
> > > out:
> > > 	mas_destroy(mas);
> > > 	return entry;
> > > 
> > > There is no propogation of ENOMEM, it returns success. No caller
> > > checks for any error here either.
> > 
> > At one point this was considered to be impossible to fail, and it is
> > documented to return the entry or null.
> 
> I think that is the right API design..

Callers can check mas_is_err() and check for xa_err(mas) == -ENOMEM.
I'm going to add a note about it to the documentation of the erase
function.

> 
> > I think I should probably change this to return -ENOMEM, fix the docs on
> > it, and probably audit the callers (most use external locks or are
> > early-boot-so-don't-worry-about-it).  Any issue here should also be
> > caught by lockdep pretty quickly.
> 
> Locking aside, I don't think most of the callers can handle an -ENOMEM
> return at all..
> 
> > mas_store_gfp() works if you know the range of your entry.  You could
> > also write the XA_ZERO_ENTRY over the entry so that there is no internal
> > node changes - just a value swap.  If you do this, you have to be
> > careful when reading things back when using the mas_ interface.
> 
> That's probably a good option. Try to store NULL, if that ENOMEM's
> then drop a ZERO_ENTRY and set a bit someplace so that the next
> allocation has to sweep the tree and clean the ZERO's. At least that
> shifts the ENOMEM to an alloc side operation where it can be handled.

Why is the retry GFP_ATOMIC on a timer of 10ms?

Also, why is this patch set using an external spinlock only created to
manage the tree?  Why isn't it using an internal lock?  Is it just to
avoid the possibility of the unlock?

> 
> > I think, in your case, hitting an XA_ZERO_ENTRY would be necessary to
> > indicate that we cannot reuse this particular location until it is
> > correctly dealt with?  Or is the maple tree the only reason it is
> > considered unusable?
> 
> Yeah, it would be be a maple tree issue only. Defered rebalancing
> leave space unavailable.
> 
> This is a case where there is no sane way to handle destroy
> failure. You can't return an error code from dma_unmap() for
> example. So the only reason for this complexity is because maple tree
> exposes a failable erase to it's caller..

I think this is even more complicated by the contexts it is called in -
that is, we cannot preallocate prior to going into this state either?

> 
> IDK, could it handle it internally somehow? Or at least document a
> reasonable pattern with some helpers?
> 
> Eg _iommufd_destroy_mmap() is in trouble too, it cargo culted the
> no-check mt_erase.

Are you sure that's not okay?  The mt_mmap tree is allocated with an
internal spinlock.  In this case, the lock will be dropped, the
allocation will be satisfied and the erase operation will retry.

Thanks,
Liam