[PATCH v3 3/3] iova: defer maple tree erase on GFP_ATOMIC failure

Jason Gunthorpe jgg at ziepe.ca
Wed Jun 17 11:04:19 PDT 2026


On Wed, Jun 17, 2026 at 01:45:27PM -0400, Liam R. Howlett wrote:
> On 26/06/15 08:56AM, Jason Gunthorpe wrote:
> > On Fri, Jun 12, 2026 at 02:44:06PM -0400, Liam R. Howlett wrote:
> > > > Currently it never returns a failure to the caller. Look at mas_erase():
> > > > 
> > > > 	entry = mas_state_walk(mas);
> > > > 	if (!entry)
> > > > 		return NULL;
> > > > [..]
> > > > 	if (mas_is_err(mas))
> > > > 		goto out;
> > > > [..]
> > > > out:
> > > > 	mas_destroy(mas);
> > > > 	return entry;
> > > > 
> > > > There is no propogation of ENOMEM, it returns success. No caller
> > > > checks for any error here either.
> > > 
> > > At one point this was considered to be impossible to fail, and it is
> > > documented to return the entry or null.
> > 
> > I think that is the right API design..
> 
> Callers can check mas_is_err() and check for xa_err(mas) == -ENOMEM.
> I'm going to add a note about it to the documentation of the erase
> function.

That's something for mas_erase, but doesn't help mtree_erase() ..

> Why is the retry GFP_ATOMIC on a timer of 10ms?
> 
> Also, why is this patch set using an external spinlock only created to
> manage the tree?  Why isn't it using an internal lock?  Is it just to
> avoid the possibility of the unlock?

IDK, seem like good questions

> > 
> > > I think, in your case, hitting an XA_ZERO_ENTRY would be necessary to
> > > indicate that we cannot reuse this particular location until it is
> > > correctly dealt with?  Or is the maple tree the only reason it is
> > > considered unusable?
> > 
> > Yeah, it would be be a maple tree issue only. Defered rebalancing
> > leave space unavailable.
> > 
> > This is a case where there is no sane way to handle destroy
> > failure. You can't return an error code from dma_unmap() for
> > example. So the only reason for this complexity is because maple tree
> > exposes a failable erase to it's caller..
> 
> I think this is even more complicated by the contexts it is called in -
> that is, we cannot preallocate prior to going into this state either?

Yes, in this case at least the context is GFP_ATOMIC and there is no
way to pre-allocate.. But that seems like another issue since the
mas_erase does not support GFP_ATOMIC anyhow..

> > Eg _iommufd_destroy_mmap() is in trouble too, it cargo culted the
> > no-check mt_erase.
> 
> Are you sure that's not okay?  The mt_mmap tree is allocated with an
> internal spinlock.  In this case, the lock will be dropped, the
> allocation will be satisfied and the erase operation will retry.

Is this what I was asking before? Under some conditions the allocation
can not fail because in the modern kernel we don't allow small
GFP_KERNEL allocations to fail?

Otherwise this:

	if (gfpflags_allow_blocking(gfp) && !mt_external_lock(mas->tree)) {
		mtree_unlock(mas->tree);
		mas_alloc_nodes(mas, gfp);
		mtree_lock(mas->tree);
	} else {
		mas_alloc_nodes(mas, gfp);
	}

Is always called with GFP_KERNEL for erase. It doesn't seem like
external lock has any impact if mas_alloc_nodes can fail or not?

It looks like if you have an external lock then the hard wired
GFP_KERNEL in mtree_erase/mas_erase mean the lock has to be a sleeping
kind to use those functions.

If that's the case it should be documented like this too :)

Jason



More information about the maple-tree mailing list