[PATCH v3 3/3] iova: defer maple tree erase on GFP_ATOMIC failure
Jason Gunthorpe
jgg at ziepe.ca
Wed Jun 17 11:04:19 PDT 2026
On Wed, Jun 17, 2026 at 01:45:27PM -0400, Liam R. Howlett wrote:
> On 26/06/15 08:56AM, Jason Gunthorpe wrote:
> > On Fri, Jun 12, 2026 at 02:44:06PM -0400, Liam R. Howlett wrote:
> > > > Currently it never returns a failure to the caller. Look at mas_erase():
> > > >
> > > > entry = mas_state_walk(mas);
> > > > if (!entry)
> > > > return NULL;
> > > > [..]
> > > > if (mas_is_err(mas))
> > > > goto out;
> > > > [..]
> > > > out:
> > > > mas_destroy(mas);
> > > > return entry;
> > > >
> > > > There is no propogation of ENOMEM, it returns success. No caller
> > > > checks for any error here either.
> > >
> > > At one point this was considered to be impossible to fail, and it is
> > > documented to return the entry or null.
> >
> > I think that is the right API design..
>
> Callers can check mas_is_err() and check for xa_err(mas) == -ENOMEM.
> I'm going to add a note about it to the documentation of the erase
> function.
That's something for mas_erase, but doesn't help mtree_erase() ..
> Why is the retry GFP_ATOMIC on a timer of 10ms?
>
> Also, why is this patch set using an external spinlock only created to
> manage the tree? Why isn't it using an internal lock? Is it just to
> avoid the possibility of the unlock?
IDK, seem like good questions
> >
> > > I think, in your case, hitting an XA_ZERO_ENTRY would be necessary to
> > > indicate that we cannot reuse this particular location until it is
> > > correctly dealt with? Or is the maple tree the only reason it is
> > > considered unusable?
> >
> > Yeah, it would be be a maple tree issue only. Defered rebalancing
> > leave space unavailable.
> >
> > This is a case where there is no sane way to handle destroy
> > failure. You can't return an error code from dma_unmap() for
> > example. So the only reason for this complexity is because maple tree
> > exposes a failable erase to it's caller..
>
> I think this is even more complicated by the contexts it is called in -
> that is, we cannot preallocate prior to going into this state either?
Yes, in this case at least the context is GFP_ATOMIC and there is no
way to pre-allocate.. But that seems like another issue since the
mas_erase does not support GFP_ATOMIC anyhow..
> > Eg _iommufd_destroy_mmap() is in trouble too, it cargo culted the
> > no-check mt_erase.
>
> Are you sure that's not okay? The mt_mmap tree is allocated with an
> internal spinlock. In this case, the lock will be dropped, the
> allocation will be satisfied and the erase operation will retry.
Is this what I was asking before? Under some conditions the allocation
can not fail because in the modern kernel we don't allow small
GFP_KERNEL allocations to fail?
Otherwise this:
if (gfpflags_allow_blocking(gfp) && !mt_external_lock(mas->tree)) {
mtree_unlock(mas->tree);
mas_alloc_nodes(mas, gfp);
mtree_lock(mas->tree);
} else {
mas_alloc_nodes(mas, gfp);
}
Is always called with GFP_KERNEL for erase. It doesn't seem like
external lock has any impact if mas_alloc_nodes can fail or not?
It looks like if you have an external lock then the hard wired
GFP_KERNEL in mtree_erase/mas_erase mean the lock has to be a sleeping
kind to use those functions.
If that's the case it should be documented like this too :)
Jason
More information about the maple-tree
mailing list