[PATCH v3 3/3] iova: defer maple tree erase on GFP_ATOMIC failure

Thu Jun 18 21:54:13 PDT 2026

On 26/06/18 11:51PM, Rik van Riel wrote:
> On Thu, 2026-06-18 at 12:24 -0300, Jason Gunthorpe wrote:
> > 
> > - I like your idea for Rik to try to store NULL to erase, on failure
> >   store ZERO_ENTRY, and then set a note on the next alloc to clean
> > the
> >   ZERO_ENTRYs?
> > 
> Is there any efficient way to find those XA_ZERO_ENTRYs?
> 
> Without a good way to find them, we might still need the
> llist to clean them up, though I agree that cleaning them
> up from the allocator side looks cleaner than doing it
> from an external worker, and I did make that change for v4.

no, but how often is there a failure?  Would walking the list and
writing NULLs be out of the question (I really don't know, sorry if it's
a dumb one)?

Something like this totally untested code:

unsigned long pfn_lo, pfn_hi;

pfn_hi = 0;
pfn_lo = 0;
mas_lock(&mas);
mas_set(&mas, 0);
mas_for_each(&mas, entry, ULONG_MAX) {
        if (entry == XA_ZERO_ENTRY) {
                if (mas.index < pfn_lo)
                        pfn_lo = mas.index;
                pfn_hi = mas.last;
        }
}

if (pfn_hi) {
        mas_set_range(&mas, pfn_lo, pfn_hi);
        mas_store_gfp(&mas, NULL, GFP_KERNEL);
}

mas_unlock(&mas);

If you want to optimise it, you can just keep the first failure in the
contiguous area there are failures and reuse the pfn_hi to set the
range. (are the areas contiguous usually?) It's probably better to free
any memory you can since you just ran out of memory anyways.  Although
it's 16B.. how many are usually processed in a group?

Would a normal workqueue be okay to do the freeing?

Thanks,
Liam