[PATCH] nvme: Drop WQ_MEM_RECLAIM flag from core workqueues
Jason Gunthorpe
jgg at ziepe.ca
Mon Apr 12 14:04:02 BST 2021
On Mon, Apr 12, 2021 at 02:49:09PM +0200, Daniel Wagner wrote:
> I've grepped through the code and didn't find anything which supports
> the guarantee claim. Neither mm nor schedule seems to care about this
> flag nor workqueue.c (except the early init bits). Or I must miss
> something.
It is pretty complicated, but the WQ_MEM_RECLAIM preallocates a thread:
static int init_rescuer(struct workqueue_struct *wq)
{
if (!(wq->flags & WQ_MEM_RECLAIM))
return 0;
rescuer = alloc_worker(NUMA_NO_NODE);
This comment explains it:
* Workqueue rescuer thread function. There's one rescuer for each
* workqueue which has WQ_MEM_RECLAIM set.
*
* Regular work processing on a pool may block trying to create a new
* worker which uses GFP_KERNEL allocation which has slight chance of
* developing into deadlock if some works currently on the same queue
* need to be processed to satisfy the GFP_KERNEL allocation. This is
* the problem rescuer solves.
*
* When such condition is possible, the pool summons rescuers of all
* workqueues which have works queued on the pool and let them process
* those works so that forward progress can be guaranteed.
*
* This should happen rarely.
Basically the allocation of importance in the workqueue is assigning a
worker, so pre-allocating a worker ensures the work can continue to
progress without becoming dependent on allocations.
This is why work under the WQ_MEM_RECLAIM cannot recurse back into the
allocator as it would get a rescurer thread stuck at a point when all
other threads are already stuck.
To remove WQ_MEM_RECLAIM you have to make assertions about the calling
contexts and blocking contexts of the workqueue, not what the work
itself is doing.
Jason
More information about the Linux-nvme
mailing list