[RFC PATCH 01/11] workqueue: Add a decrement-after-return and wake if 0 facility

Tue Sep 5 07:50:16 PDT 2017

Tejun Heo <tj at kernel.org> wrote:

> Given how work items are used, I think this is too inviting to abuses
> where people build complex event chains through these counters and
> those chains would be completely opaque.  If the goal is protecting
> .text of a work item, can't we just do that?  Can you please describe
> your use case in more detail?

With one of my latest patches to AFS, there's a set of cell records, where
each cell has a manager work item that mainains that cell, including
refreshing DNS records and excising expired records from the list.  Performing
the excision in the manager work item makes handling the fscache index cookie
easier (you can't have two cookies attached to the same object), amongst other
things.

There's also an overseer work item that maintains a single expiry timer for
all the cells and queues the per-cell work items to do DNS updates and cell
removal.

The reason that the overseer exists is that it makes it easier to do a put on
a cell.  The cell decrements the cell refcount and then wants to schedule the
cell for destruction - but it's no longer permitted to touch the cell.  I
could use atomic_dec_and_lock(), but that's messy.  It's cleaner just to set
the timer on the overseer and leave it to that.

However, if someone does rmmod, I have to be able to clean everything up.  The
overseer timer may be queued or running; the overseer may be queued *and*
running and may get queued again by the timer; and each cell's work item may
be queued *and* running and may get queued again by the manager.

> Why can't it be done via the usual "flush from exit"?

Well, it can, but you need a flush for each separate level of dependencies,
where one dependency will kick off another level of dependency during the
cleanup.

So what I think I would have to do is set a flag to say that no one is allowed
to set the timer now (this shouldn't happen outside of server or volume cache
clearance), delete the timer synchronously, flush the work queue four times
and then do an RCU barrier.

However, since I have volumes with dependencies on servers and cells, possibly
with their own managers, I think I may need up to 12 flushes, possibly with
interspersed RCU barriers.

It's much simpler to count out the objects than to try and get the flushing
right.

David