[RFC PATCH 00/18] kthreads/signal: Safer kthread API and signal handling

Peter Zijlstra peterz at infradead.org
Fri Jun 5 09:22:16 PDT 2015


On Fri, Jun 05, 2015 at 05:00:59PM +0200, Petr Mladek wrote:
> Workqueue
> 
> 
> Workqueues are quite popular and many kthreads have already been
> converted into them.
> 
> Work queues allow to split the function into even more pieces and
> reach the common check point more often. It is especially useful
> when a kthread handles more tasks and is woken when some work
> is needed. Then we could queue the appropriate work instead
> of waking the whole kthread and checking what exactly needs
> to be done.
> 
> But there are many kthreads that need to cycle many times
> until some work is finished, e.g. khugepaged, virtio_balloon,
> jffs2_garbage_collect_thread. They would need to queue the
> work item repeatedly from the same work item or between
> more work items. It would be a strange semantic.
> 
> Work queues allow to share the same kthread between more users.
> It helps to reduce the number of running kthreads. It is especially
> useful if you would need a kthread for each CPU.
> 
> But this might also be a disadvantage. Just look into the output
> of the command "ps" and see the many [kworker*] processes. One
> might see this a black hole. If a kworker makes the system busy,
> it is less obvious what the problem is in compare with the old
> "simple" and dedicated kthreads.
> 
> Yes, we could add some debugging tools for work queues but
> it would be another non-standard thing that developers and
> system administrators would need to understand.
> 
> Another thing is that work queues have their own scheduler. If we
> move even more tasks there it might need even more love. Anyway,
> the extra scheduler adds another level of complexity when
> debugging problems.

There's a lot more problems with workqueues:

 - they're not regular tasks and all the task controls don't work on
   them. This means all things scheduler, like cpu-affinity, nice, and
   RT/deadline scheduling policies. Instead there is some half baked
   secondary interface for some of these.

   But this also very much includes things like cgroups, which brings me
   to the second point.

 - its oblivious to cgroups (as it is to RT priority for example) both
   leading to priority inversion. A work enqueued from a deep/limited
   cgroup does not inherit the task's cgroup. Instead this work is ran
   from the root cgroup.

   This breaks cgroup isolation, more significantly so when a large part
   of the actual work is done from workqueues (as some workloads end up
   being). Instead of being able to control the work, it all ends up in
   the root cgroup outside of control.





More information about the linux-mtd mailing list