[PATCH 1/4] nvme-tcp: per-controller I/O workqueues

Tejun Heo tj at kernel.org
Wed Jul 3 10:07:06 PDT 2024


Hello,

On Wed, Jul 03, 2024 at 06:16:32PM +0300, Sagi Grimberg wrote:
...
> > > OK, wonder what is the cost here. Is it in ALL conditions better
> > > than a single workqueue?
> > 
> > Well, clearly not on memory-limited systems; a workqueue per controller
> > takes up more memory that a single one. And it's questionable whether
> > such a system isn't underprovisioned for nvme anyway.

Each workqueue does take up some memory but it's not enormous (I think it's
512 + 512 * nr_cpus + some extra + rescuer if MEM_RECLAIM). Each workqueue
is just a frontend to shared backend worker pools, so splitting a workqueue
into multiple that do about the same work usually won't create more workers.

> > We will see a higher scheduler interaction as the scheduler needs to
> > switch between workqueues, but that was kinda the idea. And I doubt one

This isn't necessarily true. The backend worker pools don't care whether you
have one or multiple workqueues. For per-cpu workqueues, the concurrency
management applies across different workqueues. For unbound workqueues,
because concurrency limit is per workqueue, if there are enough concurrent
work items being queued, the concurrent number of running kworkers may go up
but that's just because the total concurrency went up. Whether you have one
or many workqueues, as long as workqueues share the properties, they map to
the same backend worker pools and execute exactly the same way.

> > can measure it; the overhead between switching workqueues should be
> > pretty much identical to the overhead switching between workqueue items.

They are identical.

> > I could do some measurements, but really I don't think it'll yield any
> > surprising results.
> 
> I'm just not used to seeing drivers create non-global workqueues. I've seen
> some filesystems have workqueues per-super, but
> it's not a common pattern around the kernel.
> 
> Tejun,
> Is this a pattern that we should pursue? Do multiple symmetric workqueues
> really work better (faster, with less overhead) than
> a single global workqueues?

Yeah, there's nothing wrong with creating multiple if for the right reasons.
Here are some reasons I can think of:

- Not wanting to share concurrency limit so that one one device can't
  interfere with another. Not sharing rescuer may also have *some* benefits
  although I doubt it'd be all that noticeable.

- To get separate flush domains. e.g. If you want to be able to do
  flush_workqueue() on the work items that service one device without
  getting affected by work items from other devices.

- To get different per-device workqueue attribtes - e.g. maybe you wanna
  confine workers serving a specific device to a subset of CPUs or give them
  higher priority.

Note that separating workqueues does not necessarily change how things are
executed. e.g. You don't get your own kworkers.

Thanks.

-- 
tejun



More information about the Linux-nvme mailing list