[PATCH 4/7] blk-mq: allow the driver to pass in an affinity mask
Christoph Hellwig
hch at lst.de
Tue Sep 6 09:50:56 PDT 2016
[adding Thomas as it's about the affinity_mask he (we) added to the
IRQ core]
On Tue, Sep 06, 2016 at 10:39:28AM -0400, Keith Busch wrote:
> > Always the previous one. Below is a patch to get us back to the
> > previous behavior:
>
> No, that's not right.
>
> Here's my topology info:
>
> # numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> node 0 size: 15745 MB
> node 0 free: 15319 MB
> node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> node 1 size: 16150 MB
> node 1 free: 15758 MB
> node distances:
> node 0 1
> 0: 10 21
> 1: 21 10
How do you get that mapping? Does this CPU use Hyperthreading and
thus expose siblings using topology_sibling_cpumask? As that's the
only thing the old code used for any sort of special casing.
I'll need to see if I can find a system with such a mapping to reproduce.
> If I have 16 vectors, the affinity_mask generated by what you're doing
> looks like 0000ffff, CPU's 0-15. So the first 16 bits are set since each
> of those are the first unique CPU, getting a unique vector just like you
> wanted. If an unset bit just means share with the previous, then all of
> my thread siblings (CPU's 16-31) get to share with CPU 15. That's awful!
>
> What we want for my CPU topology is the 16th CPU to pair with CPU 0,
> 17 pairs with 1, 18 with 2, and so on. You can't convey that information
> with this scheme. We need affinity_masks per vector.
We actually have per-vector masks, but they are hidden inside the IRQ
core and awkward to use. We could to the get_first_sibling magic
in the block-mq queue mapping (and in fact with the current code I guess
we need to). Or take a step back from trying to emulate the old code
and instead look at NUMA nodes instead of siblings which some folks
suggested a while ago.
More information about the Linux-nvme
mailing list