[PATCH v7] NVMe: conversion to blk-mq

Jens Axboe axboe at fb.com
Fri Jun 13 08:11:37 PDT 2014


On 06/13/2014 09:05 AM, Keith Busch wrote:
> On Fri, 13 Jun 2014, Jens Axboe wrote:
>> On 06/12/2014 06:06 PM, Keith Busch wrote:
>>> When cancelling IOs, we have to check if the hwctx has a valid tags
>>> for some reason. I have 32 cores in my system and as many queues, but
>>
>> It's because unused queues are torn down, to save memory.
>>
>>> blk-mq is only using half of those queues and freed the "tags" for the
>>> rest after they'd been initialized without telling the driver. Why is
>>> blk-mq not making utilizing all my queues?
>>
>> You have 31 + 1 queues, so only 31 mappable queues. blk-mq symmetrically
>> distributes these, so you should have a core + thread sibling on 16
>> queues. And yes, that leaves 15 idle hardware queues for this specific
>> case. I like the symmetry, it makes it more predictable if things are
>> spread out evenly.
> 
> You'll see performance differences on some workloads that depend on which
> cores your process runs and which one services an interrupt. We can play
> games with with cores and see what happens on my 32 cpu system. I usually
> run 'irqbalance --hint=exact' for best performance, but that doesn't do
> anything with blk-mq since the affinity hint is gone.

Huh wtf, that hint is not supposed to be gone. I'm guessing it went away
with the removal of the manual queue assignments.

> I ran the following script several times on each version of the
> driver. This will pin a sequential read test to cores 0, 8, and 16. The
> device is local to NUMA node on cores 0-7 and 16-23; the second test
> runs on the remote node and the third on the thread sibling of 0. Results
> were averaged, but very consistent anyway. The system was otherwise idle.
> 
>  # for i in $(seq 0 8 16); do
>   > let "cpu=1<<$i"
>   > cpu=`echo $cpu | awk '{printf "%#x\n", $1}'`
>   > taskset ${cpu} dd if=/dev/nvme0n1 of=/dev/null bs=4k count=1000000
> iflag=direct
>   > done
> 
> Here are the performance drops observed with blk-mq with the existing
> driver as baseline:
> 
>  CPU : Drop
>  ....:.....
>    0 : -6%
>    8 : -36%
>   16 : -12%

We need the hints back for sure, I'll run some of the same tests and
verify to be sure. Out of curiousity, what is the topology like on your
box? Are 0/1 siblings, and 0..7 one node?




More information about the Linux-nvme mailing list