[PATCHv2 0/3] nvme-tcp: improve scalability

Wed Jul 10 07:06:44 PDT 2024

On 7/10/24 13:56, Sagi Grimberg wrote:
> 
> 
> On 08/07/2024 10:10, Hannes Reinecke wrote:
>> Hi all,
>>
>> for workloads with a lot of controllers we run into workqueue contention,
>> where the single workqueue is not able to service requests fast enough,
>> leading to spurious I/O errors and connect resets during high load.
>> This patchset improves the situation by improve the fairness between
>> rx and tx scheduling, introducing per-controller workqueues,
>> and distribute the load accoring to the blk-mq cpu mapping.
>> With this we reduce the spurious I/O errors and improve the overall
>> performance for highly contended workloads.
>>
>> All performance number are derived from the 'tiobench-example.fio'
> 
> Did you keep the fio file unmodified? I'd suggest to run it for longer
> say 60 seconds each workload. 512 MB is a very short benchmark...

Not for 32 queues :-)
But yeah, I can keep it running for slightly longer.

Not making much progress, mind; your 'softirq' patch definitely speeds 
up receiving, but seem to messing up the write side such that I'm 
basically guaranteed to hit I/O timeouts on WRITE :-(

Keep on debugging ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich