[PATCHv2 0/3] nvme-tcp: improve scalability

Mon Jul 15 23:31:53 PDT 2024

On 08/07/2024 10:10, Hannes Reinecke wrote:
> Hi all,
>
> for workloads with a lot of controllers we run into workqueue contention,
> where the single workqueue is not able to service requests fast enough,
> leading to spurious I/O errors and connect resets during high load.
> This patchset improves the situation by improve the fairness between
> rx and tx scheduling, introducing per-controller workqueues,
> and distribute the load accoring to the blk-mq cpu mapping.
> With this we reduce the spurious I/O errors and improve the overall
> performance for highly contended workloads.
>
> All performance number are derived from the 'tiobench-example.fio'
> sample from the fio sources, running on a 96 core machine with one
> subsystem and two paths, each path exposing 32 queues.
> Backend is nvmet using an Intel DC P3700 NVMe SSD.
>
> Changes to the initial submission:
> - Make the changes independent from the 'wq_unbound' parameter
> - Drop changes to the workqueue
> - Add patch to improve rx/tx fairness

Hey Hannes, were you able to make progress here?
Thought I'd ask...