[PATCH 2/4] nvme-tcp: align I/O cpu with blk-mq mapping

Wed Jul 3 12:47:56 PDT 2024

>>
>>>>
>>>> And it makes the 'CPU hogged' messages go away, which is a bonus in 
>>>> itself...
>>>
>>> Which messages? aren't these messages saying that the work spent too 
>>> much time? why are you describing the case where the work does not get
>>> cpu quota to run?
>>
>> I means these messages:
>>
>> workqueue: nvme_tcp_io_work [nvme_tcp] hogged CPU for >10000us 32771 
>> times, consider switching to WQ_UNBOUND
>
> That means that we are spending too much time in io_work, This is a 
> separate bug. If you look at nvme_tcp_io_work it has
> a stop condition after 1 millisecond. However, when we call 
> nvme_tcp_try_recv() it just keeps receiving from the socket until
> the socket receive buffer has no more payload. So in theory nothing 
> prevents from the io_work from looping there forever.
>
> This is indeed a bug that we need to address. Probably by setting 
> rd_desc.count to some limit, decrement it for every
> skb that we consume, and if we reach that limit and there are more 
> skbs pending, we break and self-requeue.
>
> If we indeed spend much time processing a single queue in io_work, it 
> is possible that we have a starvation problem
> that is escalating to the timeouts you are seeing.

btw, if this is indeed the root cause to this issue, we can probably 
combine the softirq approach with this.
in data_ready, consume a limited number of skbs directly in softirq (say 
32k), if we have more, schedule io_work
to consume the rest, and pace from there in limits of say 512k bytes or 
so...

We may actually end up eating the cake and having it too. But lets see 
if this is indeed the root-cause first.