[PATCH 1/8] nvme-tcp: switch TX deadline to microseconds and make it configurable

Wed Jul 17 23:30:20 PDT 2024

On 7/17/24 23:03, Sagi Grimberg wrote:
> 
> 
> On 16/07/2024 10:36, Hannes Reinecke wrote:
>> The current TX deadline for the workqueue is pretty much arbitrary,
>> and jiffies is not the best choice for sub-microseconds granularity.
>> So make the deadline configurable via a config option and switch to
>> ktime instead of jiffies.
> 
> I thought we agreed on bytes based limits. Did you try to look how does
> that behave?
> 
I tried, but this is not the objective of this patch.

Main reason for this patch was that we are trying to measure
sub-jiffy granularity. When running with HZ=100 (as I did)
you only have a granularity of 10 milliseconds per jiffy, and
the call 'msecs_to_jiffies(1)' is getting ever so questionable.

So while you can have the jiffy displayed in microseconds,
you cannot _measure_ a value below 10 milliseconds.
So all calculation which you try will be inaccurate, which
is quite harmful when you try to do an analysis based on
those numbers.
So switching to ktime gives a better precision and with
that a more reliable data for analysing latency.

>>
>> Signed-off-by: Hannes Reinecke <hare at kernel.org>
>> ---
>>   drivers/nvme/host/tcp.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
>> index 0873b3949355..3cf9a9abb0e0 100644
>> --- a/drivers/nvme/host/tcp.c
>> +++ b/drivers/nvme/host/tcp.c
>> @@ -44,6 +44,10 @@ static bool wq_unbound;
>>   module_param(wq_unbound, bool, 0644);
>>   MODULE_PARM_DESC(wq_unbound, "Use unbound workqueue for nvme-tcp IO 
>> context (default false)");
>> +static int deadline = 1000;
>> +module_param(deadline, int, 0644);
>> +MODULE_PARM_DESC(deadline, "RX/TX deadline in microseconds (default: 
>> 1000)");
>> +
>>   /*
>>    * TLS handshake timeout
>>    */
>> @@ -1278,7 +1282,8 @@ static void nvme_tcp_io_work(struct work_struct *w)
>>   {
>>       struct nvme_tcp_queue *queue =
>>           container_of(w, struct nvme_tcp_queue, io_work);
>> -    unsigned long deadline = jiffies + msecs_to_jiffies(1);
>> +    u64 start = ktime_to_us(ktime_get());
>> +    u64 tx_deadline = start + deadline;
>>       do {
>>           bool pending = false;
>> @@ -1302,7 +1307,7 @@ static void nvme_tcp_io_work(struct work_struct *w)
>>           if (!pending || !queue->rd_enabled)
>>               return;
>> -    } while (!time_after(jiffies, deadline)); /* quota is exhausted */
>> +    } while (ktime_to_us(ktime_get()) < tx_deadline); /* quota is 
>> exhausted */
>>       queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
>>   }
> 
> The rename to tx_deadline does not make sense in the context of this patch.

Ok, will be renaming it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich