[PATCH v2 net-next 07/21] nvme-tcp: Add DDP data-path

Sun Jan 31 03:44:12 EST 2021

On 19/01/2021 6:18, David Ahern wrote:
> On 1/14/21 8:10 AM, Boris Pismenny wrote:
>> @@ -664,8 +753,15 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
>>  		return -EINVAL;
>>  	}
>>  
>> -	if (!nvme_try_complete_req(rq, cqe->status, cqe->result))
>> -		nvme_complete_rq(rq);
>> +	req = blk_mq_rq_to_pdu(rq);
>> +	if (req->offloaded) {
>> +		req->status = cqe->status;
>> +		req->result = cqe->result;
>> +		nvme_tcp_teardown_ddp(queue, cqe->command_id, rq);
>> +	} else {
>> +		if (!nvme_try_complete_req(rq, cqe->status, cqe->result))
>> +			nvme_complete_rq(rq);
>> +	}
>>  	queue->nr_cqe++;
>>  
>>  	return 0;
>> @@ -859,9 +955,18 @@ static int nvme_tcp_recv_pdu(struct nvme_tcp_queue *queue, struct sk_buff *skb,
>>  static inline void nvme_tcp_end_request(struct request *rq, u16 status)
>>  {
>>  	union nvme_result res = {};
>> +	struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq);
>> +	struct nvme_tcp_queue *queue = req->queue;
>> +	struct nvme_tcp_data_pdu *pdu = (void *)queue->pdu;
>>  
>> -	if (!nvme_try_complete_req(rq, cpu_to_le16(status << 1), res))
>> -		nvme_complete_rq(rq);
>> +	if (req->offloaded) {
>> +		req->status = cpu_to_le16(status << 1);
>> +		req->result = res;
>> +		nvme_tcp_teardown_ddp(queue, pdu->command_id, rq);
>> +	} else {
>> +		if (!nvme_try_complete_req(rq, cpu_to_le16(status << 1), res))
>> +			nvme_complete_rq(rq);
>> +	}
>>  }
>>  
>>  static int nvme_tcp_recv_data(struct nvme_tcp_queue *queue, struct sk_buff *skb,
> 
> 
> The req->offload checks assume the offload is to the expected
> offload_netdev, but you do not verify the data arrived as expected. You
> might get lucky if both netdev's belong to the same PCI device (assuming
> the h/w handles it a certain way), but it will not if the netdev's
> belong to different devices.
> 
> Consider a system with 2 network cards -- even if it is 2 mlx5 based
> devices. One setup can have the system using a bond with 1 port from
> each PCI device. The tx path picks a leg based on the hash of the ntuple
> and that (with Tariq's bond patches) becomes the expected offload
> device. A similar example holds for a pure routing setup with ECMP. For
> both there is full redundancy in the network - separate NIC cards
> connected to separate TORs to have independent network paths.
> 
> A packet arrives on the *other* netdevice - you have *no* control over
> the Rx path. Your current checks will think the packet arrived with DDP
> but it did not.
> 

There's no problem if another (non-offload) netdevice receives traffic
that arrives here. Because that other device will never set the SKB
frag pages to point to the final destination buffers, and so copy
offload will not take place.

The req->offload indication is mainly for matching ddp_setup with
ddp_teardown calls, it does not control copy/crc offload as these are
controlled per-skb using frags for copy and skb bits for crc.