Unexpected issues with 2 NVME initiators using the same target

Tue Mar 14 16:42:15 PDT 2017

> We tested the patches
> with a single target system and a single initiator system connected via CX4s at
> 25Gb through an Arista 7060X switch with regular Ethernet flow control
> enabled (no PFC/DCB - but the switch has no other traffic on it).  We connected
> 8 Intel P3520 1.2 TB SSDs from the target to the initiator with 16 IO queues per
> disk.  Then we ran FIO with a 4KB workload, random IO pattern, 4 jobs per disk,
> queue depth 32 per job, testing 100% read, 70/30 read/write, and 100% write
> workloads.  We used the default 4.10-RC8 kernel, then patched the same kernel
> with Sagi's patch, and then separately with Max's patch, and then both patches
> at the same time (just for fun).  The patches were applied on both target and
> initiator.  In general we do see to see a performance hit on small block read
> workloads but it is not massive, looks like about 10%.  We also tested some
> large block transfers and didn't see any impact.  Results here are in 4KB IOPS:
> 
> Read/Write	4.10-RC8	Patch 1 (Sagi)	Patch 2 (Max)	Both Patches
> 100/0		667,158		611,737		619,586
> 		607,080
> 70/30		941,352		890,962		884,222
> 		876,926
> 0/100		667,379		666,000		666,093
> 		666,144
> 

One additional result from our 25Gb testing - we did do an additional test with the same configuration as above but we ran just a single disk, and a single FIO job with queue depth 8.  This is a light workload designed to examine latency under lower load, when not bottlenecked on network or disk throughput, as opposed to driving the system to max IOPS.  Here we see about a 30usec (20%) increase to latency on 4KB random reads when we apply Sagi's patch and a corresponding dip in IOPS (only about a 2% hit to latency was seen with Max's patch):

4.10-RC8	Patch 1		4.10-RC8 Kernel		Patch 1
IOPS		IOPS		Latency (usec)		Latency (usec)
49,304		40,490		160.3			192.9