Windows NVMeoF client - debug assistance requested

Robert Randall (rrandall) rrandall at micron.com
Mon Jun 20 18:50:22 PDT 2016


Hello all,

We were making great progress on a Windows version of the NVMe over Fabrics host / initiator / client until we ran into something very interesting.  All of read/write requests from the client execute without any errors reported on either side, the Linux server is happy and the Windows client is happy.  The interesting part is that the writes never make it to the device.  The write completes with success but a subsequent read does not return what was written.  We've verified the command packets, etc. and the writes look just fine and do succeed but the data never makes it to the device.  The server reports no errors and returns success.

If we use the Linux host to write data to the device or just 'dd' a good partition table to the device we can read it without any problems from Windows.

I could use some advice on the best strategy to track this down.

The traffic generated by the Linux host and Windows host are quite similar.  The only difference that might be interesting are the RDMA addresses.  On Linux they tend to be low in the address range.  On Windows they tend to be high in the address range.  But this is true of the reads as well.

A sample write from the Linux host:
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087148] nvmet: drivers/nvme/target/core.c nvmet_req_init 499
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087149] nvmet:  struct nvme_common_command {
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087150] nvmet: __u8 opcode 1
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087151] nvmet: __u8 flags 40
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087153] nvmet: __u16 command_id 78
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087154] nvmet: __le32 nsid 1
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087155] nvmet: __le32 cdw2[0] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087156] nvmet: __le32 cdw2[1] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087157] nvmet: __le64 metadata 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087158] nvmet: union nvme_data_ptr { nvme_keyed_sgl_desc
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087159] nvmet: nvme_keyed_sgl_desc.__le64 addr 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087161] nvmet: nvme_keyed_sgl_desc.__u8 length[0] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087162] nvmet: nvme_keyed_sgl_desc.__u8 length[1] 10
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087163] nvmet: nvme_keyed_sgl_desc.__u8 length[2] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087164] nvmet: nvme_keyed_sgl_desc.__u8 key[0] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087166] nvmet: nvme_keyed_sgl_desc.__u8 key[1] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087167] nvmet: nvme_keyed_sgl_desc.__u8 key[2] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087168] nvmet: nvme_keyed_sgl_desc.__u8 key[3] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087169] nvmet: nvme_keyed_sgl_desc.__u8 type 1
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087170] nvmet: __le32 cdw10[0] 9C8
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087171] nvmet: __le32 cdw10[1] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087173] nvmet: __le32 cdw10[2] 7
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087174] nvmet: __le32 cdw10[3] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087175] nvmet: __le32 cdw10[4] 0
Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087176] nvmet: __le32 cdw10[5] 0

A sample write from the Windows host:
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109389] nvmet:  struct nvme_common_command {
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109390] nvmet: __u8 opcode 1
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109391] nvmet: __u8 flags 40
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109392] nvmet: __u16 command_id A
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109394] nvmet: __le32 nsid 1
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109395] nvmet: __le32 cdw2[0] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109396] nvmet: __le32 cdw2[1] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109397] nvmet: __le64 metadata 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109398] nvmet: union nvme_data_ptr { nvme_keyed_sgl_desc
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109400] nvmet: nvme_keyed_sgl_desc.__le64 addr FFFFD0009BB95040
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109401] nvmet: nvme_keyed_sgl_desc.__u8 length[0] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109402] nvmet: nvme_keyed_sgl_desc.__u8 length[1] 2
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109405] nvmet: nvme_keyed_sgl_desc.__u8 length[2] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109406] nvmet: nvme_keyed_sgl_desc.__u8 key[0] A
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109407] nvmet: nvme_keyed_sgl_desc.__u8 key[1] 64
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109408] nvmet: nvme_keyed_sgl_desc.__u8 key[2] 1
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109409] nvmet: nvme_keyed_sgl_desc.__u8 key[3] 92
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109411] nvmet: nvme_keyed_sgl_desc.__u8 type 40
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109412] nvmet: __le32 cdw10[0] 21
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109413] nvmet: __le32 cdw10[1] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109414] nvmet: __le32 cdw10[2] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109415] nvmet: __le32 cdw10[3] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109416] nvmet: __le32 cdw10[4] 0
Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109418] nvmet: __le32 cdw10[5] 0


I haven't updated my code since 6/8 and I'm using the nvmf-all.2 branch.

Any / all assistance would be much appreciated.  We would like to demo this at FMS in August and, assuming Microsoft releases the documentation for the in-kernel RDMA client API we're using, we would like to open source the driver as well.

Best regards,
Robert Randall



More information about the Linux-nvme mailing list