Windows NVMeoF client - debug assistance requested

Robert Randall robert.r.randall at gmail.com
Tue Jun 21 07:55:00 PDT 2016


Correct, we don't use in-capsule data.  What is confusing is that the
reads are working fine without in-capsule data.  The writes are
failing silently.  Nothing in the protocol indicates an error but the
bits never get to the device.

On Tue, Jun 21, 2016 at 9:19 AM, Sagi Grimberg <sagi at grimberg.me> wrote:
>
>> Hello all,
>>
>> We were making great progress on a Windows version of the NVMe over
>> Fabrics host / initiator / client until we ran into something very
>> interesting.  All of read/write requests from the client execute without any
>> errors reported on either side, the Linux server is happy and the Windows
>> client is happy.  The interesting part is that the writes never make it to
>> the device.  The write completes with success but a subsequent read does not
>> return what was written.  We've verified the command packets, etc. and the
>> writes look just fine and do succeed but the data never makes it to the
>> device.  The server reports no errors and returns success.
>>
>> If we use the Linux host to write data to the device or just 'dd' a good
>> partition table to the device we can read it without any problems from
>> Windows.
>>
>> I could use some advice on the best strategy to track this down.
>>
>> The traffic generated by the Linux host and Windows host are quite
>> similar.  The only difference that might be interesting are the RDMA
>> addresses.  On Linux they tend to be low in the address range.  On Windows
>> they tend to be high in the address range.  But this is true of the reads as
>> well.
>>
>> A sample write from the Linux host:
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087148] nvmet:
>> drivers/nvme/target/core.c nvmet_req_init 499
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087149] nvmet:  struct
>> nvme_common_command {
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087150] nvmet: __u8 opcode 1
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087151] nvmet: __u8 flags 40
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087153] nvmet: __u16 command_id 78
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087154] nvmet: __le32 nsid 1
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087155] nvmet: __le32 cdw2[0] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087156] nvmet: __le32 cdw2[1] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087157] nvmet: __le64 metadata 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087158] nvmet: union nvme_data_ptr
>> { nvme_keyed_sgl_desc
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087159] nvmet:
>> nvme_keyed_sgl_desc.__le64 addr 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087161] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[0] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087162] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[1] 10
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087163] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[2] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087164] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[0] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087166] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[1] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087167] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[2] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087168] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[3] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087169] nvmet:
>> nvme_keyed_sgl_desc.__u8 type 1
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087170] nvmet: __le32 cdw10[0] 9C8
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087171] nvmet: __le32 cdw10[1] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087173] nvmet: __le32 cdw10[2] 7
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087174] nvmet: __le32 cdw10[3] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087175] nvmet: __le32 cdw10[4] 0
>> Jun 20 16:36:48 nvmeof0 kernel: [ 4024.087176] nvmet: __le32 cdw10[5] 0
>>
>> A sample write from the Windows host:
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109389] nvmet:  struct
>> nvme_common_command {
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109390] nvmet: __u8 opcode 1
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109391] nvmet: __u8 flags 40
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109392] nvmet: __u16 command_id A
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109394] nvmet: __le32 nsid 1
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109395] nvmet: __le32 cdw2[0] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109396] nvmet: __le32 cdw2[1] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109397] nvmet: __le64 metadata 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109398] nvmet: union nvme_data_ptr
>> { nvme_keyed_sgl_desc
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109400] nvmet:
>> nvme_keyed_sgl_desc.__le64 addr FFFFD0009BB95040
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109401] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[0] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109402] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[1] 2
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109405] nvmet:
>> nvme_keyed_sgl_desc.__u8 length[2] 0
>
>
> Looks like the writes differ in length (4k vs. 512).
>
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109406] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[0] A
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109407] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[1] 64
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109408] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[2] 1
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109409] nvmet:
>> nvme_keyed_sgl_desc.__u8 key[3] 92
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109411] nvmet:
>> nvme_keyed_sgl_desc.__u8 type 40
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109412] nvmet: __le32 cdw10[0] 21
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109413] nvmet: __le32 cdw10[1] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109414] nvmet: __le32 cdw10[2] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109415] nvmet: __le32 cdw10[3] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109416] nvmet: __le32 cdw10[4] 0
>> Jun 20 17:26:47 nvmeof0 kernel: [ 7023.109418] nvmet: __le32 cdw10[5] 0
>
>
> Looks like you never use in-capsule data (which we do in linux for 4k
> or less), you passed to the target:
> {addr=0xFFFFD0009BB95040, length=0x200, key=0x9201640a}
>
> Are you sure that this is the buffer address you have registered with
> the device? Do you see the actual data travel the wire?
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme



-- 
Robert Randall | robert.r.randall at gmail.com



More information about the Linux-nvme mailing list