[PATCH] nvme-tcp: Check if request has started before processing it
Hannes Reinecke
hare at suse.de
Fri Feb 26 11:42:46 EST 2021
On 2/26/21 5:13 PM, Keith Busch wrote:
> On Fri, Feb 26, 2021 at 01:54:00PM +0100, Hannes Reinecke wrote:
>> On 2/26/21 1:35 PM, Daniel Wagner wrote:
>>> On Mon, Feb 15, 2021 at 01:29:45PM -0800, Sagi Grimberg wrote:
>>>> Well, I think we should probably figure out why that is happening first.
>>>
>>> I got my hands on a tcpdump trace. I've trimmed it to this:
>>>
[ .. ]
>>> NVM Express Fabrics TCP
>>> Pdu Type: CapsuleResponse (5)
>>> Pdu Specific Flags: 0x00
>>> .... ...0 = PDU Header Digest: Not set
>>> .... ..0. = PDU Data Digest: Not set
>>> .... .0.. = PDU Data Last: Not set
>>> .... 0... = PDU Data Success: Not set
>>> Pdu Header Length: 24
>>> Pdu Data Offset: 0
>>> Packet Length: 24
>>> Unknown Data: 02000400000000001b0000001f000000
>>>
>>> 0000 00 00 0c 9f f5 a8 b4 96 91 41 16 c0 08 00 45 00 .........A....E.
>>> 0010 00 4c 00 00 40 00 40 06 00 00 0a e4 26 af 0a e4 .L.. at .@.....&...
>>> 0020 c2 1e 11 44 88 4f b8 58 90 ec 8e 1b 32 ed 80 18 ...D.O.X....2...
>>> 0030 01 01 fe d3 00 00 01 01 08 0a e6 ed ac be d6 a3 ................
>>> 0040 5d 0c 05 00 18 00 18 00 00 00 02 00 04 00 00 00 ]...............
>>> 0050 00 00 1b 00 00 00 1f 00 00 00 ..........
>>>
>> As I suspected, we did receive an invalid frame.
>> Data digest would have saved us, but then it's not enabled.
>>
>> So we do need to check if the request is valid before processing it.
>
> That's just addressing a symptom. You can't fully verify the request is
> valid this way because the host could have started the same command ID
> the very moment before the code checks it, incorrectly completing an
> in-flight command and getting data corruption.
>
Oh, I am fully aware.
Bad frames are just that, bad frames.
We can only fully validate that when digests are enabled, but I gather
that controllers sending out bad frames wouldn't want to enable digests,
either. So relying on that is possibly not an option.
So really what I'm trying to avoid is the host crashing on a bad frame.
That kind of thing always resonates bad with customers.
And tripping over an uninitialized command is just too stupid IMO.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
More information about the Linux-nvme
mailing list