nvme-format: protection information enabled although metadata size is 0

Binarus lists at binarus.de
Wed Nov 2 12:32:19 PDT 2022


On 02.11.2022 16:59, Keith Busch wrote:
> On Wed, Nov 02, 2022 at 04:42:21PM +0100, Binarus wrote:
>> Thank you very much for confirming that this is a bug. May I steal your time again and ask what you would do in that situation? Throw away the device because we can't trust it, or format it with 8 bytes of metadata and hope that the PI works correctly then?
> 
> I think that's going too far. To the best of my knowledge, the device
> works fine. You just hit an untested parameter combo. The device may
> report you requested PI, but without metadata, it's going to behave the
> same as a non-PI 4k format. If you supply valid paramters for pi
> formats, then the device will correctly honor that.

In the meantime, I have updated the firmware and confirm that this die 
not changed the wrong behavior. There is still no error message when 
using nvme-format with the wrong parameters shown in my first post.

Thank you very much for confirming that using the correct parameters 
(i.e. LBAF 4) will actually enable the PI. That's the way I would have 
expected it.

> I'm not sure if you're familiar with the different nvme metadata types
> though, so I'll add that this particular model's does not work with the
> Linux kernel's end-to-end protection. This device supports only the
> "extended" metadata, not the "separate" that the Linux block stack
> requires. You won't be able to use the generic block layer for IO with
> protection information, but you should be able to use it in passthrough
> modes. And if you are using the 8-byte format (LBAF 4, I believe), then
> the driver will have the device strip/generate PI without the host ever
> seeing it.

I have a vague notion of the metadata types, and have recognized 
something which worries me even more:

In the datasheet / manual for the P3700 from October 2015 (newest 
version I could find), in table 34 on page 38 which describes the 
Identify Namespace data structure, it clearly says that byte 27 will 
report value 0x3, which means that both metadata types (extended and 
separate) are supported. From the "Interpretation" column of the "MC" row:

"Indicated support for metadata transferred with the extended data LBA 
and in separate buffer - both are supported."

However, when I execute nvme id-ns /dev/nvme0n1 on the machine in 
question, it shows the value 0x1 for the MC, which means that it 
supports only the extended LBA metadata.

That means the either the datasheet / manual or nvme is wrong. I guess 
that the former is the case, and your statement supports that.

I had absolutely no clue that the standard Linux IO does not support 
extended LBA metadata, and thus does not support extended LBA PI. That's 
quite disappointing. Currently, I don't know what the passthrough mode 
you have mentioned is, but I'll research it.

Perhaps I am using it already, because the SSD in question acts as a 
cache device in a ZFS pool. Since ZFS circumvents the normal I/O layer 
at some places, maybe it can use extended LBA PI.

I am aware that I wouldn't need the PI anyway with ZFS (because ZFS has 
its own checksumming which sets it apart from other file systems), but 
I'm eager to learn more about NVMe and the PI for future cases and other 
constellations, so I'll read about passthrough and Linux. Plus, we have 
a few (consumer) Samsung SSD 980 Pro in Windows machines here, and of 
course we would like to learn how to turn on the PI on them (if Windows 
supports it at all).

Thank you very much again, and best regards,

Binarus




More information about the Linux-nvme mailing list