Buffer I/O Errors from Zoned NVME devices

Jeffrey Lien Jeff.Lien at wdc.com
Tue Feb 2 10:06:22 EST 2021


Keith, Christoph, Damien,
This errors are happening on both the 5.9 and 5.10.7 kernels.  CONFIG_BLK_DEV_ZONED is set to y in the .config file.   

I will try the patch to disable partition scanning that Keith suggested.  I'll also get the latest FW loaded and see if that resolves the issue.  

-----Original Message-----
From: Damien Le Moal <Damien.LeMoal at wdc.com> 
Sent: Monday, February 1, 2021 3:05 PM
To: hch at lst.de; Keith Busch <kbusch at kernel.org>
Cc: Jeffrey Lien <Jeff.Lien at wdc.com>; linux-nvme at lists.infradead.org
Subject: Re: Buffer I/O Errors from Zoned NVME devices

On 2021/02/02 3:03, hch at lst.de wrote:
> On Mon, Feb 01, 2021 at 09:53:06AM -0800, Keith Busch wrote:
>> On Mon, Feb 01, 2021 at 02:36:12PM +0000, Jeffrey Lien wrote:
>>> Christoph, Keith
>>> We're seeing a lot of these Buffer I/O errors with our zoned nvme devices.  One of the FW developers looked into it and had the following explanation:
>>> All these Reads are from the kernel during enumeration and for LBAs that are in last zone's hole hence expected to return boundary error which is getting logged by kernel.
>>>
>>> [65281.936988] Buffer I/O error on dev nvme1n2, logical block 
>>> 3800039296, async page read [65281.937165] blk_update_request: I/O 
>>> error, dev nvme1n2, sector 3800039297 op 0x0:(READ) flags 0x0 
>>> phys_seg 1 prio class 0 [65281.937166] Buffer I/O error on dev 
>>> nvme1n2, logical block 3800039297, async page read [65281.937335] 
>>> blk_update_request: I/O error, dev nvme1n2, sector 3800039298 op 
>>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 [65281.937336] Buffer 
>>> I/O error on dev nvme1n2, logical block 3800039298, async page read 
>>> [65281.937498] blk_update_request: I/O error, dev nvme1n2, sector 
>>> 3800039299 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>>>
>>> Are you aware of this issue and if so, do you have any recommendations on how to avoid or resolve?  
>>
>> Is this from the partition scanning? We don't partition zoned 
>> devices, so I think we can skip it. Does the following resolve the issue?
> 
> We already have special zoned device handling in the partitioning code.

Partitions are ignored and warning printed, but the partition table is still being read...

> 
> But NVMe should make sure to never span a zone boundary as we set the 
> chunk size to avoid that.
> 
> What kernel version is this?  Is CONFIG_BLK_DEV_ZONED enabled?

I had a very similar problem doing zonefs tests on Matias machine on a ZNS drive last week. The problem was the firmware... An upgrade to the latest version fixed the issue. Not sure what FW rev you are running here, but upgrading might solve this.

> 


--
Damien Le Moal
Western Digital Research



More information about the Linux-nvme mailing list