Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.

Bradley Chapman chapman6235 at comcast.net
Mon Jan 18 13:33:42 EST 2021


Good afternoon!

On 1/17/21 11:36 PM, Chaitanya Kulkarni wrote:
> On 1/17/21 11:05 AM, Bradley Chapman wrote:
>> [ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
>> [ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672121] nvme nvme1: failed to mark controller live state
>> [ 2836.672123] nvme nvme1: Removing after probe failure status: -19
>> [ 2836.689016] Aborting journal on device dm-0-8.
>> [ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592,
>> lost sync page write
>> [ 2836.689027] JBD2: Error -5 detected when updating journal superblock
>> for dm-0-8.
> Without the knowledge of fs mount/format command I can only suspect that
> super
> block zeroing issued with write-zeroes request is translated into
> REQ_OP_WRITE_ZEROES which controller is not able to process resulting in
> the error. This analysis maybe wrong.
> 
> Can you please share following details :-
> 
> nvme id-ns /dev/nvme0n1 -H (we are interested in oncs part here)

I ran the requested command against /dev/nvme1n1 (since /dev/nvme0n1 
works perfectly so far) and here is the result:

NVME Identify Namespace 1:
nsze    : 0x1dcf32b0
ncap    : 0x1dcf32b0
nuse    : 0x1dcf32b0
nsfeat  : 0
   [2:2] : 0     Deallocated or Unwritten Logical Block error Not Supported
   [1:1] : 0     Namespace uses AWUN, AWUPF, and ACWU
   [0:0] : 0     Thin Provisioning Not Supported

nlbaf   : 0
flbas   : 0
   [4:4] : 0     Metadata Transferred in Separate Contiguous Buffer
   [3:0] : 0     Current LBA Format Selected

mc      : 0
   [1:1] : 0     Metadata Pointer Not Supported
   [0:0] : 0     Metadata as Part of Extended Data LBA Not Supported

dpc     : 0
   [4:4] : 0     Protection Information Transferred as Last 8 Bytes of 
Metadata Not Supported
   [3:3] : 0     Protection Information Transferred as First 8 Bytes of 
Metadata Not Supported
   [2:2] : 0     Protection Information Type 3 Not Supported
   [1:1] : 0     Protection Information Type 2 Not Supported
   [0:0] : 0     Protection Information Type 1 Not Supported

dps     : 0
   [3:3] : 0     Protection Information is Transferred as Last 8 Bytes 
of Metadata
   [2:0] : 0     Protection Information Disabled

nmic    : 0
   [0:0] : 0     Namespace Multipath Not Capable

rescap  : 0
   [6:6] : 0     Exclusive Access - All Registrants Not Supported
   [5:5] : 0     Write Exclusive - All Registrants Not Supported
   [4:4] : 0     Exclusive Access - Registrants Only Not Supported
   [3:3] : 0     Write Exclusive - Registrants Only Not Supported
   [2:2] : 0     Exclusive Access Not Supported
   [1:1] : 0     Write Exclusive Not Supported
   [0:0] : 0     Persist Through Power Loss Not Supported

fpi     : 0x80
   [7:7] : 0x1   Format Progress Indicator Supported
   [6:0] : 0     Format Progress Indicator (Remaining 0%)

dlfeat  : 1
   [4:4] : 0     Guard Field of Deallocated Logical Blocks is set to 0xFFFF
   [3:3] : 0     Deallocate Bit in the Write Zeroes Command is Not Supported
   [2:0] : 0x1   Bytes Read From a Deallocated Logical Block and its 
Metadata are 0x00

nawun   : 0
nawupf  : 0
nacwu   : 0
nabsn   : 0
nabo    : 0
nabspf  : 0
noiob   : 0
nvmcap  : 0
nsattr  : 0
nvmsetid: 0
anagrpid: 0
endgid  : 0
nguid   : 00000000000000000000000000000000
eui64   : 0000000000000000
LBA Format  0 : Metadata Size: 0   bytes - Data Size: 512 bytes - 
Relative Performance: 0 Best (in use)

> 
> Also for above device what is the value for the queue block write-zeroes
> 
> parameter that is present in the
> /sys/block/<nvmeXnY>/queue/write_zeroes_max_bytes ?

$ cat /sys/block/nvme1n1/queue/write_zeroes_max_bytes
131584

> 
> You can also try blkdiscard -z 0 -l 1024 /dev/<nvmeXnY> to see if the
> problem is with
> write zeroes.

# blkdiscard -z -l 1024 /dev/nvme1n1
blkdiscard: /dev/nvme1n1: BLKZEROOUT ioctl failed: Device or resource busy

> 
> Also can you please also try the latest nvme tree branch nvme-5.11 ?
> 

Where do I get that code from? Is it already in the 5.11-rc tree or do I 
need to look somewhere else? I checked https://github.com/linux-nvme but 
I did not see it there.

Brad



More information about the Linux-nvme mailing list