Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.

Sun Jan 17 23:36:01 EST 2021

On 1/17/21 11:05 AM, Bradley Chapman wrote:
> [ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
> [ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037 
> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
> [ 2836.672121] nvme nvme1: failed to mark controller live state
> [ 2836.672123] nvme nvme1: Removing after probe failure status: -19
> [ 2836.689016] Aborting journal on device dm-0-8.
> [ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592, 
> lost sync page write
> [ 2836.689027] JBD2: Error -5 detected when updating journal superblock 
> for dm-0-8.
Without the knowledge of fs mount/format command I can only suspect that
super
block zeroing issued with write-zeroes request is translated into
REQ_OP_WRITE_ZEROES which controller is not able to process resulting in
the error. This analysis maybe wrong.

Can you please share following details :-

nvme id-ns /dev/nvme0n1 -H (we are interested in oncs part here)

Also for above device what is the value for the queue block write-zeroes

parameter that is present in the
/sys/block/<nvmeXnY>/queue/write_zeroes_max_bytes ?

You can also try blkdiscard -z 0 -l 1024 /dev/<nvmeXnY> to see if the
problem is with
write zeroes.

Also can you please also try the latest nvme tree branch nvme-5.11 ?