Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
Bradley Chapman
chapman6235 at comcast.net
Mon Jan 18 13:33:42 EST 2021
Good afternoon!
On 1/17/21 11:36 PM, Chaitanya Kulkarni wrote:
> On 1/17/21 11:05 AM, Bradley Chapman wrote:
>> [ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
>> [ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037
>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>> [ 2836.672121] nvme nvme1: failed to mark controller live state
>> [ 2836.672123] nvme nvme1: Removing after probe failure status: -19
>> [ 2836.689016] Aborting journal on device dm-0-8.
>> [ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592,
>> lost sync page write
>> [ 2836.689027] JBD2: Error -5 detected when updating journal superblock
>> for dm-0-8.
> Without the knowledge of fs mount/format command I can only suspect that
> super
> block zeroing issued with write-zeroes request is translated into
> REQ_OP_WRITE_ZEROES which controller is not able to process resulting in
> the error. This analysis maybe wrong.
>
> Can you please share following details :-
>
> nvme id-ns /dev/nvme0n1 -H (we are interested in oncs part here)
I ran the requested command against /dev/nvme1n1 (since /dev/nvme0n1
works perfectly so far) and here is the result:
NVME Identify Namespace 1:
nsze : 0x1dcf32b0
ncap : 0x1dcf32b0
nuse : 0x1dcf32b0
nsfeat : 0
[2:2] : 0 Deallocated or Unwritten Logical Block error Not Supported
[1:1] : 0 Namespace uses AWUN, AWUPF, and ACWU
[0:0] : 0 Thin Provisioning Not Supported
nlbaf : 0
flbas : 0
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0 Current LBA Format Selected
mc : 0
[1:1] : 0 Metadata Pointer Not Supported
[0:0] : 0 Metadata as Part of Extended Data LBA Not Supported
dpc : 0
[4:4] : 0 Protection Information Transferred as Last 8 Bytes of
Metadata Not Supported
[3:3] : 0 Protection Information Transferred as First 8 Bytes of
Metadata Not Supported
[2:2] : 0 Protection Information Type 3 Not Supported
[1:1] : 0 Protection Information Type 2 Not Supported
[0:0] : 0 Protection Information Type 1 Not Supported
dps : 0
[3:3] : 0 Protection Information is Transferred as Last 8 Bytes
of Metadata
[2:0] : 0 Protection Information Disabled
nmic : 0
[0:0] : 0 Namespace Multipath Not Capable
rescap : 0
[6:6] : 0 Exclusive Access - All Registrants Not Supported
[5:5] : 0 Write Exclusive - All Registrants Not Supported
[4:4] : 0 Exclusive Access - Registrants Only Not Supported
[3:3] : 0 Write Exclusive - Registrants Only Not Supported
[2:2] : 0 Exclusive Access Not Supported
[1:1] : 0 Write Exclusive Not Supported
[0:0] : 0 Persist Through Power Loss Not Supported
fpi : 0x80
[7:7] : 0x1 Format Progress Indicator Supported
[6:0] : 0 Format Progress Indicator (Remaining 0%)
dlfeat : 1
[4:4] : 0 Guard Field of Deallocated Logical Blocks is set to 0xFFFF
[3:3] : 0 Deallocate Bit in the Write Zeroes Command is Not Supported
[2:0] : 0x1 Bytes Read From a Deallocated Logical Block and its
Metadata are 0x00
nawun : 0
nawupf : 0
nacwu : 0
nabsn : 0
nabo : 0
nabspf : 0
noiob : 0
nvmcap : 0
nsattr : 0
nvmsetid: 0
anagrpid: 0
endgid : 0
nguid : 00000000000000000000000000000000
eui64 : 0000000000000000
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes -
Relative Performance: 0 Best (in use)
>
> Also for above device what is the value for the queue block write-zeroes
>
> parameter that is present in the
> /sys/block/<nvmeXnY>/queue/write_zeroes_max_bytes ?
$ cat /sys/block/nvme1n1/queue/write_zeroes_max_bytes
131584
>
> You can also try blkdiscard -z 0 -l 1024 /dev/<nvmeXnY> to see if the
> problem is with
> write zeroes.
# blkdiscard -z -l 1024 /dev/nvme1n1
blkdiscard: /dev/nvme1n1: BLKZEROOUT ioctl failed: Device or resource busy
>
> Also can you please also try the latest nvme tree branch nvme-5.11 ?
>
Where do I get that code from? Is it already in the 5.11-rc tree or do I
need to look somewhere else? I checked https://github.com/linux-nvme but
I did not see it there.
Brad
More information about the Linux-nvme
mailing list