Problem with SPCC 256GB NVMe 1.3 drive - refcount_t: underflow; use-after-free.
Chaitanya Kulkarni
Chaitanya.Kulkarni at wdc.com
Tue Jan 19 22:08:32 EST 2021
On 1/18/21 10:33 AM, Bradley Chapman wrote:
> Good afternoon!
>
> On 1/17/21 11:36 PM, Chaitanya Kulkarni wrote:
>> On 1/17/21 11:05 AM, Bradley Chapman wrote:
>>> [ 2836.554298] nvme nvme1: I/O 415 QID 3 timeout, disable controller
>>> [ 2836.672064] blk_update_request: I/O error, dev nvme1n1, sector 16350
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672072] blk_update_request: I/O error, dev nvme1n1, sector 16093
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672074] blk_update_request: I/O error, dev nvme1n1, sector 15836
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672076] blk_update_request: I/O error, dev nvme1n1, sector 15579
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672078] blk_update_request: I/O error, dev nvme1n1, sector 15322
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672080] blk_update_request: I/O error, dev nvme1n1, sector 15065
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672082] blk_update_request: I/O error, dev nvme1n1, sector 14808
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672083] blk_update_request: I/O error, dev nvme1n1, sector 14551
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672085] blk_update_request: I/O error, dev nvme1n1, sector 14294
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672087] blk_update_request: I/O error, dev nvme1n1, sector 14037
>>> op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
>>> [ 2836.672121] nvme nvme1: failed to mark controller live state
>>> [ 2836.672123] nvme nvme1: Removing after probe failure status: -19
>>> [ 2836.689016] Aborting journal on device dm-0-8.
>>> [ 2836.689024] Buffer I/O error on dev dm-0, logical block 25198592,
>>> lost sync page write
>>> [ 2836.689027] JBD2: Error -5 detected when updating journal superblock
>>> for dm-0-8.
>> Without the knowledge of fs mount/format command I can only suspect that
>> super
>> block zeroing issued with write-zeroes request is translated into
>> REQ_OP_WRITE_ZEROES which controller is not able to process resulting in
>> the error. This analysis maybe wrong.
>>
>> Can you please share following details :-
>>
>> nvme id-ns /dev/nvme0n1 -H (we are interested in oncs part here)
> I ran the requested command against /dev/nvme1n1 (since /dev/nvme0n1
> works perfectly so far) and here is the result:
Sorry my bad it suppose to be nvme id-ctrl /dev/nvme0n1 -H
>> Also for above device what is the value for the queue block write-zeroes
>>
>> parameter that is present in the
>> /sys/block/<nvmeXnY>/queue/write_zeroes_max_bytes ?
> $ cat /sys/block/nvme1n1/queue/write_zeroes_max_bytes
> 131584
So write-zeroes is configured from the setup.
>> You can also try blkdiscard -z 0 -l 1024 /dev/<nvmeXnY> to see if the
>> problem is with
>> write zeroes.
> # blkdiscard -z -l 1024 /dev/nvme1n1
> blkdiscard: /dev/nvme1n1: BLKZEROOUT ioctl failed: Device or resource busy
This is exactly what I thought, we need to add a quirk for this model
and make sure
we don't set the write-zeroes support and make blk-lib emulate the
write-zeroes.
>> Also can you please also try the latest nvme tree branch nvme-5.11 ?
>>
> Where do I get that code from? Is it already in the 5.11-rc tree or do I
> need to look somewhere else? I checked https://github.com/linux-nvme but
> I did not see it there.
Here is the link :-git://git.infradead.org/nvme.git
Branch 5.12.
> Brad
>
More information about the Linux-nvme
mailing list