[PATCH] NVMe: Force cancel commands on hot-removal

Mohana Goli mohana.goli at seagate.com
Wed Sep 9 09:05:26 PDT 2015


Keith,

I ran another test after creating filesystem on the nvme
devices.Unfortunately i have seen the IO Timeouts again.
I used the following steps:

1.Created the 2 different filesystems on 2 different drives
2.Mounted the both the devices to different mount points
3.Initiated DD write IOs on both the mount points
4.Pulled one of the drive then i could see the IO timeouts.

But issue is not seen if i use the my fix ,doing nvme_dev_shutdown
first and nvme_dev_remove in nvme_remove function.And calling
blk_cleanup_queue(ns->queue) before calling del_gendisk in
nvme_ns_remove function.


Probably this is uncovering another corner case.

Thanks & Regards,
Mohan.



-----------------------------------------------

[  381.133461] pciehp 0000:0e:09.0:pcie24: pending interrupts 0x0108
from Slot Status
[  381.133472] pciehp 0000:0e:09.0:pcie24: DPC Interrupt status is
set:status = 0x89
[  381.133475] pciehp 0000:0e:09.0:pcie24: DPC is triggered:status = 0x89
[  381.133483] pciehp 0000:0e:09.0:pcie24: Card not present on Slot(9)
[  381.133486] pciehp 0000:0e:09.0:pcie24: slot(9): Link Down event
[  381.133500] pciehp 0000:0e:09.0:pcie24: Handle DPC Event on the slot(9)
[  381.133503] pciehp 0000:0e:09.0:pcie24: handle_dpc_trigger_event:
allocated memory :info =0xffff88102e8cde00
[  381.133508] pciehp 0000:0e:09.0:pcie24: pciehp_unconfigure_device:
domain:bus:dev = 0000:15:00
[  381.133547] nvme_ns_remove :entry kill =1
[  381.133567] nvme 0000:15:00.0: Cancelling I/O 17 QID 7
[  381.133571] blk_update_request: I/O error, dev nvme0n1, sector 25651808
[  381.133749] Buffer I/O error on dev nvme0n1, logical block 3206476,
lost async page write
[  381.133755] Buffer I/O error on dev nvme0n1, logical block 3206477,
lost async page write
[  381.133757] Buffer I/O error on dev nvme0n1, logical block 3206478,
lost async page write
[  381.133758] Buffer I/O error on dev nvme0n1, logical block 3206479,
lost async page write
[  381.133760] Buffer I/O error on dev nvme0n1, logical block 3206480,
lost async page write
[  381.133762] Buffer I/O error on dev nvme0n1, logical block 3206481,
lost async page write
[  381.133763] Buffer I/O error on dev nvme0n1, logical block 3206482,
lost async page write
[  381.133765] Buffer I/O error on dev nvme0n1, logical block 3206483,
lost async page write
[  381.133766] Buffer I/O error on dev nvme0n1, logical block 3206484,
lost async page write
[  381.133768] Buffer I/O error on dev nvme0n1, logical block 3206485,
lost async page write
[  411.339718] nvme 0000:15:00.0: Timeout I/O 468 QID 1
[  411.339740] nvme 0000:15:00.0: Aborting I/O 468 QID 1
[  428.361007] nvme 0000:15:00.0: Timeout I/O 1 QID 67
[  428.361023] nvme 0000:15:00.0: Aborting I/O 1 QID 67
[  428.361029] nvme 0000:15:00.0: Timeout I/O 2 QID 67
[  428.361033] nvme 0000:15:00.0: Aborting I/O 2 QID 67
[  428.361038] nvme 0000:15:00.0: Timeout I/O 3 QID 67
[  428.361042] nvme 0000:15:00.0: Aborting I/O 3 QID 67
[  442.363914] nvme 0000:15:00.0: Timeout I/O 468 QID 1
[  442.363929] nvme 0000:15:00.0: I/O 468 QID 1 timeout, reset controller
[  442.366134] nvme 0000:15:00.0: Cancelling I/O 1 QID 67
[  442.366140] nvme 0000:15:00.0: completing aborted command with status:4007
[  442.366146] blk_update_request: I/O error, dev nvme0n1, sector 1
[  442.366160] nvme 0000:15:00.0: Cancelling I/O 2 QID 67
[  442.366163] nvme 0000:15:00.0: completing aborted command with status:4007
[  442.366165] blk_update_request: I/O error, dev nvme0n1, sector 8
[  442.366169] nvme 0000:15:00.0: Cancelling I/O 3 QID 67
[  442.366172] nvme 0000:15:00.0: completing aborted command with status:4007
[  442.366174] blk_update_request: I/O error, dev nvme0n1, sector 16
[  442.366187] nvme 0000:15:00.0: Cancelling I/O 468 QID 1
[  442.366189] nvme 0000:15:00.0: completing aborted command with status:4007
[  442.366192] blk_update_request: I/O error, dev nvme0n1, sector 390911849
[  442.366200] nvme 0000:15:00.0: Cancelling I/O 1 QID 0
[  442.366204] nvme 0000:15:00.0: Abort status:7 result:288cf000
[  442.366207] nvme 0000:15:00.0: Cancelling I/O 2 QID 0
[  442.366209] nvme 0000:15:00.0: Abort status:7 result:288cf000
[  442.366212] nvme 0000:15:00.0: Cancelling I/O 3 QID 0
[  442.366215] nvme 0000:15:00.0: Abort status:7 result:288cf000
[  442.366217] nvme 0000:15:00.0: Cancelling I/O 4 QID 0
[  442.366220] nvme 0000:15:00.0: Abort status:7 result:288cf000
[  442.366228] XFS (nvme0n1): metadata I/O error: block 0x174cd769
("xlog_iodone") error 5 numblks 64
[  442.366233] XFS (nvme0n1): xfs_do_force_shutdown(0x2) called from
line 1177 of file fs/xfs/xfs_log.c.  Return address =
0xffffffff8127cd00
[  442.366253] XFS (nvme0n1): Log I/O Error Detected.  Shutting down filesystem
[  442.366255] XFS (nvme0n1): Please umount the filesystem and rectify
the problem(s)
[  442.366292] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  442.366298] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  442.366665] device: 'nvme0n1': device_del
[  442.366828] nvme_ns_remove :gen disk removed
[  442.370177] device: '259:1': device_unregister
[  442.370181] device: '259:1': device_del
[  442.370231] device: '259:1': device_create_release
[  442.370239] nvme_ns_remove :exit
[  442.370264] nvme :Removed the namespaces
[  442.370317] device: 'nvme0': device_unregister
[  442.370319] device: 'nvme0': device_del
[  442.370565] nvme :nvme dev completly removed
[  442.370575] device: '0000:15:00.0': device_del
[  442.370645] pciehp 0000:0e:09.0:pcie24: pciehp_unconfigure_device:
domain:bus:dev = 0000:15:00
[  442.370663] pcieport 0000:0e:09.0: Clear the DPC trigger status = 0x89
[  442.376962] pci 0000:15:00.0: Refused to change power state, currently in D3
[  442.376971] pci 0000:15:00.0: can't enable device: BAR 0 [mem
0x95300000-0x95303fff 64bit] not claimed
[  442.376975] pci 0000:15:00.0: Device failed to resume
[  458.128229] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  488.231729] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  518.335223] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  548.438697] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  578.542194] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  608.645685] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  638.749164] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  668.852663] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  698.956143] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  729.059637] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  759.163119] XFS (nvme0n1): xfs_log_force: error -5 returned.
[  789.266620] XFS (nvme0n1): xfs_log_force: error -5 returned.

On Wed, Sep 9, 2015 at 7:09 PM, Keith Busch <keith.busch at intel.com> wrote:
> On Wed, 9 Sep 2015, Mohana Goli wrote:
>>
>> Keith,
>>
>> Don't we need to take a queue spin lock  while processing the IOs on
>> each request queue like the way we are doing in nvme_clear_queue ?
>>
>>
>> >>>blk_mq_all_tag_busy_iter(hctx->tags,
>>                                              nvme_cancel_queue_ios,
>>                                               hctx->driver_data);
>
>
> You're right, thanks for the catch.



More information about the Linux-nvme mailing list