[PATCH 0/2] rr: nvme-pci: fix races and UAF
Casey Chen
cachen at purestorage.com
Wed Jun 30 11:01:37 PDT 2021
This review request has been there nearly a week. Except one comment
from Chaitanya that adjust the alignment of each line in commit message
of the second patch, I don't see any other comments. Do you guys have
more comments ? Let me know!
On 6/24/21 10:31 AM, Casey Chen wrote:
> Found two bugs while power-cycling PCIe NVMe drive for hours:
> - Races in nvme_setup_io_queues()
> - UAF introduced by nvme_dev_remove_admin(), which is found
> after fixing the races. Without fixing the races, the system
> just crashes and cannot survive to reproduce the UAF.
>
> The proposed fixes have been tested for several days for correctness.
>
> 0. Code baseline
>
> Tag nvme-5.14-2021-06-08 of repo http://git.infradead.org/nvme.git
>
> 1. Testing method
>
> while :; power off one drive;
> sleep $((RANDOM%3)).$((RANDOM%10));
> power on the same drive;
> sleep $((RANDOM%3)).$((RANDOM%10));
> done
>
> 2. Sample crash trace due to races in nvme_setup_io_queues()
> (task ID shown after the timestamps)
>
> [11668.533431][ T716] pcieport 0000:87:08.0: pciehp: Slot(402): Card present
> ...
> [11668.681298][T251231] nvme nvme12: pci function 0000:8c:00.0
> [11668.681354][T26714] nvme 0000:8c:00.0: enabling device (0100 -> 0102)
> [11669.046119][ C31] pcieport 0000:87:08.0: pciehp: pending interrupts 0x0108 from Slot Status
> [11669.046142][ T716] pcieport 0000:87:08.0: pciehp: Slot(402): Link Down
> [11669.046146][ T716] pcieport 0000:87:08.0: pciehp: Slot(402): Card not present
> [11669.046149][ T716] pcieport 0000:87:08.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:8c:00
> [11669.077428][ T716] ------------[ cut here ]------------
> [11669.077431][ T716] kernel BUG at drivers/pci/msi.c:348!
> [11669.077555][ T716] invalid opcode: 0000 [#1] SMP KASAN
> [11669.077658][ T716] CPU: 31 PID: 716 Comm: irq/127-pciehp Not tainted 5.13.0-rc3+
> [11669.078022][ T716] RIP: 0010:free_msi_irqs+0x28a/0x2d0
> ...
> [11669.093982][ T716] Call Trace:
> [11669.096850][ T716] pci_free_irq_vectors+0xe/0x20
> [11669.099695][ T716] nvme_dev_disable+0x140/0x760 [nvme]
> [11669.102503][ T716] ? _raw_spin_lock_irqsave+0x9c/0x100
> [11669.105271][ T716] ? trace_hardirqs_on+0x2c/0xe0
> [11669.107994][ T716] nvme_remove+0x191/0x1e0 [nvme]
> [11669.110689][ T716] pci_device_remove+0x6b/0x110
> [11669.113316][ T716] device_release_driver_internal+0x14f/0x280
> [11669.115939][ T716] pci_stop_bus_device+0xcb/0x100
> [11669.118515][ T716] pci_stop_and_remove_bus_device+0xe/0x20
> [11669.121079][ T716] pciehp_unconfigure_device+0xfa/0x200
> [11669.123597][ T716] ? pciehp_configure_device+0x1c0/0x1c0
> [11669.126049][ T716] ? trace_hardirqs_on+0x2c/0xe0
> [11669.128444][ T716] pciehp_disable_slot+0xc4/0x1a0
> [11669.130771][ T716] ? pciehp_runtime_suspend+0x40/0x40
> [11669.133054][ T716] ? __mutex_lock_slowpath+0x10/0x10
> [11669.135289][ T716] ? trace_hardirqs_on+0x2c/0xe0
> [11669.137462][ T716] pciehp_handle_presence_or_link_change+0x15c/0x4f0
> [11669.139632][ T716] ? down_read+0x11f/0x1a0
> [11669.141731][ T716] ? pciehp_handle_disable_request+0x80/0x80
> [11669.143817][ T716] ? rwsem_down_read_slowpath+0x600/0x600
> [11669.145851][ T716] ? __radix_tree_lookup+0xb2/0x130
> [11669.147842][ T716] pciehp_ist+0x19d/0x1a0
> [11669.149790][ T716] ? pciehp_set_indicators+0xe0/0xe0
> [11669.151704][ T716] ? irq_finalize_oneshot.part.46+0x1d0/0x1d0
> [11669.153588][ T716] irq_thread_fn+0x3f/0xa0
> [11669.155407][ T716] irq_thread+0x195/0x290
> [11669.157147][ T716] ? irq_thread_check_affinity.part.49+0xe0/0xe0
> [11669.158883][ T716] ? _raw_read_lock_irq+0x50/0x50
> [11669.160611][ T716] ? _raw_read_lock_irq+0x50/0x50
> [11669.162320][ T716] ? irq_forced_thread_fn+0xf0/0xf0
> [11669.164032][ T716] ? trace_hardirqs_on+0x2c/0xe0
> [11669.165731][ T716] ? irq_thread_check_affinity.part.49+0xe0/0xe0
> [11669.167461][ T716] kthread+0x1c8/0x1f0
> [11669.169173][ T716] ? kthread_parkme+0x40/0x40
> [11669.170883][ T716] ret_from_fork+0x22/0x30
>
> 3. KASAN report for the UAF introduced by nvme_dev_remove_admin()
> (task ID is shown after the timestamp)
>
> [18319.015748][T246989] nvme nvme13: pci function 0000:8c:00.0
> [18319.015795][T215541] nvme 0000:8c:00.0: enabling device (0100 -> 0102)
> [18319.369086][ C31] pcieport 0000:87:08.0: pciehp: pending interrupts 0x0108 from Slot Status
> [18319.369107][ T716] pcieport 0000:87:08.0: pciehp: Slot(402): Link Down
> [18319.369111][ T716] pcieport 0000:87:08.0: pciehp: Slot(402): Card not present
> [18319.369116][ T716] pcieport 0000:87:08.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:8c:00
> [18320.452045][T215541] nvme nvme13: 88/0/0 default/read/poll queues
> [18320.469475][T215541] nvme nvme13: failed to mark controller live state
> [18320.469483][T215541] nvme nvme13: Removing after probe failure status: -19
> [18320.551295][T215541] ==================================================================
> [18320.551299][T215541] BUG: KASAN: use-after-free in __blk_mq_all_tag_iter+0x9c/0x3f0
> [18320.551311][T215541] Read of size 4 at addr ffff888897904d04 by task kworker/u178:2/215541
> [18320.551315][T215541]
> [18320.551318][T215541] CPU: 86 PID: 215541 Comm: kworker/u178:2 Not tainted 5.13.0-rc3+
> [18320.551327][T215541] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
> [18320.551339][T215541] Call Trace:
> [18320.551343][T215541] dump_stack+0xa4/0xdb
> [18320.551354][T215541] ? __blk_mq_all_tag_iter+0x9c/0x3f0
> [18320.551359][T215541] print_address_description.constprop.10+0x3a/0x60
> [18320.551366][T215541] ? __blk_mq_all_tag_iter+0x9c/0x3f0
> [18320.551372][T215541] ? __blk_mq_all_tag_iter+0x9c/0x3f0
> [18320.551377][T215541] ? blk_mq_update_nr_requests+0x2a0/0x2a0
> [18320.551382][T215541] kasan_report.cold.15+0x7c/0xd8
> [18320.551390][T215541] ? __blk_mq_all_tag_iter+0x9c/0x3f0
> [18320.551395][T215541] __blk_mq_all_tag_iter+0x9c/0x3f0
> [18320.551401][T215541] ? blk_mq_update_nr_requests+0x2a0/0x2a0
> [18320.551407][T215541] ? bt_iter+0xf0/0xf0
> [18320.551412][T215541] ? __blk_mq_all_tag_iter+0x2c9/0x3f0
> [18320.551417][T215541] ? blk_mq_update_nr_requests+0x2a0/0x2a0
> [18320.551422][T215541] ? bt_iter+0xf0/0xf0
> [18320.551427][T215541] ? dev_printk_emit+0x95/0xbb
> [18320.551436][T215541] blk_mq_tagset_busy_iter+0x75/0xa0
> [18320.551441][T215541] ? blk_mq_update_nr_requests+0x2a0/0x2a0
> [18320.551446][T215541] ? blk_mq_update_nr_requests+0x2a0/0x2a0
> [18320.551451][T215541] blk_mq_tagset_wait_completed_request+0x86/0xc0
> [18320.551457][T215541] ? blk_mq_tagset_busy_iter+0xa0/0xa0
> [18320.551463][T215541] ? blk_mq_tagset_busy_iter+0x80/0xa0
> [18320.551469][T215541] ? trace_event_raw_event_nvme_setup_cmd+0x2d0/0x2d0 [nvme_core]
> [18320.551493][T215541] nvme_dev_disable+0x4f6/0x760 [nvme]
> [18320.551502][T215541] ? trace_hardirqs_on+0x2c/0xe0
> [18320.551510][T215541] nvme_reset_work+0x226/0x2060 [nvme]
> [18320.551520][T215541] ? nvme_remove+0x1e0/0x1e0 [nvme]
> [18320.551528][T215541] ? __update_load_avg_cfs_rq+0x1d8/0x550
> [18320.551537][T215541] ? down_read+0x11f/0x1a0
> [18320.551545][T215541] ? newidle_balance+0x444/0x690
> [18320.551552][T215541] ? update_load_avg+0x626/0xbe0
> [18320.551557][T215541] ? update_cfs_group+0x1e/0x150
> [18320.551562][T215541] ? load_balance+0x11d0/0x11d0
> [18320.551567][T215541] ? dequeue_entity+0x150/0x730
> [18320.551573][T215541] ? nvme_irq_check+0x60/0x60 [nvme]
> [18320.551581][T215541] ? finish_task_switch+0x101/0x3d0
> [18320.551588][T215541] ? read_word_at_a_time+0xe/0x20
> [18320.551594][T215541] ? strscpy+0xc1/0x1d0
> [18320.551598][T215541] process_one_work+0x4b9/0x7b0
> [18320.551604][T215541] worker_thread+0x72/0x710
> [18320.551610][T215541] ? process_one_work+0x7b0/0x7b0
> [18320.551614][T215541] kthread+0x1c8/0x1f0
> [18320.551618][T215541] ? kthread_parkme+0x40/0x40
> [18320.551622][T215541] ret_from_fork+0x22/0x30
> [18320.551630][T215541]
> [18320.551632][T215541] Allocated by task 215541:
> [18320.551635][T215541] kasan_save_stack+0x19/0x40
> [18320.551639][T215541] __kasan_kmalloc+0x7f/0xa0
> [18320.551642][T215541] kmem_cache_alloc_node_trace+0x187/0x2b0
> [18320.551648][T215541] blk_mq_init_tags+0x47/0x100
> [18320.551651][T215541] blk_mq_alloc_rq_map+0x44/0xf0
> [18320.551656][T215541] __blk_mq_alloc_map_and_request+0x7f/0x140
> [18320.551661][T215541] blk_mq_alloc_tag_set+0x25e/0x510
> [18320.551666][T215541] nvme_reset_work+0x14f9/0x2060 [nvme]
> [18320.551674][T215541] process_one_work+0x4b9/0x7b0
> [18320.551678][T215541] worker_thread+0x72/0x710
> [18320.551682][T215541] kthread+0x1c8/0x1f0
> [18320.551685][T215541] ret_from_fork+0x22/0x30
> [18320.551689][T215541]
> [18320.551690][T215541] Freed by task 716:
> [18320.551693][T215541] kasan_save_stack+0x19/0x40
> [18320.551696][T215541] kasan_set_track+0x1c/0x30
> [18320.551699][T215541] kasan_set_free_info+0x20/0x30
> [18320.551704][T215541] __kasan_slab_free+0xec/0x130
> [18320.551707][T215541] kfree+0xa8/0x460
> [18320.551712][T215541] blk_mq_free_map_and_requests+0x8d/0xc0
> [18320.551717][T215541] blk_mq_free_tag_set+0x30/0xf0
> [18320.551721][T215541] nvme_remove+0x199/0x1e0 [nvme]
> [18320.551729][T215541] pci_device_remove+0x6b/0x110
> [18320.551735][T215541] device_release_driver_internal+0x14f/0x280
> [18320.551744][T215541] pci_stop_bus_device+0xcb/0x100
> [18320.551750][T215541] pci_stop_and_remove_bus_device+0xe/0x20
> [18320.551754][T215541] pciehp_unconfigure_device+0xfa/0x200
> [18320.551761][T215541] pciehp_disable_slot+0xc4/0x1a0
> [18320.551765][T215541] pciehp_handle_presence_or_link_change+0x15c/0x4f0
> [18320.551770][T215541] pciehp_ist+0x19d/0x1a0
> [18320.551774][T215541] irq_thread_fn+0x3f/0xa0
> [18320.551780][T215541] irq_thread+0x195/0x290
> [18320.551783][T215541] kthread+0x1c8/0x1f0
> [18320.551786][T215541] ret_from_fork+0x22/0x30
> [18320.551791][T215541]
> [18320.551792][T215541] The buggy address belongs to the object at ffff888897904d00
> [18320.551792][T215541] which belongs to the cache kmalloc-192 of size 192
> [18320.551795][T215541] The buggy address is located 4 bytes inside of
> [18320.551795][T215541] 192-byte region [ffff888897904d00, ffff888897904dc0)
> [18320.551800][T215541] The buggy address belongs to the page:
> [18320.551802][T215541] page:000000002f3df664 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x897904
> [18320.551807][T215541] head:000000002f3df664 order:1 compound_mapcount:0
> [18320.551810][T215541] flags: 0x200000000010200(slab|head|node=0|zone=2)
> [18320.551819][T215541] raw: 0200000000010200 dead000000000100 dead000000000122 ffff88810004ca00
> [18320.551824][T215541] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
> [18320.551826][T215541] page dumped because: kasan: bad access detected
> [18320.551828][T215541]
> [18320.551829][T215541] Memory state around the buggy address:
> [18320.551832][T215541] ffff888897904c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [18320.551835][T215541] ffff888897904c80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> [18320.551838][T215541] >ffff888897904d00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [18320.551841][T215541] ^
> [18320.551843][T215541] ffff888897904d80: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> [18320.551846][T215541] ffff888897904e00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [18320.551848][T215541] ==================================================================
> [18320.551850][T215541] Disabling lock debugging due to kernel taint
>
> Casey Chen (2):
> nvme-pci: Fix multiple races in nvme_setup_io_queues()
> nvme-pci: Fix UAF introduced by nvme_dev_remove_admin()
>
> drivers/nvme/host/pci.c | 60 ++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 53 insertions(+), 7 deletions(-)
>
More information about the Linux-nvme
mailing list