nvme-tcp uaf when tls setup fails

Mon Oct 14 06:08:29 PDT 2024

On Mon, Oct 14, 2024 at 01:54:42PM GMT, Daniel Wagner wrote:
> FYI, I am working on extending nvme-cli to use the newly added
> tls_configured_key and tls_keyring sysfs. While playing around, KASAN
> reported an error on the current 6.12-rc3:
> 
>  nvme nvme1: creating 8 I/O queues.
>  nvme nvme1: mapped 8/0/0 default/read/poll queues.
>  nvme nvme1: Connect command failed, errno: -18
>  nvme nvme1: failed to connect queue: 3 ret=-18
>  ==================================================================
>  BUG: KASAN: slab-use-after-free in blk_mq_queue_tag_busy_iter+0x3ec/0x430
>  Read of size 4 at addr ffff8880156a8194 by task kworker/1:1H/169
> 
>  CPU: 1 UID: 0 PID: 169 Comm: kworker/1:1H Not tainted 6.12.0-rc3-1-default #247 45def3a9beaa5cc1f0fb63a7039570d73ee4a307
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
>  Workqueue: kblockd blk_mq_timeout_work
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x73/0xa0
>   print_report+0x165/0x720
>   ? __virt_addr_valid+0x165/0x340
>   ? __virt_addr_valid+0x2fb/0x340
>   ? blk_mq_queue_tag_busy_iter+0x3ec/0x430
>   kasan_report+0xce/0x110
>   ? blk_mq_queue_tag_busy_iter+0x3ec/0x430
>   blk_mq_queue_tag_busy_iter+0x3ec/0x430
>   ? __pfx_blk_mq_check_expired+0x10/0x10
>   blk_mq_timeout_work+0x94/0x280
>   ? process_scheduled_works+0x841/0x1220
>   process_scheduled_works+0x8c4/0x1220
>   worker_thread+0x8f5/0xd50
>   ? __kthread_parkme+0x7e/0x190
>   ? __kthread_parkme+0x7e/0x190
>   ? __kthread_parkme+0x7e/0x190
>   kthread+0x270/0x2f0
>   ? __pfx_worker_thread+0x10/0x10
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork+0x33/0x70
>   ? __pfx_kthread+0x10/0x10
>   ret_from_fork_asm+0x1a/0x30
>   </TASK>
> 
>  Allocated by task 1751:
>   kasan_save_track+0x2b/0x70
>   __kasan_kmalloc+0x89/0xa0
>   __kmalloc_cache_noprof+0x1d9/0x3c0
>   nvme_tcp_create_ctrl+0x57/0xac0 [nvme_tcp]
>   nvmf_dev_write+0x1bd1/0x22c0 [nvme_fabrics]
>   vfs_write+0x1cc/0x9f0
>   ksys_write+0xac/0x150
>   do_syscall_64+0x96/0x160
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
>  Freed by task 1751:
>   kasan_save_track+0x2b/0x70
>   kasan_save_free_info+0x3c/0x50
>   __kasan_slab_free+0x59/0x70
>   kfree+0x171/0x400
>   nvme_free_ctrl+0x38f/0x470 [nvme_core]
>   device_release+0x8d/0x180
>   kobject_put+0x1e8/0x3b0
>   nvme_tcp_create_ctrl+0x85a/0xac0 [nvme_tcp]
>   nvmf_dev_write+0x1bd1/0x22c0 [nvme_fabrics]
>   vfs_write+0x1cc/0x9f0
>   ksys_write+0xac/0x150
>   do_syscall_64+0x96/0x160
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
>  The buggy address belongs to the object at ffff8880156a8000
>   which belongs to the cache kmalloc-8k of size 8192
>  The buggy address is located 404 bytes inside of
>   freed 8192-byte region [ffff8880156a8000, ffff8880156aa000)
> 
>  The buggy address belongs to the physical page:
>  page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x156a8
>  head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
>  flags: 0xfffffc0000040(head|node=0|zone=1|lastcpupid=0x1fffff)
>  page_type: f5(slab)
>  raw: 000fffffc0000040 ffff888008443180 dead000000000100 dead000000000122
>  raw: 0000000000000000 0000000000020002 00000001f5000000 0000000000000000
>  head: 000fffffc0000040 ffff888008443180 dead000000000100 dead000000000122
>  head: 0000000000000000 0000000000020002 00000001f5000000 0000000000000000
>  head: 000fffffc0000003 ffffea000055aa01 ffffffffffffffff 0000000000000000
>  head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
>  page dumped because: kasan: bad access detected
> 
>  Memory state around the buggy address:
>   ffff8880156a8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff8880156a8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  >ffff8880156a8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                           ^
>   ffff8880156a8200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>   ffff8880156a8280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ==================================================================
> 
> 
> ./nvme connect -t tcp -a 192.168.154.148 -s 4420 -q nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36 --hostnqn nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36 --hostid befdec4c-2234-11b2-a85c-ca77c773af36 -n nqn.io-1 --tls -vv
> Error opening /usr/local/etc/nvme/config.json, No such file or directory
> scan controller nvme0
> warning: using auto generated hostid and hostnqn
> lookup subsystem /sys/class/nvme-subsystem/nvme-subsys0/nvme0
> skipping namespace scan for ctrl nvme0
> skipping path scan for ctrl nvme0
> scan subsystem nvme-subsys0
> skipping namespace scan for subsys nqn.2019-08.org.qemu:nvme-0
> kernel supports: instance cntlid transport traddr trsvcid nqn queue_size nr_io_queues reconnect_delay ctrl_loss_tmo keep_alive_tmo hostnqn host_traddr host_iface hostid duplicate_connect disable_sqflow hdr_digest data_digest nr_write_queues nr_poll_queues tos keyring tls_key fast_io_fail_tmo discovery dhchap_secret dhchap_ctrl_secret tls
> option "concat" ignored
> connect ctrl, 'nqn=nqn.io-1,transport=tcp,traddr=192.168.154.148,trsvcid=4420,hostnqn=nqn.2014-08.org.nvmexpress:uuid:befdec4c-2234-11b2-a85c-ca77c773af36,hostid=befdec4c-2234-11b2-a85c-ca77c773af36,ctrl_loss_tmo=600,tls'
> Failed to write to /dev/nvme-fabrics: Invalid cross-device link
> could not add new controller: failed to write to nvme-fabrics device

The logs say that the connect to queue 3 fails, but it seems this
command never got send out (ftrace):

    kworker/4:0H-759     [004] .....  8771.165686: nvme_setup_cmd: nvme1: qid=0, cmdid=0, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_fabrics_type_connect recfmt=0, qid=1, sqsize=127, cattr=0, kato=0)
          <idle>-0       [004] ..s1.  8771.172062: nvme_complete_rq: nvme1: qid=0, cmdid=0, res=0x1, retries=0, flags=0x0, status=0x0
    kworker/5:0H-796     [005] .....  8771.172422: nvme_setup_cmd: nvme1: qid=0, cmdid=0, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_fabrics_type_connect recfmt=0, qid=2, sqsize=127, cattr=0, kato=0)
          <idle>-0       [005] ..s1.  8771.178292: nvme_complete_rq: nvme1: qid=0, cmdid=0, res=0x1, retries=0, flags=0x0, status=0x0

And it is very reliable to reproduce it.