[Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `nvmet_tcp_build_iovec`

Alon Zahavi zahavi.alon at gmail.com
Tue Nov 21 03:20:33 PST 2023


On Tue, 21 Nov 2023 at 11:26, Sagi Grimberg <sagi at grimberg.me> wrote:
>
>
>
> On 11/20/23 17:29, Alon Zahavi wrote:
> > On Mon, 20 Nov 2023 at 12:50, Sagi Grimberg <sagi at grimberg.me> wrote:
> >>
> >>
> >>
> >> On 11/15/23 11:35, Alon Zahavi wrote:
> >>> Just sending another reminder for this issue.
> >>> Until a fix for this there is a remote DoS that can be triggered.
> >>>
> >>> On Mon, 6 Nov 2023 at 15:40, Alon Zahavi <zahavi.alon at gmail.com> wrote:
> >>>>
> >>>> # Bug Overview
> >>>>
> >>>> ## The Bug
> >>>> A null-ptr-deref in `nvmet_tcp_build_iovec`.
> >>>>
> >>>> ## Bug Location
> >>>> `drivers/nvme/target/tcp.c` in the function `nvmet_tcp_build_iovec`.
> >>>>
> >>>> ## Bug Class
> >>>> Remote Denial of Service
> >>>>
> >>>> ## Disclaimer:
> >>>> This bug was found using Syzkaller with NVMe-oF/TCP added support.
> >>>>
> >>
> >> Hey Alon, thanks for the report.
> >>
> >>>> # Technical Details
> >>>>
> >>>> ## Kernel Report - NULL Pointer Dereference
> >>>> ```
> >>>> [  157.833470] BUG: kernel NULL pointer dereference, address:
> >>>> 000000000000000c
> >>>> [  157.833478] #PF: supervisor read access in kernel mode
> >>>> [  157.833484] #PF: error_code(0x0000) - not-present page
> >>>> [  157.833490] PGD 126e40067 P4D 126e40067 PUD 130d16067 PMD 0
> >>>> [  157.833506] Oops: 0000 [#1] PREEMPT SMP NOPTI
> >>>> [  157.833515] CPU: 3 PID: 3067 Comm: kworker/3:3H Kdump: loaded Not
> >>>> tainted 6.5.0-rc1+ #5
> >>>> [  157.833525] Hardware name: VMware, Inc. VMware Virtual
> >>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
> >>>> [  157.833532] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
> >>>> [  157.833546] RIP: 0010:nvmet_tcp_build_pdu_iovec+0x7a/0x120
> >>>> [  157.833558] Code: fe 44 89 a3 20 02 00 00 49 c1 e4 05 4c 03 63 30
> >>>> 4c 89 75 d0 41 89 c6 e8 34 b8 18 ff 45 85 ff 0f 84 99 00 00 00 e8 06
> >>>> bd 18 ff <41> 8b 74 24 0c 41 8b 44 24 08 4c 89 e7 49 8b 0c 24 89 f2 41
> >>>> 89 75
> >>>> [  157.833568] RSP: 0018:ffffc9001ab83c28 EFLAGS: 00010293
> >>>> [  157.833576] RAX: 0000000000000000 RBX: ffff88812b9583e0 RCX: 0000000000000000
> >>>> [  157.833584] RDX: ffff888131b10000 RSI: ffffffff82191dda RDI: ffffffff82191dcc
> >>>> [  157.833591] RBP: ffffc9001ab83c58 R08: 0000000000000005 R09: 0000000000000000
> >>>> [  157.833598] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000
> >>>> [  157.833605] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000007
> >>>> [  157.833612] FS:  0000000000000000(0000) GS:ffff888233f80000(0000)
> >>>> knlGS:0000000000000000
> >>>> [  157.833630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>> [  157.833638] CR2: 000000000000000c CR3: 0000000122dd4002 CR4: 00000000007706e0
> >>>> [  157.833659] PKRU: 55555554
> >>>> [  157.833686] Call Trace:
> >>>> [  157.833691]  <TASK>
> >>>> [  157.833712]  ? show_regs+0x6e/0x80
> >>>> [  157.833745]  ? __die+0x29/0x70
> >>>> [  157.833757]  ? page_fault_oops+0x278/0x740
> >>>> [  157.833784]  ? up+0x3b/0x70
> >>>> [  157.833835]  ? do_user_addr_fault+0x63b/0x1040
> >>>> [  157.833846]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
> >>>> [  157.833862]  ? irq_work_queue+0x95/0xc0
> >>>> [  157.833874]  ? exc_page_fault+0xcf/0x390
> >>>> [  157.833889]  ? asm_exc_page_fault+0x2b/0x30
> >>>> [  157.833925]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
> >>>> [  157.833958]  ? nvmet_tcp_build_pdu_iovec+0x6c/0x120
> >>>> [  157.833971]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
> >>>> [  157.833998]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
> >>>> [  157.834011]  nvmet_tcp_try_recv_pdu+0x995/0x1310
> >>>> [  157.834066]  nvmet_tcp_io_work+0xe6/0xd90
> >>>> [  157.834081]  process_one_work+0x3da/0x870
> >>>> [  157.834112]  worker_thread+0x67/0x640
> >>>> [  157.834124]  kthread+0x164/0x1b0
> >>>> [  157.834138]  ? __pfx_worker_thread+0x10/0x10
> >>>> [  157.834148]  ? __pfx_kthread+0x10/0x10
> >>>> [  157.834162]  ret_from_fork+0x29/0x50
> >>>> [  157.834180]  </TASK>
> >>>> ```
> >>>>
> >>>> ## Description
> >>>>
> >>>> ### Tracing The Bug
> >>>> As written above, the bug occurs during the execution of
> >>>> nvmet_tcp_build_iovec. Looking at the kernel logs report we can see
> >>>> the exact line of code that triggers the bug.
> >>>>
> >>>> Code Block 1:
> >>>> ```
> >>>> static void nvmet_tcp_build_pdu_iovec(struct nvmet_tcp_cmd *cmd)
> >>>> {
> >>>>     ...
> >>>>     sg = &cmd->req.sg[cmd->sg_idx]; // #1
> >>>>
> >>>>     while (length) {
> >>>>       u32 iov_len = min_t(u32, length, sg->length - sg_offset); // #2
> >>>>     ...
> >>>>     }
> >>>> ...
> >>>> }
> >>>> ```
> >>>> Breakdown:
> >>>>
> >>>> 1. The variable `sg` is getting the value of  `&cmd->req.sg[cmd->sg_idx]`.
> >>>> At the assembly level (intel flavor):
> >>>> ```
> >>>> mov    DWORD PTR [rbx+0x220], r12d     ; r12 holds the `cmd` address
> >>>> add    r12, QWORD PTR [rbx+0x30]          ; adding the value of
> >>>> `req.sg[cmd->sg_idx]`
> >>>> ```
> >>>>
> >>>> However, `cmd->req.sg` is NULL at this point of execution thus `sg`
> >>>> will point to `0 + cmd->sg_idx`, which will most likely be either 0x0
> >>>> or 0x1, a non-accessible memory addresses.
> >>>>
> >>>> 2. After moving the address into `sg` the driver will dereference it
> >>>> later, inside the while loop.
> >>>> ```
> >>>> mov    esi, DWORD PTR [r12+0xc]
> >>>> ```
> >>>> When getting here, `r12` will point into (probably) 0x0. This means
> >>>> that the CPU will try to access the memory address 0xC and will
> >>>> trigger a NULL pointer dereference.
> >>>>
> >>>>
> >>>> ## Root Cause
> >>>> `req` is initialized during `nvmet_req_init`. However, the sequence
> >>>> that leads into `nvmet_tcp_build_iovec` does not contain any call for
> >>>> `nvmet_req_init`, thus crashing the kernel with NULL pointer
> >>>> dereference. This flow of execution can also create a situation where
> >>>> an uninitialized memory address will be dereferenced, which has
> >>>> undefined behaviour.
> >>
> >> If req->sg was not allocated, we shouldn't build a corresponding iovec.
> >> There is a case where we encounter a failure where nvmet_req_init is
> >> not called, but instead nvmet_tcp_handle_req_failure is called and
> >> should properly initialize req->sg and the corresponding iovec.
> >>
> >> The intention is to drain the error request from the socket, or at
> >> least attempt to do so so the connection recovers.
> >>
> >> I'd be interested to know if this path (nvmet_tcp_handle_req_failure)
> >> is taken or there is something else going on....
> >>
> >
> > Looking at some other kernel reports (an example below) for the same
> > issue I can see that the call for `nvmet_tcp_build_pdu_iovec` is
> > executed during `nvmet_tcp_handle_h2c_data_pdu`.
> > The call for `nvmet_tcp_handle_h2c_data_pdu` is just too early, and
> > does not have the required checks before calling
> > `nvmet_tcp_build_pdu_iovec` later.
>
> Yes, as Chaitnaya pointed out, we are missing proper error handling when
> the host sends a malformed h2cdata pdu.
>
> >
> > ```
> > Call Trace:
> >   <TASK>
> >   nvmet_tcp_handle_h2c_data_pdu drivers/nvme/target/tcp.c:988 [inline]
> >   nvmet_tcp_done_recv_pdu drivers/nvme/target/tcp.c:1020 [inline]
> >   nvmet_tcp_try_recv_pdu+0xddf/0x1cb0 drivers/nvme/target/tcp.c:1182
> >   nvmet_tcp_try_recv_one drivers/nvme/target/tcp.c:1306 [inline]
> >   nvmet_tcp_try_recv drivers/nvme/target/tcp.c:1338 [inline]
> >   nvmet_tcp_io_work+0x109/0x1510 drivers/nvme/target/tcp.c:1388
> >   process_one_work+0x725/0xdc0 kernel/workqueue.c:2597
> >   worker_thread+0x3e2/0x9f0 kernel/workqueue.c:2748
> >   kthread+0x201/0x250 kernel/kthread.c:389
> >   ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
> >   </TASK>
> > ```
> > BTW this is the same root cause of another bug I reported -
> > NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`
> > (https://lore.kernel.org/linux-nvme/CAK5usQuk-R1auf3_WSgKK6UAO5qeo9zjeWxEB-Wxs2JvFhgLVA@mail.gmail.com/T/#t)
>
> So I take it as this issue covers all the outstanding reports? I've lost
> a bit of track catching up on a long backlog of threads.

This issue covers only two bug reports I sent - this one, and
https://lore.kernel.org/linux-nvme/CAK5usQuk-R1auf3_WSgKK6UAO5qeo9zjeWxEB-Wxs2JvFhgLVA@mail.gmail.com/T/#t
There is a third report that doesn't seem related.



More information about the Linux-nvme mailing list