[Bug Report] NVMe-oF/TCP - NULL Pointer Dereference in `nvmet_tcp_build_iovec`

Tue Nov 21 01:26:50 PST 2023

On 11/20/23 17:29, Alon Zahavi wrote:
> On Mon, 20 Nov 2023 at 12:50, Sagi Grimberg <sagi at grimberg.me> wrote:
>>
>>
>>
>> On 11/15/23 11:35, Alon Zahavi wrote:
>>> Just sending another reminder for this issue.
>>> Until a fix for this there is a remote DoS that can be triggered.
>>>
>>> On Mon, 6 Nov 2023 at 15:40, Alon Zahavi <zahavi.alon at gmail.com> wrote:
>>>>
>>>> # Bug Overview
>>>>
>>>> ## The Bug
>>>> A null-ptr-deref in `nvmet_tcp_build_iovec`.
>>>>
>>>> ## Bug Location
>>>> `drivers/nvme/target/tcp.c` in the function `nvmet_tcp_build_iovec`.
>>>>
>>>> ## Bug Class
>>>> Remote Denial of Service
>>>>
>>>> ## Disclaimer:
>>>> This bug was found using Syzkaller with NVMe-oF/TCP added support.
>>>>
>>
>> Hey Alon, thanks for the report.
>>
>>>> # Technical Details
>>>>
>>>> ## Kernel Report - NULL Pointer Dereference
>>>> ```
>>>> [  157.833470] BUG: kernel NULL pointer dereference, address:
>>>> 000000000000000c
>>>> [  157.833478] #PF: supervisor read access in kernel mode
>>>> [  157.833484] #PF: error_code(0x0000) - not-present page
>>>> [  157.833490] PGD 126e40067 P4D 126e40067 PUD 130d16067 PMD 0
>>>> [  157.833506] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>>> [  157.833515] CPU: 3 PID: 3067 Comm: kworker/3:3H Kdump: loaded Not
>>>> tainted 6.5.0-rc1+ #5
>>>> [  157.833525] Hardware name: VMware, Inc. VMware Virtual
>>>> Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
>>>> [  157.833532] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work
>>>> [  157.833546] RIP: 0010:nvmet_tcp_build_pdu_iovec+0x7a/0x120
>>>> [  157.833558] Code: fe 44 89 a3 20 02 00 00 49 c1 e4 05 4c 03 63 30
>>>> 4c 89 75 d0 41 89 c6 e8 34 b8 18 ff 45 85 ff 0f 84 99 00 00 00 e8 06
>>>> bd 18 ff <41> 8b 74 24 0c 41 8b 44 24 08 4c 89 e7 49 8b 0c 24 89 f2 41
>>>> 89 75
>>>> [  157.833568] RSP: 0018:ffffc9001ab83c28 EFLAGS: 00010293
>>>> [  157.833576] RAX: 0000000000000000 RBX: ffff88812b9583e0 RCX: 0000000000000000
>>>> [  157.833584] RDX: ffff888131b10000 RSI: ffffffff82191dda RDI: ffffffff82191dcc
>>>> [  157.833591] RBP: ffffc9001ab83c58 R08: 0000000000000005 R09: 0000000000000000
>>>> [  157.833598] R10: 0000000000000007 R11: 0000000000000000 R12: 0000000000000000
>>>> [  157.833605] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000007
>>>> [  157.833612] FS:  0000000000000000(0000) GS:ffff888233f80000(0000)
>>>> knlGS:0000000000000000
>>>> [  157.833630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  157.833638] CR2: 000000000000000c CR3: 0000000122dd4002 CR4: 00000000007706e0
>>>> [  157.833659] PKRU: 55555554
>>>> [  157.833686] Call Trace:
>>>> [  157.833691]  <TASK>
>>>> [  157.833712]  ? show_regs+0x6e/0x80
>>>> [  157.833745]  ? __die+0x29/0x70
>>>> [  157.833757]  ? page_fault_oops+0x278/0x740
>>>> [  157.833784]  ? up+0x3b/0x70
>>>> [  157.833835]  ? do_user_addr_fault+0x63b/0x1040
>>>> [  157.833846]  ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
>>>> [  157.833862]  ? irq_work_queue+0x95/0xc0
>>>> [  157.833874]  ? exc_page_fault+0xcf/0x390
>>>> [  157.833889]  ? asm_exc_page_fault+0x2b/0x30
>>>> [  157.833925]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
>>>> [  157.833958]  ? nvmet_tcp_build_pdu_iovec+0x6c/0x120
>>>> [  157.833971]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
>>>> [  157.833998]  ? nvmet_tcp_build_pdu_iovec+0x7a/0x120
>>>> [  157.834011]  nvmet_tcp_try_recv_pdu+0x995/0x1310
>>>> [  157.834066]  nvmet_tcp_io_work+0xe6/0xd90
>>>> [  157.834081]  process_one_work+0x3da/0x870
>>>> [  157.834112]  worker_thread+0x67/0x640
>>>> [  157.834124]  kthread+0x164/0x1b0
>>>> [  157.834138]  ? __pfx_worker_thread+0x10/0x10
>>>> [  157.834148]  ? __pfx_kthread+0x10/0x10
>>>> [  157.834162]  ret_from_fork+0x29/0x50
>>>> [  157.834180]  </TASK>
>>>> ```
>>>>
>>>> ## Description
>>>>
>>>> ### Tracing The Bug
>>>> As written above, the bug occurs during the execution of
>>>> nvmet_tcp_build_iovec. Looking at the kernel logs report we can see
>>>> the exact line of code that triggers the bug.
>>>>
>>>> Code Block 1:
>>>> ```
>>>> static void nvmet_tcp_build_pdu_iovec(struct nvmet_tcp_cmd *cmd)
>>>> {
>>>>     ...
>>>>     sg = &cmd->req.sg[cmd->sg_idx]; // #1
>>>>
>>>>     while (length) {
>>>>       u32 iov_len = min_t(u32, length, sg->length - sg_offset); // #2
>>>>     ...
>>>>     }
>>>> ...
>>>> }
>>>> ```
>>>> Breakdown:
>>>>
>>>> 1. The variable `sg` is getting the value of  `&cmd->req.sg[cmd->sg_idx]`.
>>>> At the assembly level (intel flavor):
>>>> ```
>>>> mov    DWORD PTR [rbx+0x220], r12d     ; r12 holds the `cmd` address
>>>> add    r12, QWORD PTR [rbx+0x30]          ; adding the value of
>>>> `req.sg[cmd->sg_idx]`
>>>> ```
>>>>
>>>> However, `cmd->req.sg` is NULL at this point of execution thus `sg`
>>>> will point to `0 + cmd->sg_idx`, which will most likely be either 0x0
>>>> or 0x1, a non-accessible memory addresses.
>>>>
>>>> 2. After moving the address into `sg` the driver will dereference it
>>>> later, inside the while loop.
>>>> ```
>>>> mov    esi, DWORD PTR [r12+0xc]
>>>> ```
>>>> When getting here, `r12` will point into (probably) 0x0. This means
>>>> that the CPU will try to access the memory address 0xC and will
>>>> trigger a NULL pointer dereference.
>>>>
>>>>
>>>> ## Root Cause
>>>> `req` is initialized during `nvmet_req_init`. However, the sequence
>>>> that leads into `nvmet_tcp_build_iovec` does not contain any call for
>>>> `nvmet_req_init`, thus crashing the kernel with NULL pointer
>>>> dereference. This flow of execution can also create a situation where
>>>> an uninitialized memory address will be dereferenced, which has
>>>> undefined behaviour.
>>
>> If req->sg was not allocated, we shouldn't build a corresponding iovec.
>> There is a case where we encounter a failure where nvmet_req_init is
>> not called, but instead nvmet_tcp_handle_req_failure is called and
>> should properly initialize req->sg and the corresponding iovec.
>>
>> The intention is to drain the error request from the socket, or at
>> least attempt to do so so the connection recovers.
>>
>> I'd be interested to know if this path (nvmet_tcp_handle_req_failure)
>> is taken or there is something else going on....
>>
> 
> Looking at some other kernel reports (an example below) for the same
> issue I can see that the call for `nvmet_tcp_build_pdu_iovec` is
> executed during `nvmet_tcp_handle_h2c_data_pdu`.
> The call for `nvmet_tcp_handle_h2c_data_pdu` is just too early, and
> does not have the required checks before calling
> `nvmet_tcp_build_pdu_iovec` later.

Yes, as Chaitnaya pointed out, we are missing proper error handling when
the host sends a malformed h2cdata pdu.

> 
> ```
> Call Trace:
>   <TASK>
>   nvmet_tcp_handle_h2c_data_pdu drivers/nvme/target/tcp.c:988 [inline]
>   nvmet_tcp_done_recv_pdu drivers/nvme/target/tcp.c:1020 [inline]
>   nvmet_tcp_try_recv_pdu+0xddf/0x1cb0 drivers/nvme/target/tcp.c:1182
>   nvmet_tcp_try_recv_one drivers/nvme/target/tcp.c:1306 [inline]
>   nvmet_tcp_try_recv drivers/nvme/target/tcp.c:1338 [inline]
>   nvmet_tcp_io_work+0x109/0x1510 drivers/nvme/target/tcp.c:1388
>   process_one_work+0x725/0xdc0 kernel/workqueue.c:2597
>   worker_thread+0x3e2/0x9f0 kernel/workqueue.c:2748
>   kthread+0x201/0x250 kernel/kthread.c:389
>   ret_from_fork+0x29/0x50 arch/x86/entry/entry_64.S:308
>   </TASK>
> ```
> BTW this is the same root cause of another bug I reported -
> NVMe-oF/TCP - NULL Pointer Dereference in `__nvmet_req_complete`
> (https://lore.kernel.org/linux-nvme/CAK5usQuk-R1auf3_WSgKK6UAO5qeo9zjeWxEB-Wxs2JvFhgLVA@mail.gmail.com/T/#t)

So I take it as this issue covers all the outstanding reports? I've lost
a bit of track catching up on a long backlog of threads.