[PATCH 2/2] mm: make copy_to_kernel_nofault() not fault on user addresses

Omar Sandoval osandov at osandov.com
Sun Sep 1 22:31:29 PDT 2024


From: Omar Sandoval <osandov at fb.com>

I found that on x86, copy_to_kernel_nofault() still faults on addresses
outside of the kernel address range (including NULL):

  # echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc
  # echo g > /proc/sysrq-trigger
  ...
  [15]kdb> mm 0 1234
  [   94.652476] BUG: kernel NULL pointer dereference, address: 0000000000000000
  [   94.652481] #PF: supervisor write access in kernel mode
  [   94.652483] #PF: error_code(0x0002) - not-present page
  [   94.652485] PGD 0 P4D 0
  [   94.652489] Oops: Oops: 0002 [#1] PREEMPT SMP PTI
  [   94.652493] CPU: 15 UID: 0 PID: 619 Comm: bash Not tainted 6.11.0-rc6 #11
  [   94.652497] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
  [   94.652498] RIP: 0010:copy_to_kernel_nofault+0x1c/0x90
  [   94.652505] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 65 48 8b 0d 14 9d 72 4c 83 81 04 1b 00 00 01 48 83 fa 07 76 18 48 8b 06 <48> 89 07 48 83 ea 08 48 83 c7 08 48 83 c6 08 48 83 fa 07 77 e8 48
  [   94.652507] RSP: 0018:ffffa4a640b1fa48 EFLAGS: 00010002
  [   94.652509] RAX: 00000000000004d2 RBX: 0000000000000003 RCX: ffff8a8251020000
  [   94.652510] RDX: 0000000000000008 RSI: ffffa4a640b1fa68 RDI: 0000000000000000
  [   94.652512] RBP: 0000000000000000 R08: ffffa4a640b1f9f0 R09: ffffa4a640b1f9f8
  [   94.652513] R10: 000000000000000a R11: f000000000000000 R12: 0000000000000000
  [   94.652515] R13: 00000000000004d2 R14: 0000000000000000 R15: ffff8a8251020000
  [   94.652518] FS:  00007fa9c15f6b80(0000) GS:ffff8a895f7c0000(0000) knlGS:0000000000000000
  [   94.652520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   94.652521] CR2: 0000000000000000 CR3: 000000010ff72004 CR4: 0000000000770ef0
  [   94.652522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [   94.652523] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [   94.652524] PKRU: 55555554
  [   94.652525] Call Trace:
  [   94.652528]  <TASK>
  [   94.652534]  ? __die+0x24/0x70
  [   94.652538]  ? page_fault_oops+0x15a/0x480
  [   94.652543]  ? kdb_msg_write.part.0+0x3e/0xb0
  [   94.652547]  ? vkdb_printf+0x1a6/0x8b0
  [   94.652550]  ? exc_page_fault+0x72/0x160
  [   94.652555]  ? asm_exc_page_fault+0x26/0x30
  [   94.652564]  ? copy_to_kernel_nofault+0x1c/0x90
  [   94.652566]  kdb_putarea_size+0x15/0x80
  [   94.652570]  kdb_putword+0x72/0xc0
  [   94.652573]  kdb_mm+0xdc/0x130
  [   94.652576]  kdb_parse+0x2b7/0x6c0
  [   94.652578]  ? kdb_getstr+0x40a/0x910
  [   94.652581]  kdb_main_loop+0x4a0/0xa40
  [   94.652584]  ? module_event+0x10/0x10
  [   94.652590]  kdb_stub+0x1ae/0x3f0
  [   94.652594]  kgdb_cpu_enter+0x2a8/0x5f0
  [   94.652599]  kgdb_handle_exception+0xbd/0x100
  [   94.652605]  __kgdb_notify+0x30/0xd0
  [   94.652610]  kgdb_notify+0x15/0x30
  [   94.652612]  notifier_call_chain+0x5b/0xd0
  [   94.652618]  notify_die+0x53/0x80
  [   94.652622]  exc_int3+0xf9/0x130
  [   94.652626]  asm_exc_int3+0x39/0x40

Note that copy_to_kernel_nofault() uses pagefault_disable(), but it
still faults. This is because with Supervisor Mode Access Prevention
(SMAP) enabled, do_user_addr_fault() Oopses on a fault for a user
address from kernel space _before_ checking faulthandler_disabled().

copy_from_kernel_nofault() avoids this by checking that the address is
in the kernel before doing the actual memory access. Do the same in
copy_to_kernel_nofault() so that we get an error as expected:

  # echo ttyS0 > /sys/module/kgdboc/parameters/kgdboc
  # echo g > /proc/sysrq-trigger
  ...
  [17]kdb> mm 0 1234
  kdb_putarea_size: Bad address 0x0
  diag: -21: Invalid address

Signed-off-by: Omar Sandoval <osandov at fb.com>
---
 mm/maccess.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/maccess.c b/mm/maccess.c
index 72e9c03ea37f..d67dee51a1cc 100644
--- a/mm/maccess.c
+++ b/mm/maccess.c
@@ -61,6 +61,9 @@ long copy_to_kernel_nofault(void *dst, const void *src, size_t size)
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
 		align = (unsigned long)dst | (unsigned long)src;
 
+	if (!copy_kernel_nofault_allowed(dst, size))
+		return -ERANGE;
+
 	pagefault_disable();
 	if (!(align & 7))
 		copy_to_kernel_nofault_loop(dst, src, size, u64, Efault);
-- 
2.46.0




More information about the linux-um mailing list