BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall

Will Deacon will at kernel.org
Mon Aug 15 07:22:13 PDT 2022


Hi,

On Fri, Aug 12, 2022 at 10:01:06PM +0200, Max Schulze wrote:
> I run a userspace program, which does image analysis. This is compiled
> from freepascal. The program freezes, I get below kernel oops. My program
> is calling SysUtils.ExecuteProcess('/sbin/shutdown') when finished, I have
> traced with strace and it hangs at the *clone syscall*.
> 
> I have 4 different devices where this happens. Tonight I built the latest
> kernel with debug infos (rpi-5.19.y commit c3a3eb5a3).
> 
> $ cat /proc/cmdline
> 
> coherent_pool=1M snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=0
> video=HDMI-A-1:1920x1080M at 60 smsc95xx.macaddr=<>
> vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=tty1
> root=PARTUUID=<> rootfstype=ext4 fsck.repair=yes rootwait kpti=0 nokaslr
> mitigations=off
> 
> 
> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [20:48:46] BUG: Bad page map in process projecta  pte:1110111111111111 pmd:800000001c40003
> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
> [20:48:46] CPU: 0 PID: 1069 Comm: projecta Tainted: G         C        5.19.0-v8+ #1
> [20:48:46] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
> [20:48:46] Call trace:
> [20:48:46]  dump_backtrace.part.0+0x1dc/0x1ec
> [20:48:46]  show_stack+0x24/0x80
> [20:48:46]  dump_stack_lvl+0x8c/0xb8
> [20:48:46]  dump_stack+0x1c/0x38
> [20:48:46]  print_bad_pte+0x2ec/0x350
> [20:48:46]  vm_normal_page+0x16c/0x190
> [20:48:46]  copy_page_range+0x45c/0x13c0
> [20:48:46]  dup_mm+0x5bc/0x7f4
> [20:48:46]  copy_process+0x1354/0x2370
> [20:48:46]  kernel_clone+0xf0/0x590
> [20:48:46]  __do_sys_clone+0xa4/0xe0
> [20:48:46]  __arm64_sys_clone+0x74/0x90
> [20:48:46]  invoke_syscall+0x68/0x1a0
> [20:48:46]  el0_svc_common.constprop.0+0x88/0x170
> [20:48:46]  do_el0_svc+0xcc/0xf0
> [20:48:46]  el0_svc+0x30/0x70
> [20:48:46]  el0t_64_sync_handler+0x1ac/0x1b0
> [20:48:46]  el0t_64_sync+0x18c/0x190
> [20:48:46] Disabling lock debugging due to kernel taint
> [20:48:46] get_swap_device: Bad swap file entry 801111112111111
> [20:48:46] BUG: Bad page map in process projecta  pte:1211111111111111 pmd:800000001c40003
> [20:48:46] addr:0000007fa1c02000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c02

I hate to say it, but this all looks like memory corruption hitting the
page table and possibly the 'struct page' array to me :/

Are you able to reproduce this problem using an upstream kernel, rather
than one built from the raspberry pi tree?

Another thing to consider (and apologies if this sounds silly) is the power
supply. Are all four of your devices using the same power supply? Are you
able to try something beefier?

Finally, you could enable CONFIG_MEMTEST and boot with memtest=17 on the
command line, just in case it's a rogue DMA or something.

Will



More information about the linux-arm-kernel mailing list