BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall

Max Schulze max.schulze at online.de
Mon Aug 15 07:58:04 PDT 2022


Hi Will,


On 15.08.22 16:22, Will Deacon wrote:
> Hi,
> 
> On Fri, Aug 12, 2022 at 10:01:06PM +0200, Max Schulze wrote:
>> I run a userspace program, which does image analysis. This is compiled
>> from freepascal. The program freezes, I get below kernel oops. My program
>> is calling SysUtils.ExecuteProcess('/sbin/shutdown') when finished, I have
>> traced with strace and it hangs at the *clone syscall*.
>>
>> I have 4 different devices where this happens. Tonight I built the latest
>> kernel with debug infos (rpi-5.19.y commit c3a3eb5a3).
>>
>> $ cat /proc/cmdline
>>
>> coherent_pool=1M snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=0
>> video=HDMI-A-1:1920x1080M at 60 smsc95xx.macaddr=<>
>> vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=tty1
>> root=PARTUUID=<> rootfstype=ext4 fsck.repair=yes rootwait kpti=0 nokaslr
>> mitigations=off
>>
>>
>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> [20:48:46] BUG: Bad page map in process projecta  pte:1110111111111111 pmd:800000001c40003
>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
>> [20:48:46] CPU: 0 PID: 1069 Comm: projecta Tainted: G         C        5.19.0-v8+ #1
>> [20:48:46] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
>> [20:48:46] Call trace:
>> [20:48:46]  dump_backtrace.part.0+0x1dc/0x1ec
>> [20:48:46]  show_stack+0x24/0x80
>> [20:48:46]  dump_stack_lvl+0x8c/0xb8
>> [20:48:46]  dump_stack+0x1c/0x38
>> [20:48:46]  print_bad_pte+0x2ec/0x350
>> [20:48:46]  vm_normal_page+0x16c/0x190
>> [20:48:46]  copy_page_range+0x45c/0x13c0
>> [20:48:46]  dup_mm+0x5bc/0x7f4
>> [20:48:46]  copy_process+0x1354/0x2370
>> [20:48:46]  kernel_clone+0xf0/0x590
>> [20:48:46]  __do_sys_clone+0xa4/0xe0
>> [20:48:46]  __arm64_sys_clone+0x74/0x90
>> [20:48:46]  invoke_syscall+0x68/0x1a0
>> [20:48:46]  el0_svc_common.constprop.0+0x88/0x170
>> [20:48:46]  do_el0_svc+0xcc/0xf0
>> [20:48:46]  el0_svc+0x30/0x70
>> [20:48:46]  el0t_64_sync_handler+0x1ac/0x1b0
>> [20:48:46]  el0t_64_sync+0x18c/0x190
>> [20:48:46] Disabling lock debugging due to kernel taint
>> [20:48:46] get_swap_device: Bad swap file entry 801111112111111
>> [20:48:46] BUG: Bad page map in process projecta  pte:1211111111111111 pmd:800000001c40003
>> [20:48:46] addr:0000007fa1c02000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c02
> 

> I hate to say it, but this all looks like memory corruption hitting the
> page table and possibly the 'struct page' array to me :/

Perhaps a note on the occcurence: across devices, the "bad page map" differs at pte, but somehow is mostly consistent at pmd:800000001c40003 (though I have seen 800000001c02003 and 800000001c40003). Is this some "magic value"? Because when not, I think it would be highly unlikely to be the hardware.

It is not only my program that has the problem, I have seen

[Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3  pte:262d2626292a2627 pmd:800000001c01003

and
[Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1  pte:a098a09aa29ea8a4 pmd:800000001c01003
[Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e
[Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0


Now I have a kernel with KASAN and KFENCE, I get this additional Output, is that of any help?

 14:26:12 kernel: BUG: Bad page map in process projecta  pte:8e8d8a938a918d88 pmd:800000001c01003
 14:26:12 kernel: addr:0000007fa5601000 vm_flags:00100073 anon_vma:ffffff805e9898f8 mapping:0000000000000000 index:7fa5601
 14:26:12 kernel: file:(null) fault:0x0 mmap:0x0 read_folio:0x0
 14:26:12 kernel: CPU: 3 PID: 1053 Comm: projecta Tainted: G         C        5.19.0-v8-0815+ #5
 14:26:12 kernel: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
 14:26:12 kernel: Call trace:
 14:26:12 kernel:  dump_backtrace.part.0+0x1dc/0x1ec
 14:26:12 kernel:  show_stack+0x24/0x80
 14:26:12 kernel:  dump_stack_lvl+0x8c/0xb8
 14:26:12 kernel:  dump_stack+0x1c/0x38
 14:26:12 kernel:  print_bad_pte+0x2ec/0x350
 14:26:12 kernel:  vm_normal_page+0x16c/0x190
 14:26:12 kernel:  copy_page_range+0x45c/0x11f0
 14:26:12 kernel:  dup_mm+0x5bc/0x7f4
 14:26:12 kernel:  copy_process+0x1354/0x2370
 14:26:12 kernel:  kernel_clone+0xf0/0x590
 14:26:12 kernel:  __do_sys_clone+0xa4/0xe0
 14:26:12 kernel:  __arm64_sys_clone+0x74/0x90
 14:26:12 kernel:  invoke_syscall+0x68/0x1a0
 14:26:12 kernel:  el0_svc_common.constprop.0+0x88/0x170
 14:26:12 kernel:  do_el0_svc+0xcc/0xf0
 14:26:12 kernel:  el0_svc+0x30/0x70
 14:26:12 kernel:  el0t_64_sync_handler+0x1ac/0x1b0
 14:26:12 kernel:  el0t_64_sync+0x18c/0x190
 14:26:12 kernel: Disabling lock debugging due to kernel taint
 14:26:12 kernel: ==================================================================
 14:26:12 kernel: BUG: KASAN: wild-memory-access in __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel: Read of size 8 at addr 000002284e2a4600 by task projecta/1053
 14:26:12 kernel: 
 14:26:12 kernel: CPU: 3 PID: 1053 Comm: projecta Tainted: G    B    C        5.19.0-v8-0815+ #5
 14:26:12 kernel: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
 14:26:12 kernel: Call trace:
 14:26:12 kernel:  dump_backtrace.part.0+0x1dc/0x1ec
 14:26:12 kernel:  show_stack+0x24/0x80
 14:26:12 kernel:  dump_stack_lvl+0x8c/0xb8
 14:26:12 kernel:  print_report+0xcc/0x580
 14:26:12 kernel:  kasan_report+0xa8/0x170
 14:26:12 kernel:  __asan_load8+0x94/0xd0
 14:26:12 kernel:  __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel:  set_pte_at+0x20c/0x280
 14:26:12 kernel:  copy_page_range+0x804/0x11f0
 14:26:12 kernel:  dup_mm+0x5bc/0x7f4
 14:26:12 kernel:  copy_process+0x1354/0x2370
 14:26:12 kernel:  kernel_clone+0xf0/0x590
 14:26:12 kernel:  __do_sys_clone+0xa4/0xe0
 14:26:12 kernel:  __arm64_sys_clone+0x74/0x90
 14:26:12 kernel:  invoke_syscall+0x68/0x1a0
 14:26:12 kernel:  el0_svc_common.constprop.0+0x88/0x170
 14:26:12 kernel:  do_el0_svc+0xcc/0xf0
 14:26:12 kernel:  el0_svc+0x30/0x70
 14:26:12 kernel:  el0t_64_sync_handler+0x1ac/0x1b0
 14:26:12 kernel:  el0t_64_sync+0x18c/0x190
 14:26:12 kernel: ==================================================================
 14:26:12 kernel: Unable to handle kernel paging request at virtual address 000002284e2a4600
 14:26:12 kernel: Mem abort info:
 14:26:12 kernel:   ESR = 0x0000000096000004
 14:26:12 kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
 14:26:12 kernel:   SET = 0, FnV = 0
 14:26:12 kernel:   EA = 0, S1PTW = 0
 14:26:12 kernel:   FSC = 0x04: level 0 translation fault
 14:26:12 kernel: Data abort info:
 14:26:12 kernel:   ISV = 0, ISS = 0x00000004
 14:26:12 kernel:   CM = 0, WnR = 0
 14:26:12 kernel: [000002284e2a4600] address between user and kernel address ranges
 14:26:12 kernel: Internal error: Oops: 96000004 [#1] PREEMPT SMP
 14:26:12 kernel: Modules linked in: ov9281 rtc_pcf85063 regmap_i2c rfkill v3d bcm2835_unicam rpivid_hevc(C) v4l2_dv_timings bcm2835_v4l2(C) v4l2_fwnode bcm2835_codec(C) v4l2_async videobuf2_vmalloc v4l2_mem2mem gpu_sched bcm2835_isp(C) bcm2835_mmal_vchiq(C) videobuf2_dma_contig drm_shmem_helper videobuf2_memops videobuf2_v4l2 i2c_mux_pinctrl videobuf2_common i2c_mux raspberrypi_hwmon videodev i2c_brcmstb joydev i2c_bcm2835 hid_microsoft ff_memless vc_sm_cma(C) mc nvmem_rmem uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight ipv6
 14:26:12 kernel: CPU: 3 PID: 1053 Comm: projecta Tainted: G    B    C        5.19.0-v8-0815+ #5
 14:26:12 kernel: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
 14:26:12 kernel: pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 14:26:12 kernel: pc : __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel: lr : __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel: sp : ffffffc00cbe7590
 14:26:12 kernel: x29: ffffffc00cbe7590 x28: fffffffe00070068 x27: fffffffe00be37e8
 14:26:12 kernel: x26: 0000007fa5601000 x25: 8e858a938a918d88 x24: 0000000000000000
 14:26:12 kernel: x23: fffffffe00000000 x22: 000002284e2a4600 x21: 1ffffff80197ceba
 14:26:12 kernel: x20: 0000000000000000 x19: 0000022a4e2a4600 x18: 0000000000000000
 14:26:12 kernel: x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
 14:26:12 kernel: x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: ffffffb8014cfe11
 14:26:12 kernel: x11: 1ffffff8014cfe10 x10: ffffffb8014cfe10 x9 : dfffffc000000000
 14:26:12 kernel: x8 : ffffffc00a67f087 x7 : 0000000000000001 x6 : 00000047feb301f0
 14:26:12 kernel: x5 : ffffffc00a67f080 x4 : ffffffb8014cfe11 x3 : ffffffc0080b88a4
 14:26:12 kernel: x2 : 0000000000000000 x1 : ffffff804a0e0040 x0 : 0000000000000001
 14:26:12 kernel: Call trace:
 14:26:12 kernel:  __sync_icache_dcache+0xc0/0x190
 14:26:12 kernel:  set_pte_at+0x20c/0x280
 14:26:12 kernel:  copy_page_range+0x804/0x11f0
 14:26:12 kernel:  dup_mm+0x5bc/0x7f4
 14:26:12 kernel:  copy_process+0x1354/0x2370
 14:26:12 kernel:  kernel_clone+0xf0/0x590
 14:26:12 kernel:  __do_sys_clone+0xa4/0xe0
 14:26:12 kernel:  __arm64_sys_clone+0x74/0x90
 14:26:12 kernel:  invoke_syscall+0x68/0x1a0
 14:26:12 kernel:  el0_svc_common.constprop.0+0x88/0x170
 14:26:12 kernel:  do_el0_svc+0xcc/0xf0
 14:26:12 kernel:  el0_svc+0x30/0x70
 14:26:12 kernel:  el0t_64_sync_handler+0x1ac/0x1b0
 14:26:12 kernel:  el0t_64_sync+0x18c/0x190
 14:26:12 kernel: Code: d37ae673 8b170276 aa1603e0 940fa1f6 (f8776a60) 
 14:26:12 kernel: ---[ end trace 0000000000000000 ]---
 14:26:12 kernel: note: projecta[1053] exited with preempt_count 2





> 
> Are you able to reproduce this problem using an upstream kernel, rather
> than one built from the raspberry pi tree?

I will try to do this, might take a while and report back.

NB: When I began building the kernel with more debug output, this is when I saw the "kmemleak: cannot insert..." I reported separately. They might be related.

> 
> Another thing to consider (and apologies if this sounds silly) is the power
> supply. Are all four of your devices using the same power supply? Are you
> able to try something beefier?

No, they are all using different power supplies. The one on my desk is 12V/5A, should be plenty when recommended minimum is 1A.

> 
> Finally, you could enable CONFIG_MEMTEST and boot with memtest=17 on the
> command line, just in case it's a rogue DMA or something.

I did this just now. Nothing to report:

Mon Aug 15 16:34:24 2022] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[Mon Aug 15 16:34:24 2022] Linux version 5.19.0-v8-0815+ (root at raspberrypi) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #5 SMP PREEMPT Sun Aug 14 23:20:47 CEST 2022
[Mon Aug 15 16:34:24 2022] random: crng init done
[Mon Aug 15 16:34:24 2022] Machine model: Raspberry Pi Compute Module 4 Rev 1.0
[Mon Aug 15 16:34:24 2022] efi: UEFI not found.
[Mon Aug 15 16:34:24 2022] Reserved memory: created CMA memory pool at 0x000000001ac00000, size 320 MiB
[Mon Aug 15 16:34:24 2022] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[Mon Aug 15 16:34:24 2022] early_memtest: # of tests: 17
[Mon Aug 15 16:34:24 2022]   0x0000000000001000 - 0x0000000000210000 pattern 4c494e5558726c7a

[...] long lines of patterns. Full output at https://pastebin.ubuntu.com/p/KgStKrxDfr/

[Mon Aug 15 16:34:24 2022]   0x000000007fbfa87f - 0x000000007fbfa880 pattern 0000000000000000
[Mon Aug 15 16:34:24 2022] Zone ranges:
[Mon Aug 15 16:34:24 2022]   DMA      [mem 0x0000000000000000-0x000000003fffffff]
[Mon Aug 15 16:34:24 2022]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
[Mon Aug 15 16:34:24 2022]   Normal   empty
[Mon Aug 15 16:34:24 2022] Movable zone start for each node
[Mon Aug 15 16:34:24 2022] Early memory node ranges
[Mon Aug 15 16:34:24 2022]   node   0: [mem 0x0000000000000000-0x0000000037ffffff]
[Mon Aug 15 16:34:24 2022]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
[Mon Aug 15 16:34:24 2022] Initmem setup node 0 [mem 0x0000000000000000-0x000000007fffffff]
[Mon Aug 15 16:34:24 2022] kasan: KernelAddressSanitizer initialized (generic)
[Mon Aug 15 16:34:24 2022] percpu: Embedded 29 pages/cpu s78888 r8192 d31704 u118784
[Mon Aug 15 16:34:24 2022] pcpu-alloc: s78888 r8192 d31704 u118784 alloc=29*4096
[Mon Aug 15 16:34:24 2022] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[Mon Aug 15 16:34:24 2022] Detected PIPT I-cache on CPU0
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-v2
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-v3a
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-v4
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-BHB
[Mon Aug 15 16:34:24 2022] CPU features: kernel page table isolation forced OFF by kpti command line option
[Mon Aug 15 16:34:24 2022] CPU features: detected: ARM errata 1165522, 1319367, or 1530923
[Mon Aug 15 16:34:24 2022] Built 1 zonelists, mobility grouping on.  Total pages: 483840
[Mon Aug 15 16:34:24 2022] Kernel command line: coherent_pool=1M snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=0 video=HDMI-A-1:1920x1080M at 60 smsc95xx.macaddr=DC:___ vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000  console=tty1 root=PARTUUID=c-02 rootfstype=ext4 fsck.repair=yes rootwait kpti=0 nokaslr mitigations=off memtest=17
[Mon Aug 15 16:34:24 2022] Unknown kernel command line parameters "nokaslr", will be passed to user space.
[Mon Aug 15 16:34:24 2022] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[Mon Aug 15 16:34:24 2022] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[Mon Aug 15 16:34:24 2022] mem auto-init: stack:off, heap alloc:off, heap free:off

Thanks,

Max



More information about the linux-arm-kernel mailing list