BUG: Bad page map in process/Bad Swap file entry, RPI CM4 on clone syscall
Max Schulze
max.schulze at online.de
Mon Aug 15 07:58:04 PDT 2022
Hi Will,
On 15.08.22 16:22, Will Deacon wrote:
> Hi,
>
> On Fri, Aug 12, 2022 at 10:01:06PM +0200, Max Schulze wrote:
>> I run a userspace program, which does image analysis. This is compiled
>> from freepascal. The program freezes, I get below kernel oops. My program
>> is calling SysUtils.ExecuteProcess('/sbin/shutdown') when finished, I have
>> traced with strace and it hangs at the *clone syscall*.
>>
>> I have 4 different devices where this happens. Tonight I built the latest
>> kernel with debug infos (rpi-5.19.y commit c3a3eb5a3).
>>
>> $ cat /proc/cmdline
>>
>> coherent_pool=1M snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=0
>> video=HDMI-A-1:1920x1080M at 60 smsc95xx.macaddr=<>
>> vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000 console=tty1
>> root=PARTUUID=<> rootfstype=ext4 fsck.repair=yes rootwait kpti=0 nokaslr
>> mitigations=off
>>
>>
>> [20:47:09] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> [20:48:46] BUG: Bad page map in process projecta pte:1110111111111111 pmd:800000001c40003
>> [20:48:46] addr:0000007fa1c00000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c00
>> [20:48:46] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
>> [20:48:46] CPU: 0 PID: 1069 Comm: projecta Tainted: G C 5.19.0-v8+ #1
>> [20:48:46] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
>> [20:48:46] Call trace:
>> [20:48:46] dump_backtrace.part.0+0x1dc/0x1ec
>> [20:48:46] show_stack+0x24/0x80
>> [20:48:46] dump_stack_lvl+0x8c/0xb8
>> [20:48:46] dump_stack+0x1c/0x38
>> [20:48:46] print_bad_pte+0x2ec/0x350
>> [20:48:46] vm_normal_page+0x16c/0x190
>> [20:48:46] copy_page_range+0x45c/0x13c0
>> [20:48:46] dup_mm+0x5bc/0x7f4
>> [20:48:46] copy_process+0x1354/0x2370
>> [20:48:46] kernel_clone+0xf0/0x590
>> [20:48:46] __do_sys_clone+0xa4/0xe0
>> [20:48:46] __arm64_sys_clone+0x74/0x90
>> [20:48:46] invoke_syscall+0x68/0x1a0
>> [20:48:46] el0_svc_common.constprop.0+0x88/0x170
>> [20:48:46] do_el0_svc+0xcc/0xf0
>> [20:48:46] el0_svc+0x30/0x70
>> [20:48:46] el0t_64_sync_handler+0x1ac/0x1b0
>> [20:48:46] el0t_64_sync+0x18c/0x190
>> [20:48:46] Disabling lock debugging due to kernel taint
>> [20:48:46] get_swap_device: Bad swap file entry 801111112111111
>> [20:48:46] BUG: Bad page map in process projecta pte:1211111111111111 pmd:800000001c40003
>> [20:48:46] addr:0000007fa1c02000 vm_flags:00100073 anon_vma:ffffff805bf80d08 mapping:0000000000000000 index:7fa1c02
>
> I hate to say it, but this all looks like memory corruption hitting the
> page table and possibly the 'struct page' array to me :/
Perhaps a note on the occcurence: across devices, the "bad page map" differs at pte, but somehow is mostly consistent at pmd:800000001c40003 (though I have seen 800000001c02003 and 800000001c40003). Is this some "magic value"? Because when not, I think it would be highly unlikely to be the hardware.
It is not only my program that has the problem, I have seen
[Sun Aug 14 17:30:38 2022] BUG: Bad page map in process llvmpipe-3 pte:262d2626292a2627 pmd:800000001c01003
and
[Sat Aug 13 11:53:43 2022] BUG: Bad page map in process Xorg:disk$1 pte:a098a09aa29ea8a4 pmd:800000001c01003
[Sat Aug 13 11:53:43 2022] addr:00000055a961e000 vm_flags:200100073 anon_vma:ffffff804c07d8f8 mapping:0000000000000000 index:55a961e
[Sat Aug 13 11:53:43 2022] file:(null) fault:0x0 mmap:0x0 read_folio:0x0
Now I have a kernel with KASAN and KFENCE, I get this additional Output, is that of any help?
14:26:12 kernel: BUG: Bad page map in process projecta pte:8e8d8a938a918d88 pmd:800000001c01003
14:26:12 kernel: addr:0000007fa5601000 vm_flags:00100073 anon_vma:ffffff805e9898f8 mapping:0000000000000000 index:7fa5601
14:26:12 kernel: file:(null) fault:0x0 mmap:0x0 read_folio:0x0
14:26:12 kernel: CPU: 3 PID: 1053 Comm: projecta Tainted: G C 5.19.0-v8-0815+ #5
14:26:12 kernel: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
14:26:12 kernel: Call trace:
14:26:12 kernel: dump_backtrace.part.0+0x1dc/0x1ec
14:26:12 kernel: show_stack+0x24/0x80
14:26:12 kernel: dump_stack_lvl+0x8c/0xb8
14:26:12 kernel: dump_stack+0x1c/0x38
14:26:12 kernel: print_bad_pte+0x2ec/0x350
14:26:12 kernel: vm_normal_page+0x16c/0x190
14:26:12 kernel: copy_page_range+0x45c/0x11f0
14:26:12 kernel: dup_mm+0x5bc/0x7f4
14:26:12 kernel: copy_process+0x1354/0x2370
14:26:12 kernel: kernel_clone+0xf0/0x590
14:26:12 kernel: __do_sys_clone+0xa4/0xe0
14:26:12 kernel: __arm64_sys_clone+0x74/0x90
14:26:12 kernel: invoke_syscall+0x68/0x1a0
14:26:12 kernel: el0_svc_common.constprop.0+0x88/0x170
14:26:12 kernel: do_el0_svc+0xcc/0xf0
14:26:12 kernel: el0_svc+0x30/0x70
14:26:12 kernel: el0t_64_sync_handler+0x1ac/0x1b0
14:26:12 kernel: el0t_64_sync+0x18c/0x190
14:26:12 kernel: Disabling lock debugging due to kernel taint
14:26:12 kernel: ==================================================================
14:26:12 kernel: BUG: KASAN: wild-memory-access in __sync_icache_dcache+0xc0/0x190
14:26:12 kernel: Read of size 8 at addr 000002284e2a4600 by task projecta/1053
14:26:12 kernel:
14:26:12 kernel: CPU: 3 PID: 1053 Comm: projecta Tainted: G B C 5.19.0-v8-0815+ #5
14:26:12 kernel: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
14:26:12 kernel: Call trace:
14:26:12 kernel: dump_backtrace.part.0+0x1dc/0x1ec
14:26:12 kernel: show_stack+0x24/0x80
14:26:12 kernel: dump_stack_lvl+0x8c/0xb8
14:26:12 kernel: print_report+0xcc/0x580
14:26:12 kernel: kasan_report+0xa8/0x170
14:26:12 kernel: __asan_load8+0x94/0xd0
14:26:12 kernel: __sync_icache_dcache+0xc0/0x190
14:26:12 kernel: set_pte_at+0x20c/0x280
14:26:12 kernel: copy_page_range+0x804/0x11f0
14:26:12 kernel: dup_mm+0x5bc/0x7f4
14:26:12 kernel: copy_process+0x1354/0x2370
14:26:12 kernel: kernel_clone+0xf0/0x590
14:26:12 kernel: __do_sys_clone+0xa4/0xe0
14:26:12 kernel: __arm64_sys_clone+0x74/0x90
14:26:12 kernel: invoke_syscall+0x68/0x1a0
14:26:12 kernel: el0_svc_common.constprop.0+0x88/0x170
14:26:12 kernel: do_el0_svc+0xcc/0xf0
14:26:12 kernel: el0_svc+0x30/0x70
14:26:12 kernel: el0t_64_sync_handler+0x1ac/0x1b0
14:26:12 kernel: el0t_64_sync+0x18c/0x190
14:26:12 kernel: ==================================================================
14:26:12 kernel: Unable to handle kernel paging request at virtual address 000002284e2a4600
14:26:12 kernel: Mem abort info:
14:26:12 kernel: ESR = 0x0000000096000004
14:26:12 kernel: EC = 0x25: DABT (current EL), IL = 32 bits
14:26:12 kernel: SET = 0, FnV = 0
14:26:12 kernel: EA = 0, S1PTW = 0
14:26:12 kernel: FSC = 0x04: level 0 translation fault
14:26:12 kernel: Data abort info:
14:26:12 kernel: ISV = 0, ISS = 0x00000004
14:26:12 kernel: CM = 0, WnR = 0
14:26:12 kernel: [000002284e2a4600] address between user and kernel address ranges
14:26:12 kernel: Internal error: Oops: 96000004 [#1] PREEMPT SMP
14:26:12 kernel: Modules linked in: ov9281 rtc_pcf85063 regmap_i2c rfkill v3d bcm2835_unicam rpivid_hevc(C) v4l2_dv_timings bcm2835_v4l2(C) v4l2_fwnode bcm2835_codec(C) v4l2_async videobuf2_vmalloc v4l2_mem2mem gpu_sched bcm2835_isp(C) bcm2835_mmal_vchiq(C) videobuf2_dma_contig drm_shmem_helper videobuf2_memops videobuf2_v4l2 i2c_mux_pinctrl videobuf2_common i2c_mux raspberrypi_hwmon videodev i2c_brcmstb joydev i2c_bcm2835 hid_microsoft ff_memless vc_sm_cma(C) mc nvmem_rmem uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight ipv6
14:26:12 kernel: CPU: 3 PID: 1053 Comm: projecta Tainted: G B C 5.19.0-v8-0815+ #5
14:26:12 kernel: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
14:26:12 kernel: pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
14:26:12 kernel: pc : __sync_icache_dcache+0xc0/0x190
14:26:12 kernel: lr : __sync_icache_dcache+0xc0/0x190
14:26:12 kernel: sp : ffffffc00cbe7590
14:26:12 kernel: x29: ffffffc00cbe7590 x28: fffffffe00070068 x27: fffffffe00be37e8
14:26:12 kernel: x26: 0000007fa5601000 x25: 8e858a938a918d88 x24: 0000000000000000
14:26:12 kernel: x23: fffffffe00000000 x22: 000002284e2a4600 x21: 1ffffff80197ceba
14:26:12 kernel: x20: 0000000000000000 x19: 0000022a4e2a4600 x18: 0000000000000000
14:26:12 kernel: x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
14:26:12 kernel: x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: ffffffb8014cfe11
14:26:12 kernel: x11: 1ffffff8014cfe10 x10: ffffffb8014cfe10 x9 : dfffffc000000000
14:26:12 kernel: x8 : ffffffc00a67f087 x7 : 0000000000000001 x6 : 00000047feb301f0
14:26:12 kernel: x5 : ffffffc00a67f080 x4 : ffffffb8014cfe11 x3 : ffffffc0080b88a4
14:26:12 kernel: x2 : 0000000000000000 x1 : ffffff804a0e0040 x0 : 0000000000000001
14:26:12 kernel: Call trace:
14:26:12 kernel: __sync_icache_dcache+0xc0/0x190
14:26:12 kernel: set_pte_at+0x20c/0x280
14:26:12 kernel: copy_page_range+0x804/0x11f0
14:26:12 kernel: dup_mm+0x5bc/0x7f4
14:26:12 kernel: copy_process+0x1354/0x2370
14:26:12 kernel: kernel_clone+0xf0/0x590
14:26:12 kernel: __do_sys_clone+0xa4/0xe0
14:26:12 kernel: __arm64_sys_clone+0x74/0x90
14:26:12 kernel: invoke_syscall+0x68/0x1a0
14:26:12 kernel: el0_svc_common.constprop.0+0x88/0x170
14:26:12 kernel: do_el0_svc+0xcc/0xf0
14:26:12 kernel: el0_svc+0x30/0x70
14:26:12 kernel: el0t_64_sync_handler+0x1ac/0x1b0
14:26:12 kernel: el0t_64_sync+0x18c/0x190
14:26:12 kernel: Code: d37ae673 8b170276 aa1603e0 940fa1f6 (f8776a60)
14:26:12 kernel: ---[ end trace 0000000000000000 ]---
14:26:12 kernel: note: projecta[1053] exited with preempt_count 2
>
> Are you able to reproduce this problem using an upstream kernel, rather
> than one built from the raspberry pi tree?
I will try to do this, might take a while and report back.
NB: When I began building the kernel with more debug output, this is when I saw the "kmemleak: cannot insert..." I reported separately. They might be related.
>
> Another thing to consider (and apologies if this sounds silly) is the power
> supply. Are all four of your devices using the same power supply? Are you
> able to try something beefier?
No, they are all using different power supplies. The one on my desk is 12V/5A, should be plenty when recommended minimum is 1A.
>
> Finally, you could enable CONFIG_MEMTEST and boot with memtest=17 on the
> command line, just in case it's a rogue DMA or something.
I did this just now. Nothing to report:
Mon Aug 15 16:34:24 2022] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[Mon Aug 15 16:34:24 2022] Linux version 5.19.0-v8-0815+ (root at raspberrypi) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #5 SMP PREEMPT Sun Aug 14 23:20:47 CEST 2022
[Mon Aug 15 16:34:24 2022] random: crng init done
[Mon Aug 15 16:34:24 2022] Machine model: Raspberry Pi Compute Module 4 Rev 1.0
[Mon Aug 15 16:34:24 2022] efi: UEFI not found.
[Mon Aug 15 16:34:24 2022] Reserved memory: created CMA memory pool at 0x000000001ac00000, size 320 MiB
[Mon Aug 15 16:34:24 2022] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[Mon Aug 15 16:34:24 2022] early_memtest: # of tests: 17
[Mon Aug 15 16:34:24 2022] 0x0000000000001000 - 0x0000000000210000 pattern 4c494e5558726c7a
[...] long lines of patterns. Full output at https://pastebin.ubuntu.com/p/KgStKrxDfr/
[Mon Aug 15 16:34:24 2022] 0x000000007fbfa87f - 0x000000007fbfa880 pattern 0000000000000000
[Mon Aug 15 16:34:24 2022] Zone ranges:
[Mon Aug 15 16:34:24 2022] DMA [mem 0x0000000000000000-0x000000003fffffff]
[Mon Aug 15 16:34:24 2022] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[Mon Aug 15 16:34:24 2022] Normal empty
[Mon Aug 15 16:34:24 2022] Movable zone start for each node
[Mon Aug 15 16:34:24 2022] Early memory node ranges
[Mon Aug 15 16:34:24 2022] node 0: [mem 0x0000000000000000-0x0000000037ffffff]
[Mon Aug 15 16:34:24 2022] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[Mon Aug 15 16:34:24 2022] Initmem setup node 0 [mem 0x0000000000000000-0x000000007fffffff]
[Mon Aug 15 16:34:24 2022] kasan: KernelAddressSanitizer initialized (generic)
[Mon Aug 15 16:34:24 2022] percpu: Embedded 29 pages/cpu s78888 r8192 d31704 u118784
[Mon Aug 15 16:34:24 2022] pcpu-alloc: s78888 r8192 d31704 u118784 alloc=29*4096
[Mon Aug 15 16:34:24 2022] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[Mon Aug 15 16:34:24 2022] Detected PIPT I-cache on CPU0
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-v2
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-v3a
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-v4
[Mon Aug 15 16:34:24 2022] CPU features: detected: Spectre-BHB
[Mon Aug 15 16:34:24 2022] CPU features: kernel page table isolation forced OFF by kpti command line option
[Mon Aug 15 16:34:24 2022] CPU features: detected: ARM errata 1165522, 1319367, or 1530923
[Mon Aug 15 16:34:24 2022] Built 1 zonelists, mobility grouping on. Total pages: 483840
[Mon Aug 15 16:34:24 2022] Kernel command line: coherent_pool=1M snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_hdmi=0 video=HDMI-A-1:1920x1080M at 60 smsc95xx.macaddr=DC:___ vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x40000000 console=tty1 root=PARTUUID=c-02 rootfstype=ext4 fsck.repair=yes rootwait kpti=0 nokaslr mitigations=off memtest=17
[Mon Aug 15 16:34:24 2022] Unknown kernel command line parameters "nokaslr", will be passed to user space.
[Mon Aug 15 16:34:24 2022] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[Mon Aug 15 16:34:24 2022] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[Mon Aug 15 16:34:24 2022] mem auto-init: stack:off, heap alloc:off, heap free:off
Thanks,
Max
More information about the linux-arm-kernel
mailing list