m68k 54418 fails to execute user space

Michael Schmitz schmitzmic at gmail.com
Wed Jun 26 12:36:58 PDT 2024


Jean-Michel,

On 27/06/24 01:28, Jean-Michel Hautbois wrote:
> Hi Michael,
>
> On 26/06/2024 03:56, Michael Schmitz wrote:
>> Jean-Michel,
>>
>> On 24/06/24 20:56, Jean-Michel Hautbois wrote:
>>>
>>> When I printk the do_page_fault first debug, I get for the first 
>>> call to ls:
>>> bash-5.2# ls
>>> [   14.700000] do page fault:
>>> [   14.700000] regs->sr=0x0, regs->pc=0x70069ee6, 
>>> address=0x70069ee6, 0, (ptrval)
>>
>> Page not present, read fault. Please disable obfuscation of kernel 
>> pointer addresses by printk. Maybe also disable address space 
>> randomization while debugging this.
>>
>>> This call works almost fine (I still have the assert failed: 
>>> folio->private != NULL issue).
>>>
>>> And when I call it a second time, I get:
>>> bash-5.2# ls
>>> [   19.820000] do page fault:
>>> [   19.820000] regs->sr=0x0, regs->pc=0x6011d65a, 
>>> address=0x700e2004, 2, (ptrval)
>>
>> Page not present, write fault.
>>
>> It would be helpful if you could get a dump of /proc/1/maps before 
>> the execve() syscall in your helloworld init replacement. That might 
>> confirm all these addresses are legit (assuming mappings survive 
>> across execve(), that is), and what they correspond to.
>>
>>>
>>> The address corresponds to the defined zone ELF_ET_DYN_BASE as I set 
>>> it to 0x70000000.
>>>
>>> regs->pc is not the same as the address. It might be unrelevant, but 
>>> any help is appreciated to understand the process behind :-).
>>>
>>> I keep digging, and I am in the asm part which fears me a bit !
>>
>> I don't see that you'd need to look at any asm code here.
>
> I add a small test in do_page_fault, and in case of an error, it 
> panics. The result follows:

Please take a look at the comments at the start of 
arch/m68k/mm/fault.c:do_page_fault(). The meaning of the bits in 
error_code are explained there.

error_code != 0 is just one possible case out of the four that are 
handled by do_page_fault(). It does not signify 'no error' - if there 
hadn't been a page fault, do_page_fault() would not have been called.

You just forced a panic each time a write fault and/or a protection 
fault happens. Write faults are absolutely expected to happen when 
loading a library - ld.so needs to perform relocation after loading a 
dynamic library, and that means writes to the GOT in the library's data 
segment (PIC assumed).


>  ./scripts/decode_stacktrace.sh vmlinux < /tmp/trace.log
> [    3.857000] Run /bin/bash as init process
> [    3.858000]   with arguments:
> [    3.861000]     /bin/bash
> [    3.862000]   with environment:
> [    3.863000]     HOME=/
> [    3.864000]     TERM=linux
> [    4.242000] do page fault:
> [    4.242000] regs->sr=0x2000, regs->pc=0x41366924, 
> address=0x700b3364, 2, 41fb0000
> [    4.242000] Kernel panic - not syncing: page fault error
> [    4.242000] CPU: 0 PID: 1 Comm: bash Not tainted 
> 6.10.0-rc5-g927da6cf01fe-dirty #25
> [    4.242000] Stack from 4186dda8:
> [    4.242000]         4186dda8 41423aa4 41423aa4 700b3300 00000001 
> 00000000 4136ee10 41423aa4
> [    4.242000]         41366d7a 700b3364 700b3364 00000000 0000000d 
> 4186de60 41fb0000 41d51a60
> [    4.242000]         41005696 41416a90 41416a4d 00002000 41366924 
> 700b3364 00000002 41fb0000
> [    4.242000]         0000000a 700b3364 00000000 0000000d 00000012 
> 41d51a00 4186de60 41d51a60
> [    4.242000]         41fb81c0 41d51a60 410052fe 4100529a 4186de60 
> 700b3364 00000002 00000000
> [    4.242000]         700bc414 00000003 00008000 700ac000 41003660 
> 4186de60 00000000 00000000
> [    4.242000] Call Trace: dump_stack (lib/dump_stack.c:124)
> [    4.242000] panic (kernel/panic.c:266 kernel/panic.c:368)
> [    4.242000] do_page_fault (arch/m68k/mm/fault.c:88 (discriminator 1))
> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:725 
> arch/m68k/kernel/traps.c:775)
> [    4.242000] buserr_c (arch/m68k/kernel/traps.c:748 
> arch/m68k/kernel/traps.c:775)
> [    4.242000] buserr (arch/m68k/kernel/entry.S:116)
> [    4.242000] ma_slots (lib/maple_tree.c:759)
> [    4.242000] __clear_user (arch/m68k/lib/uaccess.c:108)
> [    4.242000] elf_load (fs/binfmt_elf.c:125 (discriminator 1) 
> fs/binfmt_elf.c:421 (discriminator 1))
> [    4.242000] load_elf_binary (fs/binfmt_elf.c:1132)
> [    4.242000] memset (arch/m68k/lib/memset.c:11)
> [    4.242000] load_misc_binary (fs/binfmt_misc.c:97 
> fs/binfmt_misc.c:146 fs/binfmt_misc.c:213)
> [    4.242000] memset (arch/m68k/lib/memset.c:11)
> [    4.242000] bprm_execve (fs/exec.c:1797 fs/exec.c:1839 
> fs/exec.c:1891 fs/exec.c:1867)
> [    4.242000] copy_strings_kernel (fs/exec.c:669)
> [    4.242000] count_strings_kernel (fs/exec.c:473)
> [    4.242000] kernel_execve (fs/exec.c:2058)
> [    4.242000] __dynamic_pr_debug (lib/dynamic_debug.c:865)
> [    4.242000] run_init_process (init/main.c:1389)
> [    4.242000] _printk (kernel/printk/printk.c:2365)
> [    4.242000] kernel_init (init/main.c:1508)
> [    4.242000] kernel_init (init/main.c:1459)
> [    4.242000] ret_from_kernel_thread (arch/m68k/kernel/entry.S:142)
> [    4.242000]
> [    4.242000] ---[ end Kernel panic - not syncing: page fault error ]---
>
> Looks like a memory mapping failure, but why ?
> My JTAG at this point dumps a list of 0s at 0x41fb0000 and my SDRAM 
> starts at 0x40000000 and ends at 0x50000000 (256MB).
0x41fb0000 seems to be init's page directory. The fault address is in 
the range where I'd expect dynamic libraries to reside.
>
> It looks like a TLB write miss which is obscure to me :-).
>
> I tried to use the /proc but as expected it is not alive after 
> mounting it.

The memory map ought to be accessible through sysrq - an alternative 
would be to modify the ELF binfmt handler and dump the map once ld.so 
has finished with relocations.

Cheers,

     Michael


> Thanks,
> JM
>
>
>> Cheers,
>>
>>      Michael
>>
>>>
>>> Thanks !
>>> JM



More information about the linux-mtd mailing list