cacheflush completely broken, suspecting PAN+LPAE
Michał Pecio
michal.pecio at gmail.com
Tue Nov 12 01:32:29 PST 2024
Hi Linus,
On Tue, 12 Nov 2024 02:15:19 +0100, Linus Walleij wrote:
> We are trying to locate the issue, which I think is the same as this
> but not sure:
> https://bugzilla.kernel.org/show_bug.cgi?id=219247
You can verify by asking the reporter to run the crashing program under
strace. If SIGSEGV follows a failed cacheflush, it's my bug most likely.
A straightforward repro of this bug:
gdb
GUILE_JIT_THRESHOLD=0 gdb
GUILE_JIT_THRESHOLD=-1 gdb
Expected outcome: segfault, segfault, shows command prompt.
> I have been trying to replicate it on a Chromebook but didn't get so
> far yet because the installation is pretty idiomatic :/ also there is
> only appears in a single Qt program and not as predictable as here.
My bug also appears in a single program ;) This system works fine, but
any JIT is broken by this kind of bug. The failure may be random if the
caches resynchronize by a fluke, but with gdb it was every time so far.
> But. It appears the code is issuing cacheflush() which I guess ends
> up in arm_syscall() here:
>
> case NR(cacheflush):
> return do_cache_op(regs->ARM_r0, regs->ARM_r1, regs->ARM_r2);
>
> To here:
>
> static inline int
> do_cache_op(unsigned long start, unsigned long end, int flags)
> {
> if (end < start || flags)
> return -EINVAL;
>
> if (!access_ok((void __user *)start, end - start))
> return -EFAULT;
>
> return __do_cache_op(start, end);
> }
Yep. I added printks here and it is particularly the call to
flush_icache_range() from __do_cache_op() which returns -EFAULT.
> Here userspace access should be fine because we have entered a
> syscall from userspace. I tried to emulate the situation with this
> program:
>
> #include <stdlib.h>
> #include <stdio.h>
> #include <errno.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/mman.h>
>
> #define NR_cacheflush 0xf0002
>
> /* libgcc */
> extern void __clear_cache(void *, void *);
>
> int main (int argc, char **argv) {
> void *addr;
> int ret;
>
> printf("Test()\n");
> addr = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
> if (addr == MAP_FAILED) {
> printf("mmap() failed\n");
> exit(1);
> }
This seems incomplete, there is no __clear_cache(). But if you add it
at the end then yes, it should fail. Confirm it with strace.
> I added prints in the cacheflush trap:
>
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index 480e307501bb..400650519bd1 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -592,11 +592,14 @@ __do_cache_op(unsigned long start, unsigned
> long end) static inline int
> do_cache_op(unsigned long start, unsigned long end, int flags)
> {
> + pr_info("%s(%08lx-%08lx)\n", __func__, start, end);
> if (end < start || flags)
> return -EINVAL;
>
> - if (!access_ok((void __user *)start, end - start))
> + if (!access_ok((void __user *)start, end - start)) {
> + pr_err("ACCESS NOT OK\n");
> return -EFAULT;
> + }
>
> return __do_cache_op(start, end);
> }
You also need to check what __do_cache_op() returns.
Regards,
Michal
More information about the linux-arm-kernel
mailing list