Random, rare, but reproducible segmentation faults

Aurelien Jarno aurelien at aurel32.net
Fri Jul 10 15:12:50 EDT 2020


Hi Alex,

On 2020-07-10 01:15, Alex Ghiti wrote:
> I have a debian kernel downloaded from here
> https://people.debian.org/~gio/dqib/ that runs using the following qemu
> command:
> 
> qemu-system-riscv64 -machine virt -cpu rv64 -m 1G -device
> virtio-blk-device,drive=hd -drive file=image.qcow2,if=none,id=hd -device
> virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2222-:22 -bios
> ~/wip/lpc/buildroot/build_rv64/images/fw_jump.elf -kernel kernel -initrd
> initrd -object rng-random,filename=/dev/urandom,id=rng -device
> virtio-rng-device,rng=rng -nographic -append "root=/dev/vda1 console=ttyS0"
> 
> First is this kernel version ok to reproduce the bug ? Or should I download
> another image ? I'd like to avoid having to rebuild the kernel myself if
> possible.

Yes, that should do it, it's running kernel 5.7.6 so enough to reproduce
the issue. You just need to increase the memory a bit more (4 to 8GB)
and add more CPU with for example -smp 4.

> Now I would like to reproduce the bug: can you give me instructions on how
> to compile the qt package ?

The following sequence should allow you to build it:
- sudo apt-get update
- sudo apt-get install build-essential
- sudo apt-get build-dep qtbase-opensource-src
- apt-get source qtbase-opensource-src
- cd qtbase-opensource-src-5.14.2+dfsg/
- dpkg-buildpackage -B

Alternatively I can prepare you an image with everything ready.

> Is the page fault address always in the same area ? It might be interesting
> to find some pattern in those addresses, maybe you could also print the
> random offset to try to link both ?

It seems really random to me, with 3 outliers:
0x0000003fe7ef3140
0x0000003fcd16cff0
0x0000003fb9e96170
0x0000003fd3f4a120
0x448173f67cdbc8b0
0x0000003fdfe093f0
0x0000003fdfe093f0
0x0000003fe1d4aa70
0x0000003fc2cfef90
0x0000003fc0f5d050
0x0000003fe1d879d0
0x0000003fe9d3e990
0xf0ef4585be2ae01f
0x00000034484f71b0
0x0000003fde30e960
0x000000156888a430
0x0000003eb8560936
0x0000003fb121a490
0x0000003fb9abddd0
0x0000003fe41fc5d0

> Also print the entire virtual memory
> mapping at the time of the fault (I don't know how to do that) to check what
> the address is close to ?

Yes, I'll try to find a way to do that.

> The 0xd cause implies that the virtual address
> does not exist at all, which is weird, my guess is that the randomization
> "reveals" the bug but that the bug is still there once the randomization is
> disabled.

I have also that feeling. It could even be a userland issue, with the
userland not able to cope with some memory mapping.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien at aurel32.net                 http://www.aurel32.net



More information about the linux-riscv mailing list