[PATCH v1] RISC-V: take text_mutex during alternative patching
Palmer Dabbelt
palmer at dabbelt.com
Wed Feb 22 14:10:28 PST 2023
On Wed, 22 Feb 2023 12:33:47 PST (-0800), heiko at sntech.de wrote:
> Am Mittwoch, 22. Februar 2023, 20:06:56 CET schrieb Guenter Roeck:
>> On 2/22/23 09:43, Heiko Stübner wrote:
>> > Am Mittwoch, 22. Februar 2023, 18:03:55 CET schrieb Heiko Stübner:
>> >> Am Mittwoch, 22. Februar 2023, 17:47:10 CET schrieb Palmer Dabbelt:
>> >>> On Wed, 22 Feb 2023 07:39:11 PST (-0800), heiko at sntech.de wrote:
>> >>>> Am Mittwoch, 22. Februar 2023, 16:31:25 CET schrieb Andrew Jones:
>> >>>>> On Wed, Feb 22, 2023 at 03:27:43PM +0100, Andrew Jones wrote:
>> >>>>>> On Thu, Feb 16, 2023 at 07:33:55AM -0800, Guenter Roeck wrote:
>> >>>>>>> On 2/15/23 14:00, Heiko Stübner wrote:
>> >>>>>>> [ ... ]
>> >>>>>>>>
>> >>>>>>>> So now I've also tested Palmer's for-next at
>> >>>>>>>> commit ec6311919ea6 ("Merge patch series "riscv: Optimize function trace"")
>> >>>>>>>>
>> >>>>>>>> again with the same variants
>> >>>>>>>> - qemu-riscv32 without zbb
>> >>>>>>>> - qemu-riscv32 with zbb
>> >>>>>>>> - qemu-riscv64 without zbb
>> >>>>>>>> - qemu-riscv64 with zbb
>> >>>>>>>>
>> >>>>>>>> And all of them booted fine into a nfs-root (debian for riscv64 and a
>> >>>>>>>> buildroot for riscv32).
>> >>>>>>>>
>> >>>>>>>> I even forced a bug into the zbb code to make sure the patching worked
>> >>>>>>>> correctly (where the kernel failed as expected).
>> >>>>>>>>
>> >>>>>>>> Qemu-version for me was 7.2.50 (v7.2.0-744-g5a3633929a-dirty)
>> >>>>>>>>
>> >>>>>>>> I did try the one from Debian-stable (qemu-5.2) but that was too old and
>> >>>>>>>> didn't support Zbb yet.
>> >>>>>>>>
>> >>>>>>>> One thing of note, the "active" 32bit config I had, somehow didn't produce
>> >>>>>>>> working images and I needed to start a new build using the rv32_defconfig.
>> >>>>>>>>
>> >>>>>>>> So right now, I'm not sure what more to test though.
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> Another example:
>> >>>>>>>
>> >>>>>>> - build defconfig
>> >>>>>>> - run
>> >>>>>>> qemu-system-riscv64 -M virt -m 512M -no-reboot -kernel arch/riscv/boot/Image \
>> >>>>>>> -snapshot -device virtio-blk-device,drive=d0 -drive file=rootfs.ext2,if=none,id=d0,format=raw \
>> >>>>>>> -append "root=/dev/vda console=ttyS0,115200 earlycon=uart8250,mmio,0x10000000,115200" \
>> >>>>>>> -nographic -monitor none
>> >>>>>>>
>> >>>>>>> With CONFIG_RISCV_ISA_ZBB=y, that results in
>> >>>>>>>
>> >>>>>>> [ 0.818263] /dev/root: Can't open blockdev
>> >>>>>>> [ 0.818856] VFS: Cannot open root device "/dev/vda" or unknown-block(0,0): error -6
>> >>>>>>> [ 0.819177] Please append a correct "root=" boot option; here are the available partitions:
>> >>>>>>> [ 0.819808] fe00 16384 vda
>> >>>>>>> [ 0.819944] driver: virtio_blk
>> >>>>>>> [ 0.820534] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
>> >>>>>>> [ 0.821101] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc8-next-20230216-00002-g80332825e240 #4
>> >>>>>>> [ 0.821672] Hardware name: riscv-virtio,qemu (DT)
>> >>>>>>> [ 0.822050] Call Trace:
>> >>>>>>> [ 0.822427] [<ffffffff800053e4>] dump_backtrace+0x1c/0x24
>> >>>>>>> [ 0.822834] [<ffffffff807f90e4>] show_stack+0x2c/0x38
>> >>>>>>> [ 0.823085] [<ffffffff80803aea>] dump_stack_lvl+0x3c/0x54
>> >>>>>>> [ 0.823351] [<ffffffff80803b16>] dump_stack+0x14/0x1c
>> >>>>>>> [ 0.823601] [<ffffffff807f944c>] panic+0x102/0x29e
>> >>>>>>> [ 0.823834] [<ffffffff80a015e2>] mount_block_root+0x18c/0x23e
>> >>>>>>> [ 0.824148] [<ffffffff80a0187c>] mount_root+0x1e8/0x218
>> >>>>>>> [ 0.824398] [<ffffffff80a019ee>] prepare_namespace+0x142/0x184
>> >>>>>>> [ 0.824655] [<ffffffff80a01182>] kernel_init_freeable+0x236/0x25a
>> >>>>>>> [ 0.824934] [<ffffffff80804602>] kernel_init+0x1e/0x10a
>> >>>>>>> [ 0.825201] [<ffffffff80003560>] ret_from_exception+0x0/0x16
>> >>>>>>> [ 0.826180] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
>> >>>>>>>
>> >>>>>>> This works fine if CONFIG_RISCV_ISA_ZBB is not enabled.
>> >>>>>>>
>> >>>>>>> Tested with gcc 11.3, binutils 2.39, qemu v7.2.0 and qemu built from mainline.
>> >>>>>>>
>> >>>>>>
>> >>>>>> Just to +1 this, I get the same result (unable to mount root fs) with
>> >>>>>>
>> >>>>>> $QEMU -cpu rv64,zbb=on \
>> >>>>>> -nographic \
>> >>>>>> -machine virt \
>> >>>>>> -kernel $KERNEL \
>> >>>>>> -append 'root=/dev/vda console=ttyS0' \
>> >>>>>> -drive file=disk.ext4,format=raw,id=hd0 \
>> >>>>>> -device virtio-blk-pci,drive=hd0
>> >>>>>>
>> >>>>>> kernel: latest riscv-linux/for-next (8658db0a4a0f), defconfig
>> >>>>>> gcc: riscv-gnu-toolchain (12.1.0)
>> >>>>>> binutils: riscv-gnu-toolchain (2.39)
>> >>>>>> qemu: latest master (79b677d658d3)
>> >>>>>>
>> >>>>>> Flipping the QEMU cpu zbb property off allows boot to succeed, i.e. it's
>> >>>>>> not necessary to compile out the CONFIG_RISCV_ISA_ZBB code from the
>> >>>>>> kernel, it's just necessary to avoid using it.
>> >>>>>>
>> >>>>>
>> >>>>> Looks like something in the strncmp implementation. Only commenting it
>> >>>>> out allows boot to succeed.
>> >>>>
>> >>>> and interestingly it seems to be something very specific. I.e. my setup is
>> >>>> nfsroot-based (qemu is "just" another board in my boardfarm) and booting
>> >>>> into an nfs-root works quite nicely.
>> >>>>
>> >>>> I guess I need to look into how to get an actual disk-image in there.
>> >>>
>> >>> It looks like Drew isn't using an initrd (and with NFS, presumably you
>> >>> are)? That's probably a big difference as well.
>> >>
>> >> There is no initrd involved in my qemu setup ;-) .
>> >> Just a kernel built from current for-next + defconfig and a nfs-server
>> >> holding a full Debian-riscv64 system.
>> >>
>> >>
>> >> The magic qemu incantation is also pretty standard:
>> >>
>> >> /usr/local/bin/qemu-system-riscv64 -M virt -smp 2 -m 1G -display none \
>> >> -cpu rv64,zbb=true,zbc=true,svpbmt=true,Zicbom=true,Zawrs=true,sscofpmf=true,v=true \
>> >> -serial telnet:localhost:5500,server,nowait -kernel /home/devel/nfs/kernel/riscv64/Image \
>> >> -append earlycon=sbi root=/dev/nfs nfsroot=10.0.2.2:/home/devel/nfs/rootfs-riscv64virt ip=dhcp rw \
>> >> -netdev user,id=n1 -device virtio-net-pci,netdev=n1 -name boardfarm-0,process=boardfarm-vm-0 -daemonize
>> >>
>> >>
>> >> And I also checked that my kernel really lands in the Zbb-code by simply
>> >> adding a "ret" [0] and making sure it breaks vs. runs without it
>> >>
>> >>
>> >> Next I'll try to move my rootfs into a real image-file and switch over to
>> >> the commandline Drew calls, to see if it'll reproduce then.
>> >
>> > With the rootfs in an image I could reproduce the issue now.
>> >
>> > I hacked in a very crude change to find the invocation that acts differently
>> > and voila it's a strncmp of the "/dev/vda" against "/dev/" [0].
>> >
>> > It's really strange that all the other invocations produce correct results.
>> >
>>
>> Not really; the failures are the only comparisons which are expected to
>> return 0 and are not a complete string match.
>>
>> I'd suspect that strncmp(s1, s2, strlen(s2)) will always return
>> a bad result if s1 starts with s2 but is not identical to s2.
>
> The issue seems to come into play when s2 is shorter than the stepsize
> for the bitops-accelerated part (reg-size being 8) here.
>
> I.e.
>
> strncmp("/dev/vda", "/dev/", 5): 1
I'd tried pulling out your failures from the previous message into the
selftests, turns out this one happens to fail for me. From looking at
the code I'd figured it was this fetch-past-nul issue so I'd added some
alignment skewing, but it looks like it happens to trigger for me
without any.
I sent a patch to add the test here
<https://lore.kernel.org/r/20230222220435.10688-1-palmer@rivosinc.com/>
> strncmp("/dev/vda", "/dev", 4): 1
> strncmp("/dev/vda", "/de", 3): 1
> ...
> but
> strncmp("/dev/vda12", "/dev/vda", 8): 0
> strncmp("/dev/vda12", "/dev/vda1", 9): 0
>
> For size-9 this succeeds because the first step is "/dev/vda" and the
> 0byte in "12" is detected and jumps to processing the rest individually.
>
> Duplicating the 0-byte check for s2 [1] - i.e. jumping to the byte-wise
> path when a 0-byte is detected in the second string - solves the issue.
>
> Not sure if there is a more performant solution for this, because we do
> have the length available. I'll think some more about that part.
The only way I could come up with was to check the length and fall back
to the byte routines. That may or may not be faster depending on loop
counts and such, I doubt it matters until we get to proper
uarch-optimized routines.
>
>
> Heiko
>
>
> [1]
> @@ -83,6 +83,8 @@ strncmp_zbb:
> REG_L t1, 0(a1)
> orc.b t3, t0
> bne t3, t5, 2f
> + orc.b t3, t1
> + bne t3, t5, 2f
> addi a0, a0, SZREG
> addi a1, a1, SZREG
> beq t0, t1, 1b
>
>
>
>> > [0]
>> > non-working - with zbb-strncmp:
>> > [ 2.619129] strncmp: comparing /dev/vda <-> mtd, count 3 => -1
>> > [ 2.620451] strncmp: comparing /dev/vda <-> ubi, count 3 => -1
>> > [ 2.621163] strncmp: comparing /dev/vda <-> PARTUUID=, count 9 => -1
>> > [ 2.621703] strncmp: comparing /dev/vda <-> PARTLABEL=, count 10 => -1
>> > ------ below is the difference
>> > [ 2.622255] strncmp: comparing /dev/vda <-> /dev/, count 5 => 1
>> > [ 2.623446] strncmp: comparing /dev/vda <-> /dev/, count 5 => 1
>> > ------ above is the difference
>> > [ 2.627064] strncmp: comparing sysfs <-> ext3, count 4 => 14
>> > [ 2.627646] strncmp: comparing tmpfs <-> ext3, count 4 => 15
>> > [ 2.628476] strncmp: comparing bdev <-> ext3, count 4 => -3
>> > [ 2.628990] strncmp: comparing proc <-> ext3, count 4 => 11
>> > [ 2.629422] strncmp: comparing cgroup <-> ext3, count 4 => -2
>> > [ 2.629856] strncmp: comparing cgroup2 <-> ext3, count 4 => -2
>> > [ 2.630345] strncmp: comparing cpuset <-> ext3, count 4 => -2
>> > [ 2.630785] strncmp: comparing devtmpfs <-> ext3, count 4 => -1
>> > [ 2.631267] strncmp: comparing debugfs <-> ext3, count 4 => -1
>> > [ 2.631696] strncmp: comparing securityfs <-> ext3, count 4 => 1
>> > [ 2.632596] strncmp: comparing sockfs <-> ext3, count 4 => 14
>> > [ 2.633095] strncmp: comparing bpf <-> ext3, count 4 => -3
>> > [ 2.633600] strncmp: comparing pipefs <-> ext3, count 4 => 11
>> > [ 2.634071] strncmp: comparing ramfs <-> ext3, count 4 => 13
>> > [ 2.634501] strncmp: comparing hugetlbfs <-> ext3, count 4 => 1
>> > [ 2.634955] strncmp: comparing rpc_pipefs <-> ext3, count 4 => 1
>> > [ 2.635407] strncmp: comparing devpts <-> ext3, count 4 => -1
>> > [ 2.638788] strncmp: comparing ext3 <-> ext3, count 4 => 0
>> >
>> >
>> > working - with normal strncmp:
>> > [ 2.416438] strncmp: comparing /dev/vda <-> mtd, count 3 => -62
>> > [ 2.417312] strncmp: comparing /dev/vda <-> ubi, count 3 => -70
>> > [ 2.418296] strncmp: comparing /dev/vda <-> PARTUUID=, count 9 => -33
>> > [ 2.418921] strncmp: comparing /dev/vda <-> PARTLABEL=, count 10 => -33
>> > ------ below is the difference
>> > [ 2.419450] strncmp: comparing /dev/vda <-> /dev/, count 5 => 0
>> > [ 2.420698] strncmp: comparing /dev/vda <-> /dev/, count 5 => 0
>> > ------ above is the difference
>> > [ 2.424086] strncmp: comparing sysfs <-> ext3, count 4 => 14
>> > [ 2.424663] strncmp: comparing tmpfs <-> ext3, count 4 => 15
>> > [ 2.425111] strncmp: comparing bdev <-> ext3, count 4 => -3
>> > [ 2.425563] strncmp: comparing proc <-> ext3, count 4 => 11
>> > [ 2.426022] strncmp: comparing cgroup <-> ext3, count 4 => -2
>> > [ 2.426717] strncmp: comparing cgroup2 <-> ext3, count 4 => -2
>> > [ 2.427175] strncmp: comparing cpuset <-> ext3, count 4 => -2
>> > [ 2.427608] strncmp: comparing devtmpfs <-> ext3, count 4 => -1
>> > [ 2.428061] strncmp: comparing debugfs <-> ext3, count 4 => -1
>> > [ 2.428526] strncmp: comparing securityfs <-> ext3, count 4 => 14
>> > [ 2.428985] strncmp: comparing sockfs <-> ext3, count 4 => 14
>> > [ 2.429423] strncmp: comparing bpf <-> ext3, count 4 => -3
>> > [ 2.429863] strncmp: comparing pipefs <-> ext3, count 4 => 11
>> > [ 2.430433] strncmp: comparing ramfs <-> ext3, count 4 => 13
>> > [ 2.431202] strncmp: comparing hugetlbfs <-> ext3, count 4 => 3
>> > [ 2.431721] strncmp: comparing rpc_pipefs <-> ext3, count 4 => 13
>> > [ 2.432222] strncmp: comparing devpts <-> ext3, count 4 => -1
>> > [ 2.432685] strncmp: comparing ext3 <-> ext3, count 4 => 0
>> >
>> >
>> >
>>
>>
More information about the linux-riscv
mailing list