[v10, 00/10] riscv: support kernel-mode Vector

Andy Chiu andy.chiu at sifive.com
Fri Jan 12 10:46:01 PST 2024


Hi Björn,

On Sat, Jan 13, 2024 at 12:03 AM Andy Chiu <andy.chiu at sifive.com> wrote:
>
> On Fri, Jan 12, 2024 at 11:29 PM Björn Töpel <bjorn at kernel.org> wrote:
> >
> > Andy,
> >
> > > Hello:
> > >
> > > This series was applied to riscv/linux.git (for-next)
> > > by Palmer Dabbelt <palmer at rivosinc.com>:
> > >
> >
> > I'm getting some boot issues with this series applied to riscv/for-next.
> >
> > The full runs (with logs) is here:
> > https://github.com/linux-riscv/linux-riscv/actions/runs/7498706326
> >
> > Typically it fails in two ways:
> > Ubuntu rootfs:
> > --8<--
> > [ 4.346414] (sd-gens)[68]: Failed to extract file name from '': Invalid argument
> > [ 4.390832] systemd[1]: Failed to fork off sandboxing environment for executing generators: Protocol error
> > [ESC[0;1;31m!!!!!!ESC[0m] Failed to start up manager.
> > [ 4.440164] systemd[1]: Freezing execution.
> > --8<--
> >
> > or:
> > --8<--
> > [   14.909912] (sd-gens)[71]: Assertion '!strv_isempty(dirs)' failed at src/shared/exec-util.c:211, function execute_directories(). Aborting.
> > [   15.008480] systemd[1]: Failed to fork off sandboxing environment for executing generators: Protocol error
> > [ESC[0;1;31m!!!!!!ESC[0m] Failed to start up manager.
> > [   15.111989] systemd[1]: Freezing execution.
> > --8<--
> >
> > and Alpine with:
> > --8<--
> > [ 0.036703] Kernel panic - not syncing: kmem_cache_create_usercopy: Failed to create slab 'riscv_vector_ctx'. Error -22
> > [ 0.039195] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc1-defconfig_plain-gdf944704182e #1
> > [ 0.040744] Hardware name: riscv-virtio,qemu (DT)
> > [ 0.041975] Call Trace:
> > [ 0.042813] [<ffffffff800067a4>] dump_backtrace+0x1c/0x24
> > [ 0.044832] [<ffffffff80945980>] show_stack+0x2c/0x38
> > [ 0.045724] [<ffffffff80952214>] dump_stack_lvl+0x3c/0x54
> > [ 0.046841] [<ffffffff80952240>] dump_stack+0x14/0x1c
> > [ 0.047428] [<ffffffff80945e7c>] panic+0x106/0x29e
> > [ 0.047998] [<ffffffff8015f14c>] kmem_cache_create_usercopy+0x20e/0x258
> > [ 0.048786] [<ffffffff80a044dc>] riscv_v_setup_ctx_cache+0x2c/0x3c
> > [ 0.049521] [<ffffffff80a03a48>] arch_task_cache_init+0x10/0x18
> > [ 0.057832] [<ffffffff80a0706c>] fork_init+0x42/0x168
> > [ 0.058737] [<ffffffff80a00d70>] start_kernel+0x6ba/0x73a
> > --8<--
> >
> > The Alpine boot can be fixed with something like:
> > --8<--
> > diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
> > index f9769703fd39..0ac79a9cdba5 100644
> > --- a/arch/riscv/kernel/vector.c
> > +++ b/arch/riscv/kernel/vector.c
> > @@ -53,6 +53,9 @@ int riscv_v_setup_vsize(void)
> >
> >  void __init riscv_v_setup_ctx_cache(void)
> >  {
> > +       if (!riscv_v_vsize)
> > +               return;
> > +
> >         riscv_v_user_cachep = kmem_cache_create_usercopy("riscv_vector_ctx",
> >                                                          riscv_v_vsize, 16, SLAB_PANIC,
> >                                                          0, riscv_v_vsize, NULL);
> > --8<--
>
> Sorry for that! I forgot to do a has_vector() check before creating
> the cache. I am going to send a patch to fix it.
>
> >
> > but with this "fix" in place I still get Ubuntu boot failures. To
> > reproduce the CI locally:
> >
> >   | git fetch https://github.com/linux-riscv/linux-riscv e2aad75b340d65b0be4d1a689db3e10c6ed3f18e
> >   | git checkout FETCH_HEAD
> >   | docker pull ghcr.io/linux-riscv/pw-builder-multi:latest
> >   | docker run -it --volume $PWD:/build/my-linux ghcr.io/linux-riscv/pw-builder-multi:latest bash
> >   | # In container
> >   | bash -l
> >   | mkdir -p /build/kernels/logs
> >   | .github/scripts/series/prepare_tests.sh
> >   | cd /build/my-linux
> >   | .github/scripts/series/kernel_builder.sh rv64 defconfig plain gcc
> >   | .github/scripts/series/test_runner.sh rv64 defconfig plain gcc ubuntu
> >   | .github/scripts/series/test_runner.sh rv64 defconfig plain gcc alpine
>
> It's weird that these errors do not show up in my test environment. I
> will try to reproduce it with the script above.

I just located the boot fail with some experiments. It is related to
the fallback logic in enter_vector_usercopy(). It seems like booting
is successful if we restarted scalar fallback with its original copy
size. It is not affecting preempt_v because preempt_v will never goes
into this branch.

It's late for me. I will figure out the reason and hopefully fix it at
the root cause in the weekend.

--- a/arch/riscv/lib/riscv_v_helpers.c
+++ b/arch/riscv/lib/riscv_v_helpers.c
@@ -30,9 +30,6 @@ asmlinkage int enter_vector_usercopy(void *dst, void
*src, size_t n)
        kernel_vector_end();

        if (remain) {
-               copied = n - remain;
-               dst += copied;
-               src += copied;
                goto fallback;
        }

>
> >
> > Logs in /build/tests/run_test*
> >
> > I'll continue to debug in the meantime.
> >
> >
> > Björn
>
> Thanks,
> Andy

Thanks,
Andy



More information about the linux-riscv mailing list