vector status when vlen doesn't match

Palmer Dabbelt palmer at dabbelt.com
Wed Feb 28 13:56:51 PST 2024


On Wed, 28 Feb 2024 08:02:47 PST (-0800), Conor Dooley wrote:
> Yo Andy,
>
> I was wondering if you could clarify something that came up today on
> IRC.
>
> When we enable vector we check vlenb and if it doesn't match the kernel
> is suppose to not support vector. We do this in a few places, the first
> is in riscv_fill_hwcap():
>
> 	if (elf_hwcap & COMPAT_HWCAP_ISA_V) {
> 		riscv_v_setup_vsize();
> 		/*
> 		 * ISA string in device tree might have 'v' flag, but
> 		 * CONFIG_RISCV_ISA_V is disabled in kernel.
> 		 * Clear V flag in elf_hwcap if CONFIG_RISCV_ISA_V is disabled.
> 		 */
> 		if (!IS_ENABLED(CONFIG_RISCV_ISA_V))
> 			elf_hwcap &= ~COMPAT_HWCAP_ISA_V;
> 	}
>
> Why does this riscv_v_setup_vsize() failing have no impact? Because this
> is called only once on the boot CPU, right? I feel like it deserves a
> comment as to why.

Unless I'm missing something, that first call can't fail: there's no 
previous riscv_v_size set, so there's nothing to trigger the failure.

Might warrant a comment, though...

> But the main thing I was wondering about was the other time it is called,
> in smp_callin():
> 	if (has_vector()) {
> 		if (riscv_v_setup_vsize())
> 			elf_hwcap &= ~COMPAT_HWCAP_ISA_V;
> 	}
>
> If this fails, we clear the HWCAP bit, but I am not immediately grasping
> how this affects the rest of the kernel.

I think it's actually a bug: we're not clearing the bit in riscv_isa, so 
this only ends up disabling the V bit in AT_HWCAP and thus doesn't do 
anything in the kernel.

> has_vector() just checks if the extension is supported by all cores on
> the cpu using either an alternative or the extension support bitmap.
> There are lots of sites in the kernel that use has_vector() as the
> gating check (as far as I can tell), so if vlen does not match between
> CPUs should we not be returning an error for has_vector()?
>
> The alternative that has_vector() relies on is patched before the non-boot
> CPUs are enabled, so we cannot modify the result once the non-boot CPUs
> are in their callin functions and detect a vlen mismatch while at at the
> same time using it in smp_calling(), so should this code be changed to
> something along the lines of:
>
> 	if (riscv_has_extension_unlikely(v)) {
> 		if (riscv_v_setup_vsize()) {
> 			elf_hwcap &= ~COMPAT_HWCAP_ISA_V;
> 			riscv_v_vlen_mismatch = true;
> 		}
> 	}
> and then has_vector() becomes something like
> static __always_inline bool has_vector(void)
> {
> 	return riscv_has_extension_unlikely(RISCV_ISA_EXT_v) && unlikely(riscv_v_vlen_mismatch);
> }
>
> Probably there's value to be gained in static branches etc here, but I
> was just trying to explain what I was getting at. Maybe I am missing
> something though? I do remember talking about this back when the vector
> patches were still in review.

I actually don't think that alone fixes it: even if we make has_vector() 
respect the changes from the new CPU (whether we re-apply the 
alternatives or make a more dynamic check), we'd need to make sure the 
code that has already looked at has_vector() doesn't get scheduled on 
one of the shorter-VLENB CPUs.

So maybe the right answer here is to flip things around, and just refuse 
to enable CPUs that come in with a VLENB different than the boot CPU?  
That way we could avoid being broken by systems that behave this way, if 
someone ends up building one then they can figure out how to make it 
work better.

> Thanks,
> Conor.
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv



More information about the linux-riscv mailing list