[RFC patch 0/4] riscv: introduce alternative mechanism to apply errata patches

Vincent Chen vincent.chen at sifive.com
Thu Mar 11 15:29:04 GMT 2021


On Thu, Mar 11, 2021 at 11:51 AM Ruinland ChuanTzu Tsai
<ruinland at andestech.com> wrote:
>
>
> > With the emergence of more and more RISC-V CPUs, the request for how to
> > upstream the vendor errata patch may gradually appear. In order to resolve
> > this issue, this patch introduces the alternative mechanism from ARM64 and
> > x86 to enable the kernel to patch code at runtime according to the
> > manufacturer information of the running CPU. The main purpose of this patch
> > set is to propose a framework to apply vendor's errata solutions. Based on
> > this framework, it can be ensured that the errata only applies to the
> > specified CPU cores. Other CPU cores do not be affected. Therefore, some
> > complicated scenarios are unsupported in this patch set, such as patching
> > code to the kernel module, doing relocation in patching code, and
> > heterogeneous CPU topology.
> >
> > In the "alternative" scheme, Users could use the macro ALTERNATIVE to apply
> > an errata to the existing code flow. In the macro ALTERNATIVE, users need
>
> In general I love the idea of fixing errata dynamically, yet in the past,
> vendors sometimes bump into the dilemma on the necessity of using alternative
> framework or just plainly "ifdefine"-ing the CONFIG_ERRATUM_XXX to toggle their
> fixups.
>
> Quoting from the Qualcomm Falkor E1041 dicssusion threads [1][2] :
>
> "Just do it with an #ifdef"
>
> "There's really no need for this to be an alternative. It makes the kernel
> larger and more complex due to all the altinstr data and probing code."
>
> I wonder if there will be similar disputes end up here. And having a conclusion
> on what should be resolved by alternative and what shouldn't might be more
> complicated since we're having a more diversed group of vendors (or "users" who
> can access the errata as you mentioned).
>
> Cordially yours,
> Ruinland
>
> [1] https://lore.kernel.org/kvmarm/20171201112457.GE18083@arm.com/
> [2] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1512957823-18064-2-git-send-email-shankerd@codeaurora.org/#21249629
>

To be honest, We had a similar discussion in our internal group, and
someone voted for the ifdef method. However, in the end, I chose the
alternative scheme because it will not have any performance impact on
other vendor's core.

For the "#ifdef" mechanism, if we want one kernel image that can
include all vendor's errata, all CONFIG_ERRATA_XXX should be able to
enable at the same time. In this case, the code patching may become
the following pattern.

if (IS_ENABLED(CONFIG_ERRATA_XXX ) && unlikely(has_errata_xxx))
{
      vendor's errata ......
}
The has_errata_xxx is a global variable and its default value is 0. At
runtime, the customized function provided by the vendor should
properly set has_errata_xxx as 1 for the specified CPU cores. Because
the default value of has_errata_xxx for certain cores is 0, It is OK
for all vendor's to enable the CONFIG_ERRATA_XXX. However, this
implementation will cause other vendors to suffer some performance
impact. I used sifive errata cip 1200 as an example:

+        if (IS_ENABLED(CONFIG_ERRATA_SIFIVE_CIP_1200) &&
unlikely(has_errata_sifive_cip_1200)) {
+                asm("sfence.vma\n");
+        } else
+               local_flush_tlb_page(addr);

In this case, except for the particular sifive cores, all riscv cores
will execute local_flush_tlb_page(addr) instead of sfence.vma. The
corresponding assembly code is listed below.

        if (IS_ENABLED(CONFIG_ERRATA_SIFIVE_CIP_1200) &&
unlikely(has_errata_sifive_cip_1200)) {
ffffffe000007416:       012dd797                auipc   a5,0x12dd
ffffffe00000741a:       cfa7c783                lbu     a5,-774(a5) #
ffffffe0012e4110 <has_errata_sifive_cip_1200>
ffffffe00000741e:       e7e9                    bnez
a5,ffffffe0000074e8 <do_page_fault+0x33c>
local_flush_tlb_page():
/home/vincentchen/work/64bit/linux/./arch/riscv/include/asm/tlbflush.h:22
ffffffe000007420:       12098073                sfence.vma      s3

According to the objdump results, before executing
local_flush_tlb_page(), other vendors' cores need to execute three
additional instructions (from 0xffffffe000007416 to
0xffffffe00000741e) due to this errata. The performance impact from
these 3 instructions may be slight. ( 1. The branch predictor may
always do the correct prediction at 0xffffffe00000741e. 2. If this
errata is inserted into a hot spot path, has_errata_sifive_cip_1200
may be stored in the L2 cache.) However, I am still not sure whether
each vendor is willing to suffer the performance impact. Besides, I am
also worried that if we have a lot of errata, the impact on
performance will become significant. Therefore, I finally chose the
alternative scheme.

However, if all vendors are willing to suffer the performance impact
from other vendor's errata, I can change the framework to "#ifdef"
scheme. After all, it may need to take a lot of time to accumulate a
lot of errata to make the performance impact obvious.



More information about the linux-riscv mailing list