[PATCH] riscv: add copy_mc_to_{kernel,user} support to enable MC fault tolerance
Ruidong Tian
tianruidong at linux.alibaba.com
Thu May 7 23:24:39 PDT 2026
copy_mc_to_kernel() and copy_mc_to_user() are architecture hooks that
let the kernel survive an uncorrectable hardware memory error (e.g. an
uncorrectable ECC fault) raised during the *source* read of a memory
copy. They are the cornerstone of graceful error recovery on every
path that has to duplicate a page whose contents might already be bad:
- COW (copy-on-write): wp_page_copy(), do_cow_fault(),
copy_present_page() in fork, and __wp_page_copy_user() all
route their per-page copy through copy_mc_user_highpage();
- hugetlb / THP: copy_user_gigantic_page(), copy_subpage(),
__collapse_huge_page_copy() and collapse_file() rely on the
same hook (copy_mc_user_highpage / copy_mc_highpage) to clone
or collapse 2 MiB / 1 GiB folios without tearing the kernel
down on a single bad cacheline;
- page reclaim / migration / KSM: folio_mc_copy(),
ksm_might_need_to_copy(), compaction and NUMA balancing;
- file I/O on byte-addressable memory (DAX, CXL.mem, and the
iov_iter MC helpers) and core-dump writeout.
When any of these callers hits a hardware error during the load, the
copy_mc_* helper returns a non-zero byte count instead of oopsing the
kernel. The caller can then react in whatever way fits the context:
propagate -EFAULT back to the originating syscall, isolate the poisoned
page through memory_failure_queue(), retry on a clean replica, or as
a last resort kill the owning task. The system as a whole keeps
running.
This is also why a new copy routine is required rather than reusing
the existing memcpy(). The C contract for memcpy() is
void *memcpy(void *dst, const void *src, size_t n);
it returns dst unconditionally and has no out-of-band way to tell
the caller whether the copy actually succeeded. MC-aware callers
need exactly that signal - a single "did the hardware raise an
exception during this copy or not" bit - so the API has to be
unsigned long memcpy_mc(void *dst, const void *src, size_t n);
where the return value serves as the error indicator (0 on success,
non-zero when a load faulted). The fact that the non-zero value
happens to be the remaining byte count is just a useful implementation
detail for optional follow-up work such as poisoning the exact
sub-range; the essential point is that a successful copy and a
failed copy can no longer be distinguished from the outside with
memcpy()'s void-pointer return, so a new function is unavoidable.
RISC-V previously did not provide either of the copy_mc_* hooks, so
the generic fallback in <linux/uaccess.h> was used:
static inline unsigned long
copy_mc_to_kernel(void *dst, const void *src, size_t cnt)
{
memcpy(dst, src, cnt);
return 0;
}
That fallback has no exception-table entry on the load side, so an
access to poisoned memory (reported through the RAS/AIA path) takes
the kernel down just like any other unhandled fault - defeating the
whole point of the copy_mc_* API and leaving every COW / hugetlb /
THP-collapse / migration path above exposed on RISC-V. A native
implementation that actually stops on the faulting load and signals
the error through its return value is therefore required.
A word on the exception-table entry type used by this patch: x86 and
arm64 both carry dedicated "MC-safe" flavours in their extable
infrastructure. x86 defines EX_TYPE_DEFAULT_MCE_SAFE and
EX_TYPE_FAULT_MCE_SAFE (see arch/x86/include/asm/extable_fixup_types.h
and the fixups in arch/x86/lib/copy_mc_64.S), while arm64 ends up
reusing EX_TYPE_KACCESS_ERR_ZERO for the same purpose. Tempting as
it is to mirror that and introduce a new RISC-V-specific type, it
turns out to buy very little in practice: inspecting
arch/x86/mm/extable.c shows that EX_TYPE_DEFAULT_MCE_SAFE shares its
handler with EX_TYPE_DEFAULT, and EX_TYPE_FAULT_MCE_SAFE shares its
handler with EX_TYPE_FAULT - in every case the fixup simply redirects
PC to the fixup label and lets the caller return. The tag mostly
serves as documentation. Because the fix-up behaviour we need is
identical to that of a plain extable entry, this patch keeps things
simple and uses the existing _asm_extable helper rather than growing
a new EX_TYPE_* constant; if a future RAS integration ever needs to
discriminate MC-safe sites from ordinary ones, the tag can be added
later without touching any call site.
Implement it by factoring the existing hand-written memcpy into a
shared template and reusing it for the MC variant:
- memcpy_template.S
The whole body of the original memcpy.S is moved here verbatim,
with every load/store expressed through six parametric macros:
LOAD_B / STORE_B, LOAD_W / STORE_W and LOAD_REG / STORE_REG.
The template is #include'd into a SYM_FUNC_START/END wrapper by
the caller, which also supplies the macro definitions.
- memcpy.S
Now only defines plain lb/sb, lw/sw and REG_L/REG_S macros and
includes the template. The generated code for __memcpy() is
byte-for-byte equivalent to the previous open-coded version, so
the hot path for regular kernel memcpy is unchanged.
- memcpy_mc.S (new)
Defines the same macros, but every *load* is wrapped in a "fixup"
macro that emits an _asm_extable entry pointing at a local label
6:. On the happy path __memcpy_mc() returns 0; on a hardware
error the exception handler jumps to label 6:, which returns a
non-zero value (the still-outstanding byte count held
in a2) to flag the failure to the caller.
Signed-off-by: Ruidong Tian <tianruidong at linux.alibaba.com>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/uaccess.h | 25 +++++++
arch/riscv/lib/Makefile | 2 +
arch/riscv/lib/memcpy.S | 113 ++++++----------------------
arch/riscv/lib/memcpy_mc.S | 57 ++++++++++++++
arch/riscv/lib/memcpy_template.S | 124 +++++++++++++++++++++++++++++++
6 files changed, 233 insertions(+), 89 deletions(-)
create mode 100644 arch/riscv/lib/memcpy_mc.S
create mode 100644 arch/riscv/lib/memcpy_template.S
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 9fb053839feb..83b4d313fafd 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -25,6 +25,7 @@ config RISCV
select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
select ARCH_HAS_BINFMT_FLAT
select ARCH_HAS_CURRENT_STACK_POINTER
+ select ARCH_HAS_COPY_MC
select ARCH_HAS_DEBUG_VIRTUAL if MMU
select ARCH_HAS_DEBUG_VM_PGTABLE
select ARCH_HAS_DEBUG_WX
diff --git a/arch/riscv/include/asm/uaccess.h b/arch/riscv/include/asm/uaccess.h
index 11c9886c3b70..0d93fff8329a 100644
--- a/arch/riscv/include/asm/uaccess.h
+++ b/arch/riscv/include/asm/uaccess.h
@@ -487,6 +487,31 @@ static inline void user_access_restore(unsigned long enabled) { }
if (__asm_copy_from_user_sum_enabled(_dst, _src, _len)) \
goto label;
+#ifdef CONFIG_ARCH_HAS_COPY_MC
+unsigned long memcpy_mc(void *dst, const void *src, unsigned long n);
+/**
+ * copy_mc_to_kernel/user - memory copy that handles source exceptions
+ *
+ * @to: destination address
+ * @from: source address
+ * @size: number of bytes to copy
+ *
+ * Return 0 for success, or bytes not copied.
+ */
+static inline unsigned long __must_check
+copy_mc_to_kernel(void *to, const void *from, unsigned long size)
+{
+ return memcpy_mc(to, from, size);
+}
+#define copy_mc_to_kernel copy_mc_to_kernel
+
+static inline unsigned long __must_check
+copy_mc_to_user(void __user *to, const void *from, unsigned long size)
+{
+ return raw_copy_to_user(to, from, size);
+}
+#endif
+
#else /* CONFIG_MMU */
#include <asm-generic/uaccess.h>
#endif /* CONFIG_MMU */
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 6f767b2a349d..2cad46615640 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -20,3 +20,5 @@ lib-$(CONFIG_64BIT) += tishift.o
lib-$(CONFIG_RISCV_ISA_ZICBOZ) += clear_page.o
obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
lib-$(CONFIG_RISCV_ISA_V) += riscv_v_helpers.o
+
+lib-$(CONFIG_ARCH_HAS_COPY_MC) += memcpy_mc.o
diff --git a/arch/riscv/lib/memcpy.S b/arch/riscv/lib/memcpy.S
index 44e009ec5fef..55216111901e 100644
--- a/arch/riscv/lib/memcpy.S
+++ b/arch/riscv/lib/memcpy.S
@@ -7,102 +7,37 @@
#include <asm/asm.h>
/* void *memcpy(void *, const void *, size_t) */
-SYM_FUNC_START(__memcpy)
- move t6, a0 /* Preserve return value */
- /* Defer to byte-oriented copy for small sizes */
- sltiu a3, a2, 128
- bnez a3, 4f
- /* Use word-oriented copy only if low-order bits match */
- andi a3, t6, SZREG-1
- andi a4, a1, SZREG-1
- bne a3, a4, 4f
+/*
+ * Plain load/store macros for kernel memory copy (no fixup tables).
+ * These are used by copy_template.S which is #include'd below.
+ */
+ .macro LOAD_B reg, offset, ptr
+ lb \reg, \offset(\ptr)
+ .endm
- beqz a3, 2f /* Skip if already aligned */
- /*
- * Round to nearest double word-aligned address
- * greater than or equal to start address
- */
- andi a3, a1, ~(SZREG-1)
- addi a3, a3, SZREG
- /* Handle initial misalignment */
- sub a4, a3, a1
-1:
- lb a5, 0(a1)
- addi a1, a1, 1
- sb a5, 0(t6)
- addi t6, t6, 1
- bltu a1, a3, 1b
- sub a2, a2, a4 /* Update count */
+ .macro STORE_B reg, offset, ptr
+ sb \reg, \offset(\ptr)
+ .endm
-2:
- andi a4, a2, ~((16*SZREG)-1)
- beqz a4, 4f
- add a3, a1, a4
-3:
- REG_L a4, 0(a1)
- REG_L a5, SZREG(a1)
- REG_L a6, 2*SZREG(a1)
- REG_L a7, 3*SZREG(a1)
- REG_L t0, 4*SZREG(a1)
- REG_L t1, 5*SZREG(a1)
- REG_L t2, 6*SZREG(a1)
- REG_L t3, 7*SZREG(a1)
- REG_L t4, 8*SZREG(a1)
- REG_L t5, 9*SZREG(a1)
- REG_S a4, 0(t6)
- REG_S a5, SZREG(t6)
- REG_S a6, 2*SZREG(t6)
- REG_S a7, 3*SZREG(t6)
- REG_S t0, 4*SZREG(t6)
- REG_S t1, 5*SZREG(t6)
- REG_S t2, 6*SZREG(t6)
- REG_S t3, 7*SZREG(t6)
- REG_S t4, 8*SZREG(t6)
- REG_S t5, 9*SZREG(t6)
- REG_L a4, 10*SZREG(a1)
- REG_L a5, 11*SZREG(a1)
- REG_L a6, 12*SZREG(a1)
- REG_L a7, 13*SZREG(a1)
- REG_L t0, 14*SZREG(a1)
- REG_L t1, 15*SZREG(a1)
- addi a1, a1, 16*SZREG
- REG_S a4, 10*SZREG(t6)
- REG_S a5, 11*SZREG(t6)
- REG_S a6, 12*SZREG(t6)
- REG_S a7, 13*SZREG(t6)
- REG_S t0, 14*SZREG(t6)
- REG_S t1, 15*SZREG(t6)
- addi t6, t6, 16*SZREG
- bltu a1, a3, 3b
- andi a2, a2, (16*SZREG)-1 /* Update count */
+ .macro LOAD_REG reg, offset, ptr
+ REG_L \reg, \offset(\ptr)
+ .endm
-4:
- /* Handle trailing misalignment */
- beqz a2, 6f
- add a3, a1, a2
+ .macro STORE_REG reg, offset, ptr
+ REG_S \reg, \offset(\ptr)
+ .endm
- /* Use word-oriented copy if co-aligned to word boundary */
- or a5, a1, t6
- or a5, a5, a3
- andi a5, a5, 3
- bnez a5, 5f
-7:
- lw a4, 0(a1)
- addi a1, a1, 4
- sw a4, 0(t6)
- addi t6, t6, 4
- bltu a1, a3, 7b
+ .macro LOAD_W reg, offset, ptr
+ lw \reg, \offset(\ptr)
+ .endm
- ret
+ .macro STORE_W reg, offset, ptr
+ sw \reg, \offset(\ptr)
+ .endm
-5:
- lb a4, 0(a1)
- addi a1, a1, 1
- sb a4, 0(t6)
- addi t6, t6, 1
- bltu a1, a3, 5b
-6:
+SYM_FUNC_START(__memcpy)
+#include "memcpy_template.S"
ret
SYM_FUNC_END(__memcpy)
SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy)
diff --git a/arch/riscv/lib/memcpy_mc.S b/arch/riscv/lib/memcpy_mc.S
new file mode 100644
index 000000000000..76825cabfc16
--- /dev/null
+++ b/arch/riscv/lib/memcpy_mc.S
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2013 Regents of the University of California
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm/asm-extable.h>
+
+ .macro fixup op reg addr lbl
+100:
+ \op \reg, \addr
+ _asm_extable 100b, \lbl
+ .endm
+
+/* int memcpy_mc(void *, const void *, size_t) */
+
+/*
+ * Load macros with fixup for MC-aware copy: only source (load) operations
+ * are protected by exception table entries, since the destination buffer
+ * is known-good kernel memory. Store operations are plain.
+ */
+ .macro LOAD_B reg, offset, ptr
+ fixup lb \reg, \offset(\ptr), 6f
+ .endm
+
+ .macro STORE_B reg, offset, ptr
+ fixup sb \reg, \offset(\ptr), 6f
+ .endm
+
+ .macro LOAD_REG reg, offset, ptr
+ fixup REG_L \reg, \offset(\ptr), 6f
+ .endm
+
+ .macro STORE_REG reg, offset, ptr
+ fixup REG_S \reg, \offset(\ptr), 6f
+ .endm
+
+ .macro LOAD_W reg, offset, ptr
+ fixup lw \reg, \offset(\ptr), 6f
+ .endm
+
+ .macro STORE_W reg, offset, ptr
+ fixup sw \reg, \offset(\ptr), 6f
+ .endm
+
+SYM_FUNC_START(__memcpy_mc)
+#include "memcpy_template.S"
+ move a0, zero
+ ret
+6:
+ move a0, a2
+ ret
+SYM_FUNC_END(__memcpy_mc)
+SYM_FUNC_ALIAS_WEAK(memcpy_mc, __memcpy_mc)
+SYM_FUNC_ALIAS(__pi_memcpy_mc, __memcpy_mc)
+SYM_FUNC_ALIAS(__pi___memcpy_mc, __memcpy_mc)
diff --git a/arch/riscv/lib/memcpy_template.S b/arch/riscv/lib/memcpy_template.S
new file mode 100644
index 000000000000..6d9a34e652d0
--- /dev/null
+++ b/arch/riscv/lib/memcpy_template.S
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2013 Regents of the University of California
+ *
+ * RISC-V memory copy template. This file is #include'd by memcpy.S and
+ * memcpy_mc.S. The includer must define the following macros before the
+ * #include:
+ *
+ * LOAD_B(reg, offset, ptr) - byte load from source
+ * STORE_B(reg, offset, ptr) - byte store to destination
+ * LOAD_REG(reg, offset, ptr) - SZREG-sized load from source
+ * STORE_REG(reg, offset, ptr) - SZREG-sized store to destination
+ * LOAD_W(reg, offset, ptr) - 32-bit word load from source
+ * STORE_W(reg, offset, ptr) - 32-bit word store to destination
+ *
+ * For MC-aware copies, LOAD_* macros should use fixup with label "6f".
+ * The includer must provide the "6:" fixup handler that returns the
+ * number of bytes not copied in a0.
+ *
+ * Register usage (caller must set a0,a1,a2 per standard calling convention):
+ * a0 (a0) - original destination address (also return value)
+ * a1 (SRC) - source pointer (advanced during copy)
+ * a2 (COUNT) - number of bytes to copy / remaining count
+ * t6 (DST) - working destination pointer
+ */
+
+ move t6, a0 /* Preserve return value */
+
+ /* Defer to byte-oriented copy for small sizes */
+ sltiu a3, a2, 128
+ bnez a3, .Ltail
+
+ /* Use word-oriented copy only if low-order bits match */
+ andi a3, t6, SZREG-1
+ andi a4, a1, SZREG-1
+ bne a3, a4, .Ltail
+
+ beqz a3, .Laligned /* Skip if already aligned */
+ /*
+ * Round to nearest double word-aligned address
+ * greater than or equal to start address
+ */
+ andi a3, a1, ~(SZREG-1)
+ addi a3, a3, SZREG
+ /* Handle initial misalignment */
+ sub a4, a3, a1
+.Llead_copy:
+ LOAD_B a5, 0, a1
+ addi a1, a1, 1
+ STORE_B a5, 0, t6
+ addi t6, t6, 1
+ bltu a1, a3, .Llead_copy
+ sub a2, a2, a4 /* Update count */
+
+.Laligned:
+ andi a4, a2, ~((16*SZREG)-1)
+ beqz a4, .Ltail
+ add a3, a1, a4
+.Lunrolled_loop:
+ LOAD_REG a4, 0, a1
+ LOAD_REG a5, SZREG, a1
+ LOAD_REG a6, 2*SZREG, a1
+ LOAD_REG a7, 3*SZREG, a1
+ LOAD_REG t0, 4*SZREG, a1
+ LOAD_REG t1, 5*SZREG, a1
+ LOAD_REG t2, 6*SZREG, a1
+ LOAD_REG t3, 7*SZREG, a1
+ LOAD_REG t4, 8*SZREG, a1
+ LOAD_REG t5, 9*SZREG, a1
+ STORE_REG a4, 0, t6
+ STORE_REG a5, SZREG, t6
+ STORE_REG a6, 2*SZREG, t6
+ STORE_REG a7, 3*SZREG, t6
+ STORE_REG t0, 4*SZREG, t6
+ STORE_REG t1, 5*SZREG, t6
+ STORE_REG t2, 6*SZREG, t6
+ STORE_REG t3, 7*SZREG, t6
+ STORE_REG t4, 8*SZREG, t6
+ STORE_REG t5, 9*SZREG, t6
+ LOAD_REG a4, 10*SZREG, a1
+ LOAD_REG a5, 11*SZREG, a1
+ LOAD_REG a6, 12*SZREG, a1
+ LOAD_REG a7, 13*SZREG, a1
+ LOAD_REG t0, 14*SZREG, a1
+ LOAD_REG t1, 15*SZREG, a1
+ addi a1, a1, 16*SZREG
+ STORE_REG a4, 10*SZREG, t6
+ STORE_REG a5, 11*SZREG, t6
+ STORE_REG a6, 12*SZREG, t6
+ STORE_REG a7, 13*SZREG, t6
+ STORE_REG t0, 14*SZREG, t6
+ STORE_REG t1, 15*SZREG, t6
+ addi t6, t6, 16*SZREG
+ bltu a1, a3, .Lunrolled_loop
+ andi a2, a2, (16*SZREG)-1 /* Update count */
+
+.Ltail:
+ /* Handle trailing misalignment */
+ beqz a2, .Lcopy_done
+ add a3, a1, a2
+
+ /* Use word-oriented copy if co-aligned to word boundary */
+ or a5, a1, t6
+ or a5, a5, a3
+ andi a5, a5, 3
+ bnez a5, .Lbyte_tail
+
+.Lword_tail:
+ LOAD_W a4, 0, a1
+ addi a1, a1, 4
+ STORE_W a4, 0, t6
+ addi t6, t6, 4
+ bltu a1, a3, .Lword_tail
+ j .Lcopy_done
+
+.Lbyte_tail:
+ LOAD_B a4, 0, a1
+ addi a1, a1, 1
+ STORE_B a4, 0, t6
+ addi t6, t6, 1
+ bltu a1, a3, .Lbyte_tail
+
+.Lcopy_done:
+ /* Fall through to includer's return or fixup code */
--
2.51.2.612.gdc70283dfc
More information about the linux-riscv
mailing list