[PATCH RFC] riscv: disable local interrupts and stop other CPUs before restart
Samuel Holland
samuel.holland at sifive.com
Mon Mar 16 21:07:20 PDT 2026
Hi Troy,
On 2026-03-16 9:45 PM, Troy Mitchell wrote:
> On Mon Mar 16, 2026 at 9:19 PM CST, Samuel Holland wrote:
>> On 2026-03-16 2:23 AM, Troy Mitchell wrote:
>>> On Thu Mar 12, 2026 at 11:05 AM CST, Vivian Wang wrote:
>>>> On 3/11/26 10:51, Troy Mitchell wrote:
>>>>> Currently, the RISC-V implementation of machine_restart() directly calls
>>>>> do_kernel_restart() without disabling local interrupts or stopping other
>>>>> CPUs. This missing architectural setup causes fatal issues for systems
>>>>> that rely on external peripherals (e.g., I2C PMICs) to execute the system
>>>>> restart when CONFIG_PREEMPT_RCU is enabled.
>>>>>
>>>>> When a restart handler relies on the I2C subsystem, the I2C core checks
>>>>> i2c_in_atomic_xfer_mode() to decide whether to use the sleepable xfer
>>>>> or the polling atomic_xfer. This check evaluates to true if
>>>>> (!preemptible() || irqs_disabled()).
>>>>>
>>>>> During do_kernel_restart(), the restart handlers are invoked via
>>>>> atomic_notifier_call_chain(), which holds an RCU read lock.
>>>>> The behavior diverges based on the preemption model:
>>>>> 1. Under CONFIG_PREEMPT_VOLUNTARY or CONFIG_PREEMPT_NONE, rcu_read_lock()
>>>>> implicitly disables preemption. preemptible() evaluates to false, and
>>>>> the I2C core correctly routes to the atomic, polling transfer path.
>>>>> 2. Under CONFIG_PREEMPT_RCU, rcu_read_lock() does NOT disable preemption.
>>>>> Since machine_restart() left local interrupts enabled, irqs_disabled()
>>>>> is false, and preempt_count is 0. Consequently, preemptible() evaluates
>>>>> to true.
>>>>>
>>>>> As a result, the I2C core falsely assumes a sleepable context and routes
>>>>> the transfer to the standard master_xfer path. This inevitably triggers a
>>>>> schedule() call while holding the RCU read lock, resulting in a fatal splat:
>>>>> "Voluntary context switch within RCU read-side critical section!" and
>>>>> a system hang.
>>>>>
>>>>> Align RISC-V with other major architectures (e.g., ARM64) by adding
>>>>> local_irq_disable() and smp_send_stop() to machine_restart().
>>>>> - local_irq_disable() guarantees a strict atomic context, forcing sub-
>>>>> systems like I2C to always fall back to polling mode.
>>>>> - smp_send_stop() ensures exclusive hardware access by quiescing other
>>>>> CPUs, preventing them from holding bus locks (e.g., I2C spinlocks)
>>>>> during the final restart phase.
>>>>
>>>> Maybe while we're at it, we can fix the other functions in this file as
>>>> well?
>>> Nice catch. I'll fix other functions in the next version.
>>>>
>>>> I think the reason we ended up with the "unsafe" implementations of the
>>>> reboot/shutdown functions is that on the backend it is usually SBI SRST
>>>> calls, which can protect itself from other CPUs and interrupts. Since on
>>>> K1 we're going to be poking I2C directly, we run into the problem
>>>> described above. So all of these should disable interrupts and stop
>>>> other CPUs before calling the handlers, and can't assume the handlers
>>>> are all SBI SRST.
>>> Yes, we cannot assume that all platforms rely on this.
>>
>> Why isn't K1 using the SBI SRST extension? Resetting the platform from S-mode
>> directly causes problems if you ever want to run another supervisor domain (for
>> example a TEE or EFI runtime services), which may need to clean up before a
>> system reset.
>>
>> As you mention, other platforms use the standard SBI SRST interface, event if
>> they need to poke a PMIC to perform a system reset. Is there something
>> preventing K1 from following this path?
>
> Ideally, yes, resetting the platform should be handled by the firmware (SBI SRST) to
> properly tear down M-mode/TEE states before pulling the plug.
>
> However, for the K1 platform, while the SoC itself can be reset via SBI SRST, this warm
> reset does not propagate to the external PMIC. For instance, if the kernel switches the
> SD card to 1.8V signaling for high-speed mode during runtime, an SoC-only reset leaves
> the PMIC still supplying 1.8V. Upon reboot, the bootrom/early kernel expects the SD card
> to be in the default 3.3V state for initialization, which inevitably leads to a boot hang.
Right, so you would need a driver in M-mode that performs a PMIC-level reset
instead of a SoC-level reset when the SBI SRST function is called. Several
platforms already do this.
> Setting the K1-specific hardware design aside, I believe this patch is fundamentally necessary
> to fix a latent bug in the RISC-V Linux kernel itself, regardless of whether the final
> reset is backed by SBI or not.
>
> The core issue here is the context in which do_kernel_restart() invokes the restart_handler_list.
> It does so via an atomic_notifier_call_chain. If RISC-V enters this phase without disabling local IRQs,
> we are leaving a trap for any generic driver or kernel subsystem invoked during this teardown phase.
>
> Under CONFIG_PREEMPT_RCU, the RCU read lock does not bump the preempt count.
> Without local_irq_disable(), the preemptible() check will unexpectedly evaluate to true.
>
> A quick grep shows that preemptible() is widely relied upon by generic subsystems to
> determine safe execution paths (~16 occurrences in kernel/ and ~11 in drivers/).
> This means generic code implicitly trusts the architecture to set up the correct
> atomic context during a system restart.
I agree with the conceptual issue, that do_kernel_restart() should be called
from an atomic context, but may not be. But there seems to be no practical
problem if the first/only entry in restart_handler_list is sbi_srst_reboot(),
which is why we haven't seen problems on other platforms.
> The I2C PMIC crash we encountered is just one symptom of this missing context. If we don't
> fix this at the architecture level, it leaves the door open for other undiscovered panics.
> Any driver (watchdog, display, network, etc.) that registers a restart handler and internally
> relies on preemptible() to choose between a sleepable or polling path will inevitably trigger
> a schedule(), attempt to acquire a sleeping lock (e.g., a mutex), or call other blocking functions
> while holding the RCU read lock, resulting in a fatal splat.
Are you possibly confusing restart_handler_list and reboot_notifier_list here? A
display or network driver would not register a restart handler. And similarly to
the PMIC case, I would expect a platform that resets by watchdog to have a
M-mode driver for it and still use the SBI SRST extension.
> *Aligning machine_restart() with other architectures* by adding local_irq_disable() and smp_send_stop()
> ensures a deterministic, single-threaded atomic context.
> This protects the Linux teardown sequence from SMP deadlocks and context misidentification,
> well before the control is ever handed over to the firmware or hardware.
>
> Does this perspective make sense?
Yes, I'm not opposed to making this change, because it is conceptually the right
thing to do. I'm only questioning why you are in a position to notice this issue
in the first place: you only see this because Linux is driving the PMIC directly
instead of going through firmware.
Regards,
Samuel
More information about the linux-riscv
mailing list