Shutdown problem in SMP system happened on Tegra20
bilhuang at nvidia.com
Fri Aug 24 04:23:39 EDT 2012
When doing shutdown on Tegra20/Tegra30, we need to read/write PMIC registers through I2C
to perform the power off sequence. Unfortunately, sometimes we'll fail to shutdown
due to I2C timeout on Tegra20. And the cause of the timeout is due to the CPU which I2C
controller IRQ affined to will have chance to be offlined without migrating all irqs affined
to it, so the following I2C transactions will fail (no any CPU will handle that interrupt
Some snippet of the shutdown codes:
pm_power_off(); /* this is where we send I2C write to shutdown */
In "smp_send_stop()", it will send "IPI_CPU_STOPS" to offline other cpus except
current cpu (smp_processor_id()), however, current cpu will not always be cpu0 at
least at Tegra20, that said for example cpu1 might be the current cpu and cpu0 will
be offlined and this is the case why the I2C transaction will timeout.
For normal case, "disable_nonboot_cpus()" call will disable all other Cpus except
cpu0, that means we won't hit the problem mentioned here since cpu0 will always be
the current cpu in the call "smp_send_stop", but the call to "disable_nonboot_cpus"
will happen only when "CONFIG_PM_SLEEP_SMP" is enabled which is not the case for
Tegra20/Tegra30, we don't support suspend yet so this can't be enabled.
There are two known fix for this, the first one is enable suspend (ARCH_SUSPEND_POSSIBLE)
so the cpu0 will be the only online cpu while doing "machine_shutdown". The second
fix is adding call to "migrate_irqs()" in "ipi_cpu_stop" so all irqs can be migrated to
the active cpu.
Could someone familiar with the ARM SMP design help to answer my two questions?
1. Does it make sense that "smp_processor_id()" could be non-cpu0 in the call
"smp_send_stop()"? In Tegra30 it will always be cpu0 but Tegra20 will be 50-50,
I just can't find the magic.
2. If current cpu is not necessarily be cpu0 in the call "smp_send_stop()", then
does it make sense to add "migrate_irqs()" in "ipi_cpu_stop()"? Or is there any
other fix which makes more sense?
More information about the linux-arm-kernel