[PATCH AUTOSEL 6.6 54/66] genirq: Retain disable depth for managed interrupts across CPU hotplug

Sasha Levin sashal at kernel.org
Sun Jun 1 16:37:31 PDT 2025


From: Brian Norris <briannorris at chromium.org>

[ Upstream commit 788019eb559fd0b365f501467ceafce540e377cc ]

Affinity-managed interrupts can be shut down and restarted during CPU
hotunplug/plug. Thereby the interrupt may be left in an unexpected state.
Specifically:

 1. Interrupt is affine to CPU N
 2. disable_irq() -> depth is 1
 3. CPU N goes offline
 4. irq_shutdown() -> depth is set to 1 (again)
 5. CPU N goes online
 6. irq_startup() -> depth is set to 0 (BUG! driver expects that the interrupt
    		     	      	        still disabled)
 7. enable_irq() -> depth underflow / unbalanced enable_irq() warning

This is only a problem for managed interrupts and CPU hotplug, all other
cases like request()/free()/request() truly needs to reset a possibly stale
disable depth value.

Provide a startup function, which takes the disable depth into account, and
invoked it for the managed interrupts in the CPU hotplug path.

This requires to change irq_shutdown() to do a depth increment instead of
setting it to 1, which allows to retain the disable depth, but is harmless
for the other code paths using irq_startup(), which will still reset the
disable depth unconditionally to keep the original correct behaviour.

A kunit tests will be added separately to cover some of these aspects.

[ tglx: Massaged changelog ]

Suggested-by: Thomas Gleixner <tglx at linutronix.de>
Signed-off-by: Brian Norris <briannorris at chromium.org>
Signed-off-by: Thomas Gleixner <tglx at linutronix.de>
Link: https://lore.kernel.org/all/20250514201353.3481400-2-briannorris@chromium.org
Signed-off-by: Sasha Levin <sashal at kernel.org>
---

Based on my analysis of the commit and the current kernel code, here is
my assessment: **YES** This commit should be backported to stable kernel
trees. Here's my extensive analysis: ## Bug Analysis The commit fixes a
critical bug in the interrupt subsystem's CPU hotplug handling that
affects affinity-managed interrupts. The bug manifests as follows: 1.
**Problem Scenario**: When a managed interrupt is disabled (depth=1),
then its CPU goes offline causing `irq_shutdown()`, and later the CPU
comes back online triggering `irq_startup()`, the disable depth
accounting becomes corrupted. 2. **Current Broken Behavior** (from
existing kernel code): - `irq_shutdown()` sets `desc->depth = 1`
unconditionally (line seen in grep output) - `irq_startup()` sets
`desc->depth = 0` unconditionally - This overwrites any previous disable
state, violating the disable/enable balance 3. **User-Visible Impact**:
The bug causes: - Unbalanced `enable_irq()` warnings - Interrupts being
unexpectedly enabled during CPU hotplug - Potential system instability
or incorrect device behavior ## Code Changes Analysis The fix makes
surgical changes to preserve disable depth: ### 1. **irq_shutdown()
Change** (kernel/irq/chip.c): ```c - desc->depth = 1; + /bin /bin.usr-
is-merged /boot /dev /etc /home /init /lib /lib.usr-is-merged /lib64
/lost+found /media /mnt /opt /proc /root /run /sbin /sbin.usr-is-merged
/snap /srv /sys /tmp /usr /var +
amd_display_timing_generator_analysis.md
amd_doorbell_analysis_framework.md
amd_gpu_documentation_backport_analysis.md amd_si_chip_analysis.md
amdgpu_csb_analysis.md amdgpu_mqd_prop_backport_analysis.md
analogix_dp_backport_analysis.md cgroup_rstat_analysis.sh
cgroup_rstat_analysis_guide.md cpuset_comment_fix_backport_analysis.md
csiphy_x1e80100_backport_analysis.md
dcn315_smu_indirect_register_access_analysis.md
drm_dp_helper_dpcd_refactoring_backport_analysis.md
drm_format_helper_generic_8bit_conversion_backport_analysis.md
es8375_codec_driver_backport_analysis.md expected_findings_summary.md
hdmi_read_request_analysis.md kernel_analysis_commands.sh
kselftest_harness_teardown_metadata_backport_analysis.md linux-kernel
mediatek_vcodec_h264_backport_analysis.md
pidfd_open_kselftest_harness_compatibility_analysis.md
qualcomm_camss_bus_type_backport_analysis.md
selftests_harness_libatomic_backport_analysis.md
sphinx_version_compatibility_backport_analysis.md
spi_sh_msiof_grpmask_backport_analysis.md
spi_sh_msiof_sitmdr2_sirmdr2_bitfield_conversion_backport_analysis.md
sta2x11_removal_analysis.md test_unused_param.c test_unused_param.o
ti_bridge_encoder_crtc_refactoring_backport_analysis.md
xe_driver_flr_vf_restriction_analysis.md
xe_firmware_per_gt_backport_analysis.md
xe_sriov_logging_backport_analysis.md Increment disable depth, so that a
managed shutdown on + amd_display_timing_generator_analysis.md
amd_doorbell_analysis_framework.md
amd_gpu_documentation_backport_analysis.md amd_si_chip_analysis.md
amdgpu_csb_analysis.md amdgpu_mqd_prop_backport_analysis.md
analogix_dp_backport_analysis.md cgroup_rstat_analysis.sh
cgroup_rstat_analysis_guide.md cpuset_comment_fix_backport_analysis.md
csiphy_x1e80100_backport_analysis.md
dcn315_smu_indirect_register_access_analysis.md
drm_dp_helper_dpcd_refactoring_backport_analysis.md
drm_format_helper_generic_8bit_conversion_backport_analysis.md
es8375_codec_driver_backport_analysis.md expected_findings_summary.md
hdmi_read_request_analysis.md kernel_analysis_commands.sh
kselftest_harness_teardown_metadata_backport_analysis.md linux-kernel
mediatek_vcodec_h264_backport_analysis.md
pidfd_open_kselftest_harness_compatibility_analysis.md
qualcomm_camss_bus_type_backport_analysis.md
selftests_harness_libatomic_backport_analysis.md
sphinx_version_compatibility_backport_analysis.md
spi_sh_msiof_grpmask_backport_analysis.md
spi_sh_msiof_sitmdr2_sirmdr2_bitfield_conversion_backport_analysis.md
sta2x11_removal_analysis.md test_unused_param.c test_unused_param.o
ti_bridge_encoder_crtc_refactoring_backport_analysis.md
xe_driver_flr_vf_restriction_analysis.md
xe_firmware_per_gt_backport_analysis.md
xe_sriov_logging_backport_analysis.md CPU hotunplug preserves the actual
disabled state when the + amd_display_timing_generator_analysis.md
amd_doorbell_analysis_framework.md
amd_gpu_documentation_backport_analysis.md amd_si_chip_analysis.md
amdgpu_csb_analysis.md amdgpu_mqd_prop_backport_analysis.md
analogix_dp_backport_analysis.md cgroup_rstat_analysis.sh
cgroup_rstat_analysis_guide.md cpuset_comment_fix_backport_analysis.md
csiphy_x1e80100_backport_analysis.md
dcn315_smu_indirect_register_access_analysis.md
drm_dp_helper_dpcd_refactoring_backport_analysis.md
drm_format_helper_generic_8bit_conversion_backport_analysis.md
es8375_codec_driver_backport_analysis.md expected_findings_summary.md
hdmi_read_request_analysis.md kernel_analysis_commands.sh
kselftest_harness_teardown_metadata_backport_analysis.md linux-kernel
mediatek_vcodec_h264_backport_analysis.md
pidfd_open_kselftest_harness_compatibility_analysis.md
qualcomm_camss_bus_type_backport_analysis.md
selftests_harness_libatomic_backport_analysis.md
sphinx_version_compatibility_backport_analysis.md
spi_sh_msiof_grpmask_backport_analysis.md
spi_sh_msiof_sitmdr2_sirmdr2_bitfield_conversion_backport_analysis.md
sta2x11_removal_analysis.md test_unused_param.c test_unused_param.o
ti_bridge_encoder_crtc_refactoring_backport_analysis.md
xe_driver_flr_vf_restriction_analysis.md
xe_firmware_per_gt_backport_analysis.md
xe_sriov_logging_backport_analysis.md CPU comes back online. See
irq_startup_managed(). + linux-kernel/ + desc->depth++; ``` This
preserves the original disable state instead of overwriting it. ### 2.
**New irq_startup_managed() Function**: ```c +void
irq_startup_managed(struct irq_desc *desc) +{ + /bin /bin.usr-is-merged
/boot /dev /etc /home /init /lib /lib.usr-is-merged /lib64 /lost+found
/media /mnt /opt /proc /root /run /sbin /sbin.usr-is-merged /snap /srv
/sys /tmp /usr /var + amd_display_timing_generator_analysis.md
amd_doorbell_analysis_framework.md
amd_gpu_documentation_backport_analysis.md amd_si_chip_analysis.md
amdgpu_csb_analysis.md amdgpu_mqd_prop_backport_analysis.md
analogix_dp_backport_analysis.md cgroup_rstat_analysis.sh
cgroup_rstat_analysis_guide.md cpuset_comment_fix_backport_analysis.md
csiphy_x1e80100_backport_analysis.md
dcn315_smu_indirect_register_access_analysis.md
drm_dp_helper_dpcd_refactoring_backport_analysis.md
drm_format_helper_generic_8bit_conversion_backport_analysis.md
es8375_codec_driver_backport_analysis.md expected_findings_summary.md
hdmi_read_request_analysis.md kernel_analysis_commands.sh
kselftest_harness_teardown_metadata_backport_analysis.md linux-kernel
mediatek_vcodec_h264_backport_analysis.md
pidfd_open_kselftest_harness_compatibility_analysis.md
qualcomm_camss_bus_type_backport_analysis.md
selftests_harness_libatomic_backport_analysis.md
sphinx_version_compatibility_backport_analysis.md
spi_sh_msiof_grpmask_backport_analysis.md
spi_sh_msiof_sitmdr2_sirmdr2_bitfield_conversion_backport_analysis.md
sta2x11_removal_analysis.md test_unused_param.c test_unused_param.o
ti_bridge_encoder_crtc_refactoring_backport_analysis.md
xe_driver_flr_vf_restriction_analysis.md
xe_firmware_per_gt_backport_analysis.md
xe_sriov_logging_backport_analysis.md Only start it up when the disable
depth is 1, so that a disable, +
amd_display_timing_generator_analysis.md
amd_doorbell_analysis_framework.md
amd_gpu_documentation_backport_analysis.md amd_si_chip_analysis.md
amdgpu_csb_analysis.md amdgpu_mqd_prop_backport_analysis.md
analogix_dp_backport_analysis.md cgroup_rstat_analysis.sh
cgroup_rstat_analysis_guide.md cpuset_comment_fix_backport_analysis.md
csiphy_x1e80100_backport_analysis.md
dcn315_smu_indirect_register_access_analysis.md
drm_dp_helper_dpcd_refactoring_backport_analysis.md
drm_format_helper_generic_8bit_conversion_backport_analysis.md
es8375_codec_driver_backport_analysis.md expected_findings_summary.md
hdmi_read_request_analysis.md kernel_analysis_commands.sh
kselftest_harness_teardown_metadata_backport_analysis.md linux-kernel
mediatek_vcodec_h264_backport_analysis.md
pidfd_open_kselftest_harness_compatibility_analysis.md
qualcomm_camss_bus_type_backport_analysis.md
selftests_harness_libatomic_backport_analysis.md
sphinx_version_compatibility_backport_analysis.md
spi_sh_msiof_grpmask_backport_analysis.md
spi_sh_msiof_sitmdr2_sirmdr2_bitfield_conversion_backport_analysis.md
sta2x11_removal_analysis.md test_unused_param.c test_unused_param.o
ti_bridge_encoder_crtc_refactoring_backport_analysis.md
xe_driver_flr_vf_restriction_analysis.md
xe_firmware_per_gt_backport_analysis.md
xe_sriov_logging_backport_analysis.md hotunplug, hotplug sequence does
not end up enabling it during + amd_display_timing_generator_analysis.md
amd_doorbell_analysis_framework.md
amd_gpu_documentation_backport_analysis.md amd_si_chip_analysis.md
amdgpu_csb_analysis.md amdgpu_mqd_prop_backport_analysis.md
analogix_dp_backport_analysis.md cgroup_rstat_analysis.sh
cgroup_rstat_analysis_guide.md cpuset_comment_fix_backport_analysis.md
csiphy_x1e80100_backport_analysis.md
dcn315_smu_indirect_register_access_analysis.md
drm_dp_helper_dpcd_refactoring_backport_analysis.md
drm_format_helper_generic_8bit_conversion_backport_analysis.md
es8375_codec_driver_backport_analysis.md expected_findings_summary.md
hdmi_read_request_analysis.md kernel_analysis_commands.sh
kselftest_harness_teardown_metadata_backport_analysis.md linux-kernel
mediatek_vcodec_h264_backport_analysis.md
pidfd_open_kselftest_harness_compatibility_analysis.md
qualcomm_camss_bus_type_backport_analysis.md
selftests_harness_libatomic_backport_analysis.md
sphinx_version_compatibility_backport_analysis.md
spi_sh_msiof_grpmask_backport_analysis.md
spi_sh_msiof_sitmdr2_sirmdr2_bitfield_conversion_backport_analysis.md
sta2x11_removal_analysis.md test_unused_param.c test_unused_param.o
ti_bridge_encoder_crtc_refactoring_backport_analysis.md
xe_driver_flr_vf_restriction_analysis.md
xe_firmware_per_gt_backport_analysis.md
xe_sriov_logging_backport_analysis.md hotplug unconditionally. + linux-
kernel/ + desc->depth--; + if (!desc->depth) + irq_startup(desc,
IRQ_RESEND, IRQ_START_COND); +} ``` This provides proper depth-aware
startup specifically for managed interrupts. ### 3. **IRQ_STARTUP_ABORT
Case Update**: ```c case IRQ_STARTUP_ABORT: + desc->depth = 1;
irqd_set_managed_shutdown(d); return 0; ``` Ensures consistent depth
handling in abort cases. ### 4. **CPU Hotplug Integration**
(kernel/irq/cpuhotplug.c): ```c - irq_startup(desc, IRQ_RESEND,
IRQ_START_COND); + irq_startup_managed(desc); ``` Uses the new depth-
aware function for managed interrupt restoration. ## Backport
Suitability Assessment **Meets All Stable Criteria:** 1. **Fixes
Important Bug**: ✅ Corrects interrupt enable/disable balance corruption
2. **Small & Contained**: ✅ Minimal changes confined to IRQ subsystem 3.
**Low Regression Risk**: ✅ Only affects managed interrupts during CPU
hotplug 4. **No New Features**: ✅ Pure bug fix, no architectural changes
5. **Critical Subsystem**: ✅ IRQ handling is fundamental to system
stability **Similar Pattern Match**: This closely resembles Similar
Commit #3 and #5 (both marked YES) which also fix managed interrupt CPU
hotplug issues with surgical changes to preserve state across hotplug
events. **Real-World Impact**: The bug can cause: - Server/datacenter
systems using CPU hotplug for power management to experience interrupt
handling issues - Multi-queue device drivers to malfunction after CPU
hotplug cycles - Unbalanced IRQ state leading to system instability The
fix is well-contained, addresses a clear functional bug with user-
visible symptoms, and follows established patterns for stable backports
in the IRQ subsystem.

 kernel/irq/chip.c       | 22 +++++++++++++++++++++-
 kernel/irq/cpuhotplug.c |  2 +-
 kernel/irq/internals.h  |  1 +
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index dc94e0bf2c940..cb8dc8d3d8f76 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -223,6 +223,19 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
 		return IRQ_STARTUP_ABORT;
 	return IRQ_STARTUP_MANAGED;
 }
+
+void irq_startup_managed(struct irq_desc *desc)
+{
+	/*
+	 * Only start it up when the disable depth is 1, so that a disable,
+	 * hotunplug, hotplug sequence does not end up enabling it during
+	 * hotplug unconditionally.
+	 */
+	desc->depth--;
+	if (!desc->depth)
+		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
+}
+
 #else
 static __always_inline int
 __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
@@ -275,6 +288,7 @@ int irq_startup(struct irq_desc *desc, bool resend, bool force)
 			ret = __irq_startup(desc);
 			break;
 		case IRQ_STARTUP_ABORT:
+			desc->depth = 1;
 			irqd_set_managed_shutdown(d);
 			return 0;
 		}
@@ -307,7 +321,13 @@ void irq_shutdown(struct irq_desc *desc)
 {
 	if (irqd_is_started(&desc->irq_data)) {
 		clear_irq_resend(desc);
-		desc->depth = 1;
+		/*
+		 * Increment disable depth, so that a managed shutdown on
+		 * CPU hotunplug preserves the actual disabled state when the
+		 * CPU comes back online. See irq_startup_managed().
+		 */
+		desc->depth++;
+
 		if (desc->irq_data.chip->irq_shutdown) {
 			desc->irq_data.chip->irq_shutdown(&desc->irq_data);
 			irq_state_set_disabled(desc);
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index eb86283901565..20067a655e203 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -219,7 +219,7 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
 		return;
 
 	if (irqd_is_managed_and_shutdown(data))
-		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
+		irq_startup_managed(desc);
 
 	/*
 	 * If the interrupt can only be directed to a single target
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index fbeecc608f54c..0ebdcbc5f6bef 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -87,6 +87,7 @@ extern void __enable_irq(struct irq_desc *desc);
 extern int irq_activate(struct irq_desc *desc);
 extern int irq_activate_and_startup(struct irq_desc *desc, bool resend);
 extern int irq_startup(struct irq_desc *desc, bool resend, bool force);
+extern void irq_startup_managed(struct irq_desc *desc);
 
 extern void irq_shutdown(struct irq_desc *desc);
 extern void irq_shutdown_and_deactivate(struct irq_desc *desc);
-- 
2.39.5




More information about the Linux-mediatek mailing list