[PATCH AUTOSEL 6.19] ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn()

Sasha Levin sashal at kernel.org
Wed Feb 11 04:30:22 PST 2026


From: Yicong Yang <yang.yicong at picoheart.com>

[ Upstream commit 7cf28b3797a81b616bb7eb3e90cf131afc452919 ]

The device object rescan in acpi_scan_clear_dep_fn() is scheduled on a
system workqueue which is not guaranteed to be finished before entering
userspace. This may cause some key devices to be missing when userspace
init task tries to find them. Two issues observed on RISCV platforms:

 - Kernel panic due to userspace init cannot have an opened
   console.

   The console device scanning is queued by acpi_scan_clear_dep_queue()
   and not finished by the time userspace init process running, thus by
   the time userspace init runs, no console is present.

 - Entering rescue shell due to the lack of root devices (PCIe nvme in
   our case).

   Same reason as above, the PCIe host bridge scanning is queued on
   a system workqueue and finished after init process runs.

The reason is because both devices (console, PCIe host bridge) depend on
riscv-aplic irqchip to serve their interrupts (console's wired interrupt
and PCI's INTx interrupts). In order to keep the dependency, these
devices are scanned and created after initializing riscv-aplic. The
riscv-aplic is initialized in device_initcall() and a device scan work
is queued via acpi_scan_clear_dep_queue(), which is close to the time
userspace init process is run. Since system_dfl_wq is used in
acpi_scan_clear_dep_queue() with no synchronization, the issues will
happen if userspace init runs before these devices are ready.

The solution is to wait for the queued work to complete before entering
userspace init. One possible way would be to use a dedicated workqueue
instead of system_dfl_wq, and explicitly flush it somewhere in the
initcall stage before entering userspace. Another way is to use
async_schedule_dev_nocall() for scanning these devices. It's designed
for asynchronous initialization and will work in the same way as before
because it's using a dedicated unbound workqueue as well, but the kernel
init code calls async_synchronize_full() right before entering userspace
init which will wait for the work to complete.

Compared to a dedicated workqueue, the second approach is simpler
because the async schedule framework takes care of all of the details.
The ACPI code only needs to focus on its job. A dedicated workqueue for
this could also be redundant because some platforms don't need
acpi_scan_clear_dep_queue() for their device scanning.

Signed-off-by: Yicong Yang <yang.yicong at picoheart.com>
[ rjw: Subject adjustment, changelog edits ]
Link: https://patch.msgid.link/20260128132848.93638-1-yang.yicong@picoheart.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki at intel.com>
Signed-off-by: Sasha Levin <sashal at kernel.org>
---

LLM Generated explanations, may be completely bogus:

## Detailed Analysis

### 1. COMMIT MESSAGE ANALYSIS

The commit fixes two concrete, severe boot failures on RISC-V platforms
with ACPI:

1. **Kernel panic** - userspace init cannot open a console because the
   console device hasn't been scanned yet. The device scan work was
   queued via `acpi_scan_clear_dep_queue()` to `system_dfl_wq` (or
   `system_unbound_wq` in stable) and hasn't completed by the time
   userspace init runs.

2. **Boot failure into rescue shell** - root device (PCIe NVMe via PCIe
   host bridge) is missing for the same reason: the scan work is still
   queued and not completed.

Both are caused by a race: the deferred device scan (queued by
`acpi_scan_clear_dep_queue()`) is scheduled on a system workqueue with
**no synchronization barrier** before userspace init starts. Devices
that depend on RISC-V APLIC (interrupt controller) are scanned
asynchronously after APLIC initialization in `device_initcall()`, and if
init runs before the workqueue work completes, critical devices are
missing.

The commit message is well-written by both the author (Yicong Yang) and
was reviewed/edited by the ACPI maintainer (Rafael J. Wysocki), who
signed it off.

### 2. CODE CHANGE ANALYSIS

The change is **small and surgical** (~30 net lines removed):

**Before (old code):**
- A `struct acpi_scan_clear_dep_work` wraps `work_struct` + `acpi_device
  *`
- `acpi_scan_clear_dep_fn()` is a `work_struct` callback that calls
  `acpi_bus_attach()` under `acpi_scan_lock`, then releases the device
  reference and frees the wrapper
- `acpi_scan_clear_dep_queue()` allocates the wrapper via `kmalloc()`,
  initializes the work, and queues it on
  `system_dfl_wq`/`system_unbound_wq`

**After (new code):**
- `acpi_scan_clear_dep_fn()` signature changes to `(void *dev,
  async_cookie_t cookie)` - an `async_func_t` callback
- It uses `to_acpi_device(dev)` directly instead of `container_of` on a
  wrapper struct
- `acpi_scan_clear_dep_queue()` calls `async_schedule_dev_nocall()`
  instead of `queue_work()`
- The `struct acpi_scan_clear_dep_work` wrapper is removed entirely
- No more `kmalloc()` for the wrapper (the async framework handles its
  own allocation internally)

**Why this fixes the bug:** `async_schedule_dev_nocall()` schedules work
on the async framework's dedicated domain (`async_dfl_domain`). The
critical property is that `kernel_init()` in `init/main.c` calls
`async_synchronize_full()` **before** entering userspace (before
`run_init_process()`):

```1569:1642:init/main.c
static int __ref kernel_init(void *unused)
{
        // ...
        kernel_init_freeable();
        /* need to finish all async __init code before freeing the
memory */
        async_synchronize_full();
        // ...
        // <userspace init happens after this point>
```

This guarantees all async-scheduled work (including the device scans)
completes before userspace init starts. The old
`queue_work(system_unbound_wq, ...)` had no such synchronization
barrier.

**Reference counting correctness:** The reference counting is preserved
identically:
- On success: `acpi_scan_clear_dep_fn()` releases the reference via
  `acpi_dev_put(adev)`
- On failure: `acpi_scan_clear_dep_queue()` returns `false`, and the
  caller `acpi_scan_clear_dep()` releases the reference via
  `acpi_dev_put(adev)`

### 3. CLASSIFICATION

This is a **real bug fix** for a **race condition** that causes **kernel
panics and boot failures**. It is not a feature, cleanup, or
optimization.

### 4. SCOPE AND RISK ASSESSMENT

- **Files changed:** 1 (`drivers/acpi/scan.c`)
- **Net lines:** Reduced - removes the wrapper struct, simplifies both
  functions
- **Subsystem:** ACPI scan, a core subsystem
- **Risk:** LOW. The change replaces one deferred scheduling mechanism
  (workqueue) with another (async framework) that has the specific
  property of being synchronized before userspace init. The functional
  behavior of the callback is identical. The async framework is well-
  established and already used extensively in the kernel for device
  probing.
- **Could this break something?** Very unlikely. The
  `async_schedule_dev_nocall()` function uses an unbound workqueue
  internally just like the old code, with the added benefit of the
  synchronization barrier. The only behavior change is that work is
  guaranteed to complete before userspace init, which is strictly
  desirable.

### 5. USER IMPACT

- **Severity:** CRITICAL - kernel panics and inability to boot
- **Affected platforms:** Primarily RISC-V ACPI platforms right now, but
  the underlying race could affect any platform using
  `acpi_dev_clear_dependencies()` (Intel camera IVSC, INT3472, Surface
  devices, ACPI EC, PCI link, GPIO, I2C - 18 different callers)
- **Who benefits:** RISC-V ACPI users are the primary beneficiaries.
  Other platforms could theoretically hit this race too under heavy load
  at boot time, though it's most likely on RISC-V where interrupt
  controller dependency chains are deeper.

### 6. DEPENDENCY CHECK

- **`async_schedule_dev_nocall()`:** Already backported to all active
  stable trees (6.1.y, 6.6.y, 6.12.y) - verified by checking commit
  history
- **`acpi_scan_clear_dep_queue()` / `acpi_scan_clear_dep_fn()`:**
  Present in all active stable trees (introduced before v5.17)
- **`#include <linux/async.h>`:** The new header include is the only
  other change needed
- **Backport adjustment:** Stable trees (6.1.y through 6.14.y) use
  `system_unbound_wq` instead of `system_dfl_wq`, but since the commit
  removes the `queue_work()` call entirely, the diff will need a trivial
  context adjustment (the old `queue_work(system_unbound_wq, ...)` line
  vs `queue_work(system_dfl_wq, ...)`)
- **RISC-V ACPI dependency IDs:** The RSCV0001/RSCV0002 entries are in
  6.12.y but not in 6.6.y or 6.1.y. However, the fix is still valid for
  older trees because the same race exists for Intel camera/IVSC/PCI-
  link devices that use this code path

### 7. STABILITY INDICATORS

- Authored by Yicong Yang (kernel contributor)
- Reviewed and edited by Rafael J. Wysocki (ACPI maintainer, signed off)
- The approach was explicitly chosen over a dedicated workqueue because
  it leverages the existing async synchronization framework which "takes
  care of all the details"
- Real-world tested (the bug was observed on actual RISC-V hardware)

### CONCLUSION

This commit fixes a **critical race condition** causing **kernel panics
and boot failures**. The fix is:
- **Small and surgical** (single file, removes complexity)
- **Obviously correct** (leverages the well-established async
  synchronization barrier before userspace init)
- **Self-contained** (no dependencies beyond `async_schedule_dev_nocall`
  which already exists in all stable trees)
- **Low risk** (functionally equivalent deferred execution, just with
  proper synchronization)
- **High impact** (prevents kernel panics and boot failures on RISC-V
  ACPI platforms; potentially prevents similar races on other platforms
  using the same code path)

The only minor concern is the trivial context difference
(`system_unbound_wq` vs `system_dfl_wq`) for backport to pre-6.19 stable
trees, which requires a trivial adaptation of the diff context but does
not affect the logic (the line is removed, not modified).

**YES**

 drivers/acpi/scan.c | 41 +++++++++++++++--------------------------
 1 file changed, 15 insertions(+), 26 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 416d87f9bd107..b78f6be2f9468 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -5,6 +5,7 @@
 
 #define pr_fmt(fmt) "ACPI: " fmt
 
+#include <linux/async.h>
 #include <linux/module.h>
 #include <linux/init.h>
 #include <linux/slab.h>
@@ -2360,46 +2361,34 @@ static int acpi_dev_get_next_consumer_dev_cb(struct acpi_dep_data *dep, void *da
 	return 0;
 }
 
-struct acpi_scan_clear_dep_work {
-	struct work_struct work;
-	struct acpi_device *adev;
-};
-
-static void acpi_scan_clear_dep_fn(struct work_struct *work)
+static void acpi_scan_clear_dep_fn(void *dev, async_cookie_t cookie)
 {
-	struct acpi_scan_clear_dep_work *cdw;
-
-	cdw = container_of(work, struct acpi_scan_clear_dep_work, work);
+	struct acpi_device *adev = to_acpi_device(dev);
 
 	acpi_scan_lock_acquire();
-	acpi_bus_attach(cdw->adev, (void *)true);
+	acpi_bus_attach(adev, (void *)true);
 	acpi_scan_lock_release();
 
-	acpi_dev_put(cdw->adev);
-	kfree(cdw);
+	acpi_dev_put(adev);
 }
 
 static bool acpi_scan_clear_dep_queue(struct acpi_device *adev)
 {
-	struct acpi_scan_clear_dep_work *cdw;
-
 	if (adev->dep_unmet)
 		return false;
 
-	cdw = kmalloc(sizeof(*cdw), GFP_KERNEL);
-	if (!cdw)
-		return false;
-
-	cdw->adev = adev;
-	INIT_WORK(&cdw->work, acpi_scan_clear_dep_fn);
 	/*
-	 * Since the work function may block on the lock until the entire
-	 * initial enumeration of devices is complete, put it into the unbound
-	 * workqueue.
+	 * Async schedule the deferred acpi_scan_clear_dep_fn() since:
+	 * - acpi_bus_attach() needs to hold acpi_scan_lock which cannot
+	 *   be acquired under acpi_dep_list_lock (held here)
+	 * - the deferred work at boot stage is ensured to be finished
+	 *   before userspace init task by the async_synchronize_full()
+	 *   barrier
+	 *
+	 * Use _nocall variant since it'll return on failure instead of
+	 * run the function synchronously.
 	 */
-	queue_work(system_dfl_wq, &cdw->work);
-
-	return true;
+	return async_schedule_dev_nocall(acpi_scan_clear_dep_fn, &adev->dev);
 }
 
 static void acpi_scan_delete_dep_data(struct acpi_dep_data *dep)
-- 
2.51.0




More information about the linux-riscv mailing list