[BUG] MTD: refcount underflow/use-after-free during rapid SPI NOR unbind/bind cycles

He, Guocai (CN) Guocai.He.CN at windriver.com
Sun Dec 7 23:52:32 PST 2025


got it. Thanks for your detail.

________________________________________
From: Pratyush Yadav <pratyush at kernel.org>
Sent: Saturday, December 6, 2025 1:43 AM
To: Miquel Raynal
Cc: He, Guocai (CN); richard at nod.at; vigneshr at ti.com; linux-mtd at lists.infradead.org; Tudor Ambarus; Michael Walle; Pratyush Yadav
Subject: Re: [BUG] MTD: refcount underflow/use-after-free during rapid SPI NOR unbind/bind cycles

CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.

On Fri, Dec 05 2025, Miquel Raynal wrote:

> Hello,
>
> On 28/11/2025 at 02:24:11 GMT, "He, Guocai (CN)" <Guocai.He.CN at windriver.com> wrote:
>
>> Hi MTD maintainers,
>>
>> I'm reporting a kernel bug in the MTD subsystem that causes a refcount
>> underflow and use-after-free warning during rapid SPI NOR driver
>> unbind/bind operations.
>
> Adding all SPI NOR gurus in the Cc list.

Sounds a bit similar to something we got reports for earlier as well [0]
(though this one could be a similar but different bug). IIRC there was
another patch/thread for it but I can't seem to find it.

I think we should do some sort of locking or ref counting to make sure
no operations are in progress during the driver bind or unbind. I never
got around to poking into it too deeply, and I am not sure I can find
time for it in the near future either.

This is something we should fix, but I am not too worried about it since
I don't think anyone should be rapidly binding and unbinding the driver
in any real workload.

[0] https://lore.kernel.org/linux-mtd/20250325133954.3699535-1-liwei.song.lsong@gmail.com/T/#u

>
> Thanks,
> Miquèl
>
>> ## Environment
>> - Kernel version: 6.6.116-yocto-standard #1  (6.12 have the same issue)
>> - Architecture: SoCFPGA Stratix 10 SoCDK (ARM64  have the same issue)
>> - Device: SPI NOR flash (mt25qu02g, 262144 Kbytes)
>> - SPI controller: ff8d2000.spi.0
>>
>> ## Reproduction Steps
>> 1. In one SSH session, run continuous unbind/bind:
>>    ```bash
>>    while :; do
>>        echo spi0.0 >/sys/bus/spi/devices/spi0.0/driver/unbind
>>        echo spi0.0 >/sys/bus/spi/drivers/spi-nor/bind
>>    done
>>    ```
>>
>> 2. In another SSH session, continuously read MTD info:
>>    ```bash
>>    while :; do cat /proc/mtd; done
>>    ```
>>
>> 3. After running for some time, the following call trace appears:
>>
>> ## Call Trace
>> ```
>> Deleting MTD partitions on "ff8d2000.spi.0":
>> Deleting u-boot MTD partition
>> ------------[ cut here ]------------
>> refcount_t: underflow; use-after-free.
>> WARNING: CPU: 2 PID: 921 at /lib/refcount.c:28 refcount_warn_saturate+0xf4/0x148
>> Modules linked in: sch_fq_codel openvswitch nsh nf_conncount nf_nat fuse nfnetlink
>> CPU: 2 PID: 921 Comm: sh Not tainted 6.6.116-yocto-standard #1
>> Hardware name: SoCFPGA Stratix 10 SoCDK (DT)
>> pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : refcount_warn_saturate+0xf4/0x148
>> lr : refcount_warn_saturate+0xf4/0x148
>> sp : ffff8000829abaf0
>>
>> Call trace:
>>  refcount_warn_saturate+0xf4/0x148
>>  del_mtd_device+0x118/0x140
>>  __del_mtd_partitions+0x94/0xf8
>>  del_mtd_partitions+0x50/0x80
>>  mtd_device_unregister+0x50/0x90
>>  spi_nor_remove+0x2c/0x48
>>  spi_mem_remove+0x28/0x40
>>  spi_remove+0x38/0x60
>>  device_remove+0x54/0x90
>>  device_release_driver_internal+0x1d4/0x238
>>  device_driver_detach+0x20/0x38
>>  unbind_store+0xbc/0xc8
>>  drv_attr_store+0x2c/0x48
>>  sysfs_kf_write+0x4c/0x68
>>  kernfs_fop_write_iter+0x138/0x1f0
>>  vfs_write+0x1b8/0x2e0
>>  ksys_write+0x7c/0x120
>>  __arm64_sys_write+0x24/0x38
>>  invoke_syscall+0x5c/0x138
>>  el0_svc_common.constprop.0+0x48/0xf0
>>  do_el0_svc+0x24/0x38
>> do_el0_svc+0x24/0x38
>>  el0_svc+0x38/0x108
>>  el0t_64_sync_handler+0x120/0x130
>>  el0t_64_sync+0x190/0x198
>> ```
>>
>> ## Analysis
>> This appears to be a race condition? :
>> 1. The unbind operation triggers MTD partition deletion via del_mtd_device()
>> 2. Simultaneously, another process reading /proc/mtd holds references to the MTD device
>> 3. The reference count goes negative, indicating the device was freed while still being accessed
>>
>> ## Additional Information
>> - The issue is reproducible with the above test case
>>
>> Please let me know if you need any additional information or testing.
>>
>> Best regards,
>> Guocai He

--
Regards,
Pratyush Yadav



More information about the linux-mtd mailing list