[bug report] mtd: bad block counter inflated when repeatedly marking the same block

Wang Zhaolong wangzhaolong at huaweicloud.com
Mon Sep 1 02:26:45 PDT 2025


Hi all,

I’d like to report a mismatch between bad-block statistics and actual
on-flash state when repeatedly calling MEMSETBADBLOCK on the same
eraseblock.

Summary
- Repeatedly marking the same block bad (e.g., 5 times) makes
   /sys/class/mtd/mtdX/bad_blocks increase by 5.
- After reboot,  the statistical value ture to the correct value of 1.
- So the runtime counter (ecc_stats.badblocks) is inflated.

Repro (with nandsim.ko)

```bash
# ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB
# modprobe nandsim id_bytes=$ID
# ~/mtd-utils/mtd_markbad /dev/mtd1 10 1 # Repeat 5 times
......
# ~/mtd-utils/mtd_markbad /dev/mtd1 10 1

# -- It can be observed that 5 bad blocks will appear in the statistical information.
# cat /sys/class/mtd/mtd1/bad_blocks
5

# -- In fact, we can only scan 1 bad block.
# ubiformat -v /dev/mtd1  | grep "bad eraseblock"
ubiformat: 1 bad eraseblocks found, numbers: 10
```

Root cause analysis (kernel-side)

```
mtd_block_markbad
   mtd->_block_markbad()
     nand_block_markbad
       ret = nand_block_isbad
       return 0; // ret > 0
   mtd->ecc_stats.badblocks++;  // No bad blocks was marked but was counted.
   
Relevant code
- drivers/mtd/nand/raw/nand_base.c:nand_block_markbad()
- drivers/mtd/mtdcore.c:mtd_block_markbad()
```

nand_block_markbad() returns 0 both for “newly marked” and “already bad”.
mtdcore cannot tell whether this call actually added a new bad block,
but still increments ecc_stats.badblocks.

Possible fixes (high level)
- Core-side conservative fix (minimal ABI change):
   * In mtd_block_markbad(), probe _block_isbad(master, ofs) before
     calling _block_markbad(), and (if available) probe again after success.
   * Only increment ecc_stats.badblocks if the state transitioned from
     “good” to “bad”.

- Teach *_block_markbad() to return a distinct positive code for
   “already bad” vs “newly marked”, so the core can increment only on
   “newly marked”.

What I want to know is:
- Would the core-side pre/post _block_isbad check be acceptable as a short-term fix?
- Any objections regarding the extra isbad IO in the markbad path?
- Longer-term, is there interest in an explicit API/return-code semantics
   to differentiate “already bad” vs “newly marked”?

I’m very interested in helping resolve this issue and would be grateful
for any guidance or suggestions.

Best regards,
Wang Zhaolong




More information about the linux-mtd mailing list