[LEDE-DEV] Untangling 4K to 64K EB size JFFS2 migration

Thibaut VARÈNE hacks at slashdirt.org
Tue Nov 14 10:18:55 PST 2017


Summary of the situation:

Situation:
Following c082938, PR#1495 attempts to switch ramips devices from 4K to 64K eraseblocks to speed up flash operations which are currently very slow. Furthermore, mktplinkfw does not support 4k sectors, which means that on Archer ramips devices the current image will always fail to restore the config.

Issue:
On NOR flash devices with jffs2 overlay, running `syupgrade -c` from a system with 4K sectors to flash an image with 64k sectors triggers filesystem corruption after a few reboots

Assumptions:
- we want to preserve config backup when flashing a system with 4K sectors with an image with 64k sectors
- using an intermediary upgrade image before changing blocksize is not acceptable

Observations:
1. when sysupgrade -c is invoked to preserve config files over reflash, it saves the current configuration files into `/sysupgrade.tgz` and appends that data as jffs2 via `mtd -j` option (in /lib/common.sh `default_do_upgrade()`)
2. the resulting jffs2 data has an EB size that matches the settings from the current running kernel, i.e. 4k: jffs2 nodes are written at 4k boundary with cleanmarkers and deadc0de marker also located at 4k boundary.
3. when the system is rebooted after flashing the new image, the jjfs2 driver finds the backup data before the deadc0de marker and complains about the invalid alignment of cleanmarkers, however the data appears to be valid at this point;
4. the preinit job will detect the presence of /sysupgrade.tgz in the resulting overlay, it will extract it and after the boot has completed it will remove `/sysupgrade.tgz` (in `/lib/preinit/80_mount_root` and then `/etc/init.d/done`)
5. when the device is rebooted one more time the jffs2 filesystem gets apparently so badly corrupted that it no longer complete the boot sequence.

Preliminary analysis:
When the system is rebooted after the flash, the initial jffs2 nodes that have been written by `mtd` at step 1. above have a 4k alignment and length. I assume (but I haven’t checked the code) that this sets jffs2 to operate on a filesystem that has 4k nodes, and this conflicts with the real EB size as reported by the kernel which is now 64k. This ends up in filesystem corruption.

Questions:
- To David: can jffs2 be “fixed” to cope with such a situation? Ideally it would preserve the backup data, but if that is not possible it should probably start from a clean slate instead of totally corrupting the partition?
- To LEDE devs: in the latter case, is it okay to break config preservation this one time? Note: 17.01 has 4K sectors enabled since 925e63e

Thanks,
Thibaut

PS: I’m not subscribed


More information about the Lede-dev mailing list