sam9x5: MTD numbering changed

Boris Brezillon boris.brezillon at free-electrons.com
Fri Nov 3 01:06:55 PDT 2017


On Fri, 3 Nov 2017 00:12:17 +0100
Cyrille Pitchen <cyrille.pitchen at wedev4u.fr> wrote:

> Hi all,
> 
> Le 02/11/2017 à 18:58, Boris Brezillon a écrit :
> > On Thu, 2 Nov 2017 18:36:35 +0100
> > Richard Genoud <richard.genoud at gmail.com> wrote:
> >   
> >> 2017-11-02 16:45 GMT+01:00 Boris Brezillon <boris.brezillon at free-electrons.com>:  
> >>> On Thu, 02 Nov 2017 16:28:13 +0100
> >>> Richard Genoud <richard.genoud at gmail.com> wrote:
> >>>    
> >>>> +Nicolas
> >>>> [its email got lost somehow]
> >>>> Le jeudi 02 novembre 2017 à 16:09 +0100, Boris Brezillon a écrit :    
> >>>>> On Thu, 02 Nov 2017 15:13:47 +0100
> >>>>> Richard Genoud <richard.genoud at gmail.com> wrote:
> >>>>>    
> >>>>>> Le jeudi 02 novembre 2017 à 13:39 +0100, Boris Brezillon a écrit :    
> >>>>>>> +Nicolas
> >>>>>>>
> >>>>>>> Hi Richard,
> >>>>>>>
> >>>>>>> On Thu, 02 Nov 2017 12:17:16 +0100
> >>>>>>> Richard Genoud <richard.genoud at gmail.com> wrote:
> >>>>>>>    
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> I've got an at91sam9g35-cm based board, with 4 partition on the
> >>>>>>>> spi-
> >>>>>>>> dataflas and 5 partitions on the NAND flash.
> >>>>>>>> Before commit 1004a2977bdc ("ARM: dts: at91: Switch to the new
> >>>>>>>> NAND
> >>>>>>>> bindings"),
> >>>>>>>> the NAND partitions were mtd0-4 and spi-dataflash partitions
> >>>>>>>> mtd5-
> >>>>>>>> 8.
> >>>>>>>>
> >>>>>>>> Since commit 1004a2977bdc ("ARM: dts: at91: Switch to the new
> >>>>>>>> NAND
> >>>>>>>> bindings"),
> >>>>>>>> the spi-dataflash partitions are discovered before the NAND
> >>>>>>>> partitions.
> >>>>>>>> So NAND partition became mtd4-8 and spi-dataflash partition
> >>>>>>>> mtd0-3.
> >>>>>>>>
> >>>>>>>> This broke some script that relied on the mtd numbering.
> >>>>>>>>
> >>>>>>>> Updating those scripts to rely on the mtd device name instead
> >>>>>>>> of
> >>>>>>>> number is not really a problem. The real problem is when an
> >>>>>>>> older
> >>>>>>>> script using mtd numbering is run on the new system : I expect
> >>>>>>>> dead
> >>>>>>>> kittens everywhere !    
> >>>>>>>
> >>>>>>> Crap! That was one of the thing I was afraid of when changing the
> >>>>>>> binding: probe order has an impact on ids assigned to MTD devs,
> >>>>>>> and
> >>>>>>> since things are not defined at the same place in the DT, it
> >>>>>>> changes
> >>>>>>> the probe order.
> >>>>>>>    
> >>>>>>>>
> >>>>>>>> So, I'd like to know if there's a way to force the older
> >>>>>>>> numbering
> >>>>>>>> ?    
> >>>>>>>
> >>>>>>> Reverting the patches is probably the easiest way (and it's
> >>>>>>> easily
> >>>>>>> backportable). Now, if we want to switch to the new bindings at
> >>>>>>> some
> >>>>>>> point we'll need to support DT aliases for mtd devs:
> >>>>>>>
> >>>>>>> aliases {
> >>>>>>>         mtdX = &flashpartN;
> >>>>>>>         mtdY = &flashdevM;
> >>>>>>> };
> >>>>>>>
> >>>>>>> The problem with this solution is that it only works if all
> >>>>>>> partitions
> >>>>>>> are defined in the DT, which is not always the case (they can be
> >>>>>>> defined
> >>>>>>> on the command line with mtdparts=).    
> >>>>>>
> >>>>>> Yes, and if they are different from the ones declared in
> >>>>>> at91sam9x5cm.dtsi, they are likely defined with mtdparts=, since
> >>>>>> AFAIK,
> >>>>>> we can't remove a declared partitionning.
> >>>>>>
> >>>>>> I'll disable the ebi and switch back to the old binding in my dts
> >>>>>> for
> >>>>>> now.    
> >>>>>>>    
> >>>>>>>> (I tried poking around the DTS without succès).
> >>>>>>>>
> >>>>>>>> any idea ?    
> >>>>>>>
> >>>>>>> I don't have a perfect solution, but the problem you report
> >>>>>>> clearly
> >>>>>>> shows that relying on MTD numbering is unsafe and should be
> >>>>>>> avoided.    
> >>>>>>
> >>>>>> Clearly, but who doesn't ? ;)
> >>>>>>    
> >>>>>
> >>>>> Just had a lengthy discussion with Alexandre, and he brought a valid
> >>>>> point: there has never been any guarantee on MTD numbering. Not only
> >>>>> the order of DT nodes have an impact on the probe order, but also the
> >>>>> order in which drivers are linked when creating the kernel image. Yes
> >>>>> these things usually don't change, but I'm not sure it's a good idea
> >>>>> to let user-space apps think it will never change in the future.
> >>>>>
> >>>>> How about fixing the scripts you were referring to instead of
> >>>>> reverting
> >>>>> the change? What's the blocking point?    
> >>>>
> >>>> I already fixed the user-space scripts (and actually, they predate the
> >>>> device-tree era, so at that time, relying on MTD numbering wasn't so
> >>>> bad :)).
> >>>> Anyway, here's the blocking point :
> >>>>
> >>>> We have firmwares with an embedded script to update our boards. (more
> >>>> or less a FW + script in a zip file).
> >>>> Those old firmwares are already in the wild and rely on the old MTD
> >>>> numbering (yes, that's bad).
> >>>>
> >>>>
> >>>> So, even if all new scripts are corrected in the new firmwares,
> >>>> downgrading a board with an old firmware/old script will brick the
> >>>> board.
> >>>> So I'll have to detect that and forbid downgrading.    
> >>>
> >>> Not sure what you refer to when you're talking about 'FW', but I'd
> >>> expect it to contain a kernel [+ a dtb] + a rootfs, so if you're using
> >>> an old "FW+update-script", you will still have the old numbering and
> >>> the old script will work just fine, and if you switch to a newer version
> >>> of the "FW+update-script" based on a 4.13 kernel+dtb you will anyway
> >>> have the updated script. Am I missing something?    
> >>
> >> Ok, let's have an example.
> >> New firmware : kernel
> >> 4.14+dtb+rootfs+uboot+uboot_env+at91bootstrap+new update script.
> >> Old firmware : kernel 4.11 + dtb + rootfs + uboot +
> >> uboot_env+at91bootstrap+old update script.
> >>
> >> Let's suppose there's the new firmware on the board (let's call it
> >> 4.14 firmware) :
> >> MTD0 is the dataflash partition 0 (at91bootstrap)
> >> MTD1 is the dataflash part1 (uboot)
> >> MTD2 is the dataflash part2 (ubootenv)
> >> MTD3 is the dataflash part3 (free space)
> >> MTD4 is the NAND part 0 (dtb)
> >> MTD5 is the NAND part 1 (kernel)
> >> MTD6 is the NAND part 2 (UBI) <- in this one, there're 3 ubifs volumes
> >> : rootfs/rootfs2/data
> >>
> >> The thing is, the update script runs in user-space and updates the
> >> kernel, dtb, uboot, bootstrap and flashes the rootfs2.
> >> (on next boot, the rootfs2/rootfs will be atomically swapped)
> >> So, when downgrading, the old update script is executed on the board,
> >> under the 4.14 firmware, thinking that MTD0 is the NAND part0 (dtb),
> >> while it is actually the dataflash part0 (bootstrap).  
> >> => the bootstrap is erased and replaced by the dtb (well, actually, it    
> >> won't even be like that since there will be a mismatch between
> >> nandwrite/flash_part).  
> > 
> > Can you re-generate zip archives for old versions? If you can maybe it
> > would be simpler to just embed the new update script (assuming this
> > script works fine with both old and new FW) in old FW archives.
> >  
> 
> No sure, I've totally understand the issue, if not sorry for the noise!
> 
> Would it be possible to use some udev rules to create some symbolic links
> in /dev based on ATTRS{name}, the mtd partition names?
> 
> something like:
> /dev/mtd-at91bootstrap -> /dev/mtd0
> /dev/mtd-uboot -> /dev/mtd1
> /dev/mtd-ubootenv -> /dev/mtd2
> /dev/mtd-free -> /dev/mtd3
> /dev/mtd-dtb -> /dev/mtd4
> /dev/mtd-kernel -> /dev/mtd5
> /dev/mtd-ubi -> /dev/mtd6

Yep, we should definitely have a udev rule populating
a /dev/mtd/by-name/ directory, just like there is a /dev/disk/by-uuid.

> 
> Then if the new update script only uses symbolic names, it should work with
> both linux 4.11 and 4.14 kernels.
> 
> But I think I didn't totally understand why/how the old update script would
> be run under the new kernel :p

The problem is that the update script in embedded in the archive
containing image(s) to flash, so old archives still have the old
script, which means, if you try to downgrade to an older version from a
new version you're screwed because it will use /dev/mtdX directly
without checking if this is the right partition.



More information about the linux-mtd mailing list