UBIFS panic during brown-out

Richard Weinberger richard at nod.at
Tue Oct 12 06:36:18 PDT 2021


Kristóf,

----- Ursprüngliche Mail -----
> Von: "Kristof Havasi" <havasiefr at gmail.com>
>> So, are you really talking about brown-out? If so, better talk to you hardware
>> guys.
> 
> Sorry, s/brown-out/power-cut/.

:-)

>>
>> > Does the Auth+Encryption combo increase the chances for corrupting the FS
>> > during brown-out, due to the extra operations?
>>
>> Assuming you actually meant power-cut.
>> Both features are rather new, so there can be still bugs.
> 
> If the logs do not reveal any pointers, we could re-test with disabled enc/auth.
> 
>>
>> > Would you suggest turning on any of the chk_* knobs in the debugfs?
>> > I am not sure that they are helpful, or will just modify the behaviour of the
>> > timings of the system too much.
>>
>> Let's start with logs first.
>>
>> > Would it be a last resort in case the brown-out is detected to switch the FS
>> > into ro mode? Is there any API/ABI apart from the debugfs's ro_error knob to
>> > switch the FS into read-only mode and so trying to prevent file-system
>> > corruption?
>>
>> Not really. You need to make sure that the current NAND command is finished
>> and no new one will be scheduled. Depending on your hardware design this can
>> be a challenge.
> 
> So would a "successful `sync` and ro mode switch" be a safe bet, if
> during the two
> actions no new data is written to the FS?

Kind of. Actually UBIFS should handle this just fine.
Before we consider workarounds, let's give the problem more thought.

>>
>> > Any pointers are welcomed!
>> >
>> > We are on Kernel v5.4.150
>> > We use _both_ UBIFS Authentication and Encryption.
>> > The board is a SAMA5D3 powered embedded device with 1GB NAND flash,
>> > which is not even nearly full (10% used).
>> >
>> > Kernel panic after the last brown-out:
>> > ubi0: scanning is finished
>> > ubi0: attached mtd1 (name "ubivols", size 1023 MiB)
>> > ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
>> > ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
>> > ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
>> > ubi0: good PEBs: 4082, bad PEBs: 13, corrupted PEBs: 0
>> > ubi0: user volume: 5, internal volumes: 1, max. volumes count: 128
>> > ubi0: max/mean erase counter: 118/11, WL threshold: 4096, image
>> > sequence number: 921361235
>> >>ubi0: available PEBs: 0, total reserved PEBs: 4082, PEBs reserved for bad PEB
>> >>handling: 67
>> > ubi0: background thread "ubi_bgt0d" started, PID 617
>> > UBIFS (ubi0:4): Mounting in authenticated mode
>> > UBIFS (ubi0:4): recovery needed
>> >>VFS: Cannot open root device "ubi0:rootfs" or ubi0:rootfs: error -1
>>
>> There is a lot of context missing. Can you please share the full logs?
>> Usually UBIFS prints in detail what went wrong.
> 
> I couldn't see any UBIFS logs over the serial (but the kernel log
> level was also only 1)
> and the application log was also redirected there...
> But here is the full log from the application logs through the power-cut and the
> boot-up
> 
> [Sat Oct 09 14:15:00 2021] pwr outage WARN: Power outage detected ...
> [Sat Oct 09 14:15:00 2021] pwr outage INFO: Low Power mode activated
> [Sat Oct 09 14:15:00 2021] event manager TRACE: Database
> /data/logsystem.db opened successfully
> [Sat Oct 09 14:15:00 2021] event manager TRACE: SQLite prepare :
> SELECT * FROM log_sys WHERE  event_flag=2 OR event_flag=1 ORDER BY
> timestamp ASC LIMIT 1;
> [Sat Oct 09 14:15:00 2021] fnn logger TRACE: Database
> /data/logcustomer.db opened successfully
> [snip] some 10s of lines of application log related to SQL operations
> [Sat Oct 09 14:15:00 2021] fnn logger TRACE: SQLite prepare : SELECT
> MAX(repetition_counter), MAX(record_number), MAX(parent_record_number)
> FROM log_customer WHERE  timestamp > ?1 AND event_id = ?2 AND
> event_sub_id = ?3 AND user_identity = ?4 AND subject_identity = ?5 AND
> message_extensions = ?6 AND peer_identity = ?7;
> [Sat Oct 09 14:15:00 2021] fnn logger TRACE: Stepping:
> SELECT MAX(repetition_counter), MAX(record_number),
> MAX(parent_record_number) FROM log_customer WHERE  timestamp >
> 1633695300 AND event_id = 19005 AND event_sub_id = 1 AND user_identity
> = '' AND subject_identity = '01005e3180a0.eefr2000003006.sm' AND
> message_extensions = '' AND peer_identity = '';
> [Sat Oct 09 14:15:00 2021] fnn logger TRACE: SQL-Step executed successfully
> [Sat Oct 09 14:15:00 2021] fnn logger TRACE: SQLite prepare : UPDATE
> log_customer SET [snip] WHERE record_number = ?21 AND
> repetition_counter = ?22;
> [Sat Oct 09 14:15:01 2021] fnn logger TRACE: Stepping:
> UPDATE log_customer SET repetition_counter = 372, parent_record_number
> = 3031, seconds_index = 246, timestamp = 1633781700, level = 2,
> version = 1, length = 0, device_typ¦RomBOOT
> RomBOOT
> 
> AT91Bootstrap for SMGW 3.8.12-00020-g0f08dbf (Wed Apr 14 18:46:12 CEST 2021)
> 
> NAND: ONFI flash detected
> NAND: Manufacturer ID: 0x2c Chip ID: 0xd3
> NAND: Page Bytes: 4096, Spare Bytes: 224
> NAND: ECC Correctability Bits: 8, ECC Sector Bytes: 512
> NAND: Disable On-Die ECC
> NAND: Initialize PMECC params, cap: 8, sector: 512
> NAND: Initialize UBI...
> UBI: Loading volume-table 0 at PEB 202 with copy flag!
> NAND: Loading UBI volume kernel kernel-spare to @0x22000000
> (0xffffffff bytes length)...
> UBI: Length of Vol-2 updated to 3128 KB
> NAND: Loading UBI volume dtb dtb-spare to @0x21000000 (0xffffffff
> bytes length)...
> UBI: Length of Vol-0 updated to 33 KB
> 
> Booting zImage ......
> 
> Using device tree in place at 0x210001f4
> DT: Injected MAC addresses
> DT: Populated virtual-eprom
> 
> Starting linux kernel ..., machid: 0xffffffff
> 
> Booting Linux on physical CPU 0x0
> Linux version 5.4.109-00033-ga88943c1e68
> (root at runner-tzue45hw-project-205-concurrent-0) (gcc version 6.5.0
> (Linaro GCC 6.5-2018.12)) #1 Wed Apr 14 18:46:17 CEST 2021
> CPU: ARMv7 Processor [410fc051] revision 1 (ARMv7), cr=10c53c7d
> CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> OF: fdt: Machine model: SMGW
> Memory policy: Data cache writeback
> CPU: All CPU(s) started in SVC mode.
> Built 1 zonelists, mobility grouping on.  Total pages: 32511
> Kernel command line: rootfstype=ubifs ubi.mtd=1 root=ubi0:rootfs
> rootflags=auth_hash_name=sha256,auth_key=ubifsauth:0000aaaaffff0000
> Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
> mem auto-init: stack:off, heap alloc:on, heap free:on
> mem auto-init: clearing system memory may take some time...
> Memory: 121196K/131068K available (5120K kernel code, 197K rwdata,
> 956K rodata, 1024K init, 153K bss, 9872K reserved, 0K cma-reserved)
> NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
> random: get_random_bytes called from start_kernel+0x2ec/0x41c with crng_init=0
> master clk is overclocked
> clocksource: pit: mask: 0xfffffff max_cycles: 0xfffffff, max_idle_ns:
> 14479245754 ns
> Console: colour dummy device 80x30
> printk: console [tty0] enabled
> sched_clock: 32 bits at 100 Hz, resolution 10000000ns, wraps every
> 21474836475000000ns
> Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> CPU: Testing write buffer coherency: ok
> Setting up static identity map for 0x20100000 - 0x2010003c
> devtmpfs: initialized
> VFP support v0.3: implementor 41 architecture 2 part 30 variant 5 rev 1
> clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> max_idle_ns: 19112604462750000 ns
> futex hash table entries: 256 (order: -1, 3072 bytes, linear)
> pinctrl core: initialized pinctrl subsystem
> NET: Registered protocol family 16
> DMA: preallocated 256 KiB pool for atomic coherent allocations
> cpuidle: using governor ladder
> gpio-at91 fffff200.gpio: at address (ptrval)
> gpio-at91 fffff400.gpio: at address (ptrval)
> gpio-at91 fffff600.gpio: at address (ptrval)
> gpio-at91 fffff800.gpio: at address (ptrval)
> gpio-at91 fffffa00.gpio: at address (ptrval)
> pinctrl-at91 ahb:apb:pinctrl at fffff200: initialized AT91 pinctrl driver
> atmel_tcb f0010000.timer: IRQ index 1 not found
> atmel_tcb f0010000.timer: IRQ index 2 not found
> atmel_tcb f8014000.timer: IRQ index 1 not found
> atmel_tcb f8014000.timer: IRQ index 2 not found
> at_hdmac ffffe600.dma-controller: Atmel AHB DMA Controller ( cpy set
> slave ), 8 channels
> at_hdmac ffffe800.dma-controller: Atmel AHB DMA Controller ( cpy set
> slave ), 8 channels
> AT91: Detected SoC family: sama5d3
> AT91: Detected SoC: sama5d35, revision 2
> usbcore: registered new interface driver usbfs
> usbcore: registered new interface driver hub
> usbcore: registered new device driver usb
> at91_i2c f0014000.i2c: using dma0chan0 (tx) and dma0chan1 (rx) for DMA transfers
> at91_i2c f0014000.i2c: AT91 i2c bus driver (hw version: 0x402).
> at91_i2c f0018000.i2c: can't get DMA channel, continue without DMA support
> at91_i2c f0018000.i2c: AT91 i2c bus driver (hw version: 0x402).
> clocksource: Switched to clocksource pit
> NET: Registered protocol family 2
> tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
> TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
> TCP bind hash table entries: 1024 (order: 0, 4096 bytes, linear)
> TCP: Hash tables configured (established 1024 bind 1024)
> UDP hash table entries: 256 (order: 0, 4096 bytes, linear)
> UDP-Lite hash table entries: 256 (order: 0, 4096 bytes, linear)
> NET: Registered protocol family 1
> Initialise system trusted keyrings
> workingset: timestamp_bits=30 max_order=15 bucket_order=0
> NET: Registered protocol family 38
> Key type asymmetric registered
> Asymmetric key parser 'x509' registered
> atmel_usart_serial.0.auto: ttyS1 at MMIO 0xf001c000 (irq = 21,
> base_baud = 4125000) is a ATMEL_SERIAL
> atmel_usart_serial.1.auto: ttyS2 at MMIO 0xf0020000 (irq = 22,
> base_baud = 8250000) is a ATMEL_SERIAL
> atmel_usart_serial.2.auto: ttyS3 at MMIO 0xf8020000 (irq = 24,
> base_baud = 4125000) is a ATMEL_SERIAL
> atmel_usart_serial.3.auto: ttyS0 at MMIO 0xffffee00 (irq = 32,
> base_baud = 8250000) is a ATMEL_SERIAL
> printk: console [ttyS0] enabled
> atmel_spi f8008000.spi: DMA TX channel not available, SPI unable to use DMA
> atmel_spi f8008000.spi: Atmel SPI Controller using PIO only
> ksz8863 at 0 enforce active low on chipselect handle
> atmel_spi f8008000.spi: Atmel SPI Controller version 0x213 at
> 0xf8008000 (irq 23)
> libphy: Fixed MDIO Bus: probed
> macb f0028000.ethernet: invalid hw address, using random
> random: fast init done
> libphy: MACB_mii_bus: probed
> Generic PHY fixed-0:00: attached PHY driver [Generic PHY]
> (mii_bus:phy_addr=fixed-0:00, irq=POLL)
> macb f0028000.ethernet eth0: Cadence GEM rev 0x00020119 at 0xf0028000
> irq 40 (12:71:da:58:0b:2d)
> libphy: MACB_mii_bus: probed
> Micrel KSZ8081 or KSZ8091 f802c000.ethernet-ffffffff:00: attached PHY
> driver [Micrel KSZ8081 or KSZ8091]
> (mii_bus:phy_addr=f802c000.ethernet-ffffffff:00, irq=45)
> macb f802c000.ethernet eth1: Cadence MACB rev 0x0001010c at 0xf802c000
> irq 41 (d8:c4:97:1c:9a:d7)
> usbcore: registered new interface driver cdc_ether
> usbcore: registered new interface driver cdc_ncm
> usbcore: registered new interface driver cdc_mbim
> ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> ehci-platform: EHCI generic platform driver
> ehci-atmel: EHCI Atmel driver
> atmel-ehci 700000.ehci: EHCI Host Controller
> atmel-ehci 700000.ehci: new USB bus registered, assigned bus number 1
> atmel-ehci 700000.ehci: irq 43, io mem 0x00700000
> atmel-ehci 700000.ehci: USB 2.0 started, EHCI 1.00
> usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.04
> usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> usb usb1: Product: EHCI Host Controller
> usb usb1: Manufacturer: Linux 5.4.109-00033-ga88943c1e68 ehci_hcd
> usb usb1: SerialNumber: 700000.ehci
> hub 1-0:1.0: USB hub found
> hub 1-0:1.0: 3 ports detected
> ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> ohci-platform: OHCI generic platform driver
> ohci-atmel: OHCI Atmel driver
> at91_ohci 600000.ohci: USB Host Controller
> at91_ohci 600000.ohci: new USB bus registered, assigned bus number 2
> at91_ohci 600000.ohci: irq 43, io mem 0x00600000
> usb usb2: New USB device found, idVendor=1d6b, idProduct=0001, bcdDevice= 5.04
> usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
> usb usb2: Product: USB Host Controller
> usb usb2: Manufacturer: Linux 5.4.109-00033-ga88943c1e68 ohci_hcd
> usb usb2: SerialNumber: at91
> hub 2-0:1.0: USB hub found
> hub 2-0:1.0: 3 ports detected
> usbcore: registered new interface driver cdc_acm
> cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
> usbcore: registered new interface driver cdc_wdm
> usbcore: registered new interface driver usbserial_generic
> usbserial: USB Serial support registered for generic
> usbcore: registered new interface driver option
> usbserial: USB Serial support registered for GSM modem (1-port)
> usbcore: registered new interface driver usb_serial_simple
> usbserial: USB Serial support registered for carelink
> usbserial: USB Serial support registered for zio
> usbserial: USB Serial support registered for funsoft
> usbserial: USB Serial support registered for flashloader
> usbserial: USB Serial support registered for google
> usbserial: USB Serial support registered for libtransistor
> usbserial: USB Serial support registered for vivopay
> usbserial: USB Serial support registered for moto_modem
> usbserial: USB Serial support registered for motorola_tetra
> usbserial: USB Serial support registered for novatel_gps
> usbserial: USB Serial support registered for hp4x
> usbserial: USB Serial support registered for suunto
> usbserial: USB Serial support registered for siemens_mpi
> rtc-rv8803 1-0032: registered as rtc0
> i2c /dev entries driver
> at91-reset fffffe00.rstc: Starting after wakeup
> at91sam9_wdt: enabled (heartbeat=15 sec, nowayout=1)
> atmel_aes f8038000.aes: version: 0x135
> atmel_aes f8038000.aes: Atmel AES - Using dma1chan0, dma1chan1 for DMA transfers
> atmel_sha f8034000.sha: version: 0x410
> atmel_sha f8034000.sha: using dma1chan2 for DMA transfers
> atmel_sha f8034000.sha: Atmel SHA1/SHA256/SHA224/SHA384/SHA512
> atmel_tdes f803c000.tdes: version: 0x701
> atmel_tdes f803c000.tdes: using dma1chan3, dma1chan4 for DMA transfers
> atmel_tdes f803c000.tdes: Atmel DES/TDES
> usbcore: registered new interface driver usbhid
> usbhid: USB HID core driver
> nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd3
> nand: Micron MT29F8G08ABACAWP
> nand: 1024 MiB, SLC, erase size: 256 KiB, page size: 4096, OOB size: 224
> Bad block table found at page 262080, version 0x01
> Bad block table found at page 262016, version 0x01
> nand_read_bbt: bad block at 0x000000dc0000
> nand_read_bbt: bad block at 0x00000e280000
> nand_read_bbt: bad block at 0x000024f40000
> nand_read_bbt: bad block at 0x000026ec0000
> nand_read_bbt: bad block at 0x000027ec0000
> nand_read_bbt: bad block at 0x00002f140000
> nand_read_bbt: bad block at 0x000030ec0000
> nand_read_bbt: bad block at 0x000033b40000
> nand_read_bbt: bad block at 0x000038840000
> 2 fixed-partitions partitions found on MTD device atmel_nand
> Creating 2 MTD partitions on "atmel_nand":
> 0x000000000000-0x000000040000 : "at91bootstrap"
> 0x000000040000-0x000040000000 : "ubivols"
> NET: Registered protocol family 10
> Segment Routing with IPv6
> sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
> NET: Registered protocol family 17
> 8021q: 802.1Q VLAN Support v1.8
> Loading compiled-in X.509 certificates
> Loaded X.509 cert 'EFR: efr.de: a54308f075327fb6c3a6fb687fdabc10ce4c51c6'
> Key type ._fscrypt registered
> Key type .fscrypt registered
> ksz8795-switch spi0.0: active low/falling IRQ
> libphy: dsa slave smi: probed
> ksz8795-switch spi0.0 han (uninitialized): PHY [dsa-0.0:00] driver
> [Micrel KSZ886X Switch]
> ksz8795-switch spi0.0 cls (uninitialized): PHY [dsa-0.0:01] driver
> [Micrel KSZ886X Switch]
> ksz8795-switch spi0.0: Using legacy PHYLIB callbacks. Please migrate to PHYLINK!
> DSA: tree 0 setup
> ubi0: attaching mtd1
> usb 2-3: new full-speed USB device number 2 using at91_ohci
> usb 2-3: New USB device found, idVendor=08e6, idProduct=3437, bcdDevice= 2.01
> usb 2-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> usb 2-3: Product: USB SmartCard Reader
> usb 2-3: Manufacturer: Gemalto
> random: crng init done
> ubi0: scanning is finished
> ubi0: attached mtd1 (name "ubivols", size 1023 MiB)
> ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> ubi0: good PEBs: 4082, bad PEBs: 13, corrupted PEBs: 0
> ubi0: user volume: 5, internal volumes: 1, max. volumes count: 128
> ubi0: max/mean erase counter: 118/11, WL threshold: 4096, image
> sequence number: 921361235
> ubi0: available PEBs: 0, total reserved PEBs: 4082, PEBs reserved for
> bad PEB handling: 67
> ubi0: background thread "ubi_bgt0d" started, PID 617
> rtc-rv8803 1-0032: setting system clock to 2021-10-09T12:16:01 UTC (1633781761)
> UBIFS (ubi0:4): Mounting in authenticated mode
> UBIFS (ubi0:4): recovery needed
> VFS: Cannot open root device "ubi0:rootfs" or ubi0:rootfs: error -1

Hmm. -1 is -EPERM.
So some authentification check fails.

Can you find out which -EPERM in fs/ubifs this is?
E.g. by adding a printk() before the return.
Maybe this gives us a clue.

> Please append a correct "root=" boot option; here are the available partitions:
> Kernel panic - not syncing: VFS: Unable to mount root fs on ubi0:rootfs
> CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.109-00033-ga88943c1e68 #1
> Hardware name: Atmel SAMA5
> [<c010babc>] (unwind_backtrace) from [<c010a8d0>] (show_stack+0x10/0x14)
> [<c010a8d0>] (show_stack) from [<c0569970>] (panic+0xfc/0x2e8)
> [<c0569970>] (panic) from [<c0801598>] (mount_block_root+0x268/0x2a4)
> [<c0801598>] (mount_block_root) from [<c0801674>] (prepare_namespace+0x9c/0x184)
> [<c0801674>] (prepare_namespace) from [<c056b774>] (kernel_init+0x8/0x108)
> [<c056b774>] (kernel_init) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
> Exception stack(0xc7829fb0 to 0xc7829ff8)
> 9fa0:                                     00000000 00000000 00000000 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on
> ubi0:rootfs ]---
> RomBOOT
> RomBOOT
> 
>>
>> 5.4 is not fresh. Can you use a more recent kernel?
> 
> We are tracking the latest patches on 5.4, but will try whether a
> rebase to 5.10 is straight-forward or not.
> Or at least port the minimal changes which are needed to re-create the crash.
> 
> But first, I will try to reproduce the crash with a stripped down
> program, because with the current tests,
> it can take anywhere between 20-500 iterations of power-cuts with the
> full business logic to trigger the crash.
> (5m ON 30s OFF)

Thanks,
//richard



More information about the linux-mtd mailing list