UBIFS panic during brown-out

Kristof Havasi havasiefr at gmail.com
Tue Oct 12 06:02:20 PDT 2021


Hello Richard,

On Mon, 11 Oct 2021 at 21:02, Richard Weinberger <richard at nod.at> wrote:
>
> Kristóf,
>
> ----- Ursprüngliche Mail -----
> > Von: "Kristof Havasi" <havasiefr at gmail.com>
> > An: "richard" <richard at nod.at>
> > CC: "linux-mtd" <linux-mtd at lists.infradead.org>
> > Gesendet: Montag, 11. Oktober 2021 19:16:01
> > Betreff: UBIFS panic during brown-out
>
> > Dear Richard,
> >
> > we have been in contact regarding another (similar) kernel panic a year ago:
> > http://lists.infradead.org/pipermail/linux-mtd/2020-September/082175.html
> >
> > Now we are stress testing our board for brown-out stability, and we can
> > see reproducible file system corruptions.
> > The culprit was narrowed down to a pending SQLite operation during brown-out,
> > in the business-logic.
> >
> > As far as I understood, UBIFS is extremely robust in such cases, so I would
> > expect a corrupted file as the worst case scenario, not an unbootable system.
> >
> > Am I too "optimistic" about UBIFS's brown-out stability?
>
> A brown-out is something very bad. Electronic components show undefined behavior
> in this phase. UBIFS (and Linux) can nothing do there.
> But both UBI and UBIFS try to be robust against sudden power loss (power-cut).
> E.g. an interrupted erase or program operation.
>
> So, are you really talking about brown-out? If so, better talk to you hardware guys.

Sorry, s/brown-out/power-cut/.

>
> > Does the Auth+Encryption combo increase the chances for corrupting the FS
> > during brown-out, due to the extra operations?
>
> Assuming you actually meant power-cut.
> Both features are rather new, so there can be still bugs.

If the logs do not reveal any pointers, we could re-test with disabled enc/auth.

>
> > Would you suggest turning on any of the chk_* knobs in the debugfs?
> > I am not sure that they are helpful, or will just modify the behaviour of the
> > timings of the system too much.
>
> Let's start with logs first.
>
> > Would it be a last resort in case the brown-out is detected to switch the FS
> > into ro mode? Is there any API/ABI apart from the debugfs's ro_error knob to
> > switch the FS into read-only mode and so trying to prevent file-system
> > corruption?
>
> Not really. You need to make sure that the current NAND command is finished
> and no new one will be scheduled. Depending on your hardware design this can
> be a challenge.

So would a "successful `sync` and ro mode switch" be a safe bet, if
during the two
actions no new data is written to the FS?

>
> > Any pointers are welcomed!
> >
> > We are on Kernel v5.4.150
> > We use _both_ UBIFS Authentication and Encryption.
> > The board is a SAMA5D3 powered embedded device with 1GB NAND flash,
> > which is not even nearly full (10% used).
> >
> > Kernel panic after the last brown-out:
> > ubi0: scanning is finished
> > ubi0: attached mtd1 (name "ubivols", size 1023 MiB)
> > ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
> > ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
> > ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
> > ubi0: good PEBs: 4082, bad PEBs: 13, corrupted PEBs: 0
> > ubi0: user volume: 5, internal volumes: 1, max. volumes count: 128
> > ubi0: max/mean erase counter: 118/11, WL threshold: 4096, image
> > sequence number: 921361235
> >>ubi0: available PEBs: 0, total reserved PEBs: 4082, PEBs reserved for bad PEB
> >>handling: 67
> > ubi0: background thread "ubi_bgt0d" started, PID 617
> > UBIFS (ubi0:4): Mounting in authenticated mode
> > UBIFS (ubi0:4): recovery needed
> >>VFS: Cannot open root device "ubi0:rootfs" or ubi0:rootfs: error -1
>
> There is a lot of context missing. Can you please share the full logs?
> Usually UBIFS prints in detail what went wrong.

I couldn't see any UBIFS logs over the serial (but the kernel log
level was also only 1)
and the application log was also redirected there...
But here is the full log from the application logs through the power-cut and the
boot-up

[Sat Oct 09 14:15:00 2021] pwr outage WARN: Power outage detected ...
[Sat Oct 09 14:15:00 2021] pwr outage INFO: Low Power mode activated
[Sat Oct 09 14:15:00 2021] event manager TRACE: Database
/data/logsystem.db opened successfully
[Sat Oct 09 14:15:00 2021] event manager TRACE: SQLite prepare :
SELECT * FROM log_sys WHERE  event_flag=2 OR event_flag=1 ORDER BY
timestamp ASC LIMIT 1;
[Sat Oct 09 14:15:00 2021] fnn logger TRACE: Database
/data/logcustomer.db opened successfully
[snip] some 10s of lines of application log related to SQL operations
[Sat Oct 09 14:15:00 2021] fnn logger TRACE: SQLite prepare : SELECT
MAX(repetition_counter), MAX(record_number), MAX(parent_record_number)
FROM log_customer WHERE  timestamp > ?1 AND event_id = ?2 AND
event_sub_id = ?3 AND user_identity = ?4 AND subject_identity = ?5 AND
message_extensions = ?6 AND peer_identity = ?7;
[Sat Oct 09 14:15:00 2021] fnn logger TRACE: Stepping:
SELECT MAX(repetition_counter), MAX(record_number),
MAX(parent_record_number) FROM log_customer WHERE  timestamp >
1633695300 AND event_id = 19005 AND event_sub_id = 1 AND user_identity
= '' AND subject_identity = '01005e3180a0.eefr2000003006.sm' AND
message_extensions = '' AND peer_identity = '';
[Sat Oct 09 14:15:00 2021] fnn logger TRACE: SQL-Step executed successfully
[Sat Oct 09 14:15:00 2021] fnn logger TRACE: SQLite prepare : UPDATE
log_customer SET [snip] WHERE record_number = ?21 AND
repetition_counter = ?22;
[Sat Oct 09 14:15:01 2021] fnn logger TRACE: Stepping:
UPDATE log_customer SET repetition_counter = 372, parent_record_number
= 3031, seconds_index = 246, timestamp = 1633781700, level = 2,
version = 1, length = 0, device_typ¦RomBOOT
RomBOOT

AT91Bootstrap for SMGW 3.8.12-00020-g0f08dbf (Wed Apr 14 18:46:12 CEST 2021)

NAND: ONFI flash detected
NAND: Manufacturer ID: 0x2c Chip ID: 0xd3
NAND: Page Bytes: 4096, Spare Bytes: 224
NAND: ECC Correctability Bits: 8, ECC Sector Bytes: 512
NAND: Disable On-Die ECC
NAND: Initialize PMECC params, cap: 8, sector: 512
NAND: Initialize UBI...
UBI: Loading volume-table 0 at PEB 202 with copy flag!
NAND: Loading UBI volume kernel kernel-spare to @0x22000000
(0xffffffff bytes length)...
UBI: Length of Vol-2 updated to 3128 KB
NAND: Loading UBI volume dtb dtb-spare to @0x21000000 (0xffffffff
bytes length)...
UBI: Length of Vol-0 updated to 33 KB

Booting zImage ......

Using device tree in place at 0x210001f4
DT: Injected MAC addresses
DT: Populated virtual-eprom

Starting linux kernel ..., machid: 0xffffffff

Booting Linux on physical CPU 0x0
Linux version 5.4.109-00033-ga88943c1e68
(root at runner-tzue45hw-project-205-concurrent-0) (gcc version 6.5.0
(Linaro GCC 6.5-2018.12)) #1 Wed Apr 14 18:46:17 CEST 2021
CPU: ARMv7 Processor [410fc051] revision 1 (ARMv7), cr=10c53c7d
CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
OF: fdt: Machine model: SMGW
Memory policy: Data cache writeback
CPU: All CPU(s) started in SVC mode.
Built 1 zonelists, mobility grouping on.  Total pages: 32511
Kernel command line: rootfstype=ubifs ubi.mtd=1 root=ubi0:rootfs
rootflags=auth_hash_name=sha256,auth_key=ubifsauth:0000aaaaffff0000
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
mem auto-init: stack:off, heap alloc:on, heap free:on
mem auto-init: clearing system memory may take some time...
Memory: 121196K/131068K available (5120K kernel code, 197K rwdata,
956K rodata, 1024K init, 153K bss, 9872K reserved, 0K cma-reserved)
NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16
random: get_random_bytes called from start_kernel+0x2ec/0x41c with crng_init=0
master clk is overclocked
clocksource: pit: mask: 0xfffffff max_cycles: 0xfffffff, max_idle_ns:
14479245754 ns
Console: colour dummy device 80x30
printk: console [tty0] enabled
sched_clock: 32 bits at 100 Hz, resolution 10000000ns, wraps every
21474836475000000ns
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
CPU: Testing write buffer coherency: ok
Setting up static identity map for 0x20100000 - 0x2010003c
devtmpfs: initialized
VFP support v0.3: implementor 41 architecture 2 part 30 variant 5 rev 1
clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
max_idle_ns: 19112604462750000 ns
futex hash table entries: 256 (order: -1, 3072 bytes, linear)
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
DMA: preallocated 256 KiB pool for atomic coherent allocations
cpuidle: using governor ladder
gpio-at91 fffff200.gpio: at address (ptrval)
gpio-at91 fffff400.gpio: at address (ptrval)
gpio-at91 fffff600.gpio: at address (ptrval)
gpio-at91 fffff800.gpio: at address (ptrval)
gpio-at91 fffffa00.gpio: at address (ptrval)
pinctrl-at91 ahb:apb:pinctrl at fffff200: initialized AT91 pinctrl driver
atmel_tcb f0010000.timer: IRQ index 1 not found
atmel_tcb f0010000.timer: IRQ index 2 not found
atmel_tcb f8014000.timer: IRQ index 1 not found
atmel_tcb f8014000.timer: IRQ index 2 not found
at_hdmac ffffe600.dma-controller: Atmel AHB DMA Controller ( cpy set
slave ), 8 channels
at_hdmac ffffe800.dma-controller: Atmel AHB DMA Controller ( cpy set
slave ), 8 channels
AT91: Detected SoC family: sama5d3
AT91: Detected SoC: sama5d35, revision 2
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
at91_i2c f0014000.i2c: using dma0chan0 (tx) and dma0chan1 (rx) for DMA transfers
at91_i2c f0014000.i2c: AT91 i2c bus driver (hw version: 0x402).
at91_i2c f0018000.i2c: can't get DMA channel, continue without DMA support
at91_i2c f0018000.i2c: AT91 i2c bus driver (hw version: 0x402).
clocksource: Switched to clocksource pit
NET: Registered protocol family 2
tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 4096 bytes, linear)
TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
TCP bind hash table entries: 1024 (order: 0, 4096 bytes, linear)
TCP: Hash tables configured (established 1024 bind 1024)
UDP hash table entries: 256 (order: 0, 4096 bytes, linear)
UDP-Lite hash table entries: 256 (order: 0, 4096 bytes, linear)
NET: Registered protocol family 1
Initialise system trusted keyrings
workingset: timestamp_bits=30 max_order=15 bucket_order=0
NET: Registered protocol family 38
Key type asymmetric registered
Asymmetric key parser 'x509' registered
atmel_usart_serial.0.auto: ttyS1 at MMIO 0xf001c000 (irq = 21,
base_baud = 4125000) is a ATMEL_SERIAL
atmel_usart_serial.1.auto: ttyS2 at MMIO 0xf0020000 (irq = 22,
base_baud = 8250000) is a ATMEL_SERIAL
atmel_usart_serial.2.auto: ttyS3 at MMIO 0xf8020000 (irq = 24,
base_baud = 4125000) is a ATMEL_SERIAL
atmel_usart_serial.3.auto: ttyS0 at MMIO 0xffffee00 (irq = 32,
base_baud = 8250000) is a ATMEL_SERIAL
printk: console [ttyS0] enabled
atmel_spi f8008000.spi: DMA TX channel not available, SPI unable to use DMA
atmel_spi f8008000.spi: Atmel SPI Controller using PIO only
ksz8863 at 0 enforce active low on chipselect handle
atmel_spi f8008000.spi: Atmel SPI Controller version 0x213 at
0xf8008000 (irq 23)
libphy: Fixed MDIO Bus: probed
macb f0028000.ethernet: invalid hw address, using random
random: fast init done
libphy: MACB_mii_bus: probed
Generic PHY fixed-0:00: attached PHY driver [Generic PHY]
(mii_bus:phy_addr=fixed-0:00, irq=POLL)
macb f0028000.ethernet eth0: Cadence GEM rev 0x00020119 at 0xf0028000
irq 40 (12:71:da:58:0b:2d)
libphy: MACB_mii_bus: probed
Micrel KSZ8081 or KSZ8091 f802c000.ethernet-ffffffff:00: attached PHY
driver [Micrel KSZ8081 or KSZ8091]
(mii_bus:phy_addr=f802c000.ethernet-ffffffff:00, irq=45)
macb f802c000.ethernet eth1: Cadence MACB rev 0x0001010c at 0xf802c000
irq 41 (d8:c4:97:1c:9a:d7)
usbcore: registered new interface driver cdc_ether
usbcore: registered new interface driver cdc_ncm
usbcore: registered new interface driver cdc_mbim
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-platform: EHCI generic platform driver
ehci-atmel: EHCI Atmel driver
atmel-ehci 700000.ehci: EHCI Host Controller
atmel-ehci 700000.ehci: new USB bus registered, assigned bus number 1
atmel-ehci 700000.ehci: irq 43, io mem 0x00700000
atmel-ehci 700000.ehci: USB 2.0 started, EHCI 1.00
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.04
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: EHCI Host Controller
usb usb1: Manufacturer: Linux 5.4.109-00033-ga88943c1e68 ehci_hcd
usb usb1: SerialNumber: 700000.ehci
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 3 ports detected
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci-platform: OHCI generic platform driver
ohci-atmel: OHCI Atmel driver
at91_ohci 600000.ohci: USB Host Controller
at91_ohci 600000.ohci: new USB bus registered, assigned bus number 2
at91_ohci 600000.ohci: irq 43, io mem 0x00600000
usb usb2: New USB device found, idVendor=1d6b, idProduct=0001, bcdDevice= 5.04
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: USB Host Controller
usb usb2: Manufacturer: Linux 5.4.109-00033-ga88943c1e68 ohci_hcd
usb usb2: SerialNumber: at91
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
usbcore: registered new interface driver cdc_acm
cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
usbcore: registered new interface driver cdc_wdm
usbcore: registered new interface driver usbserial_generic
usbserial: USB Serial support registered for generic
usbcore: registered new interface driver option
usbserial: USB Serial support registered for GSM modem (1-port)
usbcore: registered new interface driver usb_serial_simple
usbserial: USB Serial support registered for carelink
usbserial: USB Serial support registered for zio
usbserial: USB Serial support registered for funsoft
usbserial: USB Serial support registered for flashloader
usbserial: USB Serial support registered for google
usbserial: USB Serial support registered for libtransistor
usbserial: USB Serial support registered for vivopay
usbserial: USB Serial support registered for moto_modem
usbserial: USB Serial support registered for motorola_tetra
usbserial: USB Serial support registered for novatel_gps
usbserial: USB Serial support registered for hp4x
usbserial: USB Serial support registered for suunto
usbserial: USB Serial support registered for siemens_mpi
rtc-rv8803 1-0032: registered as rtc0
i2c /dev entries driver
at91-reset fffffe00.rstc: Starting after wakeup
at91sam9_wdt: enabled (heartbeat=15 sec, nowayout=1)
atmel_aes f8038000.aes: version: 0x135
atmel_aes f8038000.aes: Atmel AES - Using dma1chan0, dma1chan1 for DMA transfers
atmel_sha f8034000.sha: version: 0x410
atmel_sha f8034000.sha: using dma1chan2 for DMA transfers
atmel_sha f8034000.sha: Atmel SHA1/SHA256/SHA224/SHA384/SHA512
atmel_tdes f803c000.tdes: version: 0x701
atmel_tdes f803c000.tdes: using dma1chan3, dma1chan4 for DMA transfers
atmel_tdes f803c000.tdes: Atmel DES/TDES
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xd3
nand: Micron MT29F8G08ABACAWP
nand: 1024 MiB, SLC, erase size: 256 KiB, page size: 4096, OOB size: 224
Bad block table found at page 262080, version 0x01
Bad block table found at page 262016, version 0x01
nand_read_bbt: bad block at 0x000000dc0000
nand_read_bbt: bad block at 0x00000e280000
nand_read_bbt: bad block at 0x000024f40000
nand_read_bbt: bad block at 0x000026ec0000
nand_read_bbt: bad block at 0x000027ec0000
nand_read_bbt: bad block at 0x00002f140000
nand_read_bbt: bad block at 0x000030ec0000
nand_read_bbt: bad block at 0x000033b40000
nand_read_bbt: bad block at 0x000038840000
2 fixed-partitions partitions found on MTD device atmel_nand
Creating 2 MTD partitions on "atmel_nand":
0x000000000000-0x000000040000 : "at91bootstrap"
0x000000040000-0x000040000000 : "ubivols"
NET: Registered protocol family 10
Segment Routing with IPv6
sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
NET: Registered protocol family 17
8021q: 802.1Q VLAN Support v1.8
Loading compiled-in X.509 certificates
Loaded X.509 cert 'EFR: efr.de: a54308f075327fb6c3a6fb687fdabc10ce4c51c6'
Key type ._fscrypt registered
Key type .fscrypt registered
ksz8795-switch spi0.0: active low/falling IRQ
libphy: dsa slave smi: probed
ksz8795-switch spi0.0 han (uninitialized): PHY [dsa-0.0:00] driver
[Micrel KSZ886X Switch]
ksz8795-switch spi0.0 cls (uninitialized): PHY [dsa-0.0:01] driver
[Micrel KSZ886X Switch]
ksz8795-switch spi0.0: Using legacy PHYLIB callbacks. Please migrate to PHYLINK!
DSA: tree 0 setup
ubi0: attaching mtd1
usb 2-3: new full-speed USB device number 2 using at91_ohci
usb 2-3: New USB device found, idVendor=08e6, idProduct=3437, bcdDevice= 2.01
usb 2-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 2-3: Product: USB SmartCard Reader
usb 2-3: Manufacturer: Gemalto
random: crng init done
ubi0: scanning is finished
ubi0: attached mtd1 (name "ubivols", size 1023 MiB)
ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 253952 bytes
ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 4096
ubi0: VID header offset: 4096 (aligned 4096), data offset: 8192
ubi0: good PEBs: 4082, bad PEBs: 13, corrupted PEBs: 0
ubi0: user volume: 5, internal volumes: 1, max. volumes count: 128
ubi0: max/mean erase counter: 118/11, WL threshold: 4096, image
sequence number: 921361235
ubi0: available PEBs: 0, total reserved PEBs: 4082, PEBs reserved for
bad PEB handling: 67
ubi0: background thread "ubi_bgt0d" started, PID 617
rtc-rv8803 1-0032: setting system clock to 2021-10-09T12:16:01 UTC (1633781761)
UBIFS (ubi0:4): Mounting in authenticated mode
UBIFS (ubi0:4): recovery needed
VFS: Cannot open root device "ubi0:rootfs" or ubi0:rootfs: error -1
Please append a correct "root=" boot option; here are the available partitions:
Kernel panic - not syncing: VFS: Unable to mount root fs on ubi0:rootfs
CPU: 0 PID: 1 Comm: swapper Not tainted 5.4.109-00033-ga88943c1e68 #1
Hardware name: Atmel SAMA5
[<c010babc>] (unwind_backtrace) from [<c010a8d0>] (show_stack+0x10/0x14)
[<c010a8d0>] (show_stack) from [<c0569970>] (panic+0xfc/0x2e8)
[<c0569970>] (panic) from [<c0801598>] (mount_block_root+0x268/0x2a4)
[<c0801598>] (mount_block_root) from [<c0801674>] (prepare_namespace+0x9c/0x184)
[<c0801674>] (prepare_namespace) from [<c056b774>] (kernel_init+0x8/0x108)
[<c056b774>] (kernel_init) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0xc7829fb0 to 0xc7829ff8)
9fa0:                                     00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on
ubi0:rootfs ]---
RomBOOT
RomBOOT

>
> 5.4 is not fresh. Can you use a more recent kernel?

We are tracking the latest patches on 5.4, but will try whether a
rebase to 5.10 is straight-forward or not.
Or at least port the minimal changes which are needed to re-create the crash.

But first, I will try to reproduce the crash with a stripped down
program, because with the current tests,
it can take anywhere between 20-500 iterations of power-cuts with the
full business logic to trigger the crash.
(5m ON 30s OFF)

Best Regards,
Kristóf



More information about the linux-mtd mailing list