UBIFS-AUTH panic after reboot

Thu Sep 17 10:48:42 EDT 2020

In the meantime we could create a simple "clean trigger", independent from our
business logic! It is done by creating random files and then deleting them.
BUT *the size* of the files and *the number* of them do have an influence.
As well as whether we delete them or not.
We could reproduce this behaviour on multiple devices.

A stripped down version of the trigger-script (without echos and such):

```
#!/bin/bash
DEST_DIR=/root/panictest
FILE_CNT=15
FILE_SIZE="10M"
DO_CLEANUP='y'
POST_SYNC_SLEEP=1

[ -d "$DEST_DIR" ] && rm -rf "$DEST_DIR"

mkdir -p "$DEST_DIR"

for i in `seq 1 $FILE_CNT` ; do
    OUT_FILE="$DEST_DIR/file$i"
    if ! dd if=/dev/urandom of=$OUT_FILE bs=$FILE_SIZE count=1 > /dev/null; then
        break
    fi
done

sync
sleep $POST_SYNC_SLEEP
df -h

if [[ "$DO_CLEANUP" == 'y' ]]; then
    rm -rf $DEST_DIR
    sync
    sleep $POST_SYNC_SLEEP
fi

reboot
```

Without "chk_lprops" the panic is only visible after the restart, but if it was
enabled the following assertion was triggering an immediate panic:
(Should I "reformat" long log lines in the future or leave them verbatim?)

UBIFS error (ubi0:4 pid 649): ubifs_assert_failed: UBIFS assert
failed: (val >> nrbits) == 0 || nrbits == 32, in fs/ubifs/lpt.c:231
UBIFS warning (ubi0:4 pid 649): ubifs_ro_mode.part.1: switched to
read-only mode, error -22
CPU: 0 PID: 649 Comm: ubifs_bgt0_4 Not tainted 5.4.64-00030-gbcb07cf6f1bb #35
Hardware name: Atmel SAMA5
[<c010bb5c>] (unwind_backtrace) from [<c010a7c8>] (show_stack+0x10/0x14)
[<c010a7c8>] (show_stack) from [<c0260c40>] (pack_bits+0x1ac/0x1d0)
[<c0260c40>] (pack_bits) from [<c0261854>] (ubifs_pack_pnode+0xa4/0x144)
[<c0261854>] (ubifs_pack_pnode) from [<c02631c4>]
(ubifs_lpt_calc_hash+0x134/0x220)
[<c02631c4>] (ubifs_lpt_calc_hash) from [<c026acdc>]
(ubifs_lpt_start_commit+0x754/0xe54)
[<c026acdc>] (ubifs_lpt_start_commit) from [<c02599c8>] (do_commit+0x1d0/0x484)
[<c02599c8>] (do_commit) from [<c0259dbc>] (ubifs_bg_thread+0x140/0x154)
[<c0259dbc>] (ubifs_bg_thread) from [<c012fef8>] (kthread+0x114/0x144)
[<c012fef8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
Exception stack(0xc71fffb0 to 0xc71ffff8)
ffa0:                                     00000000 00000000 00000000 00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
UBIFS error (ubi0:4 pid 649): ubifs_assert_failed: UBIFS assert
failed: (val >> nrbits) == 0 || nrbits == 32, in fs/ubifs/lpt.c:231
UBIFS error (ubi0:4 pid 649): ubifs_assert_failed: UBIFS assert
failed: (val >> nrbits) == 0 || nrbits == 32, in fs/ubifs/lpt.c:231

Regarding the variable space (fsize-fcount-docleanup), these were our first
rudimental findings:

fsize | fcount | ~used% | docleanup | panic? |
----- | ------ | ------ | --------- | ------ |
 10M  |  10    |  ~10%  |    yes    |   no   |
 10M  |  15    |  ~15%  |    yes    |   yes  |
 10M  |  NA    |  ~25%  |    yes    |   yes  |
 10M  |  NA    |  ~50%  |    yes    |   yes  |
 10M  |  NA    | ~100%  |    yes    |   yes  |
 10M  |  NA    | ~100%  |    no     |   no   |
  1M  |  NA    | ~100%  |    yes    |   no   |

So if we write >100M worth of 10M files and delete them we can
"ropustly" trigger
the panic. But if we do not delete them, the panic does not occour.

But if we fill the flash with 1M files and delete them, we didn't see the panic.

On Thu, 17 Sep 2020 at 15:24, Richard Weinberger
<richard.weinberger at gmail.com> wrote:
>
> On Tue, Sep 15, 2020 at 4:51 PM Kristof Havasi <havasiefr at gmail.com> wrote:
> > What I have tried:
> > =================
> > Based on the panic log [1], I can see that the panic happens here:
> >     ubifs_lpt_calc_hash
> >         `->ubifs_get_pnode
> > inside the iteration over the LPT pnodes with hashing
>
> Hmm.
>
> > Questions:
> > =========
> > Q1: Are the chk_* knobs authentication aware? Or do they report so loudly
> >     because I enabled the authentication and they cannot handle it yet?
>
> They should. If not they need fixing. :-)

They do seem to be OK, because with chk_lprops enabled, and the clean trigger
script the panic (UBI assertion) was cought immediately, not only
after a restart.

>
> > Q2: Could I use `integr_chk` with authentication and so that the UBI volume is
> >     my root filesystem?
>
> What is "integr_chk"? Do you mean the integ test from mtd-utils?

Sorry, I didn't write the correct name: I meant "integck" from mtd-utlis.

> > UBIFS (ubi0:4): Mounting in authenticated mode
> > UBIFS (ubi0:4): background thread "ubifs_bgt0_4" started, PID 632
> > UBIFS error (ubi0:4 pid 1): ubifs_get_pnode.part.6: error -22 reading
>
> So, it returns -EINVAL. Is this with chk_* enabled?

In both cases: with and without enabled chk_* knobs.

>
> > pnode at 7:37186
> > (pid 1) dumping pnode:
> >         address c7138c80 parent c7138e80 cnext 0
> >         flags 0 iip 3 level 0 num 0
> >         0: free 0 dirty 255408 flags 1 lnum 0
> >         1: free 0 dirty 190192 flags 1 lnum 0
> >         2: free 0 dirty 255360 flags 1 lnum 0
> >         3: free 0 dirty 248896 flags 1 lnum 0
>
> Sascha, does this ring a bell?

FYI: here sometimes I see some node(s) with non-zero value for "free".

>
> --
> Thanks,
> //richard

Just let me know if I can provide any other useful information.

Thank you,
    Kristof