[RFC] UBIFS authentication

Mon Apr 9 08:23:05 PDT 2018

Hi Sascha,

> On 09.04.2018, at 11:59, Sascha Hauer <s.hauer at pengutronix.de> wrote:
> 
> Hi David,
> 
> On Wed, Jan 17, 2018 at 04:19:14PM +0100, David Gstir wrote:
>> Hi everybody!
>> 
>> ### Index Authentication
>> 
>> Through UBIFS' concept of a wandering tree, it already takes care of only
>> updating and persisting changed parts from leaf node up to the root node
>> of the full B+ tree. This enables us to augment the index nodes of the tree
>> with a hash over each node's child nodes. As a result, the index basically also
>> a Merkle tree. Since the leaf nodes of the index contain the actual filesystem
>> data, the hashes of their parent index nodes thus cover all the file contents
>> and file metadata. When a file changes, the UBIFS index is updated accordingly
>> from the leaf nodes up to the root node including the master node. This process
>> can be hooked to recompute the hash only for each changed node at the same time.
>> Whenever a file is read, UBIFS can verify the hashes from each leaf node up to
>> the root node to ensure the node's integrity.
>> 
>> To ensure the authenticity of the whole index, the UBIFS master node stores a
>> keyed hash (HMAC) over its own contents (which includes the location of the root
>> node) and the root node of the index tree. As mentioned above, the master node
>> is always written to the flash whenever the index is persisted (ie. on index
>> commit).
>> 
>> Using this approach only UBIFS index nodes and the master node are changed to
>> include a hash. All other types of nodes will remain unchanged. This reduces
>> the storage overhead which is precious for users of UBIFS (ie. embedded
>> devices).
>> 
>> 
>>                                            HMAC Master Node
>>                                                 |
>>                      ' ' ' ' ' ' ' ' ' ' ' ' ' '|' '
>>                      '     +---------------+    o  '
>>                      '     |  Master Node  |       '
>>                      '     +---------------+       '       Hash Index Node #1
>>                      '             |               '                |
>>       . . . . . . . .'. . . . . .  v . . . . . . . . . . . . . . . .|. . . . .
>>       .              '      +---------------+      '                o        .
>>       .              '      | Index Node #1 |      '                         .
>>       .              '      +---------------+      '                         .
>>       .              '        |    ...   |  (fanout: 8)                      .
>>       .              ' ' ' '  | ' ' ' '  | ' ' ' ' '                         .
>>       .               +-------+          +------+                            .
>>       .               |                         |                            .
>>       .     ' ' ' ' ' v ' ' ' ' '  ' ' ' ' ' '  v  ' ' ' ' ' ' ' ' ' ' '     .
>>       .     ' +---------------+ '  '     +---------------+             '     .
>>       .     ' | Index Node #2 | '  '     | Index Node #3 |             '     .
>>       .     ' +---------------+ '  '     +---------------+             '     .
>>       .     '   |   ...         '  '        |   ...   |                '     .
>>       . . . ' . v . . . . . . . '. ' . . .  v . . . . v . . . . . . . .' . . .
>>             ' +-----------+     '  '+----------+  +-----------+        '
>>             ' | Data Node |     '  '| INO Node |  | DENT Node |        '
>>             ' +-----------+     '  '+----------+  +-----------+        '
>>             '  o                '  '                                o  '
>>             ' '|' ' ' ' ' ' ' ' '  ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' '|' '
>>                |                                                    |
>>          Hash Index Node #2                                Hash Index Node #3
> 
> When a hash covers an index node and also its children then of course
> this is really space efficient, but this also means that in order to
> read a node we always have to read all of its children. Also adding a
> new leaf node means rehashing all siblings. Is it really worth paying
> this price to save a few bytes for more hashes?
> I would rather suggest to add a hash per child to each index node.

What you propose is a simple tradeoff between the required on-flash storage
space and the amount of read operations needed to verify the index node integrity.

To get some numbers here: In case of SHA256 and a fanout of 8, we'd store an
additional 224 bytes per index node and safe 7 index node reads per tree level.

I hope it is clear that having a single hash does _not_ mean we have to read
the whole tree! :)

Considering that UBIFS is mostly used in embedded where storage space is rather
precious, we opted for storing a single hash. But sure, storing a hash per
branch/child  is a possibility and we have to discuss if that is acceptable
in most use cases.

As for adding new leaf nodes: If we add a leaf node, we'll always have to recompute
the hash of every index node on the path to the root node. Even with a hash per
branch/child. In case of TNC split/merge operations likely even more.
But I don't get why you think that would mean we have to recompute the hash
on all siblings of that nodes? Or do you simply mean that we have to re-read all
those siblings to compute the hash on the parent?

Thanks,
David