[PATCH v4 6/8] fsverity: improve performance by using multibuffer hashing

Wed Jun 5 11:58:13 PDT 2024

On Wed, Jun 05, 2024 at 05:19:01PM +0800, Herbert Xu wrote:
> On Tue, Jun 04, 2024 at 11:42:20AM -0700, Eric Biggers wrote:
> >
> > This doesn't make any sense, though.  First, the requests need to be enqueued
> > for the task, but crypto_ahash_finup() would only have the ability to enqueue it
> > in a queue associated with the tfm, which is shared by many tasks.  So it can't
> 
> OK I screwed up that one.  But that's not hard to fix.  We could
> simply add request chaining:
> 
> 	ahash_request_chain(req1, req2);
> 	ahash_request_chain(req2, req3);
> 	...
> 	ahash_request_chain(reqn1, reqn);
> 
> 	err = crypto_ahash_finup(req1);

So after completely changing several times your proposal is getting a bit closer
to mine, but it still uses a very clumsy API based around ahash that would be
much harder to use and implement correctly.  It also says nothing about what the
API would look like on the shash side, which would be needed anyway because
ahash is almost always just a pointless wrapper for shash.  Is there a different
API that you are asking for on the shash side?  If so, what?

> > actually work unless the tfm maintained a separate queue for each task, which
> > would be really complex.  Second, it adds a memory allocation per block which is
> > very undesirable.  You claim that it's needed anyway, but actually it's not;
> > with my API there is only one initial hash state regardless of how high the
> > interleaving factor is.  In fact, if multiple initial states were allowed,
> 
> Sure you don't need it for two interleaved requests.  But as you
> scale up to 16 and beyond, surely at some point you're going to
> want to move the hash states off the stack.

To reiterate, with my proposal there is only one state in memory.  It's a simple
API that can't be misused by providing inconsistent properties in the requests.
Yes, separate states would be needed if we were to support arbitrary updates,
but why add all that complexity before it's actually needed?

Also, "16 and beyond" is highly unlikely to be useful for kernel use cases.

> > multibuffer hashing would become much more complex because the underlying
> > algorithm would need to validate that these different states are synced up.  My
> > proposal is much simpler and avoids all this unnecessary overhead.
> 
> We could simply state that these chained requests must be on block
> boundaries, similar to how we handle block ciphers.  This is not a
> big deal.

... which would make it useless for most dm-verity users, as dm-verity uses a
32-byte salt with SHA-256 (which has a 64-byte block size) by default.

> 
> > Really the only reason to even consider ahash at all would be try to support
> > software hashing and off-CPU hardware accelerators using the "same" code.
> > However, your proposal would not achieve that either, as it would not use the
> > async callback.  Note, as far as I know no one actually cares about off-CPU
> > hardware accelerator support in fsverity anyway...
> 
> The other thing is that verity doesn't benefit from shash at all.
> It appears to be doing kmap's on every single request.

The diff from switching fsverity from ahash to shash clearly demonstrates
otherwise.  Yes, fsverity has to map the pages to pass into shash, but that's a
very minor thing compared to all the complexity of ahash that was saved.  And
fsverity already had to map most of the pages anyway to access them.

- Eric