[PATCH v3 6/8] fsverity: improve performance by using multibuffer hashing

Thu May 30 23:52:58 PDT 2024

On Thu, May 30, 2024 at 11:13:48PM -0700, Eric Biggers wrote:
> On Fri, May 31, 2024 at 12:50:20PM +0800, Herbert Xu wrote:
> > Eric Biggers <ebiggers at kernel.org> wrote:
> > >
> > > +               if (multibuffer) {
> > > +                       if (ctx->pending_data) {
> > > +                               /* Hash and verify two data blocks. */
> > > +                               err = fsverity_hash_2_blocks(params,
> > > +                                                            inode,
> > > +                                                            ctx->pending_data,
> > > +                                                            data,
> > > +                                                            ctx->hash1,
> > > +                                                            ctx->hash2);
> > > +                               kunmap_local(data);
> > > +                               kunmap_local(ctx->pending_data);
> > > +                               ctx->pending_data = NULL;
> > > +                               if (err != 0 ||
> > > +                                   !verify_data_block(inode, vi, ctx->hash1,
> > > +                                                      ctx->pending_pos,
> > > +                                                      ctx->max_ra_pages) ||
> > > +                                   !verify_data_block(inode, vi, ctx->hash2,
> > > +                                                      pos, ctx->max_ra_pages))
> > > +                                       return false;
> > > +                       } else {
> > > +                               /* Wait and see if there's another block. */
> > > +                               ctx->pending_data = data;
> > > +                               ctx->pending_pos = pos;
> > > +                       }
> > > +               } else {
> > > +                       /* Hash and verify one data block. */
> > > +                       err = fsverity_hash_block(params, inode, data,
> > > +                                                 ctx->hash1);
> > > +                       kunmap_local(data);
> > > +                       if (err != 0 ||
> > > +                           !verify_data_block(inode, vi, ctx->hash1,
> > > +                                              pos, ctx->max_ra_pages))
> > > +                               return false;
> > > +               }
> > > +               pos += block_size;
> > 
> > I think this complexity is gross.  Look at how we did GSO in
> > networking.  There should be a unified code-path for aggregated
> > data and simple data, not an aggregated path versus a simple path.
> > 
> > I think ultimately it stems from the fact that this code went from
> > ahash to shash.  What were the issues back then? If it's just vmalloc
> > we should fix ahash to support that, rather than making users of the
> > Crypto API go through contortions like this.
> 
> It can't be asynchronous, period.  As I've explained, that would be far too
> complex, and it would also defeat the purpose because it would make performance
> worse.  Messages *must* be queued up and hashed in the caller's context.
> 
> What could make sense would be some helper functions and an associated struct
> for queueing up messages for a particular crypto_shash, up to its mb_max_msgs
> value, and then flushing them and retrieving the digests.  These would be
> provided by the crypto API.
> 
> I think this would address your concern, in that the users (fsverity and
> dm-verity) would have a unified code path for multiple vs. single blocks.
> 
> I didn't think it would be worthwhile to go there yet, given that fsverity and
> dm-verity just want 2x or 1x, and it ends up being simpler and more efficient to
> handle those cases directly.  But we could go with the more general queueing
> helper functions instead if you feel they should be included from the start.
> 

Looking at it again a bit more closely, both fsverity and dm-verity have
per-block information that they need to keep track of in the queue in addition
to the data buffers and hashes: the block number, and in dm-verity's case also a
bvec_iter pointing to that block.

So I think it really does make sense to have them both handle the queueing
themselves, and not have it split between them and some crypto API helper
functions (i.e. two queues that mirror each other).

It would be possible, though, to organize the code in dm-verity and fsverity to
represent the queue as an array and operate on it as such.  That would also
address your concern about the two code paths.  Again, things would end up being
a bit less efficient than my more optimized code that handles 1x and 2x (which
is all that's actually needed for now) specifically, but it would work, I think.

- Eric