RFE: Support for client-side content encryption in AFS
Eric Biggers
ebiggers at kernel.org
Tue Mar 4 18:23:53 PST 2025
On Fri, Feb 28, 2025 at 11:42:34AM +0000, David Howells wrote:
> Hi,
>
> I would like to build support for file content encryption in the kAFS
> filesystem driver in the Linux kernel - but this needs standardising so that
> other AFS filesystems can make use of it also.
>
> Note that by "content encryption", I mean that only the permitted clients have
> a key to the content. The server does not. Further, filenames may also be
> encrypted.
>
> For the kAFS filesystem, content encryption would be provided by the netfs
> library. The intention is that netfslib will provide such service to any
> filesystem that uses it (afs, 9p, cifs and, hopefully soon, ceph) using Linux
> fscrypt where possible (but not mandatory). netfslib would then store the
> encrypted content in the local cache also and only decrypt it when it's
> brought into memory.
>
> Now, the way I would envision this working is:
>
> (1) Each file is divided into units of 4KiB, each of which is encrypted
> separately with its own block key. The block key is derived from the
> file key and the block offset.
>
> (2) Unfortunately, AFS does not have anywhere to store additional information
> for a file, such as xattrs, but the last block must be rounded out to at
> least the crypto block size and maybe the unit size - and we need to
> stash the real file size somewhere. There are a number of ways this
> could be dealt with:
>
> (a) Store this extra metadata in a separate file. This has a potential
> integrity issue if we fail to update that due to EDQUOT/ENOSPC,
> network loss, etc.
>
> (b) Round up the data part of the file to 4KiB and tack on a trailer at
> the end of file that has the real EOF in it. This the advantages
> that the trailer and the last block can be updated in a single
> StoreData RPC and that the real EOF can be encrypted, but the
> disadvantage that we can't return accurate info with stat() unless we
> can read (and decrypt) the trailer - and we have to do that in
> stat().
>
> (c) Stick a fixed-len trailer at the real EOF and just encrypt over part
> of that. Again, this can be updated in a single StoreData RPC and
> the real EOF can be calculated by simple subtraction. The trailer
> only need be one crypto block (say 16 bytes) in size, not the full
> 4K.
>
> (d) Find a hole somewhere in the protocol and the on-server-disk metadata
> to store a number in the range 0-4095 that is backed up and
> transferred during a volume release. I suspect this is infeasible.
>
> (e) Provide xattr support. Probably also infeasible - though it might
> help with other things such as stacked filesystem support.
>
> (3) Mark a whole volume as being content-encrypted. That is that content
> encryption is only available on a whole-volume basis unless we can find a
> way to mark individual vnodes as being encrypted - but this has the same
> issues as storing the real EOF length.
>
> This could be done in a number of ways:
>
> (a) A volume flag, passed to the client through the VLDB and the volume
> server. The flag would need to be passed on to clone volumes and
> would need to be set at volume creation time or shortly thereafter.
>
> This might need a new RPC, say VOLSER.CreateEncryptedVolume, as
> VOLSER.CreateVolume doesn't seem to offer a way to indicate this, but
> maybe VOLSER.SetFlags would suffice: you turn it on and everything is
> suddenly encrypted.
>
> (b) Storing a magic file in the root directory of the volume
> (".afs_encrypted" say) that the client can look for. This file could
> contain info about the algorithms used and the information about key
> needed to decrypt it.
>
> (4) Encrypt filenames in an encrypted directory. Whilst we could just
> directly pass encrypted filenames in the protocol as the names are XDR
> strings with a length count, they can't be stored in the standard AFS
> directory format as they may include NUL and '/'. I can see two
> possibilities here:
>
> (a) base64 encode the encrypted filenames (using a modified base64 to
> exclude '/'). This has two disadvantages: it reduces the maximum
> name length by 3/4 and makes all names longer, reducing the capacity
> of the directory.
>
> (b) Use the key to generate a series of numbers and then use each number
> to map a character of the filename, being careful to break the range
> around 0 and 47 so that we can map backwards. This may result in
> less secure filename encryption than (a) and is trickier to do.
>
> (5) Derive file keys by combining a per-volume key with the vnode ID and the
> uniquifier. Marking files with the 'name' of a specific key could be
> possible, but again this requires somewhere to store these as discussed
> in (2).
>
> Possibly 'file keys' could be skipped, deriving each block key from:
>
> RW vol ID || vnode ID || uniquifier || block pos
>
> The cell name cannot be included due to aliasing unless the canonical
> cell name can be queried.
>
> (6) Provide a conditional FS.StoreData RPC that takes a Data Version number
> as an additional parameter and fails if that doesn't match the current
> DV. The issue is that even if just a byte is changed, an entire crypto
> unit must be written and truncation may also have to reencrypt the tail.
>
> (And by "fail", I'd prefer if it returned the updated stats rather than
> simply aborting - but I understand that we really want to close off the
> data transmission).
>
> (7) Though it's not strictly required for this, similar to (6), a conditional
> FS.FetchData could be useful as well for speculatively reading from a RO
> clone of a RW volume.
>
> Again, rather than failing with an abort, I'd prefer this to return no
> data and just the updated stats. The client should then check the DV in
> the updated stats.
>
> The simplest way to do this need not involve any changes on the server, though
> having a conditional store would make it safer.
>
I haven't had a chance to look at this in detail, but a couple things:
First, CephFS already supports fscrypt. Have you looked at how it works and
solves some of these issues?
Second, per-block keys would be really inefficient and are unnecessary. The way
that fscrypt works is that the keys are (usually) per-file, and within each file
each block has a different IV (initialization vector). That is sufficient to
make each block be encrypted differently.
- Eric
More information about the linux-afs
mailing list