[PATCH 22/23] Add file data and debugfs read/write commands
Zach Brown
zab at zabbo.net
Wed Apr 9 18:18:37 PDT 2025
On Fri, Apr 04, 2025 at 08:45:38PM +0200, Valerie Aurora wrote:
> Implement ngnfs_data_read() and ngnfs_data_write() and corresponding
> debugfs commands.
> +/*
> + * Return the offset past the beginning of a block.
> + */
> +static u64 offset_in_blk(u64 offset)
> +{
> + return offset & (NGNFS_BLOCK_SIZE - 1ULL);
This is fine, but I also wouldn't be opposed to adding a _MASK variant
of the existing _SHIFT and _SIZE block size macros. That triad is very
familiar (PAGE_* in the kernel).
> +/*
> + * Calculate the index of the block reference for this logical block
> + * within an indirect block at this level (1 = pointers to data blocks).
> + */
> +static u32 calc_ref_ind(u64 dblk, int level)
> +{
> + u32 ind;
> + int i;
> +
> + BUG_ON(level < 1);
> +
> + for (i = 1; i <= level; i++)
> + dblk = div_u64_rem(dblk, NGNFS_DATA_REFS_PER_BLOCK, &ind);
> +
> + return ind;
We have a power of two number of block references per indirect block.
Let's verify that with some BUILD_BUG_ON compile-time assertions and
then implement these with shifts and masks.
> +/*
> + * Calculate the height of the tree (level of the root pointer) needed
> + * to index the logical block dblk in this file.
> + */
> +static u8 height_from_dblk(u64 dblk)
> +{
> + u64 total = NGNFS_DATA_REFS_PER_BLOCK;
> + u8 height = 2;
> +
> + if (dblk == 0)
> + return 1;
> +
> + while (dblk >= total) {
> + height++;
> + total *= NGNFS_DATA_REFS_PER_BLOCK;
> + }
> +
> + return height;
> +}
With a power of two fanout this'd become a function of ffs().
> +/*
> + * Get the block at this logical file data offset and return it in
> + * da->data. If it doesn't exist, allocate it and all indirect blocks on
> + * its path if necessary. This is only called after checking that a read
> + * is from a valid range of the file, so it should always allocate
> + * missing blocks to zero-fill holes.
> + */
> +static int get_data_block(struct ngnfs_fs_info *nfi, struct ngnfs_transaction *txn,
> + struct data_path_args *da)
The functions that fill/grow the tree could all be implemented in this
one function. The core loop is to get a block at each level. If the
block isn't found, and we're not inserting, then we hit sparse. If it's
not found, and we're inserting, then we allocate and link in. We either
link to the root or the previous parent. That little bit of control
logic gets rid of the duplication in all these helper functions.
And we get get rid of the data_path_args struct.
> +static int read_write_range(struct ngnfs_fs_info *nfi, struct ngnfs_transaction *txn,
> + struct data_path_args *da, size_t *bytes)
> +{
> + size_t start, bytes_done, bytes_todo;
> + int ret;
> +
> + if (!da->write)
> + trim_read_range(da);
> +
> + bytes_done = 0;
> + start = offset_in_blk(da->offset);
> +
> + while (bytes_done < da->len) {
> + ret = get_data_block(nfi, txn, da);
> + if (ret < 0)
> + goto out;
A full descent per block is too expensive. This gets us to what
get_data_block should return. Rather than each data block, it should
return the parent block so read_write_range can iterate over the
references to the data blocks in its offset region.
'course, there isn't a parent block when there's a single data block. I
could see working in terms of arrays of block refs (with a single entry
array for the root :)). But I could also see making it a lot easier and
special casing the trivial single block case and let the rest of the
fiddly indirection code handle cases where (off+len) > block_size.
> +static ssize_t read_write_data(struct ngnfs_fs_info *nfi, struct ngnfs_inode_ino_gen *ig,
> + u64 offset, void *buf, size_t len, int write)
> +{
> + struct {
> + struct data_path_args da;
> + struct ngnfs_transaction txn;
> + } *op;
> + size_t bytes;
> + int ret;
> +
> + op = kmalloc(sizeof(*op), GFP_NOFS);
> + if (!op)
> + return -ENOMEM;
> +
> + bytes = 0;
> + init_data_path_args(&op->da, offset, buf, len, write);
> + ngnfs_txn_init(&op->txn);
> +
> + /* XXX break up into smaller transactions */
> + do {
> + ret = ngnfs_inode_get(nfi, &op->txn, op->da.nbf, ig, &op->da.ino) ?:
> + read_write_range(nfi, &op->txn, &op->da, &bytes);
> +
> + } while (ngnfs_txn_retry(nfi, &op->txn, &ret));
> +
> + ngnfs_txn_teardown(nfi, &op->txn);
> +
> + if (ret == 0)
> + ret = bytes;
> +
> + return ret;
> +}
I agree with that XXX! :) This can't pin all the blocks for the len
argument. It needs to be broken up into chunks. This highest level can
choose some chunk length that limits the number of blocks referenced by
each transaction. Maybe something easy like 16 or 32 blocks?
This'd increment off/len as each transaction returns successful bytes,
handing each region to read_write_range.
> +/*
> + * Indirect blocks are a simple array of block refs.
> + */
> +#define NGNFS_DATA_REFS_PER_BLOCK (NGNFS_BLOCK_SIZE/sizeof(struct ngnfs_block_ref))
Whitespace for my poor old eyes, plz: (a / b)
- z
More information about the ngnfs-devel
mailing list